Senior Scientist, Foundation models for speech

Roma, Roma, Italy

Full Time

AI Research

Experienced

About Translated

Translated is on a mission to allow everyone to understand and be understood, in their own language. We are a technology-powered professional translation provider. We partner with over 200 000 professional translators worldwide, in 200 languages. Our 310 000 clients range from the private person who needs their CV translated to the very big, like Uber and Airbnb.

Our progress is largely powered by our ability to leverage scientific progress and realize the best synergy between humans and machines. We invest heavily in R&D, such as LLMs applied to translation, expressive speech synthesis, and privacy-preserving training for translation. We operate as a science-driven startup, so that our scientific innovations quickly make it to production and make a measurable impact on our operations.

The ideal candidate should have a strong enthusiasm for contributing to the design and implementation of Large-scale Language Models for speech-related tasks. Additionally, they should be capable of coordinating technical, communication, and team activities for our Meetween project.

The project: Meetween

Translated has just been awarded a grant on Meetween, a 7M€, 4-year collaborative research project started in January 2024, which Translated leads. Meetween uses LLMs and multimodal foundation models to enhance human communication.

The project is related to the following research areas: Deep Learning, Large Language and Multimodal models, Machine Translation, Automatic Speech Recognition and Translation, Summarization, AI Digital Assistants. The project offers the opportunity to collaborate with leading speech processing teams from both academia and industry.

With Meetween, we want to "solve speech": we will build foundation models to model all 3 modalities of speech (text, audio, video including lip movement, facial expression and gestures) in a single architecture. Thanks to transfer learning and conditioning, this is used to implement many speech-related downstream tasks: ASR, zero-shot TTS and voice cloning, speech-to-speech translation, lip reading and lip resync, audio/video mutual reconstruction/enhancement.

We have secured an ambitious computing budget on the Polish HPC infrastructure with hundreds of thousands of A100-hours, in addition to our in-house compute infrastructure of several hundreds of GPUs.

For maximum impact, all research outcomes of Meetween, such as trained models, datasets, evaluation benchmarks, will be open-sourced on HuggingFace.

What You’ll Do

You will be part of a team of researchers dedicated to Meetween within Translated's AI Research team, which works on several pieces of technology such as Large Language Models, Machine Translation, Speech Synthesis, privacy-preserving Machine Learning. The AI Research team collaborates closely with product and engineering teams and develops the technology to power Translated's next generation of products.

In this role, you will:

work with data, compute and algorithms
design of deep learning multimodal neural architectures
design experiments, implement them in code, run them on large (GPU on HPC) compute, run evaluations
monitor and benchmark the state of the art
have the opportunity to give guidance to more junior team members such as PhD students and interns
coordinate with our partners on our research roadmap
adapt the project's pace to the rapid scientific development in our field
organize publications and open-sourcing efforts

Requirements

completed PhD or 4 years of industry research experience in a relevant field of deep learning, typically language modeling or speech recognition
excellent programming skills: Pytorch
familiarity with Docker and Unix OS, including running GPU experiments
interest in carrying out experimental research
relevant scientific publications, teaching and research experience
experience in the industry of speech and language technology
experience in coordinating a team of researchers
excellent use of English

Bonus points if...

you're strong in multi-GPU training, GPU training optimization
you're a polyglot
you have open-source contributions
you have published at tier-1 ML/AI conferences such as NeurIPS, ICML, ICML, ICASSP, Interspeech, ACL, EMNLP

Headquarter

Translated is hosted at Pi Campus, a working environment immersed in nature where 6 luxury villas in Rome (Italy) have been converted into functional offices to foster talent growth. Pi Campus is also a venture firm created by Translated to reinvest part of its profits into promising AI startups.

Benefits and Perks

Gym
Swimming Pool
Kickboxing
Water aerobics
Fitness
Pilates
Table Tennis and Football table
Kitchen and snacks
Bonuses and incentives for employees who quit smoking, ride a bicycle to work, or adopt a child.

Learn more about our company: https://translated.com/work-at-translated-onboarding

Diversity

At Translated, we proudly embrace and celebrate each individual's unique qualities to our team, regardless of race, sexual orientation, gender identity, or any other differences. We recognize that these diverse perspectives empower us to overcome challenges, foster innovation, and drive excellence. As an inclusive and equal-opportunity employer, we are committed to cultivating an environment where everyone feels welcome, valued, and supported to achieve their full potential.

Apply for this position

Required*

Apply with Indeed

First Name*

Last Name*

Email Address*

Phone*

Address

Resume*

We've received your resume. Click here to update it.

Attach resume or Paste resume

Attach resume as .pdf, .doc, .docx, .odt, .txt, or .rtf (limit 5MB) or Paste resume

Paste your resume here or Attach resume file

URL of your Github profile

What was the hardest piece of work you produced using neural networks? Have you implemented algorithms from recent publications?*

What's an important chapter we ought to have in the DVPS textbook "Principles and Practice of Multimodal Foundation Models"?*

Human Check*

Submit Application