Senior Scientist, Foundation models for speech
About Translated
Translated is on a mission to allow everyone to understand and be understood, in their own language. We are a technology-powered professional translation provider. We partner with over 200 000 professional translators worldwide, in 200 languages. Our 310 000 clients range from the private person who needs their CV translated to the very big, like Uber and Airbnb.
Our progress is largely powered by our ability to leverage scientific progress and realize the best synergy between humans and machines. We invest heavily in R&D, such as LLMs applied to translation, expressive speech synthesis, and privacy-preserving training for translation. We operate as a science-driven startup, so that our scientific innovations quickly make it to production and make a measurable impact on our operations.
The ideal candidate should have a strong enthusiasm for contributing to the design and implementation of Large-scale Language Models for speech-related tasks. Additionally, they should be capable of coordinating technical, communication, and team activities for our Meetween project.
The project: Meetween
Translated has just been awarded a grant on Meetween, a 7M€, 4-year collaborative research project started in January 2024, which Translated leads. Meetween uses LLMs and multimodal foundation models to enhance human communication.
The project is related to the following research areas: Deep Learning, Large Language and Multimodal models, Machine Translation, Automatic Speech Recognition and Translation, Summarization, AI Digital Assistants. The project offers the opportunity to collaborate with leading speech processing teams from both academia and industry.
With Meetween, we want to "solve speech": we will build foundation models to model all 3 modalities of speech (text, audio, video including lip movement, facial expression and gestures) in a single architecture. Thanks to transfer learning and conditioning, this is used to implement many speech-related downstream tasks: ASR, zero-shot TTS and voice cloning, speech-to-speech translation, lip reading and lip resync, audio/video mutual reconstruction/enhancement.
We have secured an ambitious computing budget on the Polish HPC infrastructure with hundreds of thousands of A100-hours, in addition to our in-house compute infrastructure of several hundreds of GPUs.
For maximum impact, all research outcomes of Meetween, such as trained models, datasets, evaluation benchmarks, will be open-sourced on HuggingFace.
What You’ll Do
You will be part of a team of researchers dedicated to Meetween within Translated's AI Research team, which works on several pieces of technology such as Large Language Models, Machine Translation, Speech Synthesis, privacy-preserving Machine Learning. The AI Research team collaborates closely with product and engineering teams and develops the technology to power Translated's next generation of products.
In this role, you will:
- work with data, compute and algorithms
- design of deep learning multimodal neural architectures
- design experiments, implement them in code, run them on large (GPU on HPC) compute, run evaluations
- monitor and benchmark the state of the art
- have the opportunity to give guidance to more junior team members such as PhD students and interns
- coordinate with our partners on our research roadmap
- adapt the project's pace to the rapid scientific development in our field
- organize publications and open-sourcing efforts
Qualifications Required
- completed PhD or 4 years of industry research experience in a relevant field of deep learning, typically language modeling or speech recognition
- excellent programming skills: Pytorch
- familiarity with Docker and Unix OS, including running GPU experiments
- interest in carrying out experimental research
- relevant scientific publications, teaching and research experience
- experience in the industry of speech and language technology
- experience in coordinating a team of researchers
- excellent use of English
Bonus points if...
- you're strong in multi-GPU training, GPU training optimization
- you're a polyglot
- you have open-source contributions
- you have published at tier-1 ML/AI conferences such as NeurIPS, ICML, ICML, ICASSP, Interspeech, ACL, EMNLP
Our office
Translated is hosted at Pi Campus, a working environment immersed in nature where 6 luxury villas in Rome (Italy) have been converted into functional offices to foster talent growth. Pi Campus is also a venture firm created by Translated to reinvest part of its profits into promising AI startups.
Benefits and Perks
Our working environment is both relaxed and intense. We are passionate about our mission, and our work is highly regarded in our industry.
- Competitive and exciting work environment. You will be surrounded by innovators and experts working at Pi Campus, a venture fund and startup ecosystem. A great environment to grow your skills.
- We host regular tech and entrepreneurship talks and events, in which you can take part as a Pi Citizen.
- Gym
- Swimming Pool
- Kickboxing
- Water aerobics
- Fitness
- Pilates
- Table Tennis and Football table
- Kitchen and snacks
- Bonuses and incentives for employees who quit smoking, ride a bicycle to work, or adopt a child.
Learn more about our company: https://translated.com/work-at-translated-onboarding
Diversity
At Translated, we proudly embrace and celebrate each individual's unique qualities to our team, regardless of race, sexual orientation, gender identity, or any other differences. We recognize that these diverse perspectives empower us to overcome challenges, foster innovation, and drive excellence. As an inclusive and equal-opportunity employer, we are committed to cultivating an environment where everyone feels welcome, valued, and supported to achieve their full potential.