Voice AI Engineer
KGen is looking for a Voice AI Engineer to build and own the engineering pipelines for high-signal voice datasets focused on underserved Global South languages. The role involves designing end-to-end audio collection, annotation, and automated quality control systems, as well as building proprietary tooling atop existing ASR models. You will work with a diverse stack including Python, FFmpeg, and various audio-ML frameworks to deliver high-quality data to frontier AI labs. The successful candidate will be responsible for creating reproducible ML pipelines that track data lineage and performance metrics at scale.
50k new jobs listed every day. Install TAL to find more jobs like this.

Experience
2-5 years
Function
Engineering
Work mode
Onsite, India
Company
Tier 2
What you will work on
KGen is looking for a Voice AI Engineer to build and own the engineering pipelines for high-signal voice datasets focused on underserved Global South languages. The role involves designing end-to-end audio collection, annotation, and automated quality control systems, as well as building proprietary tooling atop existing ASR models. You will work with a diverse stack including Python, FFmpeg, and various audio-ML frameworks to deliver high-quality data to frontier AI labs. The successful candidate will be responsible for creating reproducible ML pipelines that track data lineage and performance metrics at scale.
TAL's take
Strong specialized role in a high-growth sector with clear ownership of audio infrastructure at a recognized scale.
Extremely clear technical requirements and mission-focused scope related to audio data pipelines and Global South language support.
Must haves
- 2–5 years engineering experience in speech AI or audio ML
- Hands-on experience with ASR fine-tuning (Whisper, SpeechBrain, etc.)
- Fluency with audio tooling (FFmpeg, librosa, torchaudio, etc.)
- Strong Python development skills for pipeline orchestration
- Experience building automated QC systems for audio data
- Experience shipping reproducible ML environments using Docker/CI/CD
Tools and skills
Nice to have: mlflow, w&b, aws s3, gcs, fastapi.
About the company
Growth-stage startup with significant user base and revenue but not yet established as a global top-tier tech leader.