About the job
🎯 What You’ll Do You’ll be responsible for making sure our AI brains work, scale, and improve. That means: • Owning the full AI pipeline orchestration: STT → LLM → TTS. • Implementing and monitoring prompt flows, fallback strategies, and recovery logic. • Measuring and improving conversational quality: latency, hallucination, successrate, barge-in. • Building test sets, running A/B experiments, catching regressions before users do. • Working closely with the Tech Lead and Voice UX designer to align model behavior with real-world conversations. 🧩 What You Bring • Solid Python skills, and experience integrating with LLM APIs (OpenAI, Anthropic...) • You’ve shipped or maintained a product that uses LLMs in production. • Familiarity with STT/TTS tools (e.g. Whisper, ElevenLabs, Coqui, Google, Azure). • Experience with prompt engineering, context control, session state, and fallback orchestration. • You know how to log, trace, and debug model behavior like it’s backend logic. ✅ Bonus Points If You’ve • Used Langfuse, TruLens, Helix, Ragas or built custom eval tooling. • Worked with latency-sensitive systems (real-time or streaming pipelines). • Set up CI for prompt evaluation or had a hand in LLMOps pipelines. • Managed golden sets, live traffic shadowing, or prompt regression testing. 🧠What You’ll Get • Ownership over AI behavior and quality across the entire product. • A team that values pragmatism, iteration, and clean architecture. • Input into how our agents sound, think, and adapt over time. • The chance to shape an AI system that actually talks to people — and gets judged on every sentence. 🚫 What This Role Is Not • It’s not prompt monkey work or “make it sound nicer” QA tasks. • It’s not about training foundation models or tuning a llama from scratch. • It’s not a research role. 👉 This is for someone who wants to ship LLM-powered voice agents that hold up under real-world load and get better over time.
Requirements
- Helix
- Managed golden sets
- prompt regression testing
- Langfuse
- STT/TTS tools
- Python
Preferred Technologies
- Helix
- Managed golden sets
- prompt regression testing
- Langfuse
- STT/TTS tools
- Python
About the company
Our goal is to make hiring reliable, simple, and fast. Our role will be to help all our talents find and apply for relevant contractual onsite opportunities and progress in their career. We will support any grievances or challenges you may face during the engagement.
Similar Jobs
AI / ML Engineer
Akoni Technologies
AI / ML Engineer
Akoni Technologies
Data & AI Engineer
GE Appliances