June 19, 2026
1 min read
Building Real-Time Voice Agents with LiveKit and aiortc
Generative AIPythonVoice AI
Voice is one of the most natural interfaces for AI — but building it well means wrestling with real-time audio, latency budgets and turn-taking.
The stack
I used LiveKit for managed real-time transport, and dropped down to coturn + aiortc when I needed custom WebRTC behaviour.
async def on_audio(track):
async for frame in track:
text = await transcribe(frame)
reply = await agent.respond(text)
await synthesize_and_send(reply)
Lessons learned
- Keep the agent loop non-blocking; every millisecond shows up as awkward silence.
- Stream tokens to TTS as they arrive instead of waiting for the full response.
- Benchmark relentlessly — I wrote scripts to measure end-to-end latency.
More write-ups coming soon.