June 19, 2026 1 min read

Building Real-Time Voice Agents with LiveKit and aiortc

Generative AIPythonVoice AI

Voice is one of the most natural interfaces for AI — but building it well means wrestling with real-time audio, latency budgets and turn-taking.

The stack

I used LiveKit for managed real-time transport, and dropped down to coturn + aiortc when I needed custom WebRTC behaviour.

async def on_audio(track):
    async for frame in track:
        text = await transcribe(frame)
        reply = await agent.respond(text)
        await synthesize_and_send(reply)

Lessons learned

Keep the agent loop non-blocking; every millisecond shows up as awkward silence.
Stream tokens to TTS as they arrive instead of waiting for the full response.
Benchmark relentlessly — I wrote scripts to measure end-to-end latency.

More write-ups coming soon.