Integrate Gemini 2.0 Live Api with Phone Provider ... - Page 2

andrewbull · 12-27-2024 07:03 PM

Hey All,

I've been trying to experiment with the Gemini 2.0 Live Api connecting to a phone line, and I'm sort of surprised that Google chose the output audio format that they did (Raw 16 bit PCM audio at 24kHz little-endian). Twilio only supports 8-bit Mulaw and Vonage only supports 16 bit PCM at 16kHz, both of which require conversion/resampling. I've gotten stuff working...but we'll just say it's not ready for production. Vonage resampling is using a CLI tool called ffmpeg but it's spotty/slow for the realtime conversion and the Twilio version required a manual pcm -> mulaw conversion

Has anybody else gotten this working nicely? I've found this demo using a service called Daily which sets up a webRTC room and has twilio connect to that via SIP https://github.com/kkacquah/gemini-multimodal-example/blob/main/bot_runner.py

I have this working with OpenAI Realtime API + Twilio since OpenAI worked with Twilio on the launch and made sure there was compatibility.

Integrate Gemini 2.0 Live Api with Phone Provider (Vonage, Twilio, etc)