Hey All,
I've been trying to experiment with the Gemini 2.0 Live Api connecting to a phone line, and I'm sort of surprised that Google chose the output audio format that they did (Raw 16 bit PCM audio at 24kHz little-endian). Twilio only supports 8-bit Mulaw and Vonage only supports 16 bit PCM at 16kHz, both of which require conversion/resampling. I've gotten stuff working...but we'll just say it's not ready for production. Vonage resampling is using a CLI tool called ffmpeg but it's spotty/slow for the realtime conversion and the Twilio version required a manual pcm -> mulaw conversion
Has anybody else gotten this working nicely? I've found this demo using a service called Daily which sets up a webRTC room and has twilio connect to that via SIP https://github.com/kkacquah/gemini-multimodal-example/blob/main/bot_runner.py
I have this working with OpenAI Realtime API + Twilio since OpenAI worked with Twilio on the launch and made sure there was compatibility.
User | Count |
---|---|
2 | |
1 | |
1 | |
1 | |
1 |