Get hands-on experience with 20+ free Google Cloud products and $300 in free credit for new customers.

input as text to llm and need response as audio stream from Gemini

I want to feed text to the LLM say Gemini Pro, and the response needs to be streamed to the FrondEnd or say mobile app,

below is the approach which is made working right now,

================
We convert the speech to text on the device, send it via the post request to the server, where we call the LLM, get the text response, and we make a temporary audio file which contains the text to speech(by using googleTextToSpeech), and then we send it back to the device which is indirectly another audio file,
================

After reading the above thing, one can be done, is we sending the text and getting the response in audio which can directly be stream,
does Gemini does that, I have tried to check but it didn't helped...

and the textToSpeech is robotic, which doesn't add a lot of value to the user

0 REPLIES 0