I have been trying to use an ESP32 microcontroller (coded in C++) to interact with Google STT and then TTS. The controller seems to struggle handling large audio files as part of the JSON payload for the calls, either corrupting the Base64 data or even crashing the controller. I am therefore trying to avoid transferring the audio data in this way.
For the STT part, I was able to use the microcontroller to upload the binary audio file to a signed url on Google Cloud, and then call the API to grab the file from that URL and return the derived text to the microcontroller.
But I have been unable to figure out how to get the TTS part to write the audio file to a signed URL instead of trying to attach it in Base64 to the JSON payload. The API doesn't seem to have an outputConfig option to specify a gcsurl.
Any suggestions as to how to achieve this gratefully received.
Hi @astromikemerri,
Welcome to Google Cloud Community!
It looks like you are encountering difficulties with your ESP32 microcontroller when it comes to handling large audio files for Google Speech-to-Text and Text-to-Speech APIs. The issue might be that the microcontroller struggles with large audio files in the JSON payload, causing Base64 data corruption or crashes.
Here are potential ways that might help with your use case:
You may refer to the documentation below, which offers pertinent information on service accounts, IAM permissions, naming conventions, and troubleshooting:
Was this helpful? If so, please accept this answer as “Solution”. If you need additional assistance, reply here within 2 business days and I’ll be happy to help.