Re: Sending output from Text-to-Speech API directl...

astromikemerri · 01-01-2025 11:50 AM

I have been trying to use an ESP32 microcontroller (coded in C++) to interact with Google STT and then TTS. The controller seems to struggle handling large audio files as part of the JSON payload for the calls, either corrupting the Base64 data or even crashing the controller. I am therefore trying to avoid transferring the audio data in this way.

For the STT part, I was able to use the microcontroller to upload the binary audio file to a signed url on Google Cloud, and then call the API to grab the file from that URL and return the derived text to the microcontroller.

But I have been unable to figure out how to get the TTS part to write the audio file to a signed URL instead of trying to attach it in Base64 to the JSON payload. The API doesn't seem to have an outputConfig option to specify a gcsurl.

Any suggestions as to how to achieve this gratefully received.

MarvinLlamas

Hi @astromikemerri,

Welcome to Google Cloud Community!

It looks like you are encountering difficulties with your ESP32 microcontroller when it comes to handling large audio files for Google Speech-to-Text and Text-to-Speech APIs. The issue might be that the microcontroller struggles with large audio files in the JSON payload, causing Base64 data corruption or crashes.

Here are potential ways that might help with your use case:

Post-Processing with Cloud Functions/Cloud Run: You may want to use a serverless intermediary to connect your ESP32 and TTS API. The ESP32 sends your text to a serverless function, which processes the audio and returns a signed URL for your ESP32 to download. This method simplifies data handling, ensures scalability, and separates concerns, though it adds complexity and slight costs.
Permissions: Make sure you grant your serverless function (or proxy server) the necessary permissions to access the Google Cloud APIs.
File Naming: You may want to use unique file names for each of your audio files to prevent different responses from overwriting each other. Consider incorporating the current timestamp or a random identifier as part of your file name.
Error Handling: You may want to add error handling to both your ESP32 and your server-side code to address potential failures in the process.

You may refer to the documentation below, which offers pertinent information on service accounts, IAM permissions, naming conventions, and troubleshooting:

Was this helpful? If so, please accept this answer as “Solution”. If you need additional assistance, reply here within 2 business days and I’ll be happy to help.

Sending output from Text-to-Speech API directly to Cloud Storage