Get hands-on experience with 20+ free Google Cloud products and $300 in free credit for new customers.

Bidirectional Google TTS API: Receiving Synthesized Audio While Sending Text

 

I am currently learning about Google Text To Speech "Synthesize speech with bidirectional streaming" using Google’s documentation.

I have encountered an issue that exceeds my current capabilities. Specifically, after sending the initial request using:

client.streaming_synthesize(requests=request_generator(text))

(I'm writing code in python)

I receive audio_content as expected. However, while this audio is being processed, I would like to send new text for synthesis and receive it as a new audio_content stream.

I am not sure on how to achieve this and I would greatly appreciate any guidance or examples on how to implement continuous bidirectional streaming.

Thank you for your time and assistance.

I will add the code that I am currently using

def request_generator(textrequest):
yield texttospeech.StreamingSynthesizeRequest(streaming_config=streaming_config)
yield texttospeech.StreamingSynthesizeRequest(input=texttospeech.StreamingSynthesisInput(text=textrequest))

def STTS(text):
global stream,p

responses = client.streaming_synthesize(requests=request_generator(text))

for response in responses:
    stream.write(response.audio_content)
0 1 780
1 REPLY 1

Hi @viktorb,

Welcome to Google Cloud Community!

I see you're using a generator function (request_generator) to send requests and handle the responses. To achieve continuous streaming, where new text can be sent during the synthesis process, you need to handle the requests and responses in a way that supports the dynamic nature of your use case. The challenge you're facing is that after sending the initial request, you want to send new text while the synthesis continues.

Here are some things to check and try in order to resolve the issue:

1. Concurrent Requests: You can use threading or asynchronous programming to handle continuous requests and responses. For example, after sending an initial StreamingSynthesizeRequest, you would want to continue sending requests for new text while also listening for responses. This will require managing asynchronous calls or multiple threads.

2. Handling Audio Streaming: In your current code, you’re writing audio to a stream (stream.write(response.audio_content)). This should continue while you're sending new text. The challenge is to make sure that the streaming request and response processes don't block each other.

Here’s an updated explanation when using threading to handle the continuous text-to-speech synthesis:

  • Threading: We're using Python’s threading module to run the send_texts function concurrently with the response handling. This ensures that you can keep sending new text for synthesis while also receiving audio content.
  • AudioStream: This is a simplified placeholder class. In your implementation, you'd replace it with the actual audio streaming logic.
  • Request Generator: The request_generator is yielding StreamingSynthesizeRequest for each text input. This is how the continuous text is handled.

If you'd prefer an asynchronous approach instead of using threading, you can use asyncio to manage the concurrent tasks. This would require making the requests asynchronously and using async for loops to handle the responses.

If the issue persists, you can reach out to Google Cloud Support. When reaching out, include detailed information and relevant screenshots of the errors you’ve encountered. This will assist them in diagnosing and resolving your issue more efficiently.

Was this helpful? If so, please accept this answer as “Solution”. If you need additional assistance, reply here within 2 business days and I’ll be happy to help.