Google TTS Long form synthesis working sporadicall...

Cas1

Hi, I'm Cas, new to using Google Cloud, Google APIs, and Python (I know, a good idea to try and figure it all out by myself).

I had the idea to take Public Domain ebooks and use Google TTS to make them into Audiobooks. I have used various chatbots to help me set up Google's TTS API using a (surely clumsy) way: I am using python via my command line, and calling in a script (written in a .txt file on my local hard drive) to run the TTS API (code provided below). After a lot of trying, I did get it to work, taking input text from a locally saved .txt file, outputting into my Google Bucket (using the GCS_URI), and providing the audio config parameters.

However, the problem is that some of my texts (i.e. books) get synthesized no problem into one neat file and dropped into my Bucket. Yet, other books encounter problems. I get a bunch of 5-minute chunks uploaded into the bucket. I'm guessing this is how the TTS synthesizes longer texts (breaking it up, synthesizing them in parallel, and then stitching them back together). But there seem to be a random amount of chunks missing (they are numbered, so I can check, and can confirm by listening to the chunks), and they are not added back together into 1 audio file. I've also had 1 instance where it seemed like all chunks where there, but it just didn't glue them back together.

I've tried this over multiple instances, and the same .txt files seem to work or not work, independent of when I try. So it seems to me there must be a problem with the txt files, possibly a part of the text that the TTS API cannot synthesize and skips over? I have tried reading through the texts, or synthesizing parts of the text until I could point towards a specific paragraph in one of the 'bad texts' that was likely causing the problem, but I couldn't find anything weird in that text, except perhaps the long length of sentences in that part.

Any ideas of why this might happen, and how to fix it? Thanks so much!

My Python script (largely courtesy of ChatGPT):

import time
from google.cloud import texttospeech
from google.api_core.exceptions import GoogleAPICallError
from google.api_core import operations_v1 # Import OperationsClient

# Predefined variables
project_id = "audioquest-books"
input_text_file = r"C:\Users\caspe\TTS\Books\Frankenstein\Frankenstein.txt"
output_audio_file = "Frankenstein"
output_gcs_uri = f"gs://tts_audioquestbooks_bucket/{output_audio_file}"
voice_name = "en-US-Chirp3-HD-Charon"
speaking_rate = 1.0

def synthesize_long_audio():
"""
Reads text from a file, submits a long-form synthesis job, and monitors the status.
"""
# Read input text from file
with open(input_text_file, "r", encoding="utf-8") as file:
text_content = file.read()

client = texttospeech.TextToSpeechLongAudioSynthesizeClient()

input_text = texttospeech.SynthesisInput(text=text_content)

audio_config = texttospeech.AudioConfig(
audio_encoding=texttospeech.AudioEncoding.LINEAR16,
speaking_rate=speaking_rate,
)

voice = texttospeech.VoiceSelectionParams(
language_code="en-US", name=voice_name
)

parent = f"projects/{project_id}/locations/global"

   request = texttospeech.SynthesizeLongAudioRequest(
   parent=parent,
   input=input_text,
   audio_config=audio_config,
   voice=voice,
   output_gcs_uri=output_gcs_uri,
)

   try:
operation = client.synthesize_long_audio(request=request)
   print(f"Processing started. Operation name: {operation.operation.name}")
   return operation.operation.name
   except GoogleAPICallError as e:
   print(f"Error submitting synthesis request: {e}")
   return None

def check_job_status(operation_name, start_time):
"""
Periodically checks the status of the synthesis job using the correct OperationsClient.
"""
   client = texttospeech.TextToSpeechLongAudioSynthesizeClient()

   # Initialize OperationsClient
   operations_client = operations_v1.OperationsClient(client.transport.grpc_channel)

operation_path = operation_name

   while True:
   try:
   operation = operations_client.get_operation(name=operation_path) # Correct method call
   if operation.done:
   end_time = time.time()
   elapsed_time = end_time - start_time
   elapsed_minutes = elapsed_time / 60
   print("✅ Synthesis completed successfully! Check your GCS bucket.")
   print(f"⏱️ Total execution time: {elapsed_minutes:.2f} minutes")
   return
   else:
   print("⏳ Synthesis still in progress... Checking again in 30 seconds...")
   time.sleep(30) # Wait 30 seconds before checking again
   except GoogleAPICallError as e:
   print(f"⚠️ Error checking job status: {e}")
   return

if __name__ == "__main__":
   start_time = time.time() # Start time recording
   operation_name = synthesize_long_audio()
   if operation_name:
   check_job_status(operation_name, start_time)

marckevin

Hi @Cas1,

Welcome to Google Cloud Community!

I understand you're having an issue with Google Cloud TTS long-form synthesis, which is working sporadically on certain text files. You mentioned that other text files are working without issues, which might suggest that the text content itself could be the culprit. Although you've already checked specific paragraphs in one of the "bad texts”, it might help to perform a further character inspection and check your character encoding and formatting in the code to ensure they align with the actual text you're synthesizing. Long and complex sentences can sometimes cause issues with Text-to-Speech, try to clean your text file by simplifying your sentences or removing strange characters for a manageable result.

Another possible cause could be the Text-to-Speech Quotas and Limits. Make sure you're not exceeding your rate limit.

Additionally, you can refer to the documentation on how to create long-form audio, which might provide helpful guidance.

Was this helpful? If so, please accept this answer as “Solution”. If you need additional assistance, reply here within 2 business days and I’ll be happy to help.

Google TTS Long form synthesis working sporadically