Example:
text= 'Hello, World. I can speak any language. I would like to help you.'
Hello, World: starts 00:00 ends 00:03
I can speak any language: starts 00:04 ends 00:09
I would like to help you: starts 00:10 ends 00:13
Is there something for that in python? here is the main code:
"""Synthesizes speech from the input string of text or ssml. Note: ssml must be well-formed according to: https://www.w3.org/TR/speech-synthesis/ """ from google.cloud import texttospeech # Instantiates a client client = texttospeech.TextToSpeechClient() # Set the text input to be synthesized synthesis_input = texttospeech.types.SynthesisInput(text="Hello, World. I can speak any language. I would like to help you.") # Build the voice request, select the language code ("en-US") and the ssml # voice gender ("neutral") voice = texttospeech.types.VoiceSelectionParams( language_code="en-US", ssml_gender=texttospeech.enums.SsmlVoiceGender.NEUTRAL ) texttospeech_v1beta1.types.cloud_tts_pb2 # Select the type of audio file you want returned audio_config = texttospeech.types.AudioConfig( audio_encoding=texttospeech.enums.AudioEncoding.MP3 ) # Perform the text-to-speech request on the text input with the selected # voice parameters and audio file type response = client.synthesize_speech( input_=synthesis_input, voice=voice, audio_config=audio_config ) # The response's audio_content is binary. with open("./output.mp3", "wb") as out: # Write the response to the output file. out.write(response.audio_content) print('Audio content written to file "output.mp3"')