Solved: SSML mark timepointing (v1beta1) suddenly only ret...

troyshu · 03-06-2023 11:02 AM

I'm loving GCP's Text to Speech API! I have a live product (used by thousands of users every day) that relies on the TTS API v1beta1 (Method: text.synthesize | Cloud Text-to-Speech Documentation | Google Cloud). I've gotten a lot of bug reports over the past few days and traced it to a change in behavior in the `text.synthesize` method (v1beta1), when `enableTimePointing: ["SSML_MARK"]`.

Before: `text.synthesize` with `enableTimePointing: ["SSML_MARK"]` would return an object in `timepoints` with a `markName=i` and `timeSeconds` for each `` in the input SSML.

Now: `text.synthesize` with `enableTimePointing: ["SSML_MARK"]`'s response `timepoints` object only contains timepoints for a fraction of all the `` in the input SSML.

For example:

1) Use the API explorer at Method: text.synthesize | Cloud Text-to-Speech Documentation | Google Cloud

2) Set the request body to:

{
"enableTimePointing": [
"SSML_MARK"
],
"input": {
"ssml": "<speak><prosody>I am my aunt's sister's daughter. He was sure the Devil created red sparkly glitter.</prosody></speak>"
},
"voice": {
"name": "en-US-Standard-A",
"languageCode": "en-US"
},
"audioConfig": {
"audioEncoding": "MP3"
}
}

3) See how the `timepoints` object in the response only has timepoints for mark names "0" through "5". The actual number of mark tags in the input SSML is fifteen (largest mark tag being "14").

4. In the input SSML, remove the period at the end of "daughter.", execute, and notice how the `timepoints` object in the response now has timepoints for all the mark names "0" through "14".

Can someone confirm that this is unexpected behavior? And if it's being worked on, roughly when can we expect a fix? Thank you!

Joevanie

I used your request body and it was able to get to 14. Are you still getting "fractions" of timepoints now?

View solution in original post

Joevanie

I used your request body and it was able to get to 14. Are you still getting "fractions" of timepoints now?

troyshu

Thank you for responding. I do see the expected behavior now with the response returning all the timepoints marked with . This seems to have been an intermittent issue with the Cloud Text-to-speech API.

Just curious, is there a timeline for when the v1beta1 API will graduate out of beta, so we can expect a more stable service?

Thank you again for your help!

voxoid

I'm having the same problem, suddenly getting no timepoints at all for , though I was getting them before:

import google.auth.exceptions
import google.cloud.texttospeech_v1beta1 as tts
import google.cloud.texttospeech_v1beta1.types
import simpleaudio
from google.auth.transport.requests import Request
from google_auth_oauthlib.flow import InstalledAppFlow

client = tts.TextToSpeechClient(credentials=_get_credentials())
text_input = tts.SynthesisInput(
ssml="Now try that again: start with the root<break time='0.5s'/>, and then go to the branches.")
voice_params = tts.VoiceSelectionParams(
language_code="en-US",
name="en-US-Wavenet-F"
)
audio_config = tts.AudioConfig(
audio_encoding=tts.AudioEncoding.LINEAR16,
pitch=2,
speaking_rate=1.1
)

response = client.synthesize_speech(
request=tts.types.SynthesizeSpeechRequest(
input=text_input,
voice=voice_params,
audio_config=audio_config,
enable_time_pointing=[tts.types.SynthesizeSpeechRequest.TimepointType.SSML_MARK]
)
)
print(f"response.timepoints={response.timepoints}")
timepoints = list((tp.mark_name, tp.time_seconds) for tp in response.timepoints)
print(f"timepoints={timepoints}")
Output:

```
response.timepoints=[]
timepoints=[]

```

SSML mark timepointing (v1beta1) suddenly only returns the timepoints until first period