Get hands-on experience with 20+ free Google Cloud products and $300 in free credit for new customers.

SSML mark timepointing (v1beta1) suddenly only returns the timepoints until first period

I'm loving GCP's Text to Speech API! I have a live product (used by thousands of users every day) that relies on the TTS API v1beta1 (Method: text.synthesize  |  Cloud Text-to-Speech Documentation  |  Google Cloud).  I've gotten a lot of bug reports over the past few days and traced it to a change in behavior in the `text.synthesize` method (v1beta1), when `enableTimePointing: ["SSML_MARK"]`.

Before: `text.synthesize` with `enableTimePointing: ["SSML_MARK"]` would return an object in `timepoints` with a `markName=i` and `timeSeconds` for each `<mark name = i>` in the input SSML.

Now: `text.synthesize` with `enableTimePointing: ["SSML_MARK"]`'s response `timepoints` object only contains timepoints for a fraction of  all the `<mark name = i>` in the input SSML.

 

For example:

1) Use the API explorer at Method: text.synthesize  |  Cloud Text-to-Speech Documentation  |  Google Cloud

2) Set the request body to:

{
"enableTimePointing": [
"SSML_MARK"
],
"input": {
"ssml": "<speak><prosody><mark name=\"0\"/>I <mark name=\"1\"/>am <mark name=\"2\"/>my <mark name=\"3\"/>aunt's <mark name=\"4\"/>sister's <mark name=\"5\"/>daughter. <mark name=\"6\"/>He <mark name=\"7\"/>was <mark name=\"8\"/>sure <mark name=\"9\"/>the <mark name=\"10\"/>Devil <mark name=\"11\"/>created <mark name=\"12\"/>red <mark name=\"13\"/>sparkly <mark name=\"14\"/>glitter.</prosody></speak>"
},
"voice": {
"name": "en-US-Standard-A",
"languageCode": "en-US"
},
"audioConfig": {
"audioEncoding": "MP3"
}
}

3) See how the `timepoints` object in the response only has timepoints for mark names "0" through "5". The actual number of mark tags in the input SSML is fifteen (largest mark tag being "14"). 

 

4. In the input SSML, remove the period at the end of "<mark name=\"5\"/>daughter.", execute, and notice how the `timepoints` object in the response now has timepoints for all the mark names "0" through "14".

Can someone confirm that this is unexpected behavior? And if it's being worked on, roughly when can we expect a fix? Thank you!

Solved Solved
1 3 1,123
1 ACCEPTED SOLUTION

I used your request body and it was able to get to 14. Are you still getting "fractions" of timepoints now?

Screenshot 2023-03-08 2.42.23 AM.png

View solution in original post

3 REPLIES 3

I used your request body and it was able to get to 14. Are you still getting "fractions" of timepoints now?

Screenshot 2023-03-08 2.42.23 AM.png

Thank you for responding. I do see the expected behavior now with the response returning all the timepoints marked with <mark>. This seems to have been an intermittent issue with the Cloud Text-to-speech API.

Just curious, is there a timeline for when the v1beta1 API will graduate out of beta, so we can expect a more stable service?

Thank you again for your help!

I'm having the same problem, suddenly getting no timepoints at all for <mark name='1' />, though I was getting them before:

import google.auth.exceptions
import google.cloud.texttospeech_v1beta1 as tts
import google.cloud.texttospeech_v1beta1.types
import simpleaudio
from google.auth.transport.requests import Request
from google_auth_oauthlib.flow import InstalledAppFlow

client = tts.TextToSpeechClient(credentials=_get_credentials())
text_input = tts.SynthesisInput(
ssml="Now try that again: start with the root<mark name='1'/><break time='0.5s'/>, and then go to the branches.")
voice_params = tts.VoiceSelectionParams(
language_code="en-US",
name="en-US-Wavenet-F"
)
audio_config = tts.AudioConfig(
audio_encoding=tts.AudioEncoding.LINEAR16,
pitch=2,
speaking_rate=1.1
)

response = client.synthesize_speech(
request=tts.types.SynthesizeSpeechRequest(
input=text_input,
voice=voice_params,
audio_config=audio_config,
enable_time_pointing=[tts.types.SynthesizeSpeechRequest.TimepointType.SSML_MARK]
)
)
print(f"response.timepoints={response.timepoints}")
timepoints = list((tp.mark_name, tp.time_seconds) for tp in response.timepoints)
print(f"timepoints={timepoints}")
Output:


```
response.timepoints=[]
timepoints=[]

```