I'm loving GCP's Text to Speech API! I have a live product (used by thousands of users every day) that relies on the TTS API v1beta1 (Method: text.synthesize | Cloud Text-to-Speech Documentation | Google Cloud). I've gotten a lot of bug reports over the past few days and traced it to a change in behavior in the `text.synthesize` method (v1beta1), when `enableTimePointing: ["SSML_MARK"]`.
Before: `text.synthesize` with `enableTimePointing: ["SSML_MARK"]` would return an object in `timepoints` with a `markName=i` and `timeSeconds` for each `<mark name = i>` in the input SSML.
Now: `text.synthesize` with `enableTimePointing: ["SSML_MARK"]`'s response `timepoints` object only contains timepoints for a fraction of all the `<mark name = i>` in the input SSML.
For example:
1) Use the API explorer at Method: text.synthesize | Cloud Text-to-Speech Documentation | Google Cloud
2) Set the request body to:
{
"enableTimePointing": [
"SSML_MARK"
],
"input": {
"ssml": "<speak><prosody><mark name=\"0\"/>I <mark name=\"1\"/>am <mark name=\"2\"/>my <mark name=\"3\"/>aunt's <mark name=\"4\"/>sister's <mark name=\"5\"/>daughter. <mark name=\"6\"/>He <mark name=\"7\"/>was <mark name=\"8\"/>sure <mark name=\"9\"/>the <mark name=\"10\"/>Devil <mark name=\"11\"/>created <mark name=\"12\"/>red <mark name=\"13\"/>sparkly <mark name=\"14\"/>glitter.</prosody></speak>"
},
"voice": {
"name": "en-US-Standard-A",
"languageCode": "en-US"
},
"audioConfig": {
"audioEncoding": "MP3"
}
}
3) See how the `timepoints` object in the response only has timepoints for mark names "0" through "5". The actual number of mark tags in the input SSML is fifteen (largest mark tag being "14").
4. In the input SSML, remove the period at the end of "<mark name=\"5\"/>daughter.", execute, and notice how the `timepoints` object in the response now has timepoints for all the mark names "0" through "14".
Can someone confirm that this is unexpected behavior? And if it's being worked on, roughly when can we expect a fix? Thank you!
Solved! Go to Solution.
I used your request body and it was able to get to 14. Are you still getting "fractions" of timepoints now?
User | Count |
---|---|
2 | |
2 | |
1 | |
1 | |
1 |