Announcements
This site is in read only until July 22 as we migrate to a new platform; refer to this community post for more details.
Get hands-on experience with 20+ free Google Cloud products and $300 in free credit for new customers.

TTS mark timepoint gives wrong output for certain sentences.

Hi,

I am using Google TTS (with Mark tag) to get the timepoints for certain keywords. It works fine for most of the cases but It gives incorrect output in the following example.

E.g.: <speak>Why did you choose <mark name="1"/>Southern California University<mark name="2"/> and what factors influenced your choice?</speak>

If I use the generated time point information to split the audio from time 0 to mark 1 time, I only hear "Why did you choo" whereas I should be hearing "Why did you choose"

If I use some other university name, like "<speak>Why did you choose <mark name="1"/>Fresno State University<mark name="2"/> and what factors influenced your choice?</speak>", It works perfectly fine and I can hear "Why did you choose"

Another E.g. where text to speech gives wrong timepoints:

<speak>It's okay that you're not feeling too talkative today <mark name="1"/>Supriya<mark name="2"/>, but unfortunately, super-short replies don't work well in a real-life interview.</speak>

The timepoints result is:

[
{ timeSeconds: 0, markName: '2' },
{ timeSeconds: 2.627833366394043, markName: '1' }
]

Clearly, the timepoint for mark named '2' should be greater than 2.627833366394043

Thanks,

Supriya

0 1 550
1 REPLY 1

I was able to garner this result:

Poala_Tenorio_0-1694585713040.png

Try changing your SSML by using this format:

<speak>Why did you choose <mark name=\"1\"/>Southern California University<mark name=\"2\"/> and what factors influenced your choice?</speak>