TTS mark timepoint gives wrong output for certain ...

ssingla · 09-06-2023 02:50 AM

Hi,

I am using Google TTS (with Mark tag) to get the timepoints for certain keywords. It works fine for most of the cases but It gives incorrect output in the following example.

E.g.: <speak>Why did you choose Southern California University and what factors influenced your choice?</speak>

If I use the generated time point information to split the audio from time 0 to mark 1 time, I only hear "Why did you choo" whereas I should be hearing "Why did you choose"

If I use some other university name, like "<speak>Why did you choose Fresno State University and what factors influenced your choice?</speak>", It works perfectly fine and I can hear "Why did you choose"

Another E.g. where text to speech gives wrong timepoints:

<speak>It's okay that you're not feeling too talkative today Supriya, but unfortunately, super-short replies don't work well in a real-life interview.</speak>

The timepoints result is:

[
{ timeSeconds: 0, markName: '2' },
{ timeSeconds: 2.627833366394043, markName: '1' }
]

Clearly, the timepoint for mark named '2' should be greater than 2.627833366394043

Thanks,

Supriya

Poala_Tenorio

I was able to garner this result:

Try changing your SSML by using this format:

<speak>Why did you choose Southern California University and what factors influenced your choice?</speak>

TTS mark timepoint gives wrong output for certain sentences.