Get hands-on experience with 20+ free Google Cloud products and $300 in free credit for new customers.

Text-to-Speech seems to ignore SSML input

Greetings, all! Since getting started with the TTS service, I have had good success with submitting JSON files that specify simple text input. I am using the instructions for PowerShell as described here:

https://cloud.google.com/text-to-speech/docs/quickstart-protocol#windows

When submitting JSON files that specify SSML input, however, it seems that some of the SSML elements are being ignored by the speech synthesizer. I'd like to use the <prosody> and <emphasis> elements, but the output isn't reflecting the values I specified. Here's an example:

{
  "input":{
    "ssml":"<speak><prosody rate=\"fast\" pitch=\"low\"><emphasis level=\"strong\">Guten Tag!</emphasis> Sie sind mit dem Anrufbeantworter verbunden. Gerne können Sie uns nach dem Signal-Ton eine Nachricht hinterlassen. <emphasis level=\"strong\">Vielen Dank für Ihren Anruf!</emphasis></prosody></speak>"
  },
  "voice":{
    "languageCode":"de-DE",
    "name":"de-DE-Wavenet-A",
    "ssmlGender":"FEMALE"
  },
  "audioConfig":{
    "audioEncoding":"MP3"
  }
}

It doesn't seem to matter how I specify the rate and pitch attributes—the output comes back with no alteration.

Thank you for taking the time to read this. If you have information or suggestions, please reply with your ideas!

0 1 1,738
1 REPLY 1

When submitting JSON files with SSML input, you can use the Prosody tag in the following format[1][2].

Example:

```

<prosody rate="slow" pitch="-2st">Can you hear me now?</prosody>

```

[1] https://cloud.google.com/text-to-speech/docs/ssml#prosody

[2] https://www.w3.org/TR/speech-synthesis11/#:~:text=3.2.4%20prosody%20Element