Hello,
I'm trying to use the text-to-speech API to generate a multi speaker audio file. When I use the older wavenet voices it works just fine. But when I replace the speakers with the newer Neural2 models I get a 400 error saying:
InvalidArgument: 400 Request contains an invalid argument.
How can I get this to work for multiple speakers using the newer models?
Here is a sample:
<speak>
<voice name="en-US-Neural2-J">
<p>Hello, everyone! Welcome to today's podcast. I'm your host A, and joining me is my co-host, B.</p>
</voice>
<voice name="en-US-Neural2-I">
<p>Hi, A! It's great to be here. Today, we're going to discuss an interesting topic that's been making headlines recently.</p>
</voice>
<voice name="en-US-Neural2-J">
<p>That's right, B. We're talking about the collapse of Silicon Valley Bank, which was triggered by a massive online bank run.</p>
</voice>
<voice name="en-US-Neural2-I">
<p>Indeed, A. This bank run was unlike any other we've seen before, as it was primarily fueled by social media platforms and private chat groups.</p>
</voice>
</speak>
This sample works when I replace en-US-Neural2 speakers with en-US-Wavenet.
Possible workarounds would be, converting the text into <500 bytes or sending the request into smaller pieces. This might be a related case