Re: When will Studio Voices move out of Preview?

cookbookcoding · 07-11-2023 08:57 AM

I'm very excited to to use Studio Voices in the text-to-speech API. The voice sounds a lot more realistic than wavenet. If the input is too large, I get this error:

INVALID_ARGUMENT: Input size limit exceeded for Studio Voice. This is a temporary constraint while in Preview. Please try again with a shorter input or different voice type.

When will this "temporary constraint" be lifted?

lsolatorio

Hi @cookbookcoding,

I understand your concern and thank you for your appreciation towards Studio Voices API. While there is no official announcement yet, it is likely that the API will be out of preview sometime in the near future.

Google Cloud has expressed that they are working on improvements to the service, including the increase of the input size limit. In the meantime, here are some workarounds that you can try:

Split each of the audio files smaller/ less than 1 GB
Use a different speech-to-text service that offers larger input size limits
Wait until Studio Voices API is out of preview and the input size limit is increased

I hope this helps.

powderblock

Useful info. Thank you. The community looks forward to studio voices going out of preview mode!!

SolveForIT

I had the same question and would like to encourage Google to move the Studio voices out of preview sooner rather than later.

Is the limit on the text being sent or is it on the audio file being returned? If it's the latter, how are we to know the size until the exception is thrown? Are you proposing we handle this in the exception handler?

nicbcr

Are Studio Voices still in Preview? I'd like to use them on prod to generate very short audio files (less than 15 seconds). Is there a risk of the API returning an error as long as the audio file is (significantly) smaller than 1GB?

Exper1mental

I am also very excited for the full rollout of Studio voices. I plan on migrating my audio projects from Microsoft Azure to Google Cloud once Studio voices exit preview mode. In particular, I am hoping to use Studio voices with SSML and the long-form audio feature to improve the quality of my audiobook recordings for YouTube. Microsoft Azure's neural AI suffers from a limitation where it will insert pauses into long sentences every 500 characters, and they cannot be fully hidden because the tone and pitch of the speaker can shift before and after the pause.