About “Cloud Speech-to-Text”

sidleo · 08-28-2023 11:49 PM

Hope someone can help me understand what is the difference between the APIs of v1 and v2 of the “Cloud Speech-to-Text” product, are their functions exactly the same? Just the price is different?

Poala_Tenorio

Technically, Speech-to-Text v2 is better than v1. It has an enhanced accuracy across diverse accents, varying acoustic settings, and a spectrum of microphones, even in the presence of background noises. However, v1 can still be used and has not been deprecated.

As quoted from the documentation:

The difference between the v1 and v2 versions of the Speech-to-Text API in the definition of RecognitionConfig message is the addition of the AutoDetectDecodingConfig message, which automatically detects the audio specifications.

sidleo

Thanks for the reply, so the function of V2 is the same as that of V1,and the performance of v2 is better.Can i understand like this?

hlevring

So what we are looking for beyond anything else is accurate word level timestamps, does V2 have any major improvements in respect to world level timestamps? We have been looking at hosting wisperx at GCP, but I would prefer save a lot of time and go with an out of the box api from Google if it is competitive in respect of word level timestamps.

kiikoo

I just want to add additional information, you can also check release notes

loct

There does not seem to be any mention whatsoever of v2 in the release notes, so 👎