Technically, Speech-to-Text v2 is better than v1. It has an enhanced accuracy across diverse accents, varying acoustic settings, and a spectrum of microphones, even in the presence of background noises. However, v1 can still be used and has not been deprecated.
As quoted from the documentation:
The difference between the v1 and v2 versions of the Speech-to-Text API in the definition ofRecognitionConfig
message is the addition of theAutoDetectDecodingConfig
message, which automatically detects the audio specifications.
Thanks for the reply, so the function of V2 is the same as that of V1,and the performance of v2 is better.Can i understand like this?
So what we are looking for beyond anything else is accurate word level timestamps, does V2 have any major improvements in respect to world level timestamps? We have been looking at hosting wisperx at GCP, but I would prefer save a lot of time and go with an out of the box api from Google if it is competitive in respect of word level timestamps.
I just want to add additional information, you can also check release notes
There does not seem to be any mention whatsoever of v2 in the release notes, so 👎
User | Count |
---|---|
2 | |
1 | |
1 | |
1 | |
1 |