So for example at my work we are using WEBM_OPUS encoding, which from what I understand, specificies the sample rate in audio stream metadata itself? Yet from here: https://cloud.google.com/speech-to-text/docs/basics#sample-rates it says the field is only optional for FLAC or WAV formats.
And indeed, when I try the GSTT API with some example code (Streaming Recognition and a WEBM_OPUS encoded at 48000 sample rate), the GSTT actually accepts sample rates other than 48000 - and depending on the recognition model, produces different results depending on the sample rate selected!
User | Count |
---|---|
2 | |
2 | |
1 | |
1 | |
1 |