So for example at my work we are using WEBM_OPUS encoding, which from what I understand, specificies the sample rate in audio stream metadata itself? Yet from here: https://cloud.google.com/speech-to-text/docs/basics#sample-rates it says the field is only optional for FLAC or WAV formats.
And indeed, when I try the GSTT API with some example code (Streaming Recognition and a WEBM_OPUS encoded at 48000 sample rate), the GSTT actually accepts sample rates other than 48000 - and depending on the recognition model, produces different results depending on the sample rate selected!
Did you try to put 8000 hz within your data that you are sending to speech-to-text? Answering your question those two files are optionals since they are the most commonly used.
What "two files"? ...
I was refering to WAV and FLAC audio files
Yes, it's optional for WAV and FLAC - but my question was about using WEBM_OPUS format...
It's because google uses auto defined sample rates that are the next ones. Sample rate must be one of 8000 Hz, 12000 Hz, 16000 Hz, 24000 Hz, or 48000 Hz, also you can see this documentation here.
User | Count |
---|---|
2 | |
1 | |
1 | |
1 | |
1 |