Calling speech-to-text suddenly giving me bad tran... - Page 2

timscott460 · 12-01-2022 06:36 PM

For several months I've been using S2T to transcribe mp3 audio files (1 - 40 minutes long). It's given great results and since I'm using the gcloud CLI I can script batches of submissions.

Today I submitted 10 jobs totalling 40 minutes and the results are all junk. The JSON transcript files which are normally 50-300K in size are a few hundred bytes long and just consist of a handful of individual random words. One of the files I had run on Nov-11 and it gave a good result (230K JSON file of basically correct transcriptions.)

To test this, I ran the same file through the "Create Transcription" GUI and it gave exactly the same correct result.

I modified my gcloud call (which was "gcloud beta ml speech ...." to remove the "beta" option, and the submission failed on encoding=mp3. I then added back in the "alpha" option after gcloud, this accepted the mp3 encoding but again returned the defective JSON transcrption file.

It would really be a massive inconvenience to have to use the GUI to submit jobs one at a time.

I went to the S2T "What's new" page and it didn't make any reference that seemed to explain this. (Incidentally there is a bug there where if you click on the "Speech-To-Text v1" drop down and choose "Speech-to-Text" under Public Features, you actually end up at page titled "Speed-to-Text V2" with "Speech-To-Text On-Prem" above it, and no information on either one. )

Any suggestions will be greatly appreciated!

Calling speech-to-text suddenly giving me bad transcripts ( starting 2022-Dec-1)