Turn off text normalization in Speech to Text API

agupta54 · 04-06-2023 02:20 AM

I am using Speech to Text API to transcribe audio files. I see that the output contains a lot of characters which might be possibly occurring due to inverse text normalization somewhere. Symbols like $ for dollars and other currency symbols and also numbers written in numeric format rather than words. Is there some option in RecognitionConfig which gives me verbatim output in words instead of numbers and symbols?

I see there's a "transcriptNormalization" option in the config but then I have to provide my own rules.

Aris_O

Hi @agupta54,

Welcome back to Google Cloud Community.

A Speech-to-Text API synchronous recognition request is the simplest method for performing recognition on speech audio data. Speech-to-Text can process up to 1 minute of speech audio data sent in a synchronous request. After Speech-to-Text processes and recognizes all of the audio, it returns a response.

A synchronous request is blocking, meaning that Speech-to-Text must return a response before processing the next request. Speech-to-Text typically processes audio faster than realtime, processing 30 seconds of audio in 15 seconds on average. In cases of poor audio quality, your recognition request can take significantly longer.

Here are some references that might help you.
https://cloud.google.com/speech-to-text/docs/speech-to-text-requests?_ga=2.109059397.-1392753435.167...

https://cloud.google.com/speech-to-text/docs/reference/rest/v1p1beta1/RecognitionConfig?_ga=2.221781...