Solved: Re: How does google cloud speech to text api deals...

vaibhav_jio · 09-25-2022 10:32 AM

I am using google cloud stream(AsyncStreamingRecognize) for speech to text conversion in my applications. I have gone through the below link to understand the structure of response returned by the apis :

StreamingRecognizeResponse

I can have various scenarios where I can end up with various invalid scenarios and I do not understand what could be the responses. I can invalid scenarios like :

- User speaks in a different language than what is passed in configuration .

- User does not speak anything / no input

- Only noise gets passed / Data loss

Is there any parameter inside my response which can point to above scenarios ?

josegutierrez

1.- If a user speaks a different language you can use language recognition in audio requests. Speech-to-Text supports alternative language codes for all speech recognition methods. Also, one good practice is to show a phrase that can be used or advice on what language you select to be recognized by Speech-to-Text.

2.- There are multiple ways that Speech to text can return an empty response. The source of the problem could be the RecognitionConfig or the audio itself.

3.-To avoid that only the noise gets passed and the data is lost you can pre-process the audio just as the best practices doc mentions.

View solution in original post

josegutierrez

1.- If a user speaks a different language you can use language recognition in audio requests. Speech-to-Text supports alternative language codes for all speech recognition methods. Also, one good practice is to show a phrase that can be used or advice on what language you select to be recognized by Speech-to-Text.

2.- There are multiple ways that Speech to text can return an empty response. The source of the problem could be the RecognitionConfig or the audio itself.

3.-To avoid that only the noise gets passed and the data is lost you can pre-process the audio just as the best practices doc mentions.

How does google cloud speech to text api deals with invalid inputs ?