Get hands-on experience with 20+ free Google Cloud products and $300 in free credit for new customers.

How does google cloud speech to text api deals with invalid inputs ?

I am using google cloud stream(AsyncStreamingRecognize) for speech to text conversion in my applications. I have gone through the below link to understand the structure of response returned by the apis :

StreamingRecognizeResponse  

I can have various scenarios where I can end up with various invalid scenarios and I do not understand what could be the responses. I can invalid scenarios like :

- User speaks in a different language than what is passed in configuration .

- User does not speak anything / no input

- Only noise gets passed / Data loss

Is there any parameter inside my response which can point to above scenarios ?

Solved Solved
0 1 1,119
1 ACCEPTED SOLUTION

1.- If a user speaks a different language you can use language recognition in audio requests. Speech-to-Text supports alternative language codes for all speech recognition methods. Also, one good practice is to show a phrase that can be used or advice on what language you select to be recognized by Speech-to-Text. 

2.- There are multiple ways that Speech to text can return an empty response. The source of the problem could be the RecognitionConfig or the audio itself.

3.-To avoid that only the noise gets passed and the data is lost you can pre-process the audio just as the best practices doc mentions.

View solution in original post

1 REPLY 1

1.- If a user speaks a different language you can use language recognition in audio requests. Speech-to-Text supports alternative language codes for all speech recognition methods. Also, one good practice is to show a phrase that can be used or advice on what language you select to be recognized by Speech-to-Text. 

2.- There are multiple ways that Speech to text can return an empty response. The source of the problem could be the RecognitionConfig or the audio itself.

3.-To avoid that only the noise gets passed and the data is lost you can pre-process the audio just as the best practices doc mentions.