I am using google cloud stream(AsyncStreamingRecognize) for speech to text conversion in my applications. I have gone through the below link to understand the structure of response returned by the apis :
I can have various scenarios where I can end up with various invalid scenarios and I do not understand what could be the responses. I can invalid scenarios like :
- User speaks in a different language than what is passed in configuration .
- User does not speak anything / no input
- Only noise gets passed / Data loss
Is there any parameter inside my response which can point to above scenarios ?
Solved! Go to Solution.
1.- If a user speaks a different language you can use language recognition in audio requests. Speech-to-Text supports alternative language codes for all speech recognition methods. Also, one good practice is to show a phrase that can be used or advice on what language you select to be recognized by Speech-to-Text.
2.- There are multiple ways that Speech to text can return an empty response. The source of the problem could be the RecognitionConfig or the audio itself.
3.-To avoid that only the noise gets passed and the data is lost you can pre-process the audio just as the best practices doc mentions.
1.- If a user speaks a different language you can use language recognition in audio requests. Speech-to-Text supports alternative language codes for all speech recognition methods. Also, one good practice is to show a phrase that can be used or advice on what language you select to be recognized by Speech-to-Text.
2.- There are multiple ways that Speech to text can return an empty response. The source of the problem could be the RecognitionConfig or the audio itself.
3.-To avoid that only the noise gets passed and the data is lost you can pre-process the audio just as the best practices doc mentions.
User | Count |
---|---|
2 | |
2 | |
1 | |
1 | |
1 |