Hello there!
I'm using the speech-to-text API for transcribe Amharic language in real time. Currently I'm using the V1 api version.
The quality of transcription is quite bad, with a very high Word Error Rate (above 60%)
For comparison I have used the web speech API on Chrome and the quality is definitely better but also the real-time factor is very nice!
Just to give you a real example of the results here are an example of a translated transcription from a short speech:
### Google Speech API V1P1:
**When a woman speaks, the traveler says, "How are you?" "6 women, 6 movies, 6 feet."**
### Web Speech API on Chrome: https://www.google.com/intl/it/chrome/demos/speech.html
**As we saw yesterday, an object has 6 faces when placed in any position. When we say six faces, what do we mean by a face? What are those faces called? What is the first face? The opposite is called the backface.**
### **Reference translated transcription:**
**As we saw yesterday, an object has 6 faces when placed in any position. When we say 6 faces, what do we mean by a face? What are those faces called? What is the first one? We call it the front face. If there is a front face, we call it the opposite of it, which is called the back face.**
So my question is: which model is used for the Chrome Web Speech API? It is possible to be used with the Google Cloud Speech API?
Hi @nefasto,
Welcome to Google Cloud Community!
It looks like you are dealing with poor transcription quality of Amharic speech when using the Google Cloud Speech-to-Text API (v1).
Here are potential ways that might help with your use case:
Answer to your primary question:
You may refer to the documentation below, which offers information on Google Cloud’s Speech-to-Text and APIs and references:
Was this helpful? If so, please accept this answer as “Solution”. If you need additional assistance, reply here within 2 business days and I’ll be happy to help.
User | Count |
---|---|
2 | |
2 | |
1 | |
1 | |
1 |