Hello everyone,
Currently I having problem when using Speech-to-Text API v2 on homophone voices. I tried to recognize voice "u/you". The result always shows "you" no matter how much boost given to phrase "u".
This is the request payload without adaptation:
{
"config": {
"model": "short",
"languageCodes": [
"en-US"
],
"autoDecodingConfig": {},
"features": {
"enableWordConfidence": true,
"maxAlternatives": 10
}
},
"content": "<base64_audio>"
}
And this is the payload with adaptation:
{
"config": {
"model": "short",
"languageCodes": [
"en-US"
],
"autoDecodingConfig": {},
"adaptation": {
"phraseSets": [
{
"inlinePhraseSet": {
"phrases": [
{
"value": "u",
"boost": 20
}
]
}
}
]
},
"features": {
"enableWordConfidence": true,
"maxAlternatives": 10
}
},
"content": "<base64_audio>"
}
Both request got the same response, speech is recognized as "you" even though I gave boost 20 for "u".
{
"metadata": {
"totalBilledDuration": "3s"
},
"results": [
{
"alternatives": [
{
"transcript": "you",
"confidence": 0.66239685,
"words": [
{
"word": "you",
"confidence": 0.66239685
}
]
}
],
"resultEndOffset": "2.790s",
"languageCode": "en-us"
}
]
}
Looks like putting phraseSets has no effect on recognition in this case. Is there a way to boost bias toward "u"? Or perhaps, is there any mistake with request payload?
I'm curious if you ever got this to work. According to Gemini chat's answer:
2. Limited Functionality or Bugs:
There are reports of inconsistent behavior or lack of improvement with inline phrase sets in v2. While the code structure seems correct, the functionality might not be fully functional yet.
Here are some resources for further exploration:
Official Documentation on Adaptation: https://cloud.google.com/speech-to-text/docs/adaptation
Community Discussion on Inline Phrase Sets not Working: https://cloud.google.com/php/docs/reference/cloud-speech/latest/V2.PhraseSet.Phrase
So it really looks like they released v2 with this feature not working which basically makes it useless. Until it catches up to v1p1beta1 it's impossible to make the switch.
User | Count |
---|---|
2 | |
1 | |
1 | |
1 | |
1 |