Hello everyone,
Currently I having problem when using Speech-to-Text API v2 on homophone voices. I tried to recognize voice "u/you". The result always shows "you" no matter how much boost given to phrase "u".
This is the request payload without adaptation:
{
"config": {
"model": "short",
"languageCodes": [
"en-US"
],
"autoDecodingConfig": {},
"features": {
"enableWordConfidence": true,
"maxAlternatives": 10
}
},
"content": "<base64_audio>"
}
And this is the payload with adaptation:
{
"config": {
"model": "short",
"languageCodes": [
"en-US"
],
"autoDecodingConfig": {},
"adaptation": {
"phraseSets": [
{
"inlinePhraseSet": {
"phrases": [
{
"value": "u",
"boost": 20
}
]
}
}
]
},
"features": {
"enableWordConfidence": true,
"maxAlternatives": 10
}
},
"content": "<base64_audio>"
}
Both request got the same response, speech is recognized as "you" even though I gave boost 20 for "u".
{
"metadata": {
"totalBilledDuration": "3s"
},
"results": [
{
"alternatives": [
{
"transcript": "you",
"confidence": 0.66239685,
"words": [
{
"word": "you",
"confidence": 0.66239685
}
]
}
],
"resultEndOffset": "2.790s",
"languageCode": "en-us"
}
]
}
Looks like putting phraseSets has no effect on recognition in this case. Is there a way to boost bias toward "u"? Or perhaps, is there any mistake with request payload?
User | Count |
---|---|
2 | |
1 | |
1 | |
1 | |
1 |