Speech-to-Text API v2 hints phrase has no effect o... - Page 2

Polikuy · 03-15-2024 12:28 AM

Hello everyone,

Currently I having problem when using Speech-to-Text API v2 on homophone voices. I tried to recognize voice "u/you". The result always shows "you" no matter how much boost given to phrase "u".

This is the request payload without adaptation:

{
    "config": {
        "model": "short",
        "languageCodes": [
            "en-US"
        ],
        "autoDecodingConfig": {},
        "features": {
            "enableWordConfidence": true,
            "maxAlternatives": 10
        }
    },
    "content": "<base64_audio>"
}

And this is the payload with adaptation:

{
    "config": {
        "model": "short",
        "languageCodes": [
            "en-US"
        ],
        "autoDecodingConfig": {},
        "adaptation": {
            "phraseSets": [
                {
                    "inlinePhraseSet": {
                        "phrases": [
                            {
                                "value": "u",
                                "boost": 20
                            }
                        ]
                    }
                }
            ]
        },
        "features": {
            "enableWordConfidence": true,
            "maxAlternatives": 10
        }
    },
    "content": "<base64_audio>"
}

Both request got the same response, speech is recognized as "you" even though I gave boost 20 for "u".

{
    "metadata": {
        "totalBilledDuration": "3s"
    },
    "results": [
        {
            "alternatives": [
                {
                    "transcript": "you",
                    "confidence": 0.66239685,
                    "words": [
                        {
                            "word": "you",
                            "confidence": 0.66239685
                        }
                    ]
                }
            ],
            "resultEndOffset": "2.790s",
            "languageCode": "en-us"
        }
    ]
}

Looks like putting phraseSets has no effect on recognition in this case. Is there a way to boost bias toward "u"? Or perhaps, is there any mistake with request payload?

Speech-to-Text API v2 hints phrase has no effect on homophone