I’m using Google Cloud Speech-to-Text V2 within an Android app.
I'm encountering an issue where 'SPEECH_ACTIVITY_END' is not triggered when using the 'ja-JP' language code, but it works as expected with 'en-US'
Here is the relevant code snippet for the streaming recognition features:
protected StreamingRecognitionFeatures clientFeaturesOf(GoogleV2AISpeechConfig config) { return config.getGoogleV2().getFeatures().apply( StreamingRecognitionFeatures.newBuilder() .setInterimResults(true) .setEnableVoiceActivityEvents(true) .setVoiceActivityTimeout( StreamingRecognitionFeatures.VoiceActivityTimeout.newBuilder() .setSpeechEndTimeout(Duration.newBuilder().setSeconds(3).build()) .build() ) .build() ); }
With languageCode = "en-US" The response stream includes the expected events
SpeechEventType: SPEECH_ACTIVITY_BEGIN SpeechEventType: SPEECH_EVENT_TYPE_UNSPECIFIED SpeechEventType: SPEECH_EVENT_TYPE_UNSPECIFIED ... SpeechEventType: SPEECH_ACTIVITY_END
With languageCode = "ja-JP" Everything else remains the same (code, endpoint, model, audio config, etc.), except that the audio content is in Japanese. However, in this case, I do not receive the SPEECH_ACTIVITY_END event at all. The sequence looks like:
SpeechEventType: SPEECH_ACTIVITY_BEGIN
SpeechEventType: SPEECH_EVENT_TYPE_UNSPECIFIED
SpeechEventType: SPEECH_EVENT_TYPE_UNSPECIFIED
...
// No SPEECH_ACTIVITY_END
Is this an expected limitation or behavior for certain languages (like Japanese)?
Is SPEECH_ACTIVITY_END supported only for some language models or locales?
Is there anything I should change in the configuration when using ja-JP?
Any suggestions or insights are greatly appreciated.