Google Cloud Speech to Text V2 streaming audio fee... - Page 2

spy · 01-24-2024 12:27 AM

I'm running speech-to-text stream on an Android device with microphone input, and it works smoothly in V1.

Here is a tutorial from V1
https://cloud.google.com/speech-to-text/docs/transcribe-streaming-audio

I used the StreamingRecognizeRequest and set a ResponseObserver as a callback, the final transcripts would be return.

However I tried to migrate the code to V2, it could not work properly.
Here is my code (Java).

ResponseObserver<StreamingRecognizeResponse> responseObserver = new ResponseObserver<>() {
            @Override
            public void onStart(StreamController controller) {
                Log.d(TAG, "onStart = " + controller);
            }

            @Override
            public void onResponse(StreamingRecognizeResponse response) {
                Log.d(TAG, "onResponse = ");
            }

            @Override
            public void onComplete() {
                Log.d(TAG, "onComplete = ");
            }

            @Override
            public void onError(Throwable t) {
                Log.d(TAG, "onError = " + t);
            }
        };

        RecognitionConfig recognitionConfig = RecognitionConfig.newBuilder()
                .addLanguageCodes("en-US")
                .setAutoDecodingConfig(AutoDetectDecodingConfig.newBuilder().build())
                .build();
        StreamingRecognitionConfig streamingRecognitionConfig = StreamingRecognitionConfig.newBuilder()
                .setConfig(recognitionConfig)
                .build();
        StreamingRecognizeRequest streamingRecognizeRequest = StreamingRecognizeRequest.newBuilder()
                .setStreamingConfig(streamingRecognitionConfig)
                .setRecognizer(recognizer.getName())
                .build();
        mClientStream = mSpeechClient.streamingRecognizeCallable().splitCall(responseObserver);
        mClientStream.send(streamingRecognizeRequest);

        // receive audio buffer continuously
        if (mAudioEmitter != null) {
            mAudioEmitter.start((ByteString bytes) -> {
                StreamingRecognizeRequest.Builder sBuilder = StreamingRecognizeRequest.newBuilder().setRecognizerBytes(recognizer.getNameBytes())
                        .setAudio(bytes);
                mClientStream.send(sBuilder.build());
            });
        }

I realized one of the differences between V1 and V2 is the Recognizer object, so I set the parameter and make sure it is right.
But it still cannot work, the onStart() method is called but no onResponse().

And there is no any sample about audio input (microphone) in V2 developer guides, they are all audio file recognition.

Is there any restriction on V2?

Thanks

Google Cloud Speech to Text V2 streaming audio feed from a microphone