Google Speech-to-text for live audio

asnal_rizvi · 01-10-2022 04:05 AM

I was working with google speech to text for transcribing live audio. I was able to use the auto detect feature to detect user's language he/she is speaking in. it worked perfectly when transcribing an audio file but i was not able to achieve the same result when doing the same with live audio. I followed every sample and documentation made available by google but still no luck.
Platform: Python 3.9
Here is my snippet:

config = speech.RecognitionConfig(
    encoding=speech.RecognitionConfig.AudioEncoding.LINEAR16,
    sample_rate_hertz=RATE,
    language_code=language_code,
    model="command_and_search",
    alternative_language_codes=['mr-IN', 'en-IN']
)

streaming_config = speech.StreamingRecognitionConfig(
    config=config, interim_results=True
)

with MicrophoneStream(RATE, CHUNK) as stream:
    audio_generator = stream.generator()
    requests = (
        speech.StreamingRecognizeRequest(audio_content=content)
        for content in audio_generator
    )

    responses = client.streaming_recognize(streaming_config, requests)

Any help will be appreciated.

Thanks

sf

It's not much clear what is not same with live audio, can you please clarify a bit.

1) What differences you noticed in your results?

2) Which documentation you followed?

There are whole lot of different issues for example:

- with the speaker diarization (multiple speaker recognition), can not identify different speakers

- can not determine the spaces/pauses, start or end of the speech.

- can not recognize some special words. etc.

There are a lot of factors and RecognitionConfig parameters for example:encoding, sampleRateHertz, language code, speechContext, length of the speech etc. that take into accounts while transcribing an audio.

You may find the following documentation helpful:

[1] https://cloud.google.com/speech-to-text/docs/concepts
[2] https://cloud.google.com/speech-to-text/docs/basics
[3] https://cloud.google.com/speech-to-text/docs/best-practices
[4] https://cloud.google.com/speech-to-text/docs/adaptation-model
[5] https://cloud.google.com/architecture/architecture-for-production-ready-live-transcription-tutorial
[6] https://github.com/googleapis/python-speech/blob/main/samples/microphone/transcribe_streaming_infini...