I was working with google speech to text for transcribing live audio. I was able to use the auto detect feature to detect user's language he/she is speaking in. it worked perfectly when transcribing an audio file but i was not able to achieve the same result when doing the same with live audio. I followed every sample and documentation made available by google but still no luck.
Platform: Python 3.9
Here is my snippet:
config = speech.RecognitionConfig( encoding=speech.RecognitionConfig.AudioEncoding.LINEAR16, sample_rate_hertz=RATE, language_code=language_code, model="command_and_search", alternative_language_codes=['mr-IN', 'en-IN'] )
streaming_config = speech.StreamingRecognitionConfig( config=config, interim_results=True ) with MicrophoneStream(RATE, CHUNK) as stream: audio_generator = stream.generator() requests = ( speech.StreamingRecognizeRequest(audio_content=content) for content in audio_generator ) responses = client.streaming_recognize(streaming_config, requests)
Any help will be appreciated.
Thanks
It's not much clear what is not same with live audio, can you please clarify a bit.
1) What differences you noticed in your results?
2) Which documentation you followed?
There are whole lot of different issues for example:
- with the speaker diarization (multiple speaker recognition), can not identify different speakers
- can not determine the spaces/pauses, start or end of the speech.
- can not recognize some special words. etc.
There are a lot of factors and RecognitionConfig parameters for example:encoding, sampleRateHertz, language code, speechContext, length of the speech etc. that take into accounts while transcribing an audio.
You may find the following documentation helpful:
[1] https://cloud.google.com/speech-to-text/docs/concepts
[2] https://cloud.google.com/speech-to-text/docs/basics
[3] https://cloud.google.com/speech-to-text/docs/best-practices
[4] https://cloud.google.com/speech-to-text/docs/adaptation-model
[5] https://cloud.google.com/architecture/architecture-for-production-ready-live-transcription-tutorial
[6] https://github.com/googleapis/python-speech/blob/main/samples/microphone/transcribe_streaming_infini...
User | Count |
---|---|
2 | |
1 | |
1 | |
1 | |
1 |