Re: Problem with diarization using Portuguese Braz...

rbarcellos · 06-13-2023 06:21 AM

I am using Google Speech-to-Text to transcribe my call center's phone calls, all of them in Portuguese and using recorded in WAV format (Encoding type: LINEAR16, Sample rate: 8000Hz.

I'm using API V2, spoken language (pt-BR), model Telephony, Recognizer (below):

But the transcription recognized only one speaker. Where am I doing wrong?

Thank you.

Ricardo Barcellos

kvandres

Good day @rbarcellos,

Welcome to Google Cloud Community!

You are encountering this issue since you have set the minimum speaker count to 1. If there are two different speakers in an audio recording, you should set the minimum speaker count and maximum speaker count to 2. Try setting both by 2 instead of 1 and see if it will solve the problem.

Here is a sample POST request with two different speakers in the Documentation, the minimum and the maximum speaker count is set to 2 since there are two speakers in the audio:
https://cloud.google.com/speech-to-text/docs/multiple-voices#use_a_local_file

Hope this helps!

Yasumitsu

Diarization not available in pt-BR?

diegoCassol

Since Google launched this feature, i have not been able to see it working for the pt-BR language using longRunningRecognize. When you add the parameters to the processing code, even following the documentation, the transcription occurs on just one channel.
For audio with a smaller minute, it is possible to use the recognize feature and check the difference between the channels, however, it is unreliable, since the recognizer's speaker diarization confuses the interlocutors. Google shouldn't offer this feature until it's out of beta.

Problem with diarization using Portuguese Brazil (pt-BR)