Announcements
This site is in read only until July 22 as we migrate to a new platform; refer to this community post for more details.
Get hands-on experience with 20+ free Google Cloud products and $300 in free credit for new customers.

Problem with diarization using Portuguese Brazil (pt-BR)

I am using Google Speech-to-Text to transcribe my call center's phone calls, all of them in Portuguese and using recorded in WAV format (Encoding type: LINEAR16, Sample rate: 8000Hz.

I'm using API V2, spoken language (pt-BR), model Telephony, Recognizer (below):

rbarcellos_0-1686662206945.png

But the transcription recognized only one speaker.  Where am I doing wrong?

rbarcellos_1-1686662458878.png

 

Thank you.

Ricardo Barcellos

0 3 826
3 REPLIES 3

kvandres
Former Googler

Good day @rbarcellos,

Welcome to Google Cloud Community!

You are encountering this issue since you have set the minimum speaker count to 1. If there are two different speakers in an audio recording, you should set the minimum speaker count and maximum speaker count to 2. Try setting both by 2 instead of 1 and see if it will solve the problem. 

Here is a sample POST request with two different speakers in the Documentation, the minimum and the maximum speaker count is set to 2 since there are two speakers in the audio:
https://cloud.google.com/speech-to-text/docs/multiple-voices#use_a_local_file

Hope this helps!

Diarization not available in pt-BR?

Since Google launched this feature, i have not been able to see it working for the pt-BR language using longRunningRecognize. When you add the parameters to the processing code, even following the documentation, the transcription occurs on just one channel.
For audio with a smaller minute, it is possible to use the recognize feature and check the difference between the channels, however, it is unreliable, since the recognizer's speaker diarization confuses the interlocutors. Google shouldn't offer this feature until it's out of beta.