Hi, I'm trying to use the Speech-to-Text v2 API for transcription and speaker diarization.
Per this supported languages page (https://cloud.google.com/speech-to-text/v2/docs/speech-to-text-supported-languages), I should be able to create a Recognizer using the "long" model for the language "en-US" that supports diarization.
And yet every time I try to create such a Recognizer (I've tried using both the UI and the API), I get an error message. In the API it says "Recognizer does not support feature: speaker_diarization" and in the UI it shows the attached message.
Am I missing something here?
Hi @jaypinho!
The error is because there is no such combination fulfilling your set. U can alter one or two conditions to make it work.
For example, if you want to keep the Diarization, you'd better change the model from 'long' to 'short'. And, use a specific Location, e.g. 'asia-south1', instead of using the 'global'.
I hope it helps.😊
Best wishes,
Thanks for the note! I don't think that's the issue though. Here's a similar error message when I try "asia-south1' and 'short':
And also, the 'long' / 'en-US' / 'global' combo is listed as acceptable in the Supported Languages page: https://cloud.google.com/speech-to-text/v2/docs/speech-to-text-supported-languages
I have the same problem with medical conversation model Almost no model is supporting diatrization! Best to remove that from the options!
I also tried any combination I could think of and it looks like diarization is not supported for any combination on speech-to-text-v2.
When changing to v1 the Diarization is available
I'm having this issue on v2 as well (or some version of it).
I'm using the API in GCP with REST calls, and if I submit a to a 'recognize' endpoint, the filesize is too long.
And when I submit to a 'batchRecognize' endpoint, changing the payload appropriately, then I get an error that batchRecognize does not support diarization.
This is extremely frustrating, I'll need to build out with a different provider/method, which creates quite a lot of overhead for my medical_conversation transcription needs.
The fact that this issue is still ongoing 4 months later suggests to me there's not a lot of usage on V2, or else this would have been fixed by now. Very frustrating!
I'm also trying to use the v2 API with diarization for long form english audio. The feature table for v2 (https://cloud.google.com/speech-to-text/v2/docs/speech-to-text-supported-languages) doesn't show that speech diarization is supported for any english model, except for medical_conversation (which also doesn't work).
When I try to send an API call with diarization enabled to the v2 endpoint, I get a 403 Permission Denied error (which is misleading).
Perhaps diarization doesn't work and they're no longer supporting it?
Any updates on this? when can we use diarization with V2 API?
Second to this. I am trying to use Diarization with V2 API and it seems like all I get is "Error 404: Request contains an invalid argument."
I can't tell what I am doing wrong here. Removing the diarization config makes it so the error goes away, but that is not what I am trying to do here.
```
config = cloud_speech.RecognitionConfig(
auto_decoding_config=cloud_speech.AutoDetectDecodingConfig(),
language_codes=["en-US"],
model="long",
features=cloud_speech.RecognitionFeatures(
enable_automatic_punctuation=True,
diarization_config=cloud_speech.SpeakerDiarizationConfig(
min_speaker_count=1,
max_speaker_count=6,
),
),
)
```
Today is already 2025, and Google has not yet dealt with this bug from three years ago. The efficiency is really slow. Have you given up on this project? Is there any other better replacement solution? Maybe I should use AI now
User | Count |
---|---|
2 | |
2 | |
1 | |
1 | |
1 |