gemini texttospeech model

fengye

Hi, I was testing the gemini-2.5-flash-preview-tts model in vertex AI workbench instance but got the error. Other models such as gemini-2.0-flash works fine in the same instance. Any suggestions on how to fix this or any setup needed in GCP?

Error: multi_speaker_voice_config parameter is not supported in Vertex AI

The script used is as follows:

from google import genai
from google.genai import types
import wave

# Set up the wave file to save the output:
def wave_file(filename, pcm, channels=1, rate=24000, sample_width=2):
   with wave.open(filename, "wb") as wf:
      wf.setnchannels(channels)
      wf.setsampwidth(sample_width)
      wf.setframerate(rate)
      wf.writeframes(pcm)


prompt = """TTS the following conversation between Aaron and Alice as a podcast."""    
        
# genai_client = genai.Client(vertexai=True,
#                             project=project,
#                             location=location)
        
genai_client = genai.Client(vertexai=True, api_key=MY_API_KEY)
        
response = genai_client.models.generate_content(
            model="gemini-2.5-flash-preview-tts",
            contents=[prompt, input_json],
            config=types.GenerateContentConfig(
                response_modalities=["AUDIO"],
                speech_config=types.SpeechConfig(
                    multi_speaker_voice_config=types.MultiSpeakerVoiceConfig(
                    speaker_voice_configs=[
                       types.SpeakerVoiceConfig(
                          speaker='Aaron',
                          voice_config=types.VoiceConfig(
                             prebuilt_voice_config=types.PrebuiltVoiceConfig(
                                voice_name='Kore',
                             )
                          )
                       ),
                       types.SpeakerVoiceConfig(
                          speaker='Alice',
                          voice_config=types.VoiceConfig(
                             prebuilt_voice_config=types.PrebuiltVoiceConfig(
                                voice_name='Puck',
                             )
                          )
                       ),
                    ]
                    )
                )
            )
        )


data = response.candidates[0].content.parts[0].inline_data.data

file_name='out.wav'
wave_file(file_name, data)

nikacalupas

Hi fengye,

Welcome to the Google Cloud Community!

The error "multi_speaker_voice_config parameter is not supported in Vertex AI” suggests that the gemini-2.5-flash-preview-tts model you're calling via Vertex AI doesn't yet support multi-speaker TTS through the Vertex AI endpoint.

Here are some workarounds that might help your use case:

Consider using the Gemini API instead of Vertex AI. The Gemini API currently supports multi-speaker TTS in preview mode.
Verify Model Version & Region make sure you're using the correct model version and that your project has access to the preview features. Certain Gemini 2.5 endpoints may be updated or restricted depending on usage history.

Additionally, you may keep an eye on the release notes for any latest updates or new features related to Vertex AI.

Was this helpful? If so, please accept this answer as “Solution”. If you need additional assistance, reply here within 2 business days and I’ll be happy to help.