Vetex ai deployed model

king7475 · 04-03-2025 09:33 AM

I have deployed Seasme csm 1b on vertex AI, and I think the output should be audio. I'm not getting it in that format. Also, I don't know what the input should be.

SuwarnaKale

Hello @king7475,

If you’ve deployed SeamlessM4T (Seasme CSM 1B) on Google Cloud Vertex AI but aren’t receiving audio output, the issue likely stems from:

Incorrect input format (text vs. audio).
Missing task specification (e.g., T2S for text-to-speech).
Vertex AI’s default text-only response.
Lack of post-processing (e.g., converting spectrograms to audio).

To resolve this, ensure your input matches the expected format (text for TTS, audio for S2S) and explicitly request speech output in your API call. If Vertex AI still returns text, additional configuration or a vocoder may be needed.

I hope this helped! 🙂

Best regards,

Suwarna

ilnardo92

Hi @king7475 ,

Thank you for the question! Below you can find an example of how to get prediction from Sesame CSM deployed on Google Cloud Vertex AI.

from google.cloud import aiplatform
from IPython.core.display import display
from IPython.display import Audio
import base64

instances = [
{"speaker": 0, "text": "I just won a million dollar lottery."},
{"speaker": 1, "text": "You're kidding me!"},
]

seasme_endpoint =aiplatform.Endpoint(projects/{your-project-id}/locations/{your-endpoint-region}/endpoints/{your-endpoint-id}')

response =seasme_endpoint.predict(
instances=instances,
)

for prediction in response.predictions:
display(Audio(base64.b64decode(prediction["audio"])))

If you want to know more, check out the official notebook in the model card on Vertex AI Model Garden.

Hope it helped!

Best

nikacalupas

Hi king7475,

Welcome to the Google Cloud Community!

In addition to @SuwarnaKale and @ilnardo92 ’s input. Here are a few approaches to consider:

Inspect Your Vertex AI Endpoint Configuration: To inspect your Vertex AI endpoint, check its details in the console, review the associated model for expected inputs and outputs, and enable request/response logging for debugging and optimization. To ensure everything is set up correctly
Incorrect Output Interpretation: Vertex AI may be providing audio data in an unexpected format, such as base64 encoding. To obtain the actual audio, you’ll need to decode this data. Check the response from the Vertex AI endpoint—it should include metadata specifying the encoding format. Typically, the 'content' field in the prediction response contains the base64-encoded audio, which must be decoded to make it usable.
Permissions: Make sure that you assign the appropriate permissions to your service account used by your Vertex AI endpoint so it can access Cloud Storage buckets when using Cloud Storage URIs.

Was this helpful? If so, please accept this answer as “Solution”. If you need additional assistance, reply here within 2 business days and I’ll be happy to help.