Get hands-on experience with 20+ free Google Cloud products and $300 in free credit for new customers.

Help with Gemini-1.5 Pro Model output Token Limit in Vertex AI

Hi everyone,

I’m currently using the Gemini-1.5 Pro model on Vertex AI for transcribing text. However, I’ve run into an issue: the output is getting cropped because of the 8199-token limit.

  1. How can I overcome this limitation? Are there any techniques or best practices to handle larger transcription outputs while using this model?

  2. I’m also curious, does Gemini internally use Chirp for transcription? Or is its transcription capability entirely native to Gemini itself?

Any help or insights would be greatly appreciated! Thanks in advance!

0 1 944
1 REPLY 1

Hi @Sakshijain25,

Welcome to Google Cloud Community!

You're hitting the token limit, which is a common issue when working with large amounts of text. 

Here are few workarounds to handle larger  transcriptions outputs:

Chunking - To stay within the token limit, divide the audio into shorter segments and transcribe each one individually. Once each segment is transcribed, you can combine the results.

Use of Longer Context Models - The Gemini 1.5 Pro can handle more text because it has a bigger memory (up to 2 million tokens). But, for long files, you might still need to break them into smaller parts (chunking) to make sure everything fits within the limit.

Streaming - If your use case allows it, you might consider using a streaming approach. To avoid exceeding the token limit, break down the audio file into smaller segments and send them in segments. This enables Gemini to process and transcribe the audio in real time.

Regarding Gemini transcription, Google hasn't publicly disclosed the specific components used within Gemini. It's likely that Gemini's transcription capabilities are a combination of its own internal architecture and potentially other Google technologies, including Chirp.

I hope the above information is helpful.