Re: Gemini Pro and Flash 002 suddenly shorter cont...

AskingQuestions · 12-02-2024 09:32 AM

Hello,

I can no longer use the Vertex AI API for Gemini models with long context, this is the error I get:

run with [gemini-1.5-pro-002] failed:

Unable to submit request because the input token count is 53163 but model only supports up to 32768. Reduce the input token count and try again. You can also use the CountTokens API to calculate prompt token count and billable characters. Learn more: https://cloud.google.com/vertex-ai/generative-ai/docs/learn/models

The same code and same models used to work (as expected, as per documentation Pro and Flash 002 should have context of >= 1M tokens).

I wonder if I am enrolled in some live experiment to some model that only supports short context windows (like the most recent gemini exp, which supports only 32K context window).

MJane

Hi @AskingQuestions,

Welcome to Google Cloud Community!

The error message indicates that the context length you provided (53,163 tokens) exceeds the maximum token limit (32,768 tokens) supported by the Gemini 1.5 Pro model. It’s possible that you're now using a version of the model with a smaller token limit than you expected.

As a temporary workaround , you might consider exploring other model s that offer similar context lengths or test with a smaller context window to confirm that the issue is indeed related to the context window size and not something else in your request.

If the issue persists, I suggest contacting Google Cloud Support as they can provide more insights to see if the behavior you've encountered is a known issue or specific to your project.

I hope the above information is helpful

AskingQuestions

As mentioned in the initial submission, the models I am using (gemini 1.5 002 pro and flash) have much longer context windows (over a million tokens) than what I am sending (less than 60,000 tokens).

This means that my requests get routed to some different model behind the scenes, because it often works without issues.

AssadullahSh

Facing the same issue - that too all of a sudden. Never faced the maximum tokens allowed error before.

It looks like it has been updated in one of the recent releases from Google. Or, am I wrong?

daedae

Would love a follow up on this. Also seeing similar errors with 1.5 Pro (periodically):
[GoogleGenerativeAI Error]: Error fetching from https://generativelanguage.googleapis.com/v1beta/models/gemini-1.5-pro-002:
generateContent: [400 Bad Request] The input token count (42685) exceeds the maximum number of tokens allowed (32768).

sitbonnoam

I am having the same issue... I would love a follow-up as well!

Sangjae

I'm having the same issue too....

stijnpaperbox

Same issue here!

Gemini Pro and Flash 002 suddenly shorter context window