Get hands-on experience with 20+ free Google Cloud products and $300 in free credit for new customers.

429 Resource exhausted on VertexAI with just one large request

Hi folks, I'm trying to pass a prompt to Gemini (1.5 Pro or 1.5 Flash, same problem). It's a single unit prompt, I don't send several, and it's the first one of the day that I try to send.

And I get an error: google.api_core.exceptions.ResourceExhausted: 429 Resource exhausted. Please try again later. Please refer to https://cloud.google.com/vertex-ai/generative-ai/docs/quotas#error-code-429 for more details.

What makes it special, though, is its size:
Prompt Token Count: 605995 # total_tokens
Prompt Character Count: 1294 # total_billable_characters

It's made up as follows:
contents = [
    foo, # Part.from_data
    bar, # Part.from_data
    prompt # Part.from_text
]

I've checked the quotas, and none of them seem to be breached. And the input token quota is about 4 million, so I'm well under.

Do you have any idea what might be causing my problem? Thanks!

Solved Solved
3 6 1,818
1 ACCEPTED SOLUTION

We recieved information that the issue is because of resource problems due to a new way to handle quotas and requests. This will happen for version 002 (when many users try to use 002 at the same time, for the specific region), but 001 will still work as it uses the old way to handle quotas and requests.
Other ways to solve it is to try another region (if pay-as-you-go) or buy dedicated GSU's via Provisioned Throughput.

View solution in original post

6 REPLIES 6