Hi folks, I'm trying to pass a prompt to Gemini (1.5 Pro or 1.5 Flash, same problem). It's a single unit prompt, I don't send several, and it's the first one of the day that I try to send.
And I get an error: google.api_core.exceptions.ResourceExhausted: 429 Resource exhausted. Please try again later. Please refer to https://cloud.google.com/vertex-ai/generative-ai/docs/quotas#error-code-429 for more details.
What makes it special, though, is its size:
Prompt Token Count: 605995 # total_tokens
Prompt Character Count: 1294 # total_billable_characters
It's made up as follows:
contents = [
foo, # Part.from_data
bar, # Part.from_data
prompt # Part.from_text
]
I've checked the quotas, and none of them seem to be breached. And the input token quota is about 4 million, so I'm well under.
Do you have any idea what might be causing my problem? Thanks!
Solved! Go to Solution.
We recieved information that the issue is because of resource problems due to a new way to handle quotas and requests. This will happen for version 002 (when many users try to use 002 at the same time, for the specific region), but 001 will still work as it uses the old way to handle quotas and requests.
Other ways to solve it is to try another region (if pay-as-you-go) or buy dedicated GSU's via Provisioned Throughput.
User | Count |
---|---|
2 | |
2 | |
1 | |
1 | |
1 |