Hello,
I'm experiencing a persistent, critical failure with the Vertex AI API and I'm hoping to find if others have this issue or if a Googler can investigate. All requests to the generateContent endpoint are failing.
System Details:
Project ID: curious-nucleus-440815-i6
API Endpoint: europe-west2-aiplatform.googleapis.com
Model: gemini-1.5-pro-002
Error Codes: The primary error is RESOURCE_EXHAUSTED (Code 8). This is causing secondary 429 (Too Many Requests) and 504 (Gateway Timeout) errors.
Comprehensive Troubleshooting Steps Performed:
I have worked extensively to debug this and have ruled out all client-side causes:
Quota Verification: Confirmed via the GCP console that all Vertex AI API quotas are at 0% usage. This is not a project-level quota issue.
Request Complexity: The error occurs even on the most basic requests. As a final test, I reduced my API call's tool definition to a single, minimal function. The request still fails with the same error, proving the issue is not related to the complexity of my request.
No Widespread Outage: The Google Cloud Service Health dashboard shows no active incidents for Vertex AI.
Permissions Confirmed: The service account being used is active and has the required "Vertex AI User" role.
Persistent Issue: The issue has been ongoing for several hours, and our client's exponential backoff retry logic is unable to overcome it.
This appears to be a definitive internal, server-side fault specific to the infrastructure handling our project. Any insights or escalation help would be greatly appreciated.
Thank you.
Hi @SafelySystems,
Welcome to Google Cloud Community!
I understand that you have already performed multiple troubleshooting steps to address the issue. This error possibly suggests regional capacity exhaustion on the gemini-1.5-pro-002 model, indicating a temporary shortage of resources in the specific model's regional pool. One suggestion you could try is to try different regions supported by the model.
In addition, kindly note that the gemini-1.5-pro-002 model is a legacy stable model and scheduled for retirement on September 24, 2025, with recommended upgrade on gemini-2.0-flash. However, gemini-2.0-flash is not available on europe-west2. As an alternative, you may consider using gemini-2.5-flash under Provisioned Throughput for a more consistent level of service. This option provides reserved dedicated capacity to avoid resource contention or queuing if you have a critical workload and want a consistent and predictable experience.
Was this helpful? If so, please accept this answer as “Solution”. If you need additional assistance, reply here within 2 business days and I’ll be happy to help.