Hi!
I'm facing inconsistent performance with the Vertex AI Gemini API (Flash 2.0). Latency fluctuates significantly, sometimes exceeding a minute, and I'm seeing intermittent, random errors. The issues are temporary, with performance returning to normal after a delay. I've ruled out input/output as the cause. Is it possible these problems are related to quota or other service limitations?
Solved! Go to Solution.
Hi @TomerH,
Welcome to Google Cloud Community!
When encountering inconsistent performance, including latency fluctuations and intermittent errors, with the Vertex AI Gemini API, it's important to consider quota and service limitations as potential causes. Here are the possible causes and solutions:
1.Quota Limits:
2. API Rate Limits:
3. Service Availability and Regional Issues:
Was this helpful? If so, please accept this answer as “Solution”. If you need additional assistance, reply here within 2 business days and I’ll be happy to help.
Hi @TomerH,
Welcome to Google Cloud Community!
When encountering inconsistent performance, including latency fluctuations and intermittent errors, with the Vertex AI Gemini API, it's important to consider quota and service limitations as potential causes. Here are the possible causes and solutions:
1.Quota Limits:
2. API Rate Limits:
3. Service Availability and Regional Issues:
Was this helpful? If so, please accept this answer as “Solution”. If you need additional assistance, reply here within 2 business days and I’ll be happy to help.
Thank you for your response.
I had increased quota limits. Therefore, I don't believe that's the issue.
I suspect it might have been a service availability issue. Moving forward I will keep an eye on the Status dashboard.
(I did notice that another user reported the same issue on this board. Hopefully it's now resolved).