Get hands-on experience with 20+ free Google Cloud products and $300 in free credit for new customers.

Error: ResourceExhausted: 429 Quota exceeded for aiplatform.googleapis.com/online_prediction_request

Is there any limitation on generating embeddings? I tried to generate embeddings for 152 chunks, with each chunk containing only one array element. However, I am still encountering the '429 Quota Exceeded' error. I have attached the screenshot for your reference. In the documentation, the request quota is mentioned as 600 requests per minute. Can you help me fix this?

KannanG03_1-1691757297307.pngKannanG03_2-1691757324350.png

 

7 REPLIES 7

Hi @KannanG03

I appreciate your inquiry and expanding awareness of the community.

I understand that the documentation mentioned about 600 requests per minute inline with embedding, it has also mentioned that resource management are not limited to the number of request listed in the quota table, it can also be affected by other factors such as the number of jobs it makes and the accumulated data size or output.


* Resource management requests include any request that is not a job, long-running operation, online prediction request, or Vertex AI Vizier request.

You can review your project's Quota page to determine related services that are exceeding the quota. You may need to requesting a higher quota limit to fix the issue.

I found a thread in the community that is somewhat related to your concern (service health history), this may be a re-occurrence that needs the Vertex AI Support team's attention.

Hope this helps.

 

I have similar problem:

>promise catch: [VertexAI.ClientError]: got status: 429 Too Many Requests, code: undefined

although surely I don't hit the Vertex Quota limit, see photo attached.

what could be the problem?

Screenshot 2024-02-08 at 07.25.31.png

 

No reply for a problem posted back in 02-07-2024?

I'm getting the same error, but just with langchain v0.2

Works:

```

from anthropic import AnthropicVertex
client = AnthropicVertex(region="us-east5", project_id="MY_PROJECT")
 
message = client.messages.create(
max_tokens=1024,
messages=[
{
"role": "user",
"content": "Send me a recipe for banana bread.",
}
],
model="claude-3-haiku@20240307"
)
print(message.model_dump_json(indent=2))
```
 
Does NOT work:
 
```
from langchain_google_vertexai import ChatVertexAI
chat = ChatVertexAI(model="claude-3-haiku@20240307", location="us-east5", project="MY_PROJECT")
 
from langchain_core.messages import HumanMessage, SystemMessage
messages = [
SystemMessage(content="You're a helpful assistant"),
HumanMessage(content="What is the purpose of model regularization?"),
]
chat.invoke(messages)
```
 
The error:
 
```
Retrying langchain_google_vertexai.chat_models._completion_with_retry.<locals>._completion_with_retry_inner in 4.0 seconds as it raised ResourceExhausted: 429 Quota exceeded for aiplatform.googleapis.com/generate_content_requests_per_minute_per_project_per_base_model with base model: anthropic-claude-3-haiku. Please submit a quota increase request. https://cloud.google.com/vertex-ai/docs/generative-ai/quotas-genai..
```
 
I definitely have not exceeded my quota, since I've made a total of just a few requests over the past many minutes.

Any update on this? I'm having a similar issue with `text-unicorn`. Was working in an earlier version of langchain, but after updating to the most recent release, I'm getting the 429 Quota exceeded error 🤔

I have the same issue - keep getting 429 error with the Vertex AI API but am no where near my quota limits.Screenshot 2024-11-05 at 20.36.55.png

I'm also encountering this, not really sure why its happening.

We are also having this same problem.  Specifically when using the Gemini chat function.  This is making it prohibitive to building and testing applications based on AI agents.  This is a critical problem.  Has anyone progressed through this problem?