google.api_core.exceptions.ResourceExhausted: 429 ...

_Gerald_ · 11-14-2024 04:40 AM

Hi Community,

I am blocked trying to get an answer from

gemini-1.5-flash or

gemini-1.5-pro

I am doing a simple call. But keep getting inconsistent errors (sometimes it works, most of the time it doesn't).

Error is "google.api_core.exceptions.ResourceExhausted: 429 Resource exhausted. Please try again later."

I tried:
- checking my quotas & limits from the console and everything is in the green with less than 20% usage
- changing region
- check my IAM policy

- checking my API service details:

Methods	Requests	Errors
google.cloud.aiplatform.ui.JobService.ListDataLabelingJobs	9	100%
google.cloud.aiplatform.v1.PredictionService.GenerateContent	144	24.31%
google.cloud.aiplatform.v1.PredictionService.StreamGenerateContent	2	50%
google.cloud.aiplatform.v1beta1.GenAiCacheService.GetCachedContent	4	100%

The code is very simple

def fetch_and_save_raw_output(
    prompt: str, uri: str, system_prompt: str
) -> Optional[str]:

    vertexai.init(project=config["project_id"], location=config["location"])
    model = GenerativeModel(config["model_name"], system_instruction=[SYSTEM_PROMPT])

    document = Part.from_uri(mime_type="application/pdf", uri=uri)
    try:
        response = model.generate_content(
            [prompt, document],
            generation_config=generation_config,
            safety_settings=safety_settings,
            stream=False,
        )

        if hasattr(response, "finish_reason"):
            if response.finish_reason == "MAX_TOKENS":
                log.warning(f"Response truncated due to MAX_TOKENS for {uri}")
            elif response.finish_reason != "STOP":
                log.warning(
                    f"Unexpected finish reason: {response.finish_reason} for {uri}"
                )

        # Save raw response with metadata
        pdf_name = extract_filename(uri)
        output_dict = {
            "metadata": {"uri": uri, "timestamp": time.strftime("%Y-%m-%d %H:%M:%S")},
            "response": response.to_dict(),
        }
        print(f"output_dict: {output_dict}")
                return pdf_name
    except Exception as e:
        log.exception(f"Error fetching and saving raw output for {uri}: {str(e)}")
        return None

MarvinLlamas

Hi @_Gerald_,

Welcome to Google Cloud Community!

The error ‘google.api_core.exceptions.ResourceExhausted: 429 Resource exhausted’ that you encountered suggests you have hit the quota limit or resource limit of your service account.

Here are some potential ways to address your issue:

Review Quotas and Limits: Since you mentioned that your quotas and limits are in the green zone with less than 20% usage, I recommend double-checking your specific quotas for the services you are utilizing, such as PredictionService and JobService. Some quotas might be exhausted more rapidly than others.
Retry strategy: Incorporate recovery logic into your code to handle transient errors by using exponential backoff to delay and retry the request.
API Key and Project Setup: Make sure that your API key and project configuration are properly set up. Misconfigurations can sometimes result in resource exhaustion errors.
Monitor API rate limits: Some APIs have restrictions on the number of requests you can make within a specific time frame. Ensure you don't hit these limits.

I hope the above information is helpful.

P-Train

Hi,
You can also test Gemini Flash 1.5 version 001 which worked for us.
Another soluton is to buy dedicated GSU's with Privisioned Throughput (PT), or test pay-as-you-go with another region. But if it is a Production environment, the suggestion we received is to use PT.