Get hands-on experience with 20+ free Google Cloud products and $300 in free credit for new customers.

Unexpected pauses or lack of responses from the 'gemini-1.5-flash-002' model in Vertex AI

 

I have been using the "gemini-1.5-flash-002" model with Vertex AI to generate content for the past few weeks. While it works well initially, it occasionally pauses unexpectedly after processing a certain number of requests.

I attempted to identify a pattern, such as the number of requests or the time elapsed before the pauses occur, but no consistent trend emerged. Sometimes, the model handles around 1,500 requests without issues, while other times, it pauses after approximately 100 requests.

The variation in the number of input tokens between requests is minimal, as the input data is relatively consistent in length.

When the pause occurs, it lasts for about 10 minutes before throwing the following error:
```

Traceback (most recent call last):
  File "/home/user/Documents/other_projects/lab/.venv/lib/python3.10/site-packages/google/api_core/grpc_helpers.py", line 76, in error_remapped_callable
    return callable_(*args, **kwargs)
  File "/home/user/Documents/other_projects/lab/.venv/lib/python3.10/site-packages/grpc/_channel.py", line 1181, in __call__
    return _end_unary_response_blocking(state, call, False, None)
  File "/home/user/Documents/other_projects/lab/.venv/lib/python3.10/site-packages/grpc/_channel.py", line 1006, in _end_unary_response_blocking
    raise _InactiveRpcError(state)  # pytype: disable=not-instantiable
grpc._channel._InactiveRpcError: <_InactiveRpcError of RPC that terminated with:
status = StatusCode.INTERNAL
details = "Internal error encountered."
debug_error_string = "UNKNOWN:Error received from peer ipv4:142.250.67.170:443 {created_time:"2025-01-02T15:39:22.823504078+05:30", grpc_status:13, grpc_message:"Internal error encountered."}"
>
 
The above exception was the direct cause of the following exception:
 
Traceback (most recent call last):
  File "/home/user/Documents/other_projects/lab/scripts/other/misc.py", line 58, in <module>
    output = context_based_match(text)
  File "/home/user/Documents/other_projects/lab/scripts/other/misc.py", line 31, in context_based_match
    response = model.generate_content(
  File "/home/user/Documents/other_projects/lab/.venv/lib/python3.10/site-packages/vertexai/generative_models/_generative_models.py", line 619, in generate_content
    return self._generate_content(
  File "/home/user/Documents/other_projects/lab/.venv/lib/python3.10/site-packages/vertexai/generative_models/_generative_models.py", line 744, in _generate_content
    gapic_response = self._prediction_client.generate_content(request=request)
  File "/home/user/Documents/other_projects/lab/.venv/lib/python3.10/site-packages/google/cloud/aiplatform_v1/services/prediction_service/client.py", line 2147, in generate_content
    response = rpc(
  File "/home/user/Documents/other_projects/lab/.venv/lib/python3.10/site-packages/google/api_core/gapic_v1/method.py", line 131, in __call__
    return wrapped_func(*args, **kwargs)
  File "/home/user/Documents/other_projects/lab/.venv/lib/python3.10/site-packages/google/api_core/grpc_helpers.py", line 78, in error_remapped_callable
    raise exceptions.from_grpc_error(exc) from exc
google.api_core.exceptions.InternalServerError: 500 Internal error encountered.
WARNING: All log messages before absl::InitializeLog() is called are written to STDERR
E0000 00:00:1735812565.706618   32847 init.cc:229] grpc_wait_for_shutdown_with_timeout() timed out.
```

Here is my sample code:

 

import base64
import json
from datetime import datetime

import pytz
import vertexai
from google.oauth2 import \
    service_account  # importing auth using service_account
from vertexai.generative_models import GenerationConfig, GenerativeModel

indian_tz = pytz.timezone("Asia/Kolkata")

cred_in_base64_encoding = "base64-encoded-google-app-credentials"
google_app_creds = json.loads(base64.b64decode(cred_in_base64_encoding).decode("utf-8"))
credentials = service_account.Credentials.from_service_account_info(
    google_app_creds, scopes=["https://www.googleapis.com/auth/cloud-platform"]
)
# Initialize Vertex AI
vertexai.init(
    project=google_app_creds["project_id"],
    location="europe-west3",
    credentials=credentials,
)
model_name = "gemini-1.5-flash-002"
model = GenerativeModel(model_name)


def context_based_match(input_text):
    llm_query = f"LLM prompt with {input_text}"
    # Your response should be in this JSON-format: {{"relevant": boolean}}
    response = model.generate_content(
        llm_query,
        generation_config=GenerationConfig(
            response_mime_type="application/json",
            max_output_tokens=32,
            temperature=0,
            seed=1102,
            response_schema={
                "type": "object",
                "properties": {
                    "relevant": {
                        "type": "boolean",
                    }
                },
                "required": ["relevant"],
            },
        ),
    )
    try:
        return json.loads(response.text)
    except Exception as e:
        print(f"Error: {e}")
        return {"relevant": None}


data = list()  # List of Texts
for i, text in enumerate(data, 1):
    output = context_based_match(text)
    print(f"{i} | {datetime.now(indian_tz)} | {output['relevant']} | {text}")
​

I couldn't find anything related to the issue in any google documentations or on internet. Even "Quotas & System Limits" page doesn't show any usage statistics of "gemini-1.5-flash-002" model. I can only see "Online prediction requests per minute per region" statistics which is given below.
urvisism_0-1735813290260.png

 

Any insights would be appreciated.

0 1 839
1 REPLY 1