quota error using text model

thibaultconvert · 01-11-2024 05:55 AM

Hi there !

I'm running into the following quota issue when performing predictions with Vertex AI's text models:

Failed to run inference job. Exceeded rate limits: too many concurrent queries that use ML.GENERATE_TEXT table-valued function for this project

And I cannot find in the quota overview dash which one is actually being hit (meaning I don't even know which one I should ask to be raised....)

Any clues ?

pseudo-code below (running on Cloud Run)

import vertexai
from vertexai.preview.language_models import TextGenerationModel

vertexai.init(project=project_id, location="us-central1", credentials=creds)
    parameters = {
        "temperature": 0.2,
        "max_output_tokens": 2048,
        "top_p": 0.2,
        "top_k": 8
    }
model = TextGenerationModel.from_pretrained("text-bison")
dataset = f"gs://{US_BUCKET_NAME}/metadata/{path}/{file_name}_{file_id}_prompts*"
destination_uri = f"gs://{US_BUCKET_NAME}/metadata/{path}/{file_name}_{file_id}_output"
response = model.batch_predict(destination_uri_prefix=destination_uri, dataset=dataset)

Hauii

Facing this issue as well, but having this error come from BigQuery

fabianobrito

Same here in bigquery, someone know how to solve this?

fabianobrito

Same here in bigquery...

Job exceeded rate limits: Your project exceeded quota for concurrent queries that use ML.GENERATE_TEXT_EMBEDDING table-valued function

fabianobrito

Same here, any news?