Get hands-on experience with 20+ free Google Cloud products and $300 in free credit for new customers.

quota error using text model

Hi there !

I'm running into the following quota issue when performing predictions with Vertex AI's text models:

 

 

 

Failed to run inference job. Exceeded rate limits: too many concurrent queries that use ML.GENERATE_TEXT table-valued function for this project

 

 

 

And I cannot find in the quota overview dash which one is actually being hit (meaning I don't even know which one I should ask to be raised....)
 
thibaultconvert_0-1704981244781.png

 

Any clues ?
pseudo-code below (running on Cloud Run)

 

 

import vertexai
from vertexai.preview.language_models import TextGenerationModel

vertexai.init(project=project_id, location="us-central1", credentials=creds)
    parameters = {
        "temperature": 0.2,
        "max_output_tokens": 2048,
        "top_p": 0.2,
        "top_k": 8
    }
model = TextGenerationModel.from_pretrained("text-bison")
dataset = f"gs://{US_BUCKET_NAME}/metadata/{path}/{file_name}_{file_id}_prompts*"
destination_uri = f"gs://{US_BUCKET_NAME}/metadata/{path}/{file_name}_{file_id}_output"
response = model.batch_predict(destination_uri_prefix=destination_uri, dataset=dataset)

 

 

 

 

 
2 4 424
4 REPLIES 4

Facing this issue as well, but having this error come from BigQuery

Same here in bigquery,   someone know how to solve this?

Same here in bigquery...

Job exceeded rate limits: Your project exceeded quota for concurrent queries that use ML.GENERATE_TEXT_EMBEDDING table-valued function

Same here, any news?