Error with MultiModalEmbeddingModel: Too many VMS ...

G_rubio

I receive the following error when using the GCP multimodal embedding model (vertexai.vision_models.MultiModalEmbeddingModel) in python:

grpc._channel._InactiveRpcError: <_InactiveRpcError of RPC that terminated with:
status = StatusCode.INVALID_ARGUMENT
details = "Invalid input image string with the following error message: Too many VMS requests in process: 100"
debug_error_string = "UNKNOWN:Error received from peer ipv4:**** {created_time:"*****", grpc_status:3, grpc_message
following error message: Too many VMS requests in process: 100"}
...
google.api_core.exceptions.InvalidArgument: 400 Invalid input image string with the following error message: Too many VMS requests in process: 100

I'm unable to find similar errors on the internet, to understand exactly why it happens and how to solve it.
What I tested so far, is to check the quotas in the GCP project. There is one that was near the max, so I increased it to a very high number. I still receive the same error one week after this change. Here is the current quota configuration:

Vertex AI API

Quota: Regional online prediction requests per base model per minute per region per base_model

Dimensions:

region : us-central1
base_model : multimodalembedding

Current value: 5,000

I'm open to providing more details, but I'm not sure what is causing this. Thank you in advance.

MarvinLlamas

Hi @G_rubio,

Welcome to the Google Cloud Community!

It looks like you are encountering a concurrency limit, not a quota issue. Your model processes up to 100 requests at the same time, rejecting any beyond that.

Here are the potential ways that might help with your use case:

Use Batching: Your MultiModalEmbeddingModel is designed for batch efficiency. Instead of sending one request per image, you can send a single request with multiple images, making your workflow significantly more efficient and reducing the number of individual requests.
Retry strategy: If you're experiencing throttling errors, try reducing your request frequency and implementing a retry mechanism to manage API limits efficiently.

If you continue to run into issues, consider reaching out to Google Cloud Support to further check underlying issues. When you contact them, be sure to provide as much detail as possible and include screenshots. This will help them understand your problem better and get it sorted out more quickly.

Was this helpful? If so, please accept this answer as “Solution”. If you need additional assistance, reply here within 2 business days and I’ll be happy to help.

G_rubio

Hi @MarvinLlamas!

Thank you for your answer! I thought about the batch request option before, in fact I did use the batch request functionality for gemini requests. But I didn't find any documentation in which several images/text/request are send in batch using the multimodalembedding model. In fact, in the link you sent, I think the embeddings are requested individually. I saw batch request of embeddings with a text embeddings model, but for my case, I need image embeddings.

Below are a couple of attempts to use the multimodal embedding model in batch:

TEST 1

multimodal_model = aiplatform.Model(
    model_name="multimodalembedding@001"
)

NotFound: 404 The Model does not exist.

TEST 2

multimodal_model = aiplatform.Model.from_pretrained("multimodalembedding@001")

batch_prediction_job = multimodal_model.batch_predict(
    job_display_name="multimodal_embedding_batch_job",
    gcs_source=[f"gs://{BUCKET_NAME}/batch_inputs/{input_file_name}"],
    gcs_destination_prefix=f"gs://{BUCKET_NAME}/batch_outputs/",
    instances_format="jsonl",
    predictions_format="jsonl"
)

AttributeError: type object 'Model' has no attribute 'from_pretrained'

Could you send me a code snippet with a dummy example on how to request multiple images at once? It would be very helpful, I don't find any examples on the internet, I'm not sure it is possible.

Regards, Guille

Error with MultiModalEmbeddingModel: Too many VMS requests in process:100

TEST 1

TEST 2