Solved: The following quotas are exceeded: MatchingEngineD...

luillyfe · 10-14-2023 12:00 PM

I am trying to deploy an Index to an Endpoint in Vector Search. Despise the embeddings I use to create the index are generated by the Vertex AI API, the deployment is still failing. This is the first Index I tried deploying in the current Google Account. SO it really frustrates me so much that using the APIs is too painful.

Poala_Tenorio

The error message "The following quotas are exceeded: MatchingEngineDeployedIndexNodes" indicates that you have exceeded the quota for deployed index nodes in the Matching Engine service on Google Vertex AI. To resolve this issue, you will need to request a quota increase for the MatchingEngineDeployedIndexNodes resource in your Google Cloud project.

Go to the Google Cloud Console: https://console.cloud.google.com/. In the left-hand navigation pane, click on "IAM & Admin" and then select "Quotas."

In the "Quotas" page, you can filter by service. Search for the "Matching Engine" service or similar service related to Vector Search.

Find the specific quota named "MatchingEngineDeployedIndexNodes" and click on it. You will see the current usage and limit. Click the "Edit Quotas" button.

In the form, specify the new quota limit you require. Explain the reason for the increase and provide any relevant information about your use case. Click "Submit request" to send your quota increase request. Google's support team will review your request, and once approved, the quota increase should be applied to your project.

Please note that quota increases are subject to approval and may take some time to process. Be sure to provide a clear and compelling reason for the quota increase, as this will help expedite the approval process.

View solution in original post

seailz

Same issue.

Poala_Tenorio

The error message "The following quotas are exceeded: MatchingEngineDeployedIndexNodes" indicates that you have exceeded the quota for deployed index nodes in the Matching Engine service on Google Vertex AI. To resolve this issue, you will need to request a quota increase for the MatchingEngineDeployedIndexNodes resource in your Google Cloud project.

Go to the Google Cloud Console: https://console.cloud.google.com/. In the left-hand navigation pane, click on "IAM & Admin" and then select "Quotas."

In the "Quotas" page, you can filter by service. Search for the "Matching Engine" service or similar service related to Vector Search.

Find the specific quota named "MatchingEngineDeployedIndexNodes" and click on it. You will see the current usage and limit. Click the "Edit Quotas" button.

In the form, specify the new quota limit you require. Explain the reason for the increase and provide any relevant information about your use case. Click "Submit request" to send your quota increase request. Google's support team will review your request, and once approved, the quota increase should be applied to your project.

Please note that quota increases are subject to approval and may take some time to process. Be sure to provide a clear and compelling reason for the quota increase, as this will help expedite the approval process.

luillyfe

Thank you for pointing me out on the right direction. JIC, trying to submit a request for a quota higher than 1 is not allowed and it generates an error right away. Sending a email to cloud support seems like the best option.

amitnigam_arch

There is no MatchingEngineDeployedIndexNodes in the quotas, instead it shows 'Matching Index Engine per region' it shows me value 10. Still I am getting this error on my first deployment. Please help how can I deploy my first vector index, if I can't all the effort spent in creating embeddings is a waste.

amitnigam_arch

It is not solved. This issue comes even when you deploy index endpoint for the first time. This is so frustrating.

luillyfe

TL/TR => Go to the bottom

Now I got a better understanding about this issue. The reason behind about why the MatchingEngineDeployedIndexNodes quota is exceeding is due to that a new node is required in order for the index to be deployed BUT the current quota is set to 1 😱😱.

Why a new node is required?

It is because more resources are needed! ohhh what a discovery???. It seems obvious but staying the fact allows to come up with a better reasoning. Let's think for a bit that the nodes are virtual machines in Compute Engine if we got a restriction quota of just 1 virtual machine per project, how can we deploy an application that has higher needs for cpu/memory that the current machine? we deploy a machine with better capabilities !!

Answer

Deploy your index to a machine type that has Either better CPU or Better Memory capabilities.

seailz

I tried that, it didn't work for me

amitnigam_arch

tried standard machine type with autoscaling, but no luck 😞

luillyfe

Try a machine that has high memory

amitnigam_arch

tried highest available memory i.e. 128GiB, that too failed deployment. I fail to understand if anyone at GCP vertex team, ever QA tested endpoint deployment or not!!

ragnarbulldev

Hi so this just happened to me as well with many failed deploys. You need to set MIN_REPLICA_COUNT = 1 (no need to set MAX_REPLICA_COUNT as per docs at: https://cloud.google.com/vertex-ai/docs/vector-search/deploy-index-public#deploy_index_autoscaling-g... ). This will ensure your node count is max 1 if your quota is still the default. Like this:

gcloud ai index-endpoints deploy-index INDEX_ENDPOINT_ID \
    --deployed-index-id=DEPLOYED_INDEX_ID \
    --display-name=DEPLOYED_INDEX_NAME \
    --index=INDEX_ID \
    --machine-type=MACHINE_TYPE \ (e.g. "e2-standard-2"
    --min-replica-count=MIN_REPLICA_COUNT \ set to 1
    --max-replica-count=MAX_REPLICA_COUNT \ set to 1 or leave out
    --region=LOCATION \
    --project=PROJECT_ID

I added the machine-type here too... I couldn't find this anywhere in documentation but saw a Reddit comment saying you can configure it... @Poala_Tenorio I strongly recommend Google Cloud to update their docs to reflect this. I spent about US$50 more than I wanted deploying a project due to this 😕 there was no need to have a large machine running and I followed a GCP staff member's blog post on it.