Hi team,
I am trying to deploy the deepseek-ai/deepseek-r1-distill-qwen-7b model from Vertex AI Model Garden to an endpoint in the europe-west1 region using the provided Python code. I am requesting 1 NVIDIA_TESLA_T4 GPU for this deployment.
However, the deployment fails with a quota error. I am currently using the $300 Google Cloud credit.
What I've already tried: I am using the recommended vertexai.preview.model_garden method as shown in the code below to perform the one-click deployment. I have verified the Project ID and region are correct in my code.
Relevant code:
MODEL_ID = "deepseek-ai/deepseek-r1-distill-qwen-7b"
ENDPOINT_DISPLAY_NAME = "deepseek-r1-distill-qwen-7b-mg-one-click-deploy" MODEL_DISPLAY_NAME = "deepseek-r1-distill-qwen-7b-deployed-model"
MACHINE_TYPE = "n1-standard-4"
ACCELERATOR_TYPE = "NVIDIA_TESLA_T4" # The quota error is related to this
ACCELERATOR_COUNT = 1
Error message: 429 The following quotas are exceeded: CustomModelServingT4GPUsPerProjectPerRegion. Please follow https://cloud.google.com/docs/quotas/view-manage to manage quota. 8: T
he following quotas are exceeded: CustomModelServingT4GPUsPerProjectPerRegion. Please follow https://cloud.google.com/docs/quotas/view-manage to manage quota.
How can I resolve this quota issue to deploy and test this model? Do I need to request a quota increase even with the free trial credits?
Thank you for your help!
There is a soft quota for certain resources.
You need to navigate to the IAM > Quota & System Limits
Look in my screen, notice the first and third lines. There is a quota for the L4 GPU, that's used for Colab Enterprise.
You will have to go to GPUs (all regions) under the Name column.
Go all the way to the end of the line, click the triple dot and request additional GPU
I suggest you request 1 at a time. The approval usually takes a few minutes. Hope this helps.