HI
I tried deploy a 7b and a 13b llama2 derived model to vertex AI.
However, the models I desired are not provided in the model garden so far.
So I modified the colab code(model_garden_pytorch_llama2_peft.ipynb) to load the models stored in my google cloud storage which is downloaded from elsewhere.
For the 7b model, it works fine with single GPU.
However, it requires more than 1 GPU(V100 or L4) to load a 13b model.
When I request 2 L4 GPUs via API, an error occurs as follows:
ResourceExhausted: 429 The following quotas are exceeded: CustomModelServingL4GPUsPerProjectPerRegion 8: The following quotas are exceeded: CustomModelServingL4GPUsPerProjectPerRegion
When I request 2 V4 GPUs via API, another error occurs as:
ResourceExhausted: 429 The following quotas are exceeded: CustomModelServingV100GPUsPerProjectPerRegion 8: The following quotas are exceeded: CustomModelServingV100GPUsPerProjectPerRegion
It is painful to find out the reason why it does not work.
I cannot find any document related to this problem.
By the way, I am currently a new free tier user.
Does it mean I have to upgrade my account?
If I upgrade my account, is it guaranteed to request at least 2 GPUS to deploy the 13b model?
Any answer is appreciated.
Hi@BryanKWNI,
Welcome and thank you for reaching out to our community for clarifications.
I understand that you are getting an Error code 429, hindering you from adding GPUs to load your model. Unfortunately, free tier users are limited to whatever default GPU is set to the service they are using as mentioned in the Free Trial Program Coverage.
You can't add GPUs to your VM instances.
Upgrading to a paid Cloud Billing account will definitely provide solutions to your concerns. Please reach out to Vertex AI Support regarding this matter.
I have been facing issue on vertex ai workbench. Even though I am able to provision an instance with GPU. I am not able to use those GPUs. I tried installing drivers but nothing is working as such
User | Count |
---|---|
2 | |
2 | |
1 | |
1 | |
1 |