HI
I tried deploy a 7b and a 13b llama2 derived model to vertex AI.
However, the models I desired are not provided in the model garden so far.
So I modified the colab code(model_garden_pytorch_llama2_peft.ipynb) to load the models stored in my google cloud storage which is downloaded from elsewhere.
For the 7b model, it works fine with single GPU.
However, it requires more than 1 GPU(V100 or L4) to load a 13b model.
When I request 2 L4 GPUs via API, an error occurs as follows:
ResourceExhausted: 429 The following quotas are exceeded: CustomModelServingL4GPUsPerProjectPerRegion 8: The following quotas are exceeded: CustomModelServingL4GPUsPerProjectPerRegion
When I request 2 V4 GPUs via API, another error occurs as:
ResourceExhausted: 429 The following quotas are exceeded: CustomModelServingV100GPUsPerProjectPerRegion 8: The following quotas are exceeded: CustomModelServingV100GPUsPerProjectPerRegion
It is painful to find out the reason why it does not work.
I cannot find any document related to this problem.
By the way, I am currently a new free tier user.
Does it mean I have to upgrade my account?
If I upgrade my account, is it guaranteed to request at least 2 GPUS to deploy the 13b model?
Any answer is appreciated.