Get hands-on experience with 20+ free Google Cloud products and $300 in free credit for new customers.

Requesting multiple GPU while deploying model in vertex AI

HI

I tried deploy a 7b and  a 13b llama2 derived model to vertex AI.

However, the models I desired  are not provided in the model garden so far.

So I modified the colab code(model_garden_pytorch_llama2_peft.ipynb) to load the models stored in my google cloud storage which is downloaded from elsewhere.

For the 7b model, it works fine with single GPU.

However, it requires more than 1 GPU(V100 or L4) to load a 13b model.

When I request 2 L4 GPUs via API, an error occurs as follows:

ResourceExhausted: 429 The following quotas are exceeded: CustomModelServingL4GPUsPerProjectPerRegion 8: The following quotas are exceeded: CustomModelServingL4GPUsPerProjectPerRegion

When I request 2 V4 GPUs via API, another error occurs as:

ResourceExhausted: 429 The following quotas are exceeded: CustomModelServingV100GPUsPerProjectPerRegion 8: The following quotas are exceeded: CustomModelServingV100GPUsPerProjectPerRegion

It is painful to find out the reason why it does not work.

I cannot find any document related to this problem.

By the way, I am currently a new free tier user.

Does it mean I have to upgrade my account?

If I upgrade my account, is it guaranteed to request at least 2 GPUS to deploy the 13b model?

Any  answer is appreciated.

 

 

0 2 1,428
2 REPLIES 2