Requesting multiple GPU while deploying model in v... - Page 2

BryanKWNI · 11-09-2023 01:17 AM

HI

I tried deploy a 7b and a 13b llama2 derived model to vertex AI.

However, the models I desired are not provided in the model garden so far.

So I modified the colab code(model_garden_pytorch_llama2_peft.ipynb) to load the models stored in my google cloud storage which is downloaded from elsewhere.

For the 7b model, it works fine with single GPU.

However, it requires more than 1 GPU(V100 or L4) to load a 13b model.

When I request 2 L4 GPUs via API, an error occurs as follows:

ResourceExhausted: 429 The following quotas are exceeded: CustomModelServingL4GPUsPerProjectPerRegion 8: The following quotas are exceeded: CustomModelServingL4GPUsPerProjectPerRegion

When I request 2 V4 GPUs via API, another error occurs as:

ResourceExhausted: 429 The following quotas are exceeded: CustomModelServingV100GPUsPerProjectPerRegion 8: The following quotas are exceeded: CustomModelServingV100GPUsPerProjectPerRegion

It is painful to find out the reason why it does not work.

I cannot find any document related to this problem.

By the way, I am currently a new free tier user.

Does it mean I have to upgrade my account?

If I upgrade my account, is it guaranteed to request at least 2 GPUS to deploy the 13b model?

Any answer is appreciated.

Requesting multiple GPU while deploying model in vertex AI