Solved: Vertex AI endpoints: custom models support and bil...

AnnaD · 12-04-2024 08:45 AM

Hello everyone!

My team would like to use Google Cloud to host custom Huggingface models for inference. Ideally, we are looking for a serverless solution. I found a few tutorials on making Vertex AI endpoints (both on Google Cloud: https://cloud.google.com/vertex-ai/generative-ai/docs/model-garden/deploy-and-inference-tutorial and Huggingface: https://huggingface.co/docs/google-cloud/en/examples/vertex-ai-notebooks-deploy-llama-3-1-405b-on-ve...) which look like what we would like to do, but I still have some questions:

1) Is it correct that we will be billed for the entire time of the endpoint's existence? I did not find information on whether there are times of inactivity for the endpoints, how to enforce cooldown time, and whether the active and inactive periods are billed differently;

2) Do the endpoints support both custom models and custom adapters? If so, can adapters be added during prediction?

3) Apart from the Cloud GPU and Vertex AI prediction, are there any other billable components we must have for a successful deployment?

MarvinLlamas

Hi @AnnaD,

Welcome to Google Cloud Community!

To address your question, here are potential ways that might help with your use case:

1. Endpoint Billing: Regarding endpoint billing, I recommend reaching out to Google Cloud Billing support for confirmation. as they can provide more information about the billing structure, including whether inactive periods are billed differently, how cooldown times are enforced, and whether the entire duration of an endpoint's existence is charged.
2. Custom Models and Adapters Support: You’re correct; Vertex AI endpoints support custom models and adapters. You can deploy custom models to an endpoint and add adapters during predictions, providing flexibility in managing and updating models.
3. Additional Billable Components: In addition to Cloud GPUs and Vertex AI prediction, charges may apply for storage, data transfer, and other Google Cloud services. For a full cost breakdown, you can check the Vertex AI pricing page.

You may refer to the following documentation, which will help you understand the overall billing and deployment options for Vertex AI:

I hope the above information is helpful.

View solution in original post

MarvinLlamas

Hi @AnnaD,

Welcome to Google Cloud Community!

To address your question, here are potential ways that might help with your use case:

1. Endpoint Billing: Regarding endpoint billing, I recommend reaching out to Google Cloud Billing support for confirmation. as they can provide more information about the billing structure, including whether inactive periods are billed differently, how cooldown times are enforced, and whether the entire duration of an endpoint's existence is charged.
2. Custom Models and Adapters Support: You’re correct; Vertex AI endpoints support custom models and adapters. You can deploy custom models to an endpoint and add adapters during predictions, providing flexibility in managing and updating models.
3. Additional Billable Components: In addition to Cloud GPUs and Vertex AI prediction, charges may apply for storage, data transfer, and other Google Cloud services. For a full cost breakdown, you can check the Vertex AI pricing page.

You may refer to the following documentation, which will help you understand the overall billing and deployment options for Vertex AI:

I hope the above information is helpful.

Vertex AI endpoints: custom models support and billing