vertex ai private endpoint deployment issue

Brian_oozou · 10-25-2024 07:44 AM

We have a vertex ai public endpoint and our custom model deploy on it successfully. But when tried to deploy the same model to a private endpoint, it failed with the following error in the log.

"OSError: We couldn't connect to 'https://huggingface.co' to load this file, couldn't find it in the cached files and it looks like meta-llama/Meta-Llama-3-8B is not the path to a directory containing a file named config.json."
"Checkout your internet connection or see how to run the library in offline mode at 'https://huggingface.co/docs/transformers/installation#offline-mode'."

It looks like when use the private endpoint, the compute node has no internet access. I used the following command to create the private endpoint.

gcloud beta ai endpoints create \
  --display-name=ENDPOINT_DISPLAY_NAME \
  --network=FULLY_QUALIFIED_NETWORK_NAME \
  --region=REGION

Anyone faced the similar issue and any solution on this?

ibaui

Hi @Brian_oozou,

Welcome to Google Cloud Community!

The error message indicates that Vertex AI private endpoint's compute instances lack internet access, preventing them from downloading the necessary model files from Hugging Face (meta-llama/Meta-Llama-3-8B). This can be a common issue when deploying models to private environments where external access is restricted for security reasons. Here are some things you can consider to address the issue:

Check Network Configuration: Ensure that your private endpoint has the necessary network configurations to access the internet. You might need to configure Private Service Connect or VPC Network Peering.

Pre-download and Upload Model Weights:

Download: Download the entire model (meta-llama/Meta-Llama-3-8B) locally on a machine with internet access using the Hugging Face Transformers library. This will download the .bin, config.json, and other necessary files.
Upload: Upload the entire downloaded model directory to a Cloud Storage bucket. Make this bucket accessible to your Vertex AI private endpoint's service account.
Deploy: Modify your deployment code to load the model from Cloud Storage instead of directly from Hugging Face. Your code will need to point to the gs:// path of your model in Cloud Storage.

Ensure that the service account used by your Vertex AI endpoint has the necessary permissions to read from the GCS bucket.

If you face any issues with permissions, you might need to adjust IAM roles or bucket permissions in the Google Cloud Console.

Use a Model Registry: Instead of downloading the model every time you deploy, consider using the Vertex AI Model Registry.

Train or import your model: Train your model (if you have your own training pipeline) or import the pre-trained model weights from your local machine into the model registry.
Deploy from the registry: When deploying your endpoint, specify the model from the registry instead of downloading it during deployment. This eliminates the need for internet access during deployment.

I hope the above information is helpful.