Error in deploying docker container on vertex AI

rajmaniShukla · 03-17-2025 01:32 AM

Hi everyone,

I am trying to deploy a YOLO custom container model on Vertex AI using containerization. I have successfully:

✅ Built and tested the Docker image locally.
✅ Verified that the API (FastAPI + YOLO) runs correctly in the container.
✅ Successfully deployed and tested the same image on Cloud Run.

However, when deploying on Vertex AI as a custom container model, I am facing issues.

Setup Details:

Base Image: pytorch/pytorch:2.2.0-cuda12.1-cudnn8-runtime
(for GPU support)
Serving Framework: FastAPI + Uvicorn + Gunicorn
Hardware Target: Trying to use n1-standard-4 with NVIDIA p4 GPU
Docker CMD: The container runs startup.sh which loads models and starts the FastAPI server.
Inference Request Format: Expects base64 encoded images in application/json.

Deployment Steps:

Built the Docker image locally and tested it with docker run. ✅
Pushed the image to Google Artifact Registry. ✅

Created a Vertex AI Model using:

- name: Upload Mobile Model
        run: |
          EXISTING_MODEL=$(gcloud ai models list --region=$REGION --filter="displayName=freshpet-mobile" --format="value(name)" --limit=1)
          if [ -z "$EXISTING_MODEL" ]; then
            gcloud ai models upload --region=$REGION --display-name=freshpet-mobile --container-image-uri=$IMAGE_NAME_MOBILE
          else
            echo "Mobile model already exists, skipping upload."
          fi

      - name: Wait for Mobile Model to be Registered
        run: |
            timeout=$TIMEOUT_SECONDS
            start_time=$(date +%s)
            while true; do
              MODEL_MOBILE_NAME=$(gcloud ai models list --region=$REGION --filter="displayName=freshpet-mobile" --format="value(name)" --limit=1)
              if [ -n "$MODEL_MOBILE_NAME" ]; then
                echo "Model Registered: $MODEL_MOBILE_NAME"
                echo "MODEL_MOBILE_NAME=$MODEL_MOBILE_NAME" >> $GITHUB_ENV
                break
              fi
              if [ $(( $(date +%s) - start_time )) -gt $timeout ]; then
                echo "Timeout waiting for model registration." && exit 1
              fi
              sleep 10
            done

Created an Endpoint and deployed the model.

 - name: Deploy Mobile Model to Endpoint
        run: |
          ENDPOINT_MOBILE_ID=$(gcloud ai endpoints list --region=$REGION --filter="displayName=freshpet-mobile-endpoint" --format="value(name)" --limit=1)
          if [ -z "$ENDPOINT_MOBILE_ID" ]; then
            ENDPOINT_MOBILE_ID=$(gcloud ai endpoints create --region=$REGION --display-name=freshpet-mobile-endpoint --format="value(name)")
          fi
          gcloud ai endpoints deploy-model $ENDPOINT_MOBILE_ID \
            --region=$REGION \
            --model=$MODEL_MOBILE_NAME \
            --display-name=mobile-container-deploy \
            --machine-type=$MACHINE_TYPE \
            --accelerator=count=1,type=$GPU_TYPE \
            --min-replica-count=$MIN_REPLICAS \
            --enable-access-logging \
            --autoscaling-metric-specs=$AUTOSCALING_METRIC

Issues Faced:

The model fails to load on Vertex AI (while it works perfectly on Cloud Run).
No logs appear in the Vertex AI endpoint, making debugging difficult.
When I try to send a request, I get 503 Service Unavailable or Container Failed to Start errors.

Questions:

Does Vertex AI require additional configurations for FastAPI-based custom containers?
How can I enable GPU support correctly inside the Vertex AI container?
Is there a different logging mechanism I should use to debug why the container is failing?
Are there specific health check requirements for Vertex AI containers?

Any help would be greatly appreciated! Thanks in advance. 🙏

rajmaniShukla

error mail

Hello Vertex AI Customer,

Due to an error, Vertex AI was unable to deploy model "freshpet-mobile@1".
Additional Details:
Operation State: Failed with errors
Resource Name:
projects/1061052074258/locations/us-central1/models/5571072585824731136
Error Messages: Model server never became ready. Please validate that your
model file or container configuration are valid. Model server logs can be
found at