TLDR; Deploy open models on Vertex AI in just THREE lines of code! The new Vertex AI Model Garden CLI and SDK, powered by the Deploy API, offers a model-centric interface, providing a consistent and fluid deployment experience for your open models on Vertex AI.
Have you ever tried to deploy open models on Vertex AI? It can sometimes involve navigating numerous API calls and parameters, which can lead to a slower, more error-prone deployment experience.
But what if deploying state-of-the-art open models would be as simple as choosing your model and hitting the 'deploy' button?
The new Vertex AI Model Garden CLI and SDK, powered by the new Deploy API, are designed to do just that. These tools are truly model-centric, providing you with a more consistent, and fluid experience. The SDK simplifies prototyping, removing the need for users to specify container details. The CLI provides a precise and interactive command-line interface for managing models, offering a programmatic alternative to the UI's one-click deployment and enabling scriptable automation.
Let’s see how you can use the new Vertex AI Model Garden SDK. For more information about Vertex AI Model Garden CLI, check out the official documentation.
Let’s imagine that you want to deploy the new Gemma 3 model on Vertex AI. Without the new Deploy API, it would require around 50 lines of code, as demonstrated in the "Deploying Gemma 3 on Vertex AI" section of the "Gemma 3 on Vertex AI” blog. With the new Deploy API, you can deploy an open model in 2 lines of code.
To deploy an open model, Vertex AI Model Garden SDK provides the OpenModel
class which simplifies the process of deploying these models on Vertex AI for inference. After you initialize the model, you use the deploy
method to serve the selected open model to a Vertex AI Endpoint, which is a managed service that allows you to deploy and scale your AI models.
from vertexai.preview import model_garden model_id = "google/gemma3@gemma-3-1b-it" gemma_model = model_garden.OpenModel(model_id) gemma_endpoint = gemma_model.deploy()
For those requiring granular control, the Vertex AI Model Garden SDK allows you to verify the default deployment configuration using list_deploy_options
method as shown below.
gemma_model.list_deploy_options() # [model_display_name: "gemma-3-1b-it" #... #dedicated_resources { # machine_spec { # machine_type: "g2-standard-12" # accelerator_type: NVIDIA_L4 # accelerator_count: 1 # } #} #...
The list_deploy_options
returns details about the serving container, dedicated resources, and sample requests. This information gives insights into the deployment process, including cost and serving considerations.
The Vertex AI Model Garden SDK also allows specifying advanced deployment configurations. You can precisely define compute resources, specifying machine types, replica counts, and number of accelerators and their types. Additionally, you have the flexibility to choose your deployment infrastructure, opting for example either Spot VMs or dedicated endpoints based on your needs. Finally you can customize your serving container with custom image specifications, port mappings, health checks, and environment variables, ensuring your deployment aligns perfectly with your requirements.
Below you can see how you can set some advanced configuration to deploy the most downloaded diffusion models from Stability AI on Hugging Face.
from vertexai.preview import model_garden sd_model = model_garden.OpenModel("stabilityai/stable-diffusion-xl-base-1.0") sd_endpoint = sd_model.deploy( machine_type="g2-standard-4", accelerator_type="NVIDIA_L4", accelerator_count=1, min_replica_count=1, max_replica_count=1, endpoint_display_name="sd-endpoint", model_display_name="sd-model", deploy_request_timeout=3 * 60 * 60, ) prediction = sd_endpoint.predict(instances=["photorealistic, ultra-detailed, close-up portrait of a fluffy Maine Coon cat, emerald green eyes, golden hour lighting, soft focus background, intricate fur texture, 8k resolution, cinematic"]) plot_image_from_bytes(prediction.predictions[0])
Below you have the generated image.
To minimize wasted time and prevent deployment failures due to configuration issues, the Vertex AI Model Garden SDK offers detailed error handling for common roadblocks. These include insufficient quota for deploying the selected model, organizational policies that restrict deployment, missing End-User License Agreements (EULAs) required for model access, and other potential issues. For example, the following error message demonstrates how the SDK handles an invalid token when attempting to deploy a gated model from the Hugging Face Hub.
from vertexai.preview import model_garden try: model = model_garden.OpenModel("black-forest-labs/FLUX.1-dev") endpoint = model.deploy(hugging_face_access_token="invalid-token") except Exception as e: print(f"Error: {e}") INFO:vertexai.model_garden._model_garden:Deploying model: black-forest-labs/FLUX.1-dev Error: 400 Model publishers/hf-black-forest-labs/models/flux.1-dev@001 is a gated Hugging Face model. The token is not valid or does not have permission. Please provide a valid Hugging Face access token.
To start using the new Vertex AI Model Garden SDK, check out the resources below, where you’ll find documentation and some notebook samples:
Let’s connect on Linkedin and X to share feedback and questions about Vertex AI!