TL;DR Llama 4, the first multimodal models in the Llama family featuring a Mixture-of-Experts (MoE) architecture, is now available on Vertex AI! You can deploy Llama 4 Scout (with up to 10M token context model) and Llama 4 Maverick on Vertex AI with three lines of code using the Vertex AI Model Garden SDK.
Deploying Llama 4 in the Vertex AI console
Today, we're excited to announce that Llama 4, the latest generation of open models from Meta, is available for you to use on Vertex AI! This is a significant leap forward, especially for those of you looking to build more sophisticated and personalized multimodal applications.
Llama 4 marks the family's first multimodal models powered by a Mixture-of-Experts (MoE) architecture. What does this mean for you? MoE allows models to be very large in total parameters while only activating a subset ("experts") for any given input token, leading to more efficient training and inference. Furthermore, Llama 4 utilizes early fusion, a technique that integrates text and vision information right from the initial processing stages within a unified model backbone. This joint pre-training with text and image data allows the models to grasp complex, nuanced relationships between modalities more effectively than ever before.
Llama 4 comes in two released flavors, Scout and Maverick, giving you options based on your performance needs and resource constraints. There is also a larger "teacher" model, Behemoth, currently in training.
To help developers create safe and useful Llama-supported applications and reduce the risk of adversarial failures, both models incorporate tunable system-level and multi-layered mitigations at each stage of development, from pre-training to post-training.
The easiest way to get Llama 4 up and running is through the Vertex AI Model Garden.
We've streamlined the deployment process – you can deploy an optimized Llama 4 endpoint with just a few lines of code using the Vertex AI Model Garden SDK.
Here’s a quick example of how to deploy the Llama 4 Scout Instruct model. First you initialize the OpenModel instance using the associated model ID. You can find the model ID in the Vertex AI Model Garden UI or using the list_deployable_models method. Then you start the deployment process.
# pip install 'google-cloud-aiplatform>=1.84.0' 'openai' 'google-auth' 'requests' import vertexai from vertexai.preview import model_garden vertexai.init(project="your-project-id", location="your-region") llama4_model = model_garden.OpenModel("meta/llama4@llama-4-scout-17b-16e-instruct") llama4_endpoint = llama4_model.deploy(accept_eula=True)
By default, the model will use the deployment recipe Vertex AI Model Garden provides. You can review the recipe using the list_deploy_options method associated with your OpenModel instance.
After you start the deployment process, you can monitor the deployment process from the Vertex AI Prediction as shown below.
Screenshot of model deployment inVertex AIDeploying the model in this case would take ~ 20 mins. After the model is deployed, you can use both Vertex AI API for Python (here an example) or Chat Completions API to start using Llama 4. Below an example of how to use Llama 4 Scout for a simple image captioning task.
import google.auth import openai creds, project = google.auth.default() auth_req = google.auth.transport.requests.Request() creds.refresh(auth_req) ENDPOINT_RESOURCE_NAME = 'projects/{your-project-id}/locations/{your-endpoint-region}/endpoints/{your-endpoint-id' BASE_URL = ( f"https://{your-region}-aiplatform.googleapis.com/v1beta1/{ENDPOINT_RESOURCE_NAME}" ) client = openai.OpenAI(base_url=BASE_URL, api_key=creds.token) model_response = client.chat.completions.create( model="", messages=[ { "role": "user", "content": [ {"type": "image_url", "image_url": {"url": "https://upload.wikimedia.org/wikipedia/commons/thumb/c/cb/The_Blue_Marble_%28remastered%29.jpg/580px-The_Blue_Marble_%28remastered%29.jpg"}}, {"type": "text", "text": "What is in the image?"}, ], } ], temperature=0, max_tokens=50, ) print(model_response) # The image presents a stunning visual representation of Earth, showcasing its diverse geography and atmospheric features. The planet's surface is predominantly blue, with swirling white clouds scattered across the oceans, while the landmasses are visible in shades of brown and gray, set against the inky blackness of space.
# The image presents a stunning visual representation of Earth, showcasing its diverse geography and atmospheric features. The planet's surface is predominantly blue, with swirling white clouds scattered across the oceans, while the landmasses are visible in shades of brown and gray, set against the inky blackness of space.
To start using Llama 4 models on Vertex AI, here few suggestions:
Thank you for reading! I encourage you to connect and reach out on Linkedin and X to share feedback, questions and what you build on Vertex AI.