Open Models on Vertex AI with Hugging Face: let’s ...

ilnardo92 · 11-14-2024 02:48 PM

Introduction

Imagine that you have a party and you want to build a simple “Guess what” app where you submit a riddle like “I have keys, but no locks. I have a space, but no room. You can enter, but can't go outside. What am I?” and an AI app provides the answer and represents the subject.

You share this idea with a couple of your friends who are full stack app developers. They suggest using models from Hugging Face to generate both the image and its witty caption. However, neither of you is an AI expert.

How do you identify the appropriate model? How do you test it? How do you make it accessible within the app? Answering these questions could take weeks you don’t have. What you have is only limited resources and engineering time!

Wouldn't it be great to have a place where you could explore and test various models without requiring in-depth AI expertise? Imagine a space where you could easily find familiar models or models you've heard about and test them to determine if they fit with your idea. And if a model meets your requirements, you incorporate it quickly into your app.

This place exists and it is called Vertex AI Model Garden. Vertex AI Model Garden provides you a set of capabilities to discover, test, customize, and deploy models, including Hugging Face models.

With Vertex AI Model Garden, for example, you can find the most popular models hosted on the Hugging Face Hub, deploy them and start integrating them in your AI application. Let’s see how!

Accessing Hugging Face models on Vertex AI

This is Model Garden in Vertex AI on Google Cloud.

You’re searching for a popular ‘text-to-image’ model to generate images from your riddle. You can go in the partners section and click on Hugging Face. It will open a search portal where you can find 4000+ additional models from Hugging Face ready to deploy on Vertex AI using one-click deployment. You can filter by “Model objectives” to find the best suited model as seen below.

Let’s try to deploy the FLUX.1-dev model from Black Forest Labs, famous for its ability to transform text prompts into highly detailed and visually stunning images. After clicking on the model you are looking for, you just need to provide a Hugging Face access token and deploy it as shown below.

After deployment, a new FLUX model instance is registered in Vertex AI Model Registry and ready for generating images on a Vertex AI Endpoint.

But wait, how did it happen? You might have noticed Vertex AI Model Garden provides a recommended deployment recipe with the associated machine type. But what about the model inference runtime to generate inference? That’s when the Hugging Face Deep Learning Container comes into play!

Hugging Face Deep Learning Container

Hugging Face Deep Learning Containers (DLCs) for Google Cloud are optimized Docker containers designed for training and deploying Generative AI models. They come pre-installed with essential deep learning libraries such as Transformers, Datasets, Tokenizers, or Diffusers and with purpose-built inference solutions for performant text and embedding generation.

Hugging Face DLCs allow for direct serving and training of models, eliminating the complexity of building and optimizing environments from scratch. This makes it easier for developers to focus on their application.

Hugging Face DLCs are integrated with Google Cloud services like Vertex AI. When you select the model you want to deploy in the search portal, Vertex AI automatically fetches the corresponding Hugging Face DLC together with machine specification to enable a one-click deployment experience.

To use Hugging Face DLC programmatically, you register the model using Vertex AI Python SDK where HF_PYTORCH_URI is the Hugging Face Deep Learning Container you can use to deploy your HF models on Vertex AI.

To find the Hugging Face DLC that allows you to serve your model (FLUX model in this example), you can visit Available DLCs on Google Cloud Hugging Face documentation. Today, Hugging Face provides three families of DLCs:

Text Generation Inference (TGI)
Text Embeddings Inference (TEI)
(Regular) PyTorch (both Training and Inference)

When choosing a serving container to use, keep in mind the following:

Large Language Models (LLMs) and Vision Language Models (VLMs) are meant to be used within TGI if supported. You can check this on the Hugging Face Hub by looking for the "text-generation-inference" tag.
Text embeddings models with TEI are also supported and can be found on the Hub with the "text-embeddings-inference" tag.
Any other model, including those supported by TGI and TEI, can be served using PyTorch Inference.

Notice how PyTorch DLCs offer the greatest flexibility, enabling you to serve mostly any inference workload. While TGI and TEI are further optimized for specific models, such as TGI for LLMs and TEI for embedding models. Additionally, PyTorch DLCs support both training and fine-tuning.

In this scenario, you use regular PyTorch Inference for registering and serving the FLUX model using Vertex AI SDK as shown below. Grab the container URI from this documentation page.

from google.cloud import aiplatform
from huggingface_hub import get_token

HF_PYTORCH_URI = "us-docker.pkg.dev/deeplearning-platform-release/gcr.io/huggingface-pytorch-inference-cu121.2-2.transformers.4-44.ubuntu2204.py311"


flux_model = aiplatform.Model.upload(
    display_name="flux--generate",
    serving_container_image_uri=HF_PYTORCH_URI,
    serving_container_environment_variables={
        "HF_MODEL_ID": "black-forest-labs/FLUX.1-dev",
        "HF_TASK": "text-to-image",
        "HF_TOKEN": get_token(),
    },
)

Be sure you run huggingface-cli login in advance or set the HF_TOKEN environment variable to use the get_token function.

After registering the model, you can deploy the model as shown below.

deployed_flux_model= flux_model.deploy(
    endpoint=aiplatform.Endpoint.create(display_name="flux--generate-endpoint"),
    machine_type="g2-standard-48",
    accelerator_type="NVIDIA_L4",
    accelerator_count=4,
    sync=False
)

And finally get your generated image!

from PIL import Image
from IPython.display import display
 
response = deployed_flux_model.predict(
    instances=["A mesmerizing close-up photograph of a roaring bonfire at night, flames licking upwards with vibrant orange, yellow, and red hues, embers glowing intensely, smoke billowing dramatically against a dark, starlit sky.  The image should have a cinematic, almost surreal quality."],
    parameters={
        "width": 512,
        "height": 512,
        "num_inference_steps": 8,
        "guidance_scale": 3.5,
    },
)

image = Image.open(io.BytesIO(base64.b64decode(response.predictions[0])))
display(image)

Here you can see an example of an image you may get.

Now that you know the secret spice of deploying Hugging Face models on Vertex AI and how to get predictions, let’s see how to deploy the “Guess what” application.

Build your 1st Gen AI application

For demonstration, you may consider a very simple way to quickly create the user interface for the “Guess what” application. In this case, you can use Gradio which allows you to easily integrate APIs or any arbitrary Python function with UI components such as TextBox, Image, Bottom and more.

For the “Guess What” application, you may define this set of simple functions like the one below.

import os
from google.cloud import aiplatform
import vertexai
from vertexai.generative_models import GenerativeModel, SafetySetting

def generate_gemini_content(prompt_template, **kwargs):
    prompt = prompt_template.format(**kwargs)
    response = MODEL.generate_content(
        [prompt],
        generation_config=GENERATION_CONFIG,
        safety_settings=SAFETY_SETTINGS,
        stream=False,
    )
    return response.text

def generate_subject(riddle):
    riddle_solver_prompt_template = """
    You are the best riddle solver. Given a riddle, your goal is solve it and only indicate the subject of the riddle.
    RIDDLE: {riddle}
    SUBJECT:
    """
    subject = generate_gemini_content(riddle_solver_prompt_template, riddle=riddle)
    return subject.replace("SUBJECT:", "").strip()

def generate_prompt(subject):
    image_gen_prompt_template = """
    You are a professional prompt engineer. Given a subject, prototype the most appropriate prompt to best visualize the subject.
    Only return the preferred prompt.
    SUBJECT: {subject}
    PROMPT:
    """
    return generate_gemini_content(image_gen_prompt_template, subject=subject)

def generate_image(image_gen_prompt):
    response = ENDPOINT.predict(
        instances=[image_gen_prompt],
        parameters={
            "width": 512,
            "height": 512,
            "num_inference_steps": 8,
            "guidance_scale": 3.5,
        }
    )
    return Image.open(io.BytesIO(base64.b64decode(response.predictions[0])))

def guess_game(riddle):
    answer = generate_subject(riddle)
    prompt = generate_prompt(answer)
    image = generate_image(prompt)
    return image, answer, prompt

def increment_counter(counter):
    return counter + 1, None, "", "", ""

def reset_game(counter):
    return counter, None, "", "", ""

As you can see, `generate_gemini_content`utilizes Gemini to discover the solution to the user's riddle and produces an image generation prompt for visualization purposes, `generate_image` function creates the image associated with the riddle solution using FLUX from Hugging Face and `guess_game` accomplishes the Guess what app task by producing images and responses based on a user's riddle.

After you define these core functions, you wrap them in the Gradio interface using any components you prefer. Below you have an example of a Gradio app for the “Guess what” app.

with gr.Blocks(theme=gr.themes.Ocean()) as app:
    with gr.Row():
      gr.Markdown("# Guess What Game ❓")
      counter_state = gr.State(value=0)

    with gr.Row():
        prompt_input = gr.Textbox(label="Describe someone or something 💬 ")

    submit_btn = gr.Button("Submit")

    with gr.Row():
        image_prompt = gr.Textbox(label="Generated Image Prompt with Gemini 🎨 ", visible=True)
        image_output = gr.Image(label="Generated Image 🖼️ ")
        answer_output = gr.Textbox(label="Generated Answer with Gemini 🌌 ", interactive=False)

    with gr.Row():
        correct_btn = gr.Button("+1 Correct")
        reset_btn = gr.Button("Reset")

    counter_display = gr.Number(value=0, label="Correct Guesses 👍", interactive=False)

    submit_btn.click(
        guess_game,
        inputs=[prompt_input],
        outputs=[image_output, answer_output, image_prompt]
    )

    correct_btn.click(
        increment_counter,
        inputs=[counter_state],
        outputs=[counter_state, image_output, answer_output, image_prompt, prompt_input]
    ).then(
        lambda x: x,
        inputs=[counter_state],
        outputs=[counter_display]
    )

    reset_btn.click(
        reset_game,
        inputs=[counter_state],
        outputs=[counter_state, image_output, answer_output, image_prompt, prompt_input]
    ).then(
        lambda x: x,
        inputs=[counter_state],
        outputs=[counter_display]
    )

app.launch()

Once you define your app, you just need to launch it and you get a Gradio endpoint where you can start playing your “Guess what” game.

Conclusions

Remember this article started with the idea to create a "Guess What" app for your party? With Vertex AI Model Garden and its Hugging Face integration, you are closer to realize it than you think. So take your time, explore all open models available on Vertex AI Model Garden. And once you are ready, get them deployed on Vertex AI!

So what will you build next?

What’s next

To learn more about Vertex AI Model Garden and Hugging Face Deep Learning containers, check out the following resources.

Documentation

GitHub examples

Thanks for reading

I hope you enjoyed the article. If so, 𝗙𝗼𝗹𝗹𝗼𝘄 𝗺𝗲, 👏 this article or leave comments. Also let’s connect on LinkedIn or X to share feedback and questions 🤗 about Vertex AI you would like to find an answer.