Analyzing images with Gemini Pro Vision and BigQuery

aliciamw · 01-25-2024 09:34 AM

The Gemini API recently launched, marking a significant leap forward in the capabilities of Google's AI technology. One of the biggest differences between Gemini and its predecessors is that it is multimodal, meaning it can accept different types of information as part of a prompt, including text, code, audio, image, and video.

Because of Gemini's ability to respond to multimodal prompts, it is an excellent choice for analyzing images. You can even combine multiple images into a question, such as this one (seen below) that prompts the Gemini Pro Vision model with an image of a pricing table and a bowl of fruit with the ask to calculate its cost.

Example prompt that combines text and images

What if you want to analyze a large set of data, rather than just one prompt? Because you can index images stored in Google Cloud Storage using BigQuery Object Tables, there is a huge opportunity in analyzing data with the Gemini model using BigQuery. This approach has several advantages:

It allows users who are familiar with SQL to leverage the power of Gemini without needing to write additional code
You can more easily analyze a large batch of data rather than having to make individual requests for each image or text prompt
You don't need to export your data from BigQuery before you can analyze it with Gemini

Let's see how you can set up a process to analyze images in batch using BigQuery and the Gemini Pro Vision model.

Prompting Gemini from BigQuery

Let's ask Gemini to describe and summarize a collection of images of landmarks. Here are the steps involved to do this with BigQuery:

Add your images to a BigQuery Object Table
Create a BigQuery Remote Function that calls the Gemini Pro Vision model
Use the remote function within a SQL statement in BigQuery to analyze your images

Creating the Remote Function

We've created a Terraform module that brings all three of these steps together into one deployable package, including the sample image data. Here's how you get started:

Create a Google Cloud Platform project and enable billing.

We suggest creating a new project so that you can easily shut down the project when you are finished exploring this demo, and prevent any further charges.

Enable the Cloud Resource Manager API in your project which will enable Terraform to do its job deploying resources.

Open Cloud Shell and clone the Github repository into your project:

gcloud config set project <PROJECT ID>
git clone  https://github.com/GoogleCloudPlatform/generative-ai/
cd ./generative-ai/gemini/use-cases/applying-llms-to-data/using-gemini-with-bigquery-remote-functions

Deploy the Terraform code by running terraform init, terraform plan, and terraform apply as three consecutive commands.
```
terraform init
terraform plan
terraform apply
```

The final two commands, terraform plan and terraform apply, will request that you provide your project ID and region. This sample has been tested using region us-central1.

Once the terraform apply command has successfully completed, head to the BigQuery console where you'll see a few resources within the Explorer pane:

An overview of new resources that you will find in your BigQuery Explorer pane after deployment

Using the Remote Function

You're now ready to use the Remote Function directly in BigQuery using SQL.

The following query analyzes the images by passing the image uris (available in the object table) into the gemini_bq_demo_image remote function, which then concatenates the image with the text into a prompt ("Describe and summarize this image. Use no more than 5 sentences to do so") and makes the call to Gemini:

SELECT
 uri AS image_input,
 `gemini_demo.gemini_bq_demo_image` (uri) AS image_description
FROM
 `gemini_demo.image_object_table`

The output returned by the SELECT statement will be the image uri and the response from Gemini.

You can either copy and paste to run this query yourself, or invoke the stored procedure, image_query_remote_function_sp, which contains the same query.

Let's take a look at the results:

The query results containing the Gemini responses
After you are finished with the demo, remember to clean up the resources to avoid incurring further charges. You can easily do this using the instructions provided in the repo. But, before you wrap up, make sure to check out the "Next Steps" section below - there's more to explore and experiment with!

Next steps

The flexibility of Gemini as a general and multimodal model opens a new world of possibilities for analysis within BigQuery. Once a BigQuery Remote Function is created, accessing Gemini becomes as simple as a SQL query.

Try it out for yourself! And take it a step further by:

Learning more about Gemini and BigQuery Remote Functions.
Perusing more multimodal use cases in the sample notebook Gemini: An Overview of Multimodal use cases.
Using the Remote Function with your own images! (Learn how in the Github repository README's "Make it your own" section)
Changing the prompt to test your own ideas and use cases by navigating to the Google Cloud Function, using the inline editor to edit the code, and redeploying the function.
Checking out the Github repository README to learn more about the Terraform module and the additional BigQuery Remote Function that calls the text-only Gemini Pro model.

This blog post was co-authored by Shane Glass (Twitter, Medium) and Alicia Williams (Twitter, Medium). If you have any questions, please leave a comment below. Thanks!