Analyzing images with Gemini Pro Vision and BigQuery

aliciamw
Staff

bigquery-gemini-images.png

The Gemini API recently launched, marking a significant leap forward in the capabilities of Google's AI technology. One of the biggest differences between Gemini and its predecessors is that it is multimodal, meaning it can accept different types of information as part of a prompt, including text, code, audio, image, and video.

Because of Gemini's ability to respond to multimodal prompts, it is an excellent choice for analyzing images. You can even combine multiple images into a question, such as this one (seen below) that prompts the Gemini Pro Vision model with an image of a pricing table and a bowl of fruit with the ask to calculate its cost.

Example prompt that combines text and imagesExample prompt that combines text and images

What if you want to analyze a large set of data, rather than just one prompt? Because you can index images stored in Google Cloud Storage using BigQuery Object Tables, there is a huge opportunity in analyzing data with the Gemini model using BigQuery. This approach has several advantages:

  • It allows users who are familiar with SQL to leverage the power of Gemini without needing to write additional code
  • You can more easily analyze a large batch of data rather than having to make individual requests for each image or text prompt
  • You don't need to export your data from BigQuery before you can analyze it with Gemini

Let's see how you can set up a process to analyze images in batch using BigQuery and the Gemini Pro Vision model. 

Prompting Gemini from BigQuery

Let's ask Gemini to describe and summarize a collection of images of landmarks. Here are the steps involved to do this with BigQuery:

  1. Add your images to a BigQuery Object Table
  2. Create a BigQuery Remote Function that calls the Gemini Pro Vision model
  3. Use the remote function within a SQL statement in BigQuery to analyze your images

Creating the Remote Function

We've created a Terraform module that brings all three of these steps together into one deployable package, including the sample image data. Here's how you get started:

  1. Create a Google Cloud Platform project and enable billing.
    • We suggest creating a new project so that you can easily shut down the project when you are finished exploring this demo, and prevent any further charges.
  2. Enable the Cloud Resource Manager API in your project which will enable Terraform to do its job deploying resources.
  3. Open Cloud Shell and clone the Github repository into your project:
    gcloud config set project <PROJECT ID>
    git clone  https://github.com/GoogleCloudPlatform/generative-ai/
    cd ./generative-ai/gemini/use-cases/applying-llms-to-data/using-gemini-with-bigquery-remote-functions
     
  4. Deploy the Terraform code by running terraform init, terraform plan, and terraform apply as three consecutive commands.
    terraform init
    terraform plan
    terraform apply

The final two commands, terraform plan and terraform apply, will request that you provide your project ID and region. This sample has been tested using region us-central1.

Once the terraform apply command has successfully completed, head to the BigQuery console where you'll see a few resources within the Explorer pane:

An overview of new resources that you will find in your BigQuery Explorer pane after deploymentAn overview of new resources that you will find in your BigQuery Explorer pane after deployment

Using the Remote Function

You're now ready to use the Remote Function directly in BigQuery using SQL. 

The following query analyzes the images by passing the image uris (available in the object table) into the gemini_bq_demo_image remote function, which then concatenates the image with the text into a prompt ("Describe and summarize this image. Use no more than 5 sentences to do so") and makes the call to Gemini:

 

SELECT
 uri AS image_input,
 `gemini_demo.gemini_bq_demo_image` (uri) AS image_description
FROM
 `gemini_demo.image_object_table`

 

The output returned by the SELECT statement will be the image uri and the response from Gemini.

You can either copy and paste to run this query yourself, or invoke the stored procedure, image_query_remote_function_sp, which contains the same query.

Let's take a look at the results:

The query results containing the Gemini responsesThe query results containing the Gemini responses
After you are finished with the demo, remember to clean up the resources to avoid incurring further charges. You can easily do this using the instructions provided in the repo. But, before you wrap up, make sure to check out the "Next Steps" section below - there's more to explore and experiment with!

Next steps

The flexibility of Gemini as a general and multimodal model opens a new world of possibilities for analysis within BigQuery. Once a BigQuery Remote Function is created, accessing Gemini becomes as simple as a SQL query.

Try it out for yourself! And take it a step further by:

This blog post was co-authored by Shane Glass (Twitter, Medium) and Alicia Williams (Twitter, Medium). If you have any questions, please leave a comment below. Thanks!