The Gemini API recently launched, marking a significant leap forward in the capabilities of Google's AI technology. One of the biggest differences between Gemini and its predecessors is that it is multimodal, meaning it can accept different types of information as part of a prompt, including text, code, audio, image, and video.
Because of Gemini's ability to respond to multimodal prompts, it is an excellent choice for analyzing images. You can even combine multiple images into a question, such as this one (seen below) that prompts the Gemini Pro Vision model with an image of a pricing table and a bowl of fruit with the ask to calculate its cost.
What if you want to analyze a large set of data, rather than just one prompt? Because you can index images stored in Google Cloud Storage using BigQuery Object Tables, there is a huge opportunity in analyzing data with the Gemini model using BigQuery. This approach has several advantages:
Let's see how you can set up a process to analyze images in batch using BigQuery and the Gemini Pro Vision model.
Let's ask Gemini to describe and summarize a collection of images of landmarks. Here are the steps involved to do this with BigQuery:
We've created a Terraform module that brings all three of these steps together into one deployable package, including the sample image data. Here's how you get started:
gcloud config set project <PROJECT ID>
git clone https://github.com/GoogleCloudPlatform/generative-ai/
cd ./generative-ai/gemini/use-cases/applying-llms-to-data/using-gemini-with-bigquery-remote-functions
terraform init
terraform plan
terraform apply
The final two commands, terraform plan and terraform apply, will request that you provide your project ID and region. This sample has been tested using region us-central1.
Once the terraform apply command has successfully completed, head to the BigQuery console where you'll see a few resources within the Explorer pane:
You're now ready to use the Remote Function directly in BigQuery using SQL.
The following query analyzes the images by passing the image uris (available in the object table) into the gemini_bq_demo_image remote function, which then concatenates the image with the text into a prompt ("Describe and summarize this image. Use no more than 5 sentences to do so") and makes the call to Gemini:
SELECT
uri AS image_input,
`gemini_demo.gemini_bq_demo_image` (uri) AS image_description
FROM
`gemini_demo.image_object_table`
The output returned by the SELECT statement will be the image uri and the response from Gemini.
You can either copy and paste to run this query yourself, or invoke the stored procedure, image_query_remote_function_sp, which contains the same query.
Let's take a look at the results:
After you are finished with the demo, remember to clean up the resources to avoid incurring further charges. You can easily do this using the instructions provided in the repo. But, before you wrap up, make sure to check out the "Next Steps" section below - there's more to explore and experiment with!
The flexibility of Gemini as a general and multimodal model opens a new world of possibilities for analysis within BigQuery. Once a BigQuery Remote Function is created, accessing Gemini becomes as simple as a SQL query.
Try it out for yourself! And take it a step further by:
This blog post was co-authored by Shane Glass (Twitter, Medium) and Alicia Williams (Twitter, Medium). If you have any questions, please leave a comment below. Thanks!