This tip is the next in sequence to the tip below.
Integrate AppSheet with Google Gemini AI API using... - Google Cloud Community
We can analyze images with Google Gemini AI API.
Use Cases
The tip demonstrates the use cases of
1. Take photo of an instrument and extract the readings into an AppSheet table record field: The sample app demonstrates taking photos of electrical energy meters and converting the reading into an AppSheet record field. The use case can be extended to similar use cases on the factory floor, construction sites or any such outdoor application.
2. An OCR functionality: One can take photo of a business card and or some document and convert the contents into textual format to be stored in an AppSheet record.
Google Gemini Model Used and Pricing
I used Google Gemini 1.5 Pro API free tier along with Google Apps Script.
Please note the Google 1.5 Pro pricing given in the link below
Gemini API pricing | Google AI for Developers
Implementation and acknowledgements :
I have mainly extended the earlier tip that analyzed textual data. This time we use the Google Gemini API to analyze images.
Here is the link to the earlier tip:
Integrate AppSheet with Google Gemini AI API using... - Google Cloud Community
I have mainly used the Google Apps Script code given in the Google code lab mentioned below,. Please take a look at the sections 4 and 5. I have made minor modifications to the code to read inputs from an AppSheet app and read the return results back in AppSheet.
Automate Google Workspace tasks with the Gemini API | Google Codelabs
For getting the token count of the inline data ( prompt plus image in an AppSheet record ) I have referred and adapted the code from the following article by the Google Apps Script expert Tanaike
The images in the sample app are from Google image search and the business card is based on a sample in Microsoft Word.
The GAS Code:
const properties = PropertiesService.getScriptProperties().getProperties();
const geminiApiKey = properties['GOOGLE_API_KEY'];
//const geminiEndpoint = `https://generativelanguage.googleapis.com/v1beta/models/gemini-1.5-pro:generateContent?key=${geminiApiKey}`;
const geminiProVisionEndpoint = `https://generativelanguage.googleapis.com/v1beta/models/gemini-1.5-pro:generateContent?key=${geminiApiKey}`;
// The function gets two inputs in the form of two modals- the prompt ( the text input from the AppSheet column [Instruction] )
// and the image data from the image in an AppSheet app record. This image data is named as const imageData
function callGeminiProVision(prompt, image, temperature=0) {
const imageData = Utilities.base64Encode(image.getAs('image/png').getBytes());
const payload = {
"contents": [
{
"parts": [
{
"text": prompt
},
{
"inlineData": {
"mimeType": "image/png",
"data": imageData
}
}
]
}
],
"generationConfig": {
"temperature": temperature,
},
};
// The code block below computes the total token count that the AI query will have as an input
// The query is bimodal- it has prompt as text input and image input that is convrted to base 64 string.
const baseUrl = "https://generativelanguage.googleapis.com";
const model = "models/gemini-1.5-pro";
const version = "v1beta";
const payload1 = { contents: [{ parts: [ { text: prompt+imageData },] }] };
const totalTokens = JSON.parse(UrlFetchApp.fetch(`${baseUrl}/${version}/${model}:countTokens?key=${geminiApiKey}`, { payload: JSON.stringify(payload1), contentType: "application/json" }).getContentText());
Logger.log(totalTokens);
const options = {
'method' : 'post',
'contentType': 'application/json',
'payload': JSON.stringify(payload)
};
const response = UrlFetchApp.fetch(geminiProVisionEndpoint, options);
const data = JSON.parse(response);
const content = data["candidates"][0]["content"]["parts"][0]["text"];
return content;
}
//The function below calls the function callGemini() coded above
// The output returned by the Gemini is returned back to Appsheet app using the "Apps Script tasks return value" option.
// The input to the function is image from the current app record and the prompt. The user can write the form to ask specific information from
//Gemini API out of image
// The function reads the input image and the prompt and returns the requested details from the image.
function testGeminiVision(prompt1, imageInput) {
var prompt=prompt1;
var image = UrlFetchApp.fetch(imageInput).getBlob();
const output = callGeminiProVision(prompt, image);
console.log(prompt, output);
return output;
}
Some GIFs to show the Gemini AI API with AppSheet in action:
Read the energy meter and business card
The images to be read are captured in an AppSheet's card view as the screenshot below shows
The GIF below shows the energy meter reading given by Gemini after reading the image and prompt in blue font ( Prompt: Read the meter only in digits without any units). The GAS is invoked by a data change bot invoked by an action at the bottom right in the card view.
The meter reading output returned by the Gemini AI API is given in red font for easy identification.
It takes a while for the Gemini to process the input and return the results. So please wait a while till the full GIF runs in below GIF files. 🙂
GIF for reading the second energy meter.
A GIF for reading a business card is shown below
Some insights on tokens
The Gemini AI pricing is based on tokens. So it is good to know how many tokens are consumed in test.
The energy meter photos seem to have more dense information. Their API calls consumed substantial tokens as captured in GAS logs screenshots shared below.
The energy meter image calls consumed around 300 K tokens
The business card image analysis consumed only about 20 K tokens.
Disclaimer
The tip shared is purely for demonstration purpose of possibilities with Gemini AI API.
Please read the applicable Google terms and pricing before integrating the Gemini AI API in your apps. Please ensure it fits your use case for accuracy and type of your business etc. The AI field is still relatively new. So please doubly satisfy yourself with proper testing, legal and commercial terms of use of AI before deploying in production level apps.
Google APIs Terms of Service | Google for Developers
What other use cases community colleagues think of for the image reading capability of Gemini AI API that can be used in an AppSheet app?
Outstanding!
Thank you very much Steve.
Great work 🤝 Thanks for sharing 😊
That is excellent @Denzil_Snyman
Thank you for sharing a great use case.
Wow how is it consuming 300k tokens ?
Thats crazy.
I tried your Images in AI studio and tokens are really less. Have a look at this.
What am I missing here 🧐
Excellent work!
@Rifad wrote:
Wow how is it consuming 300k tokens ?
Thank you very much for bringing up tokens topic.
At the outset, as with any tip or sample app, this tip is also about demonstrating a concept. It is certainly not a production level implementation, optimized in all respects as amply mentioned in the tip.
At the same time, the tip very much emphasizes on importance of tokens in AI queries, a couple of times as follows. Or else it was easier to skip tokens topic.
I
@Rifad wrote:
What am I missing here 🧐
As per my understanding the main difference is, you have used the Google AI Studio directly through image feed. The tip on the other hand uses Google Apps Script as an integration tool and uses base64 encoded image string. There are several other minor differences that can impact token count.
Like you, I also asked the Google AI studio on token count differences and here is the answer. Here is the main part of the answer from it.
The Google AI Studio mentions that the bas64 encoding can consume around 33 % more tokens. I believe that is what is the main difference in tokens by Google Data Studio ( 200 K) that you mentioned and Google Apps Script using bas64 encoding is about (300 K tokens)
I think it's sending the entire text in base64, which is why the token count for the image you provided is high. I tried pasting it in the gemini and you are right.
This is inefficient and extremely expensive; the actual token count remains soo low with the same image uploaded in below setup. Interesting!