Hi there,
I'm trying to create a fine tuned Gemini model where the process takes 2 images and text as input, and an output as text. Essentially object detection, e.g. 'is image 1 found in image 2?'
According to documentation within 'Generative AI on Vertex AI > Documentation'...
"Visual inspection: Train a model to identify specific objects or defects within images, automating quality control or inspection processes."
- https://cloud.google.com/vertex-ai/generative-ai/docs/models/tune_gemini/image_tune
And, the example training dataset only provides one image as within parts for the user role:
{ "contents": [ { "role": "user", "parts": [ { "fileData": { "mimeType": "image/jpeg", "fileUri": "gs://cloud-samples-data/ai-platform/generative_ai/gemini-1_5/image/longcap100/100.jpeg" } }, { "text": "Describe this image in detail that captures the essence of it." } ] }, { "role": "model", "parts": [ { "text": "A man stands on a road, wearing a blue denim jacket, tan pants, and white sneakers. He has his hands in his pockets and is wearing a white t-shirt under his jacket. The man's pants are cuffed, and his shoes are white. The road is dark grey, and the leaves are green. The man is standing in the shade, and the light is shining on the ground." } ] } ] }
"""
Install the Google AI Python SDK
$ pip install google-generativeai
"""
import os
import google.generativeai as genai
genai.configure(api_key=os.environ["GEMINI_API_KEY"])
def upload_to_gemini(path, mime_type=None):
"""Uploads the given file to Gemini.
See https://ai.google.dev/gemini-api/docs/prompting_with_media
"""
file = genai.upload_file(path, mime_type=mime_type)
print(f"Uploaded file '{file.display_name}' as: {file.uri}")
return file
# Create the model
generation_config = {
"temperature": 1,
"top_p": 0.95,
"top_k": 64,
"max_output_tokens": 8192,
"response_mime_type": "text/plain",
}
model = genai.GenerativeModel(
model_name="gemini-1.5-pro",
generation_config=generation_config,
# safety_settings = Adjust safety settings
# See https://ai.google.dev/gemini-api/docs/safety-settings
)
# TODO Make these files available on the local file system
# You may need to update the file paths
files = [
upload_to_gemini("image_architecture2.jpeg", mime_type="image/jpeg"),
upload_to_gemini("image_animal1.jpeg", mime_type="image/jpeg"),
upload_to_gemini("image_architecture2.jpeg", mime_type="image/jpeg"),
upload_to_gemini("image_objects2.jpeg", mime_type="image/jpeg"),
upload_to_gemini("image_food4.jpeg", mime_type="image/jpeg"),
upload_to_gemini("image_space3.jpeg", mime_type="image/jpeg"),
upload_to_gemini("image_transportation3.jpeg", mime_type="image/jpeg"),
]
response = model.generate_content([
"Tasks:1. Analyze image 1 and image 2. Return Found if image 2 is in image 1, and Not Found if not.",
"input: Image 1: \n",
files[0],
"\nImage 2:\n",
files[1],
"output: Not found",
"input: Image 1:\n",
files[2],
"\nImage 2:\n",
files[3],
"output: Not found",
"input: Image 1:\n",
files[4],
"\nImage 2:\n",
files[5],
"output: Found",
"input: Image 1:\n",
files[6],
"\nImage 2:",
"output: ",
"input: ",
"output: ",
])
print(response.text)
It is possible to achieve this fine tuning with 2 images and text as input on vertex AI? There is limited documentation about using 2 images and text as input.
Any direction or guidance appreciated.
Tom.
User | Count |
---|---|
2 | |
2 | |
1 | |
1 | |
1 |