Converting images or making them accessible to be ...

joela88 · 04-18-2024 10:08 AM

Hi!
Context:

I'm working with a food delivery company trying to use AI to view an image and determine if the "stocking job" (how the items were placed on the shelf) was a good job or a bad job (i.e. they have items facing the wrong way, there is empty space, etc)
I've confirmed in Gemini that this is possible via conversation where I prompt the system, giving it image examples and so on
However, now I want to expose this externally, I'm trying to use the API but I can't system prompt with images

Where I need help:
My thought as to how I can get around this is to use fine-tuning but I'm not sure how to fine tune on images. I looked at converting the image to text but that didn't seem scalable. Is there a way I could upload the image somewhere and give it a reference and then use that reference in the fine-tuning? This may seem like a novice question and that is because I am novice 🙂

Any help, guidance, or links would be much appreciated.

Converting images or making them accessible to be fine-tuned on