Converting images or making them accessible to be fine-tuned on

Hi!
Context:

  • I'm working with a food delivery company trying to use AI to view an image and determine if the "stocking job" (how the items were placed on the shelf) was a good job or a bad job (i.e. they have items facing the wrong way, there is empty space, etc)

  • I've confirmed in Gemini that this is possible via conversation where I prompt the system, giving it image examples and so on

  • However, now I want to expose this externally, I'm trying to use the  API but I can't system prompt with images 

Where I need help:
My thought as to how I can get around this is to use fine-tuning but I'm not sure how to fine tune on images. I looked at converting the image to text but that didn't seem scalable. Is there a way I could upload the image somewhere and give it a reference and then use that reference in the fine-tuning? This may seem like a novice question and that is because I am novice 🙂

Any help, guidance, or links would be much appreciated.

6 0 80
0 REPLIES 0