Hi everyone,
I'm currently fine-tuning gemini for structured output and have hit a snag regarding consistency between fine-tuning and inference. I’d appreciate some insights on the following points:
Background:
My Question:
How exactly does the response_schema parameter integrate the JSON schema into the prompt during inference? Is there any documentation or method to inspect the exact injected prompt? Given that i need to embed the JSON schema directly during fine-tuning, i want to ensure that my fine-tuning prompts are consistent with the inference-time behavior where the schema is injected automatically. Any best practices or insights to align both stages would be extremely helpful.
Hi @gagrafio,
Welcome to Google Cloud Community!
As quoted from the documentation on Gemini Model fine tuning:
“Applying controlled generation when submitting inference requests to tuned Gemini models can result in decreased model quality due to data misalignment during tuning and inference time. During tuning, controlled generation isn't applied, so the tuned model isn't able to handle controlled generation well at inference time. Supervised fine-tuning effectively customizes the model to generate structured output. Therefore you don't need to apply controlled generation when making inference requests on tuned models.”
A response schema which is a controlled generation to specify a structure of a model’s output is not advisable to apply during inference time on a fine tuned Gemini model. Inconsistency due to data misalignment between fine tuning and inference is expected when you apply or use the response schema. During inference, the JSON schema is automatically injected into the system and there is currently no documentation detailing how the response_schema parameter integrates the JSON schema into the prompt. Due to limited visibility, extensive experimentation is required to understand and inspect the JSON schema.
Was this helpful? If so, please accept this answer as “Solution”. If you need additional assistance, reply here within 2 business days and I’ll be happy to help.
Thanks a lot, I had missed that section of the documentation. Things make a lot more sense now. I think my plan moving forward will be to manually include the schema structure in the prompt during fine-tuning, and then use the same schema at inference time, without relying on controlled generation.