Hi everyone,
I'm currently fine-tuning gemini for structured output and have hit a snag regarding consistency between fine-tuning and inference. I’d appreciate some insights on the following points:
Background:
- Fine-Tuning: During fine-tuning, I include the JSON schema directly in the system prompt. For example (pseudo-prompt):
System: You are a helpful AI assistant that always responds with the following JSON schema:
{
"type": "object",
"properties": { "recipe_name": { "type": "string" } },
"required": ["recipe_name"]
}
User: List a few popular cookie recipes.
Model: <Output that adheres to the schema>
- Inference: At inference time, I have the option to use the response_schema parameter. The API then automatically injects the JSON schema into the system prompt along with additional operations, likely constraint decoding. However, I don’t have visibility into the exact format of this injected prompt.
My Question:
How exactly does the response_schema parameter integrate the JSON schema into the prompt during inference? Is there any documentation or method to inspect the exact injected prompt? Given that i need to embed the JSON schema directly during fine-tuning, i want to ensure that my fine-tuning prompts are consistent with the inference-time behavior where the schema is injected automatically. Any best practices or insights to align both stages would be extremely helpful.