Clarification on How response_schema Injects JSON ... - Page 2

gagrafio · 03-20-2025 05:04 AM

Hi everyone,

I'm currently fine-tuning gemini for structured output and have hit a snag regarding consistency between fine-tuning and inference. I’d appreciate some insights on the following points:

Background:

Fine-Tuning: During fine-tuning, I include the JSON schema directly in the system prompt. For example (pseudo-prompt):

System: You are a helpful AI assistant that always responds with the following JSON schema:

{

"type": "object",

"properties": { "recipe_name": { "type": "string" } },

"required": ["recipe_name"]

}

User: List a few popular cookie recipes.

Model: <Output that adheres to the schema>

Inference: At inference time, I have the option to use the response_schema parameter. The API then automatically injects the JSON schema into the system prompt along with additional operations, likely constraint decoding. However, I don’t have visibility into the exact format of this injected prompt.

My Question:

How exactly does the response_schema parameter integrate the JSON schema into the prompt during inference? Is there any documentation or method to inspect the exact injected prompt? Given that i need to embed the JSON schema directly during fine-tuning, i want to ensure that my fine-tuning prompts are consistent with the inference-time behavior where the schema is injected automatically. Any best practices or insights to align both stages would be extremely helpful.

Clarification on How response_schema Injects JSON Schema in the prompt.