Structured Output in vertexAI BatchPredictionJob

davidfeiz · 01-19-2025 08:45 AM

The title is self-explanatory I guess, but I will try to specify my problem a little bit further.

In my use case I am trying to use batching for an evaluation pipeline, since the output is not required to be received in real-time. Further, bc my test-set is very large I run into rate-limits of the regular API (and run into higher cost as well).
Following the documentation, I can only specify the model and input/output locations like this

Using any additional parameter - like generation_config in the regular API - throws me errors. Also function calling does not seem to be possible, which could have served as a workaround as used for previous models. The documentation does not mention anything about this nor do I find this discussed anywhere.
I also have to stress that I explicitly do not want to just validate my output afterwards (which is implemented for redundancy), but to implement this into the response generation step to begin with, making sure the evaluation pipeline is configured in the same way as the dev/production pipeline.

If this is not a current feature, how can batch predictions even be used sensibly (for anything beyond a small PoC), considering structured outputs are the only reliable way to make LLM outputs adhere to a specific format?

And as a side-note: with OpenAIs API this is possible.

MarvinLlamas

Hi @davidfeiz,

Welcome to Google Cloud Community!

It looks like you are trying to specify your generation configurations or use function calling with Vertex AI BatchPredictionJob for Gemini models to ensure your structured outputs, which is a common issue.

Here are potential ways that might help with your use case:

Experiment with input format: You may try different data formats (such as JSONL) and organize your data in ways that might enable you to pass parameters.
Prompt Engineering: Make sure you precisely construct your prompts to direct your LLM towards the desired structured output. This method depends significantly on the model's ability to consistently understand your instructions.
Validation and Post-processing: You may want to process your raw outputs to extract and structure the relevant information after your batch prediction job completes. This method adds complexity, computational overhead, and increases the risk of errors compared to a more controlled process.
Consider Custom Prediction Routine: You may want to explore using a custom prediction routine. By creating a custom container with your prediction code, you'll gain more control over how your model is called and how your output is formatted.

You may refer to the documentation below, which offers pertinent information on Google Cloud’s Batch Prediction Job, custom prediction routines, and structured output:

Was this helpful? If so, please accept this answer as “Solution”. If you need additional assistance, reply here within 2 business days and I’ll be happy to help.

davidfeiz

Hi @MarvinLlamas,

Thank you for your reply!

Regarding your answer, I will specify a bit more how the prediction pipeline is set up:
As data we use freeform text stored as files in a gcp bucket, which represents our test-set and each file in this set contains features that we want to extract. The test-set is labelled and used to evaluate the output performance of the model (gemini-1.5-pro-002).
When using Batch predictions the input is structured as a JSONL for each batch, in which we also define a custom_id to identify the input-output pairs. The prompt we pass contains not only the task but also already specifies the desired output. The third point you suggested is also already taken care of, since we validate the response (sanity checking) as well.

What we want to achieve is to have the format-restriction mechanism also on the API level, to ensure the response itself is generated with a "restricted token sampling".
So following the documentation on controlled generation for the regular API, we need to translate any pydantic model or schema into a json first and can then pass this to the model using the generation_config parameter:

import vertexai

from vertexai.generative_models import GenerationConfig, GenerativeModel

# TODO(developer): Update and un-comment below line
# PROJECT_ID = "your-project-id"
vertexai.init(project=PROJECT_ID, location="us-central1")

response_schema = {
    "type": "array",
    "items": {
        "type": "object",
        "properties": {
            "recipe_name": {
                "type": "string",
            },
        },
        "required": ["recipe_name"],
    },
}

model = GenerativeModel("gemini-1.5-pro-002")

response = model.generate_content(
    "List a few popular cookie recipes",
    generation_config=GenerationConfig(
        response_mime_type="application/json", response_schema=response_schema
    ),
)

This has been copied from https://cloud.google.com/vertex-ai/generative-ai/docs/multimodal/control-generated-output

Trying to implement the same during the batch creation for this example looks like this:

Maybe I am not passing the parameter correctly, maybe it is not intended to be passed to begin with, but any way I tried did not work and resulted in errors. (Also tried the approach from the generativeai docs where a pydantic base model is passed inside a list). Am I missing something here?

davidfeiz

So I managed to solve this problem. The solution was to extend the request dict like so

The accepted parameter is "generationConfig", which I just found by accident in one of the example notebooks provided in the generativeai repo 🙂
The different naming conventions for this parameter surely introduced some confusion here...

hibask

Could you link the example notebook you found this in? I am trying to solve a similar issue. Thanks!

sydneyat

Thanks for the code snippet - I've been looking everywhere for how to add an id to each line of a batch request!

iffishells

I am facing the similar issue for generate the strucutre output we have to pass the output scheme,but i don't know how i can pass it for the batch prediction. Google Documentation regarding this Hell. Can you please the reference or notebook for controlled generation via structure.

iffishells

{"status":"","processed_time":"2025-05-21T07:53:38.797+00:00","request":{"contents":[{"parts":[{"file_data":null,"text":"create the metadata from this file."},{"file_data":{"file_uri":"gs://foley-sound-bcc-sound-effect-wav/BBCSoundEffectsWavs/00008000.wav","mime_type":"audio/wav"},"text":null}],"role":"user"}],"generationConfig":{"temperature":0.4}},"response":{"candidates":[{"avgLogprobs":-0.3753753984478158,"content":{"parts":[{"text":"Here's a possible set of metadata for the audio file, based on the content:\n\n**General Metadata:**\n\n* **Description:** Audio recording of children playing and talking, possibly in an outdoor environment.\n* **Keywords:** Children, play, voices, outdoor, speech, noise, activity.\n\n**Technical Metadata:**\n\n* **File Format:** (Assuming it's a common format like) .mp3, .wav, .aac\n* **Duration:** (Needs to be determined by analyzing the file length)\n* **Channels:** Mono or Stereo (likely mono if recorded with a single microphone)\n* **Sample Rate:** (Needs to be determined by analyzing the file)\n* **Bit Rate:** (Needs to be determined by analyzing the file)\n\n**Contextual Metadata (If available):**\n\n* **Location:** (If known, where the recording was made)\n* **Date and Time:** (If known, when the recording was made)\n* **Recorder:** (If known, who made the recording)\n* **Purpose:** (Why the recording was made, if known)\n\n**Notes:**\n\n* The metadata above is based on the audio content and common practices.\n* To get accurate technical metadata, you need to analyze the audio file using audio editing software or metadata analysis tools."}],"role":"model"},"finishReason":"STOP"}],"createTime":"2025-05-21T07:53:38.869209Z","modelVersion":"gemini-2.0-flash-001@default","responseId":"goYtaNmGNZizgLUPp93-2A4","usageMetadata":{"candidatesTokenCount":284,"candidatesTokensDetails":[{"modality":"TEXT","tokenCount":284}],"promptTokenCount":732,"promptTokensDetails":[{"modality":"AUDIO","tokenCount":725},{"modality":"TEXT","tokenCount":7}],"totalTokenCount":1016,"trafficType":"ON_DEMAND"}}}
{"status":"Bad Request: {\"error\": {\"code\": 400, \"message\": \"Request contains an invalid argument.\", \"status\": \"INVALID_ARGUMENT\"}}","processed_time":"2025-05-21T07:53:38.799+00:00","request":{"contents":[{"parts":[{"file_data":null,"text":"create the metadata from this file."},{"file_data":{"file_uri":"gs://foley-sound-bcc-sound-effect-wav/BBCSoundEffectsWavs/00008001.wav","mime_type":"audio/wav"},"text":null}],"role":"user"}],"generationConfig":{"temperature":0.4}},"response":{}}

saadtx

I think this is the notebook: https://github.com/GoogleCloudPlatform/generative-ai/blob/main/gemini/use-cases/document-processing/...