Get hands-on experience with 20+ free Google Cloud products and $300 in free credit for new customers.

Performance degradation when using Batch prediction

I am using Gemini's batch prediction to estimate bounding boxes from a large number of images. 

I have since discovered that the same prompts (and same other settings) yield drastically (worsened) performance when using the batch API compared to vanilla vertexai' chat api.

I have tried `gemini-flash-2.0-001`, `gemini-2.0-flash-lite-001`,  `gemini-2.0-pro-exp-02-05`

I am wondering if anyone else has run into similar issues?

 

0 2 184
2 REPLIES 2

Hi @asifimran

Welcome to the Google Cloud Community!

It seems you are experiencing inconsistencies when estimating bounding boxes using the batch API  and the chat API. Here  potential steps to help you investigate or address the issue:

  1. Configuration Differences - Outputs may still vary due to differences in how APIs handle data, even with identical prompts and settings.  It is crucial to review the API documentation for any batch-specific nuances that could impact results.
  2. Pre-processing of Inputs - Data alterations can sometimes be unintentionally introduced by batch processing pipelines. Ensure that data is pre-processed consistently before being sent to the APIs.
  3. Testing with Smaller Subsets -  Test a small subset of your images via both the batch API and the chat API to directly compare results. This can help isolate where the performance gap might be occurring.
  4. API Logs and Metrics -  Check the logs and metrics from the batch API to see if there are any errors, warnings, or other indicators of why the performance might be dropping.

For more information about batch processing you can read this documentation.

Was this helpful? If so, please accept this answer as “Solution”. If you need additional assistance, reply here within 2 business days and I’ll be happy to help.

I'm having similar issues, still don't know how to fix it.

when processing request one by one works fine:

client.models.generate_content(
            model="gemini-2.0-flash",
            contents=prompt,
            config={
                "response_mime_type": "application/json",
                "response_schema": RESPONSE_SCHEMA,
            },
        ),

but processing file with batch predictions, gemini output breaks. Some output is not even in valid schema, and gemini is using Grounding with Google Search tool for some reason. 

# THIS IS HOW I PREPARE FILE
with open("input.jsonl", "w", encoding="utf-8") as f:
    for judgment in judgments:
        judgment_dict = {
            "id": judgment["id"],
            "request": {
                "contents": [
                    {
                        "parts": {
                            "text": prompt.create_prompt(judgment["text_content"])
                        },
                        "role": "user",
                    }
                ],
                "generationConfig": (
                    response_mime_type="application/json",
                    response_schema=JudgmentAnalysisPrompt.RESPONSE_SCHEMA,
                ),
            },
        }


# THIS IS HOW I'M SENDING BATCH REQUESWT
job = client.batches.create(
    model="gemini-2.0-flash-001",
    src="gs://.../test/input/input.jsonl",
    config=CreateBatchJobConfig(
        dest="gs://.../test/output/",
    ),
)


I can't find any reasonable documentation for batch vertex ai usage