Get hands-on experience with 20+ free Google Cloud products and $300 in free credit for new customers.

Turn off thinking for gemini-2.5-flash batch api in Vertex AI

gemini-2.5-flash-preview-05-20 support batch predications now (2025-05-21), doc is here: 

Gemini 2.5 Flash  |  Generative AI on Vertex AI  |  Google Cloud

I want to use gemini-2.5-flash batch api (Python) in Vertex AI, I want to turn off thinking (thinkbudget=0), How to turn it off in config?

Solved Solved
0 3 272
1 ACCEPTED SOLUTION

GenerationConfig should be in input JSONL or the BigQuery table. See examples here: https://github.com/GoogleCloudPlatform/generative-ai/blob/main/gemini/batch-prediction/intro_batch_p...

Note that Gemini 2.5 Flash does not yet support batch prediction.

View solution in original post

3 REPLIES 3

# Turn off thinking

response = client.models.generate_content(
    model="gemini-2.5-flash-preview-05-20"
    contents="What is AI?",
    config=GenerateContentConfig(
        thinking_config=ThinkingConfig(
            thinking_budget=0,
        )
    ),
)

You can add config in your batch input.
Please see more details about thinking config in https://github.com/GoogleCloudPlatform/generative-ai/blob/main/gemini/getting-started/intro_gemini_2... 

vertex gemini batch api example

    from google import genai
    from google.genai.types import CreateBatchJobConfig, JobState, HttpOptions

    client = genai.Client(http_options=HttpOptions(api_version="v1"))

    # See the documentation: https://googleapis.github.io/python-genai/genai.html#genai.batches.Batches.create
    job = client.batches.create(
        model="gemini-2.5-flash-preview-05-20",
        # Source link: https://storage.cloud.google.com/cloud-samples-data/batch/prompt_for_batch_gemini_predict.jsonl
        src="gs://cloud-samples-data/batch/prompt_for_batch_gemini_predict.jsonl",
        config=CreateBatchJobConfig(dest=output_uri),
    )

 it is using CreateBatchJobConfig, it has same thinkingconfig as GenerateContentConfig ?

GenerationConfig should be in input JSONL or the BigQuery table. See examples here: https://github.com/GoogleCloudPlatform/generative-ai/blob/main/gemini/batch-prediction/intro_batch_p...

Note that Gemini 2.5 Flash does not yet support batch prediction.