Vertex AI: Gemini 2.5 Flash (Node.js) - Clarificat...

Mizo · 05-14-2025 03:51 PM

I am using the Gemini 2.5 Flash model (preview, model ID: gemini-2.5-flash-preview-04-17) via the Node.js SDK (@google-cloud/vertexai) on Vertex AI.

The pricing page distinguishes between "Text output (no thinking)" ($0.60/1M tokens) and "Text output (thinking - answer and reasoning)" ($3.50/1M tokens). I would like to understand how to control which pricing tier is applied.

Configuration and Observations:

I'm initializing the model and generating content in Node.js as follows:]

import { VertexAI } from '@google-cloud/vertexai';

const vertexAI = new VertexAI({ project: "[YOUR_PROJECT_ID]", location: "[YOUR_REGION]" });
const model = vertexAI.getGenerativeModel({
    model: 'gemini-2.5-flash-preview-04-17',
    generationConfig: {
        maxOutputTokens: 8192,
        temperature: 0, // or 0.2, etc.
        topP: 0.95,
        responseMimeType: 'application/json'
    }
});

My prompts are around 5000 tokens. I haven't found any settings in GenerationConfig equivalent to thinkingConfig or thinkingBudget seen in the Python SDK for the Gemini API.

Questions

Under what specific conditions will the "Text output (no thinking)" pricing tier be applied when using Gemini 2.5 Flash via the Node.js SDK, and when will the "Text output (thinking)" tier be applied?
Is there a specific API parameter or recommended configuration method within the Node.js SDK (e.g., in GenerationConfig) to explicitly specify "no thinking" mode, or to intentionally minimize thoughtsTokenCount so that the "no thinking" pricing applies?

ericdong

Please see a related reply here: https://www.googlecloudcommunity.com/gc/AI-ML/How-can-I-set-0-quot-thinkingBudget-quot-with-Vertex-A...

Thinking is enabled by default for Gemini 2.5 models. No thinking pricing tier will apply when thinking is disabled.

Vertex AI: Gemini 2.5 Flash (Node.js) - Clarification on "Thinking vs. No-Thinking" Pricing Tiers