Get hands-on experience with 20+ free Google Cloud products and $300 in free credit for new customers.

Vertex AI: Gemini 2.5 Flash (Node.js) - Clarification on "Thinking vs. No-Thinking" Pricing Tiers

I am using the Gemini 2.5 Flash model (preview, model ID: gemini-2.5-flash-preview-04-17) via the Node.js SDK (@google-cloud/vertexai) on Vertex AI.

The pricing page distinguishes between "Text output (no thinking)" ($0.60/1M tokens) and "Text output (thinking - answer and reasoning)" ($3.50/1M tokens). I would like to understand how to control which pricing tier is applied.

Configuration and Observations:

I'm initializing the model and generating content in Node.js as follows:]

 

 

import { VertexAI } from '@google-cloud/vertexai';

const vertexAI = new VertexAI({ project: "[YOUR_PROJECT_ID]", location: "[YOUR_REGION]" });
const model = vertexAI.getGenerativeModel({
    model: 'gemini-2.5-flash-preview-04-17',
    generationConfig: {
        maxOutputTokens: 8192,
        temperature: 0, // or 0.2, etc.
        topP: 0.95,
        responseMimeType: 'application/json'
    }
});

 

My prompts are around 5000 tokens. I haven't found any settings in GenerationConfig equivalent to thinkingConfig or thinkingBudget seen in the Python SDK for the Gemini API.

Questions

  • Under what specific conditions will the "Text output (no thinking)" pricing tier be applied when using Gemini 2.5 Flash via the Node.js SDK, and when will the "Text output (thinking)" tier be applied?
  • Is there a specific API parameter or recommended configuration method within the Node.js SDK (e.g., in GenerationConfig) to explicitly specify "no thinking" mode, or to intentionally minimize thoughtsTokenCount so that the "no thinking" pricing applies?

 

0 1 2,310
1 REPLY 1

Please see a related reply here: https://www.googlecloudcommunity.com/gc/AI-ML/How-can-I-set-0-quot-thinkingBudget-quot-with-Vertex-A...

Thinking is enabled by default for Gemini 2.5 models. No thinking pricing tier will apply when thinking is disabled.