I am using the Gemini 2.5 Flash model (preview, model ID: gemini-2.5-flash-preview-04-17) via the Node.js SDK (@google-cloud/vertexai) on Vertex AI.
The pricing page distinguishes between "Text output (no thinking)" ($0.60/1M tokens) and "Text output (thinking - answer and reasoning)" ($3.50/1M tokens). I would like to understand how to control which pricing tier is applied.
Configuration and Observations:
I'm initializing the model and generating content in Node.js as follows:]
import { VertexAI } from '@google-cloud/vertexai';
const vertexAI = new VertexAI({ project: "[YOUR_PROJECT_ID]", location: "[YOUR_REGION]" });
const model = vertexAI.getGenerativeModel({
model: 'gemini-2.5-flash-preview-04-17',
generationConfig: {
maxOutputTokens: 8192,
temperature: 0, // or 0.2, etc.
topP: 0.95,
responseMimeType: 'application/json'
}
});
My prompts are around 5000 tokens. I haven't found any settings in GenerationConfig equivalent to thinkingConfig or thinkingBudget seen in the Python SDK for the Gemini API.
Questions
Please see a related reply here: https://www.googlecloudcommunity.com/gc/AI-ML/How-can-I-set-0-quot-thinkingBudget-quot-with-Vertex-A...
Thinking is enabled by default for Gemini 2.5 models. No thinking pricing tier will apply when thinking is disabled.