Gemini Flash 2.5 pro output - fluctuating with run...

kumarilb

We are currently comparing Gemini Flash 2.5 output withAnthropic Claude 3.0 output and trying to work through the prompts to improve the summarization output.

As part of this effort, we noticed that the Gemini Flash 2.5 pro is inconsistent in it;s output. Sometimes it does a good job and sometimes it goes off the rails and hallucinates.

Is there anyway to guarantee reliability of the inference engine? Anthropic Claude was consistently accurate once the prompt was improved.

nikacalupas

Hi kumarilb,

Welcome to the Google Cloud Community!

While you may not be able to guarantee 100% reliability, you can improve the consistency of Gemini Flash by focusing on prompt engineering, controlling randomness, and continuously evaluating and refining your system. Here are some strategies to help you do that:

If the API or model settings expose parameters like temperature, top‑k, or top‑p values, lowering these can push the model toward more deterministic outputs.
Use prompt engineering techniques, which involves being specific about the desired summary's goal, length, format, tone, and what to exclude (e.g., "Do not include any information not present"). Using techniques like few-shot prompting, breaking down complex tasks, and chain-of-thought prompting can significantly enhance accuracy.

Additionally, Google continuously updates models like Gemini, so it’s worth periodically checking the release notes for new versions or fine-tuned models specifically designed for summarization.

Was this helpful? If so, please accept this answer as “Solution”. If you need additional assistance, reply here within 2 business days and I’ll be happy to help.

Gemini Flash 2.5 pro output - fluctuating with runs with quality and accuracy going down