We are currently comparing Gemini Flash 2.5 output withAnthropic Claude 3.0 output and trying to work through the prompts to improve the summarization output.
As part of this effort, we noticed that the Gemini Flash 2.5 pro is inconsistent in it;s output. Sometimes it does a good job and sometimes it goes off the rails and hallucinates.
Is there anyway to guarantee reliability of the inference engine? Anthropic Claude was consistently accurate once the prompt was improved.