Get hands-on experience with 20+ free Google Cloud products and $300 in free credit for new customers.

Unexpected Behavior: Gemini-1.0-Pro-002 Returns Different Outputs at Temperature 0

Hello,
I am encountering different LLM responses for the same prompt with the temperature set to 0.

I am using gemini-1.0-pro-002, and I have noticed that, for some reason, setting the temperature to 0 does not always result in the LLM returning the same response. I have verified this through both the Python API and the GCP Vertex AI web interface. This issue does not seem to affect the 001 version. 

I believe this to be a bug. Thank you

Solved Solved
6 7 3,342
1 ACCEPTED SOLUTION

Hi @dario_bernardo

We appreciate you taking the time to share your observations about Gemini-1.0-Pro-002. It helps us improve!

Every model release builds on the previous one. Gemini-1.0-Pro-002 was trained with a much larger dataset, allowing for more variations in its responses. This, along with the slight non-determinism even at temperature 0 (as mentioned in the 'Send multimodal prompt requests' document), can explain the differences you're seeing.

A temperature of 0 means that the highest probability tokens are always selected. In this case, responses for a given prompt are mostly deterministic, but a small amount of variation is still possible.

If you feel this behavior is a bug, you can submit a ticket in our issue tracking system.

I hope I was able to provide you with useful insights.

View solution in original post

7 REPLIES 7

Hi @dario_bernardo

We appreciate you taking the time to share your observations about Gemini-1.0-Pro-002. It helps us improve!

Every model release builds on the previous one. Gemini-1.0-Pro-002 was trained with a much larger dataset, allowing for more variations in its responses. This, along with the slight non-determinism even at temperature 0 (as mentioned in the 'Send multimodal prompt requests' document), can explain the differences you're seeing.

A temperature of 0 means that the highest probability tokens are always selected. In this case, responses for a given prompt are mostly deterministic, but a small amount of variation is still possible.

If you feel this behavior is a bug, you can submit a ticket in our issue tracking system.

I hope I was able to provide you with useful insights.

Should this be a bug? I mean, why would temperature=0 be supported then?

I agree, I have also tried with temperature=0 and top_k=1, which in my understanding should return the same response across runs. But even with this settings the model returns different answers all he time. This behaviour was not present in earlier generations (bison or unicorn). Depending on the use case, it may be difficult to explain a business they will get a different answer every time they run it. 

Hi,

I’m experiencing similar issues with version Gemini-1.0-Pro-002. It performs significantly worse than version 001 and in my case, it’s also 4 times slower than version 001.

I’ve noticed that version 001 will be deprecated in February 2025. Is there a way to preserve old versions or download them in order to continue using them after that date?

Thank you.

I see the same for gemini-1.5-pro-001. As per the docs, Top_k is not supported for this model. Setting temperature to 0.0 and top_p to a low value close to 0.0 still results in different responses every time. Is there a way to get consistent answers with gemini-1.5-pro-001?

Same issue, did anyone find a way? 

Why is this marked as solved? Did someone find a fix for this?