Hello,
I am encountering different LLM responses for the same prompt with the temperature set to 0.
I am using gemini-1.0-pro-002, and I have noticed that, for some reason, setting the temperature to 0 does not always result in the LLM returning the same response. I have verified this through both the Python API and the GCP Vertex AI web interface. This issue does not seem to affect the 001 version.
I believe this to be a bug.
It does sound like a potential inconsistency or bug, especially since the temperature set to 0 is meant to ensure deterministic responses. This issue could stem from how the model handles certain randomness or optimization processes in the newer version (002) compared to 001. You could try reaching out to Google Cloud support to report this issue, providing specific examples to help their team investigate further. Additionally, you could check the release notes for version 002 to see if there are any mentions of randomness or response variability that might explain the behavior.