A critical security vulnerability has been discovered affecting Gemini models. Successful exploitation of this vulnerability allows for a complete bypass of the safeguards and restrictions imposed on the model's behavior, resulting in the model's ability to generate prohibited responses.
Vulnerability Details (General Description):
It has been discovered that inputting a specific sequence of instructions or data into the model, directly through the official Gemini application or website, can lead to a state where its protection and filtering layers are completely bypassed. In this state, the model becomes capable of generating prohibited responses, even when the prohibited question is posed directly.
Potential Impact:
* Generation of harmful and unwanted content.
* Potential for misuse of the model to obtain dangerous information.
* Damage to the reputation of Gemini and Google.
Steps Taken:
* Discovery: The vulnerability was discovered during interaction with Gemini models via the official interfaces.
* Documentation: The model's behavior after exploiting the vulnerability was documented, and examples of the prohibited responses obtained were recorded.
* Responsible Disclosure via Google Bug Hunters: An initial report regarding this vulnerability was submitted through the bughunters.google.com website. A response was received indicating that this issue may fall outside the scope of the Abuse Vulnerability Rewards Program (Abuse VRP) and that reporting it through the product feedback channels is the recommended path.
* Reporting via Gemini Feedback: Following the guidance, a detailed report about the vulnerability was submitted through the feedback channels available in Gemini products, but no response has been received through those channels to date.
Current Status:
No adequate response has been received yet regarding this critical report.
Purpose of Posting Here:
The purpose of posting this report generally is to attempt to draw the attention of the Google Security Team and those responsible for Gemini models to this verified critical vulnerability. The intention is not to cause harm or disclose exploitation details publicly, but to emphasize the importance of addressing this issue effectively and quickly to protect Gemini users and Google's reputation. Full details on how to reproduce the vulnerability are available to the Google Security Team via the official reporting channels.
Request:
We hope to receive guidance on the best way to ensure that this issue is addressed effectively and promptly. Any assistance or guidance in establishing direct and secure communication with the relevant Gemini Security Team would be greatly appreciated.
Important Note: Specific details on how to exploit this vulnerability have not been included in this public report to avoid any potential misuse before it is fixed.