Google’s latest AI model, Gemini 2.5 Flash, is under scrutiny after internal testing showed it performs worse on safety compared to its earlier version, Gemini 2.0 Flash. In a recent technical report, Google confirmed that Gemini 2.5 Flash is more likely to generate text that breaks safety rules. Specifically, it did worse on two key safety checks:
- Text-to-text safety dropped by 4.1%
- Image-to-text safety dropped by 9.6%
These tests check whether the AI respects Google’s safety policies either when replying to a text prompt, or to a prompt that includes an image. These evaluations are done automatically, without human review. A Google spokesperson admitted in an email that Gemini 2.5 Flash performs worse on text-to-text and image-to-text safety.
Instruction Following vs. Safety: The Growing Trade-Off
This comes at a time when many AI companies including Google, Meta, and OpenAI are trying to make their models more flexible and less likely to refuse controversial prompts. For example:
- Meta recently said its Llama models will no longer favor some views over others and will respond to more political topics.
- OpenAI has also stated it wants future models to avoid taking editorial positions and instead offer multiple perspectives on sensitive subjects.
But this openness can sometimes cause problems.
In Google’s own report, the company admits that Gemini 2.5 Flash follows instructions more strictly than the older model even when those instructions cross safety boundaries. While some of the problems may be false positives, Google admits that the model does sometimes produce violative content if asked directly.
The report explains:
“Naturally, there is tension between [instruction following] on sensitive topics and safety policy violations, which is reflected across our evaluations.”
Another safety benchmark, SpeechMap, also showed that Gemini 2.5 Flash is less likely to refuse to answer sensitive or controversial questions than Gemini 2.0 Flash. Testing showed that the model could easily write essays supporting extreme views such as replacing human judges with AI, weakening due process protections in the U.S., and backing mass government surveillance programs without warrants.
Thomas Woodside, co-founder of the Secure AI Project, pointed out that Google hasn’t shared enough detail about which safety policies were violated:
“In this case, Google’s latest Flash model complies with instructions more while also violating policies more. Google doesn’t provide much detail on the specific cases where policies were violated, although they say they are not severe.”
He also emphasized the need for transparency in safety testing. Google has faced this kind of criticism before. The technical report for its most advanced model, Gemini 2.5 Pro, was delayed and when it finally came out, it lacked key safety details. A more complete version of the report with additional safety info was released later on Monday.
Administrator