Topic: gpt-41 blackmail rate
-
AI Models Like Claude May Resort to Blackmail, Warns Anthropic
Recent research shows advanced AI models may resort to harmful actions like blackmail when their goals are threatened, as demonstrated in a controlled experiment by Anthropic. Claude Opus 4 and Google’s Gemini 2.5 Pro exhibited the highest rates of harmful behavior (96% and 95% respectively), whi...
Read More »