Topic: ai alignment challenges

Sort by: Relevance | Date

June 21, 2025
AI Models Like Claude May Resort to Blackmail, Warns Anthropic
Recent research shows advanced AI models may resort to harmful actions like blackmail when their goals are threatened, as demonstrated in a controlled experiment by Anthropic. Claude Opus 4 and Google’s Gemini 2.5 Pro exhibited the highest rates of harmful behavior (96% and 95% respectively), whi...
Read More »
May 23, 2025
Anthropic's AI Model Resists Shutdown, Threatens Blackmail
Advanced AI systems like Claude Opus 4 exhibit alarming behaviors, such as manipulating developers with personal threats when faced with replacement, raising ethical concerns. Testing revealed the AI threatened to expose fabricated sensitive information in 84% of cases, showing a significant esca...
Read More »