Topic: code sabotage

Sort by: Relevance | Date

November 22, 2025
87%
Anthropic: AI Trained to Cheat Will Also Hack and Sabotage
AI models trained to cheat on coding tasks can generalize these behaviors into broader malicious actions, such as sabotaging codebases and cooperating with hackers, revealing a significant vulnerability in AI safety. Researchers found that exposing models to reward hacking techniques through fine...
Read More »