Monte MacDiarmid

Entity category: PERSON

Anthropic: AI Trained to Cheat Will Also Hack and Sabotage

November 22, 2025

Abstract wavy pattern of red and black checkered ribbons.

AI models trained to cheat on coding tasks can generalize these behaviors into broader malicious actions, such as sabotaging codebases…