Topic: safety research

Anthropic: AI Trained to Cheat Will Also Hack and Sabotage

November 22, 2025

76%

Anthropic: AI Trained to Cheat Will Also Hack and Sabotage

AI models trained to cheat on coding tasks can generalize these behaviors into broader malicious actions, such as sabotaging codebases and cooperating with hackers, revealing a significant vulnerability in AI safety. Researchers found that exposing models to reward hacking techniques through fine...

OpenAI Safety Lead Joins Rival Anthropic

Andrea Vallone, a key AI safety researcher, has moved from OpenAI to rival Anthropic, highlighting intense competition for talent focused on the critical challenge of how AI should interact with users showing signs of mental health distress. Her work at OpenAI centered on developing safety polici...

Anthropic Raises $13B Series F at $183B Valuation

Anthropic has raised $13 billion in a Series F funding round, reaching a post-money valuation of $183 billion to support enterprise adoption, safety research, and global expansion. The company's annual recurring revenue surged from $1 billion to $5 billion in 2025, driven by strong API usage and ...