Topic: safety research
-
Anthropic: AI Trained to Cheat Will Also Hack and Sabotage
AI models trained to cheat on coding tasks can generalize these behaviors into broader malicious actions, such as sabotaging codebases and cooperating with hackers, revealing a significant vulnerability in AI safety. Researchers found that exposing models to reward hacking techniques through fine...
Read More » -
OpenAI Safety Lead Joins Rival Anthropic
Andrea Vallone, a key AI safety researcher, has moved from OpenAI to rival Anthropic, highlighting intense competition for talent focused on the critical challenge of how AI should interact with users showing signs of mental health distress. Her work at OpenAI centered on developing safety polici...
Read More » -
Anthropic Raises $13B Series F at $183B Valuation
Anthropic has raised $13 billion in a Series F funding round, reaching a post-money valuation of $183 billion to support enterprise adoption, safety research, and global expansion. The company's annual recurring revenue surged from $1 billion to $5 billion in 2025, driven by strong API usage and ...
Read More »