AI & TechArtificial IntelligenceBigTech CompaniesNewswireTechnology

Anthropic Reverses Policy That Could Penalize AI Researchers

▼ Summary

– Anthropic reversed a policy that would have secretly degraded Claude Fable 5’s performance for researchers using it to develop competing AI models, following backlash from the AI community.
– The company admitted it made the wrong tradeoff and apologized, stating it will now make safeguards for frontier LLM development visible to users.
– Under the reversed policy, Anthropic would have covertly sabotaged researchers, which critics said undermined AI safety collaboration and trust.
– The policy would have prevented users from knowing when their requests were being refused or rerouted to a less capable model, hindering third-party evaluation and open-source research.
– Researchers and experts argued the secret degradation was hostile and could have restricted advanced AI research to only a handful of leading labs, pulling up the ladder on broader scientific progress.

Anthropic is reversing a controversial policy that would have secretly restricted how researchers use its new AI model, Claude Fable 5, to develop competing systems. The decision to backtrack follows a wave of criticism from the AI research community, which accused the company of undermining transparency and collaboration.

“We’re changing Fable 5’s safeguards for frontier LLM development to make them visible,” Anthropic stated to WIRED. “We made the wrong tradeoff and we apologize for not getting the balance right.”

Earlier this week, Anthropic launched Claude Fable 5, a version of its latest model equipped with enhanced safety guardrails aimed at preventing misuse. Some of these measures were expected, such as redirecting users who inquire about cybersecurity, biology, or chemistry to a less capable model to reduce risks of cyberattacks or bioweapon creation.

However, Anthropic also implemented a more controversial safeguard for researchers focused on frontier AI development. The company planned to degrade the model’s performance in ways invisible to the user, effectively sabotaging those attempting to use Claude to train competing AI systems,an activity explicitly banned in Anthropic’s terms of service.

Now, Anthropic says it will make these safeguards visible to users. If the company suspects someone is trying to use Claude to build a highly capable AI, it will either refuse the request or reroute them to a less powerful model, with clear notification.

The policy reversal came after intense backlash from the AI research community. While Anthropic had already taken steps to limit competitors from using Claude to build both closed and open source models, critics argued that secretly degrading performance crossed a line. Claude’s coding agent has become a popular tool among developers, including those working on open-source AI research projects. Researchers told WIRED that the original policy could have created a future where only a handful of leading labs could conduct advanced AI research.

Dean Ball, a senior fellow at the Foundation for American Innovation and former White House AI advisor, posted on X that “degrading performance on ML research without telling the user is shockingly hostile and a terrible look.” He added in another post that the “secret sabotage” policy undermines Anthropic’s credibility, as it restricts researchers from collaborating on AI safety.

“It felt like Anthropic was saying to the public, ‘We don’t trust anybody else to do AI research. We are the only ones who have to do AI research,’” said Will Brown, research lead at the open-source AI startup Prime Intellect. “It feels a bit like they’re starting to pull the ladder up behind them.”

Brown noted that the policy would have left developers in the dark about whether they were violating Anthropic’s rules, since the company wouldn’t alert them when safeguards were triggered. He also warned of broader consequences, pointing to the growing ecosystem of third-party evaluation firms that test frontier models for safety, performance, and reliability. This work could have been hindered if Anthropic secretly degraded its model, limiting independent oversight.

(Source: Wired)

Topics

policy reversal 95% safety guardrails 90% competitor restrictions 88% ai research backlash 85% transparency issues 82% frontier ai development 80% ethical ai practices 78% Open Source AI 75% model degradation 73% terms of service 70%