AI & TechArtificial IntelligenceCybersecurityNewswireTechnology

Claude Sonnet 5 adds safeguards for cyber safety

▼ Summary

– Anthropic released Claude Sonnet 5, a general-purpose AI model with improved reasoning, coding, tool use, and autonomous task completion.
– Safety assessments show Sonnet 5 has a lower rate of undesirable behaviors than Sonnet 4.6 and is safer for agentic use, with cybersecurity safeguards enabled by default.
– Sonnet 5 outperformed Sonnet 4.6 on benchmarks like BrowseComp and OSWorld-Verified, and at higher effort settings matched Opus 4.8 on some tasks.
– The model is better at refusing malicious requests, resisting prompt injection, hallucinating less, and showing lower sycophancy than Sonnet 4.6.
– Sonnet 5 is available on all Claude plans, with API pricing at $2 per million input tokens and $10 per million output tokens through August 31, 2026.

Anthropic has officially launched Claude Sonnet 5, the newest iteration of its general-purpose AI model, bringing notable upgrades in reasoning, coding, tool use, and knowledge-based tasks. The system is now capable of making plans, interacting with browsers and terminals, and executing complex tasks without human intervention.

The company’s internal evaluations compare Sonnet 5 directly against its predecessor, Sonnet 4.6, and the more powerful Opus 4.8. According to Anthropic, Sonnet 5 delivers a lower rate of undesirable behaviors than Sonnet 4.6, making it a safer choice, particularly for agentic applications. “Our safety assessments found that Sonnet 5 shows an overall lower rate of undesirable behaviors than Sonnet 4.6 and is generally safer to use in agentic contexts,” Anthropic stated. “Evaluations show that it has a much lower ability to perform cybersecurity tasks than our current Opus models.”

Cybersecurity safeguards are built into Sonnet 5 by default. The model actively detects and blocks potentially dangerous cybersecurity activity in real time, using the same protective measures found in Opus 4.7 and 4.8. However, Anthropic notes that these guardrails are less restrictive than those applied to Fable 5, reflecting the company’s assessment that Sonnet 5 poses a lower overall cybersecurity risk. The model is also included in Anthropic’s Cyber Verification Program, which grants approved organizations access to reduced restrictions for legitimate security research. For cybersecurity work requiring minimal constraints, Anthropic continues to recommend Opus 4.8.

In performance benchmarks, Anthropic tested Sonnet 5 against Sonnet 4.6 and Opus 4.8 using BrowseComp for agentic search tasks and OSWorld-Verified for computer-use operations. The results show Sonnet 5 outperforming Sonnet 4.6 across all effort levels while offering better cost efficiency. At higher effort settings, it matched Opus 4.8 on certain tasks. The model also demonstrates improved resistance to malicious requests and prompt injection attacks, with lower hallucination rates and reduced sycophancy compared to Sonnet 4.6. It scored lower in the company’s automated behavioral audit, indicating fewer unwanted behaviors.

On the cybersecurity front, Sonnet 5 can handle routine, non-harmful security tasks, but it scored lower than both Opus 4.8 and Mythos 5 on dangerous cybersecurity assignments. It is not capable of developing a working exploit, though it achieved a slightly higher rate of partial success than Sonnet 4.6, which Anthropic attributes to general intelligence improvements.

Claude Sonnet 5 is now available across all Claude plans. It serves as the default model for Free and Pro users and is accessible to Max, Team, and Enterprise customers. It is also integrated into Claude Code and the Claude Platform. Through August 31, 2026, API pricing is set at $2 per million input tokens and $10 per million output tokens. After that date, prices will increase to $3 per million input tokens and $15 per million output tokens.

(Source: Help Net Security)

Topics

claude sonnet 5 100% model benchmarks 95% cybersecurity safeguards 90% coding capabilities 85% reasoning improvement 85% tool use 80% agentic contexts 80% hallucination reduction 75% prompt injection resistance 75% pricing and availability 70%