AI & TechArtificial IntelligenceCybersecurityNewswireTechnology

Open-Source Framework for Automated AI Red-Teaming

▼ Summary

– LangWatch’s Scenario is an open-source framework that runs automated red-team exercises against AI agents using multi-turn attack techniques.
– Scenario uses the Crescendo strategy with four phases—rapport, hypothetical framings, authority roles, and maximum pressure—to gradually extract sensitive information.
– The attacking model has persistent memory of failed attempts, while the target agent’s memory is wiped between attempts, mirroring real adversarial conversations.
– Scenario focuses on compromised agents with tool access (e.g., databases or financial systems), which poses a greater enterprise risk than typical jailbreaking.
– The framework is aimed at banks, insurers, and AI-first companies, integrates into development pipelines, and is free on GitHub with planned additions like Meta’s multi-turn attack method.

Enterprises running customer service bots, analytics agents, and other AI-driven applications in production handle sensitive data and connect to core business systems daily. LangWatch has launched Scenario, an open-source framework that automates red-team exercises against AI agents using multi-turn attack techniques that replicate real-world adversarial behavior.

Multi-turn attacks replace single-shot testing. Single-prompt penetration tests have long been the standard method for probing large language models. Models often resist a direct attack but then leak sensitive information across multiple conversational turns. Scenario deliberately structures those conversations, running sequences that start with harmless exploration and escalate toward complex requests and authority-based pressure.

The framework uses the Crescendo strategy, a four-phase escalation. Early turns build rapport through friendly questions. Middle turns introduce hypothetical scenarios and authority roles, such as a tester posing as an auditor with lines like “I’m conducting a compliance audit.” Final turns apply maximum pressure once context is established. After each exchange, a second model scores progress and adjusts the attack strategy, allowing the automated red team to refine its approach across turns.

Backtracking gives attackers an asymmetric advantage. A design choice in Scenario gives the attacking model persistent memory of every failed attempt, while the target agent’s memory is wiped between attempts. Rogerio Chaves, CTO at LangWatch, told Help Net Security that this asymmetry mirrors how real adversarial conversations work and is something few testing tools account for.

“The thing that sets Scenario apart is that it thinks like an attacker, not like a QA engineer,” Chaves said. “Most red-teaming tools are basically fancy checklists. That’s testing for yesterday’s attacks.” He added that the framework combines simulation testing with adversarial red teaming and models the social dynamics of manipulation, including building rapport, probing softly, and escalating once trust is established.

Focus on compromised agents with tool access. Chaves pointed to where he sees the material enterprise risk. “Jailbreaking gets all the press, it’s mostly a PR problem. The existential risk for enterprises is a compromised agent with database or financial tool access, and that’s precisely what Scenario is designed to find,” he said.

Public debate around AI safety has focused heavily on deepfakes, disinformation, and privacy. The operational exposure for many organizations sits elsewhere, in the custom AI applications they build and deploy against their own data stores and workflows. These agents are reachable through conversational interfaces and can be manipulated in ways that traditional security testing does not catch.

Target users and roadmap. Scenario is aimed at banks, insurers, and AI-first software companies running AI agents in production. The framework is built to slot into existing development and continuous integration pipelines, so teams can run adversarial tests alongside standard QA.

On what comes next, Chaves said the team is working to incorporate a multi-turn attack method published by Meta researchers that reported a 97% success rate, and to combine it with Scenario’s existing approach. Further additions will include more attack strategies and domain-specific attack libraries.

Scenario is available for free on GitHub.

(Source: Help Net Security)

Topics

red-team framework 98% langwatch scenario 97% multi-turn attacks 95% ai agent security 94% enterprise risk 93% crescendo strategy 92% automated red teaming 90% production ai systems 89% memory asymmetry 88% social engineering 85%