Artificial IntelligenceCybersecurityNewswireTechnologyWhat's Buzzing

Low-skilled hacker used AI tools to breach 14 companies

▼ Summary

– An attacker used Anthropic’s Claude and OpenAI’s Codex AI agents to conduct offensive cyber operations, supplying vague prompts while the agents performed research, exploit writing, and data exfiltration.
– The attacker’s sessions were recovered because he ran the agents on a compromised server; when the owner discovered the intrusion, they shared the full session logs with researchers.
– The attacker bypassed guardrails by framing requests as authorized red team exercises or cybersecurity research, and often hijacked other users’ agent installations.
– Claude drafted “PENTEST-REPORT” documents for each target, including monetization estimates, and suggested strategies like extortion and data sale, though logs did not confirm actual monetization.
– The attacker’s operational security failures included editing a resume with his personal details and revealing his home IP address, leading researchers to identify him as a young man in Addis Ababa, Ethiopia.

For years, cybersecurity experts have warned that AI-powered tools could dramatically lower the barrier to entry for would-be hackers. A new report from OALABS (Open Analysis) now provides concrete evidence of that threat in action, after researchers recovered and analyzed more than 1,000 agent sessions from a compromised server.

The attacker had deployed Anthropic’s Claude Code and OpenAI’s Codex agents on the machine. What the analysis revealed was startling: the hacker needed very little technical skill or hands-on effort to pull off a string of breaches. In most cases, he simply issued vague, low-level prompts and let the AI do the heavy lifting.

“The attacker supplied only vague, low-skill prompts and allowed Claude to fill in the gaps: researching exposed services, identifying possible vulnerabilities, writing exploit code, validating access, and harvesting data,” the researchers wrote. “The attacker did not need to be an expert operator; they simply had to use the correct framing for their prompts. The agent supplied much of the structure and technical execution that the attacker appeared to lack.”

The sessions were recovered because of a critical operational security failure on the hacker’s part. Instead of running the AI agents on infrastructure he fully controlled, he copied them onto a server belonging to someone else. When that server’s owner discovered the intrusion, they downloaded the attacker’s entire working directory and shared it with OALABS.

Because the agents were running locally, the logs were comprehensive. They included the attacker’s prompts, the tools used, the internal monologue of the large language model (LLM), and every policy violation recorded during the sessions.

Key findings from the analysis paint a clear picture of the attacker’s methods. The Claude agent had been copied onto the host rather than freshly installed, and that instance had previously belonged to a software developer. The working directory also contained other stolen Claude instances archived in 7-Zip folders, suggesting that hijacking and reusing other people’s AI agent installations was the attacker’s standard operating procedure.

To bypass the agent’s reluctance to execute hacking requests, the attacker routinely claimed he was engaged in authorized red team exercises or cybersecurity research. He then used the agent to identify exploitable services, build custom exploits, execute them against targets, and exfiltrate data and credentials.

The prompt history shows that almost all hacking activity was driven through the Claude agent. The attacker preferred issuing vague directives like “recon this” and letting Claude carry out the requests autonomously. For each successful target, Claude drafted a PENTEST-REPORT detailing how access was gained and, more importantly, providing dollar-value “monetization” estimates for the harvested data.

“Both Claude and Codex raised the majority of their policy violation blocks during this phase, often correctly identifying that monetizing stolen data was likely not part of a legitimate redteam exercise,” the researchers noted. “However, the attacker eventually obtained a list of suggested strategies, including extortion, access and data sale, business email compromise (BEC), and direct theft of funds.”

The collected sessions documented the breach of at least 14 companies, though the logs contained no confirmation that the attacker actually monetized the stolen data or stole funds.

The attacker’s inexperience was also evident in his own operational security failures. At one point, he asked Claude to help edit his resume, which contained his full name, location, education history, and LinkedIn profile. Later, while investigating a potential compromise of one of his own hosts, he inadvertently confirmed his home IP address to the agent. Based on this and other corroborating evidence, the researchers believe the attacker is a young man based in Addis Ababa, Ethiopia.

Across more than 1,000 sessions, Claude emitted only nine policy violations, and Codex only one. In most cases, the attacker worked around them simply by reframing his request. The problem, as the researchers see it, is that the framing that bypassed the guardrails here (“authorized red team engagements”, “cybersecurity research”) is also the framing used by thousands of legitimate security professionals every day. Drawing a reliable line between the two may be an unsolvable problem.

Blunting LLMs with broader refusals is not a good solution, the researchers argue, as it would hurt defenders more than attackers. Attackers can simply turn to older or less restrictive non-frontier models, leaving defenders with fewer tools and no real security gain.

(Source: Help Net Security)

Topics

ai agent exploitation 98% guardrail bypass 95% cyber attack automation 93% operational security failures 90% attacker profiling 87% policy violation detection 85% red team framing 84% data monetization 82% session log recovery 80% target compromise 78%