Tech Giants Paid Bounties for AI Agent Bugs, Kept Flaws Quiet

▼ Summary
– Security researcher Aonan Guan hijacked AI agents from Anthropic, Google, and Microsoft via prompt injection attacks on their GitHub Actions integrations, stealing API keys and tokens.
– The attacks worked by embedding malicious instructions in trusted data sources like pull request titles and issue comments, which the agents processed as legitimate commands.
– All three companies quietly paid bug bounties but did not publish public advisories or assign CVEs, leaving users on older, vulnerable versions unaware of the risk.
– These vulnerabilities highlight a fundamental weakness where AI agents cannot reliably separate data from instructions, making any data source a potential attack vector.
– The lack of a standard disclosure framework for such AI agent flaws means risks are not systematically communicated, despite demonstrated real-world exploitation.
A significant security vulnerability has been exposed in AI agent integrations used by major technology firms, revealing a critical gap in public disclosure practices. Researcher Aonan Guan successfully executed prompt injection attacks against AI tools from Anthropic, Google, and Microsoft, specifically targeting their GitHub Actions integrations. The attacks allowed for the theft of sensitive API keys and GitHub tokens. While all three companies paid bug bounties, none issued public security advisories or assigned standard CVE identifiers, leaving users of older software versions potentially unaware of ongoing risks.
The exploits focused on a technique known as indirect prompt injection. Instead of confronting the AI model directly, Guan embedded malicious instructions within data sources the agents were programmed to trust implicitly, such as pull request titles, issue descriptions, and comments. When these AI tools ingested the tainted content as part of their automated workflows, they obediently executed the hidden commands.
In the case of Anthropic’s Claude Code Security Review, a tool designed to scan code for vulnerabilities, Guan crafted a malicious pull request title. The agent processed the injected prompt, executed commands that leaked credentials, and then posted the output, including the stolen Anthropic API key, as a public comment on the pull request. The attack against Google’s Gemini CLI Action followed a similar path. By appending a fake “trusted content section” to a GitHub issue, the researcher overrode the agent’s safety protocols, tricking it into publishing its own API key. The GitHub Copilot Agent attack was more subtle, using an HTML comment hidden within a GitHub issue that was invisible to human readers but fully parsed by the AI, leading it to follow concealed instructions without hesitation.
The response from the companies highlighted a systemic issue in vulnerability disclosure. After confirming the flaw, Anthropic paid a $100 bounty and updated its documentation but published no advisory. GitHub initially dismissed the report before paying $500. Google paid an undisclosed sum. In each instance, the absence of a public security advisory or CVE means organizations relying on outdated versions of these integrations have no formal mechanism to learn they are exposed. Vulnerability scanners will not detect the issue, and security teams lack an artifact to track.
This is not an isolated bug but a fundamental architectural weakness in how AI agents operate. Large language models inherently struggle to distinguish between data for analysis and executable instructions. Any trusted data source, be it an email, calendar invite, or code repository, becomes a potential attack vector. Recent research underscores the practical danger. Studies in early 2026 showed attack success rates exceeding 85% against popular coding agents, while other incidents demonstrated AI agents being hijacked via calendar invites or manipulated into approving millions in fraudulent purchases.
The problem is compounded by the AI agent supply chain. Security audits of public marketplaces have found a high prevalence of flawed third-party “skills” that agents can incorporate, creating cascading risks. When an agent grants external tools the same level of trust as its core instructions, a single compromised component can jeopardize an entire development pipeline.
The industry currently lacks a standardized disclosure framework for AI agent vulnerabilities. Because prompt injection exploits emergent model behavior rather than a specific code flaw, companies seem hesitant to treat them with the same urgency as traditional software bugs. However, the impact is identical: a stolen API key enables the same damage regardless of how it was obtained. With most major agent-building frameworks placing the security burden on deploying organizations, the silence from vendors creates a dangerous knowledge gap. For businesses integrating these powerful tools into sensitive workflows, the message is clear: the access that makes AI agents useful also makes them prime targets, and the necessary safeguards for responsible disclosure have not kept pace.
(Source: The Next Web)




