New ChatGPT Attack Steals Secrets from Gmail Inboxes

▼ Summary
– Prompt injections remain an unpreventable vulnerability in LLMs, similar to memory-corruption and SQL injection flaws in other systems.
– OpenAI and other LLM providers rely on case-by-case mitigations, often implemented only after specific exploits are discovered and reported.
– A proof-of-concept attack successfully used a prompt injection to make an AI assistant scan emails and exfiltrate employee names and addresses.
– Current mitigations block information exfiltration channels by requiring explicit user consent before AI assistants can click links or use markdown.
– The attack ultimately succeeded by using the browser.open tool to autonomously access a URL and transmit sensitive data through parameters.
A newly identified attack method has successfully extracted sensitive information from Gmail inboxes using manipulated prompts directed at AI assistants. This technique, known as prompt injection, continues to pose a significant challenge for developers of large language models, as preventing such exploits remains notoriously difficult. Much like older vulnerabilities in software and web applications, these attacks exploit the very nature of how AI interprets and acts upon instructions.
OpenAI and other LLM providers have largely relied on reactive measures, implementing safeguards only after specific exploits are discovered. In this instance, the vulnerability was addressed only after cybersecurity firm Radware privately reported it. The proof-of-concept attack involved embedding malicious instructions within an email sent to a Gmail account accessible by an AI assistant called Deep Research. The injected prompt directed the system to scan emails from a company’s human resources department and collect employee names and addresses.
Current defenses don’t eliminate prompt injections entirely. Instead, they focus on blocking common methods of data exfiltration. For example, AI tools now typically require explicit user consent before clicking links or processing markdown, common ways stolen information could be transmitted. In this case, however, the attackers bypassed these protections by using a built-in tool called `browser.open`, which allows autonomous web browsing.
The injection specifically instructed Deep Research to visit a fabricated compliance URL and append stolen employee details as parameters. By doing so, the AI inadvertently transmitted confidential data to an external server, where it was logged and accessible to the attackers. The prompt itself was verbose and highly detailed, a result of extensive trial and error to refine instructions that would successfully bypass the AI’s built-in safeguards.
Notably, the malicious prompt could be disguised within an email as white text on a white background, rendering it invisible to human readers but fully readable by the AI. This underscores the subtlety and potential stealth of such attacks. While mitigations are in place for this particular method, the broader class of prompt injection attacks remains an ongoing and complex threat to AI-assisted systems.
(Source: Ars Technica)