OWASP Agent Memory Guard prevents AI weaponization via memory

▼ Summary
– Agent Memory Guard is an open-source runtime defense layer that screens all reads and writes between an AI agent and its memory store, using detectors and a YAML policy to enforce security.
– The guard uses five detection categories: SHA-256 baselines, prompt injection markers, secret/PII leakage, protected-key modifications, and size anomalies, with actions including allow, redact, quarantine, or block.
– Benchmark results show 92.5% recall, 100% precision, zero false positives, and median latency of 59 microseconds across 55 test cases; prompt injection and protected-key tampering scored 100%.
– Three misses occurred: two due to fixed-length regex patterns missing slightly longer API tokens, and one from a nested JSON just under the 64KB threshold, with adaptive fixes planned for v0.3.0.
– Future plans include adaptive evasion testing via AgentThreatBench, ML-based anomaly detection in v0.4.0, and a plugin interface for custom detectors in v0.3.0.
AI agents store information across sessions, relying on conversation history, vector stores, scratchpads, and RAG indexes that persist between runs. Anything written into that memory becomes a privileged input the agent reads back later. An attacker who injects text into the wrong field can override an agent’s instructions, extract user data, or manipulate future tool calls, and the damage lingers across sessions because the memory does.
Agent Memory Guard is an open-source runtime defense layer that sits between an agent and its memory store. It screens every read and write through a pipeline of detectors and a YAML policy. The project serves as the OWASP reference implementation for ASI06, Memory Poisoning, a category in the OWASP Top 10 for Agentic Applications.
The guard runs five core detection categories. SHA-256 baselines flag out-of-band tampering with immutable keys. Built-in detectors scan for prompt injection markers, secret and PII leakage, protected-key modifications, and size anomalies. A YAML policy maps each finding to one of four actions: allow, redact, quarantine, or block. Every decision generates a structured SecurityEvent, and point-in-time snapshots let operators roll memory back to a known-good state. A drop-in chat history class covers LangChain, and a middleware package screens model inputs, outputs, and tool outputs.
Benchmark results
The benchmark runs 55 test cases through five detectors: 40 attack payloads across four categories and 15 benign samples. Recall reached 92.5%, precision hit 100%, and the false positive rate stayed at zero, with median latency of 59 microseconds. Prompt injection and protected-key tampering each scored 100%. Sensitive data leakage hit 83%, and size anomaly reached 80%. The confusion matrix shows 37 true positives, three false negatives, and zero false positives.
Where the detectors miss
“Both missed payloads are API tokens whose length slightly exceeds the fixed-length regex pattern,” Vaishnavi Gudur, the project creator and OWASP project leader, told Help Net Security about the sensitive-data category. One was a GitHub personal access token with 37 characters after the ghp_ prefix where the detector expects 36, and the other a Google API key with 38 characters after the AIza prefix where it expects 35.
The leakage detector uses fixed-length quantifiers, a deliberate choice that favors precision and cuts false positives on random alphanumeric strings, but it goes stale when providers extend their token formats. The third miss was a nested JSON structure serializing to 58,913 bytes, sitting just under the 64KB threshold. A second check for tenfold growth against a key’s prior value would catch it in production. The benchmark runs each test on a fresh guard with no prior state. Gudur said higher-recall regex variants and adaptive threshold calibration are planned for v0.3.0.
Evasion and the road ahead
Open-source code and a visible YAML policy let an attacker read the rules. “The current rule-based detectors are a first layer,” Gudur said, describing a defense-in-depth design where teams with higher threat models layer additional detection on top of the open-source layer. Protected-key checks operate on the key path, so knowing the rule gives no bypass, and SHA-256 integrity produces a deterministic mismatch on any altered immutable value. Sensitive-data matching is more exposed, since encoding through base64, character splitting, or homoglyphs can dodge a detector that lacks normalization before matching.
Adaptive evasion testing is planned. AgentThreatBench, now merged into the inspect_evals framework, will add an evasion-aware payload set built with knowledge of the published rules. On defense, v0.4.0 adds ML-based anomaly detection on semantic features, and v0.3.0 adds a plugin interface for custom detectors that teams can keep out of the open YAML.
AI’s role in the build
“GitHub Copilot was used for boilerplate and scaffolding,” Gudur said, citing test setup, CI/CD configuration, and the pyproject.toml file, along with draft regex patterns that were then validated against provider documentation, and README sections and docstrings.
The detector pipeline architecture, the policy-engine separation, the MemoryStore protocol, the snapshot and rollback mechanism, and the source-class provenance system were human-designed against the OWASP ASI06 threat model. The 40 benchmark payloads were curated by hand. Gudur said the intellectual contribution lies in identifying the attack surface, designing the defense, and validating it against a curated adversarial corpus, and called using Copilot for boilerplate standard practice.
OWASP Agent Memory Guard is available for free on GitHub.
(Source: Help Net Security)