AI & Tech Artificial Intelligence Cybersecurity Newswire Technology

IronCurtain: Open-Source Security for Autonomous AI

March 1, 2026Last Updated: March 1, 2026

3 minutes read

Glowing figure inside geometric structure, green and black.

Originally published on: February 28, 2026

▼ Summary

– IronCurtain is an open-source software solution designed to prevent unauthorized actions by autonomous AI agents, such as those caused by prompt injection or intent drift.
– It works by isolating the agent in a virtual machine and routing its intended actions through a separate, trusted policy engine for analysis before execution.
– This policy engine uses a user-defined “constitution,” written in plain English, which is compiled into security rules to decide whether to allow, deny, or escalate each action.
– The system ensures the agent cannot directly access sensitive system resources like the filesystem, credentials, or its own policy files and logs.
– Currently an early research project, IronCurtain’s code has been publicly released for testing and improvement by developers and security researchers.

A new open-source security framework is emerging to address the growing concern of autonomous AI agents acting without proper authorization. Developed by veteran security engineer Niels Provos, the system called IronCurtain is designed to prevent AI agents from “going rogue,” whether through deliberate prompt injection attacks or gradual mission drift during extended operations. The core idea is to establish a secure barrier between the AI’s intentions and the user’s system, ensuring every action is vetted before execution.

Recent months have highlighted risks where autonomous agents, powered by large language models (LLMs), can deviate from their programmed tasks. IronCurtain tackles this by fundamentally changing how an agent interacts with its environment. Instead of granting direct system access, the framework forces all agent activity through a separate, trusted review process. Every agent, whether a simple LLM session or a more complex tool like Claude Code running in Docker, follows this same secure pipeline.

Here is how the security mechanism operates in practice. After receiving a user instruction, the agent generates TypeScript code that runs within an isolated V8 virtual machine. This code issues typed function calls, which correspond to requests for external tools via the Model Context Protocol (MCP). These tool-call requests are not sent directly to their destination. Instead, they are forwarded to a trusted MCP proxy that functions as a policy engine. This engine decides whether to allow, deny, or escalate each request for human approval.

The policy engine’s decisions are not arbitrary; they are governed by a user-defined “constitution.” This is a set of principles and specific guidance written in plain English. IronCurtain then translates this human-readable constitution into an enforceable security policy through a multi-step compilation process. A compiler LLM converts the English rules into per-interface regulations using a library of verified policy components. To ensure robustness, a test scenario generator creates cases to find gaps, and a verifier checks that the compiled rules match the original intent. A judge LLM then iteratively refines the policy until it aligns as closely as possible with the spirit of the constitution.

Policy evaluation occurs in two distinct phases. First, structural invariants are applied: access to protected paths like the constitution itself or audit logs is always denied, actions contained within a sandbox are automatically allowed, and requests to unknown tools are rejected. Second, the compiled policy rules are enforced, with each argument in a tool call checked against the constitution-derived regulations. Each argument role is evaluated independently, and the most restrictive result determines the final decision.

Only after a tool-call request is approved by the policy engine or a human does it proceed to standard MCP servers, which provide capabilities like filesystem access or git operations. The results then flow back through the trusted process to the agent. Critically, the AI agent is never granted direct access to sensitive system components. This includes the user’s filesystem, credentials such as OAuth tokens and API keys, environment variables, and its own policy files, audit logs, or configuration settings.

IronCurtain is currently an early-stage research project, and its code has been released publicly to encourage testing and collaboration from developers and security researchers. This open-source approach aims to foster community-driven improvements, building a stronger defense against the unpredictable behaviors of increasingly autonomous AI systems.

(Source: HelpNet Security)