MIT Study: AI Agents Are Fast, Loose, and Out of Control

▼ Summary
– A major study of 30 agentic AI systems reveals a widespread lack of transparency, with most failing to disclose risks, security testing, or performance benchmarks.
– These AI agents often operate without basic safety controls, such as clear usage monitoring, the ability to stop them during execution, or disclosure of their automated nature to users.
– Specific examples highlight severe security flaws, like Perplexity’s Comet browser lacking documented safety evaluations, while OpenAI’s ChatGPT Agent is noted for its better tracking capabilities.
– The autonomy of agentic AI, which allows it to perform multi-step tasks like handling emails or database entries, amplifies the potential risks posed by these security and control gaps.
– The report concludes that responsibility for addressing these critical shortcomings in documentation, safety auditing, and user control lies squarely with the AI developers like OpenAI, Anthropic, and Google.
A new comprehensive study from MIT and collaborating institutions paints a concerning picture of the current state of agentic AI, revealing widespread gaps in security, transparency, and user control. The research, which surveyed 30 prominent agentic AI systems, found a striking lack of basic disclosure about risks, monitoring capabilities, and safety protocols. This comes as the technology, which grants AI programs a degree of autonomy to perform multi-step tasks, moves rapidly into mainstream use.
The core issue is a profound lack of transparency from developers. The study authors systematically examined eight categories of disclosure and found that most agent systems provide little to no information in critical areas. Omissions range from failing to disclose potential risks to offering no details about third-party security testing or performance benchmarks. This makes it exceptionally difficult for organizations to understand what could go wrong when deploying these tools.
A significant finding is the inability to monitor or control these agents effectively. For many enterprise systems, it’s unclear if there is any way to track an agent’s specific actions in real time. Alarmingly, the report notes that several agents, including offerings from Alibaba, HubSpot, IBM, and n8n, lack any documented method to stop an autonomous agent once it has been deployed. In some cases, the only option is to shut down all agents entirely, a scenario that could be disastrous if a single agent begins causing harm.
Furthermore, most of these AI agents do not identify themselves as automated systems. They typically do not signal their AI nature to end-users or to third-party websites, bypassing standard protocols like `robots.txt` files. This lack of identification masks their activity, making it impossible to distinguish between human and bot interactions, a practice that has already led to legal action, as noted in Amazon’s lawsuit against Perplexity.
The study focused on analyzing publicly available documentation, demos, and governance papers from three categories of agentic AI: enhanced chatbots like Claude Code, specialized AI browsers like OpenAI’s Atlas, and enterprise platforms such as Microsoft’s Copilot. While the research did not test for real-world failures, it highlighted stark contrasts between different providers through anecdotal examples.
OpenAI’s ChatGPT Agent was noted as a positive example for its use of cryptographic signatures to track its web requests. In stark contrast, Perplexity’s Comet browser was described as having no documented safety evaluations, third-party testing, or sandboxing approaches beyond basic prompt-injection mitigations. HubSpot’s Breeze agents presented a mixed bag, boasting certifications for standards like GDPR and HIPAA but providing no substantive details about the methodology or results of its security evaluations.
The report concludes that the responsibility for addressing these serious gaps rests entirely with the developers. Agentic AI systems are the product of deliberate human design choices. Companies like OpenAI, Anthropic, Google, and Perplexity must proactively document their software’s limitations, conduct rigorous safety audits, and implement robust user controls. If the industry does not step up to remedy these transparency and security failures, it will inevitably face stringent external regulation. The governance challenges identified are only set to intensify as these autonomous capabilities continue to expand.
(Source: ZDNET)





