Exclusive: US Government’s Hidden AI Safety Report Revealed

▼ Summary
– AI researchers identified 139 novel ways to make AI systems misbehave, including generating misinformation and leaking data, during a red-teaming exercise in October.
– A NIST report detailing the exercise was not published due to concerns about clashing with the incoming Trump administration, according to anonymous sources.
– The Trump administration has directed NIST to revise its AI Risk Management Framework, removing references to misinformation, DEI, and climate change.
– The red-teaming event tested AI systems like Meta’s Llama and Synthesia’s avatar platform, revealing gaps in NIST’s risk assessment framework.
– Researchers found some NIST risk categories were poorly defined, limiting their practical usefulness in assessing AI vulnerabilities.
A recent cybersecurity conference in Virginia revealed startling gaps in AI safety protocols during an unprecedented stress-testing exercise. Last October, dozens of AI experts gathered in Arlington for a groundbreaking “red teaming” event, where they uncovered 139 previously unknown vulnerabilities across multiple advanced language models and artificial intelligence systems. These flaws ranged from generating false information to exposing sensitive user data, exposing critical weaknesses in newly proposed government safety standards.
The National Institute of Standards and Technology (NIST) conducted this evaluation but never released its findings. Insiders familiar with the matter suggest the report was shelved due to political concerns as leadership changed hands in Washington. One former NIST employee compared the suppression to past controversies around climate and tobacco research, where scientific findings clashed with policy agendas.
The unpublished findings clash with the current administration’s stance on AI regulation. While former President Biden’s executive order emphasized risk assessment and ethical considerations, subsequent policy shifts under new leadership have actively discouraged research into algorithmic fairness, bias, and misinformation. Ironically, despite these rollbacks, the current administration’s AI Action Plan still advocates for the very type of testing documented in the buried report, specifically calling for collaborative hackathons to evaluate AI security flaws.
Organized through NIST’s ARIA program alongside Humane Intelligence, the red-teaming event took place at the Conference on Applied Machine Learning in Information Security (CAMLIS). Teams rigorously tested several high-profile AI systems, including Meta’s open-source Llama model, Anote’s AI fine-tuning platform, Cisco-acquired Robust Intelligence’s defensive tools, and Synthesia’s avatar-generation software. Company representatives participated directly, applying NIST’s AI 600-1 framework to assess risks like misinformation propagation, data leaks, and emotional manipulation by AI.
The results exposed glaring inconsistencies in NIST’s guidelines. Researchers found ways to bypass safeguards, tricking AI into harmful behaviors, some risk categories proved useful, while others were too vaguely defined to be practical. The exercise highlighted the urgent need for clearer, more actionable standards to prevent real-world exploitation of AI vulnerabilities. Yet with the report still under wraps, crucial insights remain inaccessible to developers and policymakers alike.
Neither NIST nor the Department of Commerce has commented on why the document was withheld. As AI continues advancing at breakneck speed, the absence of transparent safety evaluations raises pressing questions about who decides what risks are worth addressing, and what gets buried along the way.
(Source: Wired)