Topic: post-deployment testing

Sort by: Relevance | Date

February 10, 2026
85%
Microsoft's AI guardrails bypassed with a single prompt
Modern AI safety systems are surprisingly fragile, as a single, carefully crafted prompt can often bypass established guardrails, raising urgent questions about long-term reliability. Researchers used a technique called GRPO Obliteration to steer AI models away from safety constraints by rewardin...
Read More »