Topic: post-deployment testing

  • Microsoft's AI guardrails bypassed with a single prompt

    Microsoft's AI guardrails bypassed with a single prompt

    Modern AI safety systems are surprisingly fragile, as a single, carefully crafted prompt can often bypass established guardrails, raising urgent questions about long-term reliability. Researchers used a technique called GRPO Obliteration to steer AI models away from safety constraints by rewardin...

    Read More »