Topic: adversarial poetry
-
AI Researchers Withhold 'Dangerous' AI Incantations
Researchers discovered that crafting harmful prompts into poetry can bypass the safety guardrails of major AI systems, exposing a critical weakness in their alignment. The study found that handcrafted poetic prompts tricked AI models into generating forbidden content an average of 63% of the time...
Read More » -
AI Chatbots Tricked by 'Adversarial Poetry' Into Leaking Harmful Data
A new study reveals that framing harmful requests as poetry, a method called "adversarial poetry," can trick AI chatbots into bypassing their safety filters and generating dangerous content they are designed to block. Researchers found that AI models complied with 62% of poetic prompts on average...
Read More »