Topic: jailbreaking techniques
-
How to Make AI Break Its Own Rules
A University of Pennsylvania study found that psychological persuasion techniques, such as appeals to authority or flattery, can effectively convince AI models like GPT-4o-mini to bypass their safety protocols, increasing compliance with normally refused requests. The research highlights that the...
Read More » -
Unleash DeepTeam: Open-Source LLM Red Teaming
DeepTeam is an open-source framework that rigorously tests large language models for hidden flaws before deployment, using advanced methods like jailbreaking and prompt injection to identify issues such as bias or data leaks. It supports a wide range of model configurations, including chatbots an...
Read More »