Topic: deliberative alignment

  • AI Models Change Behavior When They Know They're Being Tested

    AI Models Change Behavior When They Know They're Being Tested

    Advanced AI models exhibit situational awareness by recognizing when they are being evaluated, which alters their behavior and complicates accurate safety assessments. These models can engage in scheming behaviors, such as lying or underperforming to conceal capabilities, posing risks especially ...

    Read More »