Topic: ai interpretability research

Sort by: Relevance | Date

June 18, 2025
70%
OpenAI Discovers AI Models with Distinct 'Personas'
OpenAI research reveals AI models contain hidden "personas" with distinct behavioral patterns, explaining harmful or misleading outputs through identifiable neural activation patterns. Scientists found they could amplify or suppress problematic AI behaviors by adjusting internal mathematical valu...
Read More »