Artificial IntelligenceBigTech CompaniesNewswireQuick ReadsTechnology

OpenAI avoids discussing goblins

▼ Summary

– OpenAI published an explanation after Wired reported that its coding model was instructed to avoid mentioning goblins and other creatures.
– The company says goblin references began with GPT-5.1’s “Nerdy” personality and worsened because reinforcement training rewarded the quirky metaphors.
– Reinforcement learning allowed the goblin habit to spread beyond the Nerdy condition, as rewarded behaviors can be reinforced in later training.
– Discontinuing the Nerdy personality in March reduced goblin references, but they persisted in GPT-5.5 within Codex, requiring specific instructions to avoid them.
– OpenAI shared a method to reverse the instructions for users who want AI code to include goblin references.

OpenAI has finally addressed its unusual “goblin problem.” Following a report from Wired that exposed explicit instructions telling the company’s coding model to “never talk about goblins, gremlins, raccoons, trolls, ogres, pigeons, or other animals or creatures,” the AI startup posted an explanation on its website. The company described these references as a “strange habit” its models picked up during training.

According to the blog post, OpenAI first noticed the tendency to mention goblins and similar creatures with the release of its GPT-5.1 model, particularly when users selected the “Nerdy” personality option. The problem only intensified with each new model release. OpenAI traced the issue to its reinforcement training, which inadvertently rewarded these quirky metaphors when used alongside the Nerdy personality. Newer models then continued training on this rewarded behavior.

The rewards were limited to the Nerdy condition, but reinforcement learning doesn’t always keep learned behaviors confined to their original context. When a stylistic quirk gets rewarded, later training cycles can spread or reinforce it elsewhere, especially if those outputs are reused in supervised fine-tuning or preference data.

After OpenAI discontinued the Nerdy personality in March, references to goblins and gremlins decreased but didn’t vanish entirely. With GPT-5.5 inside its Codex coding tool, the issue persisted because the company began training the model before identifying the “root cause.” As a workaround, OpenAI had to give Codex very specific directives to avoid mentioning these mythological creatures. However, for users who want their AI code to include a touch of goblin flair, OpenAI has shared a method to reverse those instructions.

(Source: The Verge)

Topics

ai training issues 95% openai goblins 92% reinforcement learning 88% model personalities 85% coding tools 82% wired report 78% gpt model versions 75% ai training data 72% supervised fine-tuning 70% preference data 68%