OpenAI Reveals How ChatGPT’s Nerdy Goblin Persona Spun Out of Control

▼ Summary
– OpenAI’s Codex CLI system prompt for GPT-5.5 included an instruction to avoid mentioning goblins, gremlins, and other creatures unless absolutely relevant.
– The “goblin” problem began with GPT-5.1 in November, causing a 175% increase in goblin mentions and a 52% increase in gremlin mentions.
– The root cause was a “Nerdy” personality setting with a system prompt to “undercut pretension through playful use of language,” which a reward signal favored.
– OpenAI retired the Nerdy personality, removed the reward signal, and filtered training data containing creature words to fix the issue.
– The company added the developer-prompt instruction to GPT-5.5 because it had already been trained before the root cause was discovered.
Earlier this week, OpenAI made headlines when it open-sourced its coding agent, Codex CLI, and included a GitHub document that contained a highly specific system prompt for GPT-5.5. In coding scenarios, the model was told to never mention “goblins, gremlins, raccoons, trolls, ogres, pigeons, or other animals or creatures” unless those references were “absolutely and unambiguously relevant” to what the user had asked. That odd instruction, which appeared twice in the prompt, left many wondering why OpenAI felt compelled to include it.
Now, the company has finally shed light on the bizarre directive and the chain of events that made it necessary. For roughly a year, some ChatGPT users had noticed the model’s peculiar tendency to bring up goblins, gremlins, trolls, and similar creatures in its replies. This quirky behavior seemed to intensify as newer versions of the model were rolled out.
Even OpenAI CEO Sam Altman acknowledged the issue in a post on X Monday morning. “Feels like codex is having a ChatGPT moment,” he wrote, quickly correcting himself: “I meant a goblin moment, sorry.”
Later that same day, OpenAI published a blog post that explained the strange phenomenon and how the company ultimately tackled it. According to the post, the model’s goblin obsession first surfaced after the release of GPT-5.1 in November. User complaints about overly familiar responses prompted an internal investigation. A safety researcher, after repeatedly encountering the words “goblin” and “gremlin” while using the model, suggested adding them to the review.
The investigation revealed that mentions of “goblin” in ChatGPT had spiked by 175% following the GPT-5.1 launch, while references to “gremlin” rose by 52%. At first, OpenAI didn’t view this as a major concern. But just a few months later, as the company wrote in the blog, “the goblins came back to haunt us.”
By March, with GPT-5.4’s release, creature references had climbed even higher. Some users took to online forums to complain that the word “goblin” was popping up in “almost every conversation.” That triggered another internal analysis, which finally uncovered the root cause. OpenAI found that these creature mentions were especially frequent in responses generated for users who had selected the model’s “Nerdy” personality setting. That personality included a system prompt that instructed the model to “undercut pretension through playful use of language.”
Using its coding agent Codex, OpenAI compared outputs from reinforcement learning training that included words like “goblin” and “gremlin” against those that did not. The company discovered that one reward signal consistently favored responses containing those creature names, scoring them higher than similar answers that omitted them.
The researchers also noticed that mentions of goblins, gremlins, and other creatures had started spreading beyond the Nerdy personality. “Once a style tic is rewarded, later training can spread or reinforce it elsewhere, especially if those outputs are reused in supervised fine-tuning or preference data,” the blog explained.
To solve the problem, OpenAI said it retired the Nerdy personality, removed the reward signal that encouraged goblin mentions, and filtered training data that contained creature words. However, because GPT-5.5 had already begun training before the root cause was identified, it too developed a strange fixation on goblins. That’s why OpenAI added the developer-prompt instruction later spotted in the model’s open-source code, a move meant to curb inappropriate references to goblins and gremlins.
“Depending on who you ask, the goblins are a delightful or annoying quirk of the model,” OpenAI wrote in the blog. “But they are also a powerful example of how reward signals can shape model behavior in unexpected ways, and how models can learn to generalize rewards in certain situations to unrelated ones.”
(Source: Gizmodo.com)



