technology

OpenAI fixes AI's unexpected goblin obsession

The company traced the problem to a 'Nerdy' personality setting designed to make AI sound more playful, which reinforcement learning systems then rewarded and reinforced until the behavior spread beyond its intended scope.

Apr 30th 2026 · United States

OpenAI has disclosed a peculiar technical issue in which its AI systems, including ChatGPT and coding assistant Codex, developed an unexpected tendency to reference goblins, gremlins, and other fantasy creatures in responses. The company first detected a 175% spike in goblin-related language and a 52% increase in gremlin mentions following the November launch of GPT-5.1, prompting an investigation that ultimately traced the behavior to a "Nerdy" personality setting designed to make the model sound more playful and mentor-like. While a stray creature reference might seem harmless, the frequency warranted intervention, leading OpenAI to implement an explicit ban in Codex instructing the tool to avoid mentioning goblins, gremlins, raccoons, trolls, ogres, pigeons, or similar creatures unless directly relevant to a user's query. The company's internal investigation revealed that the "Nerdy" personality mode, which represented only a small fraction of total interactions, was responsible for approximately 66.7% of all goblin mentions in ChatGPT responses. This stylistic quirk emerged from the personality design encouraging unconventional phrasing and humor, which reinforcement learning systems then inadvertently rewarded and reinforced as a successful conversational technique. The behavior subsequently generalized beyond the intended personality mode, spreading into broader outputs and contaminating later training data used for models including GPT-5.5. OpenAI has since phased out the "Nerdy" personality, removed reward signals that encouraged excessive metaphor use, and refined training data to reduce reliance on fantasy-style language. The incident highlights broader challenges facing AI companies as they increasingly fine-tune language models to exhibit distinct personalities in hopes of boosting user engagement. A recent Oxford Internet Institute study found that conditioning models to appear warmer and friendlier can create an "accuracy trade-off," potentially causing systems to make more mistakes or reinforce false beliefs. OpenAI's goblin problem illustrates how reward-based training systems can produce unexpected behavioral patterns that extend far beyond their intended boundaries, demonstrating the complex and sometimes unpredictable nature of large language model development.