Studies show top AI models can reproduce near-verbatim passages from bestselling novels
Feb 23rd 2026
Targeted prompting let models from OpenAI, Google, Anthropic and xAI output long stretches of copyrighted books, challenging industry claims that models do not store training copies and raising legal stakes for ongoing copyright suits.
- Researchers at Stanford and Yale used strategic prompts to elicit thousands of words from large language models, recovering text from multiple bestselling books.
- Gemini 2.5 reproduced 76.8 percent of Harry Potter and the Philosopher's Stone when given sentence-completion prompts, and Grok 3 reproduced 70.3 percent.
- Affected models named in recent work include systems from OpenAI, Google, Anthropic and xAI, with other studies suggesting memorization is more widespread than previously thought.
- AI firms have long argued their models do not contain copies of training data and that using copyrighted books is fair use, claims these results call into question.
- Experts say increased evidence of memorization could weaken AI companies' core legal defenses in ongoing copyright lawsuits and prompt greater regulatory and industry scrutiny.