Anthropic says three labs ran large-scale distillation attacks to extract Claude’s capabilities
Feb 23rd 2026
Anthropic says DeepSeek, Moonshot and MiniMax used about 24,000 fraudulent accounts to generate more than 16 million Claude exchanges to train competing models, prompting new defenses and calls for industry and policy coordination.
- Anthropic attributed industrial-scale distillation campaigns to DeepSeek, Moonshot and MiniMax that produced over 16 million exchanges via roughly 24,000 fraudulent accounts.
- DeepSeek generated about 150,000 exchanges focused on reasoning, grading as a reward model, and censorship-safe alternatives for policy-sensitive queries.
- Moonshot ran roughly 3.4 million exchanges targeting agentic reasoning, tool use, coding, and computer vision, later trying to reconstruct Claude’s reasoning traces.
- MiniMax accounted for about 13 million exchanges aimed at agentic coding and tool orchestration and shifted quickly when Anthropic released a new model.
- Attackers accessed Claude through commercial proxy services running hydra cluster architectures that rotate thousands of fraudulent accounts to evade bans.
- Distilled models are likely to lack built-in safeguards, raising national security and export control concerns if capabilities are repurposed for cyber offense, surveillance or disinformation.
- Anthropic says it is deploying classifiers and behavioral fingerprinting, sharing indicators with partners, tightening access verification, and building product and model countermeasures while urging industry and policymakers to act together.