Anti-Distillation Mechanisms

Technical analysis of how fake tools are injected to poison training data from API scrapers, with discussion of how easily these protections could be bypassed by determined actors

Commenters highlight the deep irony of frontier AI companies—built on vast amounts of scraped data—deploying "poisoning" tactics like fake tool injection to prevent competitors from scraping their own outputs. The accidental leak of these mechanisms via source maps is viewed as a major strategic failure that both neutralizes the secrecy required for the defense to work and provides rivals with a detailed roadmap of unreleased models. Beyond the technical bypasses, users expressed frustration that such anti-distillation measures could intentionally degrade the experience for paying customers who are incorrectly flagged as copycats. Ultimately, the community sees these protections as a losing battle, noting that the "secret sauce" is now easily filtered out by the very actors the mechanisms were designed to thwart.

View on HN · Topics

There's a more worrying part: It refers to unreleased versions of Claude in more detail than released versions.

For a company calling chinese companies out for distillation attacks on their models, this very much looks like a distillation attack against human maintainers, especially when combined with the frustration detector.

View on HN · Topics

> "Anti-distillation: injecting fake tools to poison copycats"

Plot twist: Chinese competitors end up developing real, useful versions of Claude's fake tools.

View on HN · Topics

I cannot bring myself to care about distillation, when these companies have built their empires on top of everyone else's stolen data, while at the same time telling the world they're out to replace us all.

View on HN · Topics

Amazing that people on HN can't distinguish between training a model on open source data vs distilling a model's outputs.

View on HN · Topics

Tbh, I think distillation is happening both ways. And at this stage, "quality" is stagnating, the main edge is the tooling. The harness of CC seems to be the best so far, and I wonder if this leak would equalize the usability.

View on HN · Topics

Definitely. We can expect zAI, Qwen, Minimax CCs very soon

View on HN · Topics

more likely, they would parse them out using simple regex, the whole point is they're there but not used. Distillation is becoming less common now however

View on HN · Topics

This was my favorite bit, "We're going to steal countless copy righted works and completely ignore software licenc... wait, what? You aren't allowed to turn around and do it to us! Stop that right now!"

View on HN · Topics

Two things worth separating here: the leak mechanism and the leak contents.

The mechanism is a build pipeline issue. Bun generates source maps by default, and someone didn't exclude the .map file from the npm publish. There's an open Bun issue (oven-sh/bun#28001) about this exact behavior. One missing line in .npmignore or the package.json files field. Same category of error as the Axios compromise earlier this week — npm packaging configuration is becoming a recurring single point of failure across the ecosystem.

The contents are more interesting from a security architecture perspective. The anti-distillation system (injecting fake tool definitions to poison training data scraped from API traffic) is a defensive measure that only works when its existence is secret. Now that it's public, anyone training on Claude Code API traffic knows to filter for it. The strategic value evaporated the moment the .map file hit the CDN.

The undercover mode discussion is being framed as deception, but the actual security question is narrower: should AI-authored contributions to public repositories carry attribution? That's an AI identity disclosure question that the industry hasn't settled. The code shows Anthropic made a specific product decision — strip AI attribution in public commits from employee accounts. Whether that's reasonable depends on whether you think AI authorship is material information for code reviewers.

The frustration regex is the least interesting finding technically but the most revealing culturally. A company with frontier-level NLP capability chose a regex over an inference call for sentiment detection. The engineering reason is obvious (latency and cost), but it tells you something about where even AI companies draw the line on using their own models.

View on HN · Topics

> 250,000 wasted API calls per day

How much approximate savings would this actually be?

View on HN · Topics

I am curious about these fake tools.

They would either need to lie about consuming the tokens at one point to use in another so the token counting was precise.

But that does not make sense because if someone counted the tokens by capturing the session it would certainly not match what was charged.

Unless they would charge for the fake tools anyway so you never know they were there

View on HN · Topics

The irony of an IP scraper on an absolutely breathtaking, epic scale getting its secret sauce "scraped" - because the whole app is vibe coded (and the vibe coders appear to be oblivious to things like code obfuscation cuz move fast!)...

And so now the copy cats can ofc claim this is totally not a copy at all, it's actually Opus. No license violation, no siree!

It's fucking hilarious is what it is, it's just too much.

View on HN · Topics

> Anti-distillation: injecting fake tools to poison copycats

Does this mean `huggingface.co/Jackrong/Qwen3.5-27B-Claude-4.6-Opus-Reasoning-Distilled` is unusable? Had anyone seen fake tool calls working with this model?

View on HN · Topics

Very likely Claude was trained on Deepseek, so it's possible that spiderman-pointing-at-spiderman.jpg all models are wrong now https://www.reddit.com/r/DeepSeek/comments/1r9se7p/claude_so...

View on HN · Topics

Assuming Claude Code was used. If OpenCode or some other programmatic method was used, the "fake tool calls" won't be added

View on HN · Topics

I like that if they decide that your usage looks like distillation it just becomes useless, because there’s no way for the end user to distinguish between it just being sort of crappy or sabotaged intentionally. That’s a cool thing to pay for

View on HN · Topics

The feature flag names alone are more revealing than the code. KAIROS, the anti-distillation flags, model codenames those are product strategy decisions that competitors can now plan around. You can refactor code in a week. You can't un-leak a roadmap.

Summarizer