Compute Constraints Speculation

Discussion of whether all issues stem from Anthropic being compute-constrained, unable to meet demand, and forced to make degrading tradeoffs rather than refusing customers

Speculation suggests Anthropic is trapped in a severe compute-supply crisis, forcing the company to prioritize server stability over model intelligence through aggressive token purging and "adaptive reasoning" limits. While some view these changes as cynical cost-cutting designed to polish balance sheets for a future IPO, others argue they are the inevitable result of managing complex, rapidly scaling systems within the physical constraints of global hardware scarcity. This tension leaves users debating whether Anthropic is making necessary engineering trade-offs or is simply "overselling" a product it can no longer afford to run at full capacity. Ultimately, the discussion highlights a fundamental concern that the drive for user growth is beginning to compromise the high-reasoning quality that originally defined the brand.

View on HN · Topics

Because it significantly increases actual costs for Anthropic.

If they ignored this then all users who don’t do this much would have to subsidize the people who do.

View on HN · Topics

Genuine question: is the cost to keep a persistent warmed cache for sessions idling for hours/days not significant when done for hundreds of thousands of users? Wouldn’t it pose a resource constraint on Anthropic at some point?

View on HN · Topics

Sure, it wouldn’t make sense if they only had one customer to serve :)

View on HN · Topics

A strange view. The trade-off has nothing to do with a specific ideology or notable selfishness. It is an intrinsic limitation of the algorithms, which anybody could reasonably learn about.

Sure, the exact choice on the trade-off, changing that choice, and having a pretty product-breaking bug as a result, are much more opaque. But I was responding to somebody who was surprised there's any trade-off at all. Computers don't give you infinite resources, whether or not they're "servers," "in the cloud," or "AI."

View on HN · Topics

what matters isn't that it's a cache; what matter is it's cached _in the GPU/NPU_ memory and taking up space from another user's active session; to keep that cache in the GPU is a nonstarter for an oversold product. Even putting into cold storage means they still have to load it at the cost of the compute, generally speaking because it again, takes up space from an oversold product.

View on HN · Topics

Whats lost on this thread is these caches are in very tight supply - they are literally on the GPUs running inference. the GPUs must load all the tokens in the conversation (expensive) and then continuing the conversation can leverage the GPU cache to avoid re-loading the full context up to that point. but obviously GPUs are in super tight supply, so if a thread has been dead for a while, they need to re-use the GPU for other customers.

View on HN · Topics

As with everything Anthropic recently this is a supply constraint issue. They have not planned for scale adequately.

View on HN · Topics

It astounds me that a company valued in the hundreds-of-billions-of-dollars has written this. One of the following must be true:

1. They actually believed latency reduction was worth compromising output quality for sessions that have already been long idle. Moreover, they thought doing so was better than showing a loading indicator or some other means of communicating to the user that context is being loaded.

2. What I suspect actually happened: they wanted to cost-reduce idle sessions to the bare minimum, and "latency" is a convenient-enough excuse to pass muster in a blog post explaining a resulting bug.

View on HN · Topics

It’s definitely a cost / resource saving strategy on their end.

View on HN · Topics

It's also a bit of a fishy explanation for purging tokens older than an hour. This happens to also be their cache limit. I doubt it is incidental that this change would also dramatically drop their cost.

View on HN · Topics

> All of this points to their priorities not being aligned with their users’.

Framing this as "aligned" or "not aligned" ignores the interesting reality in the middle. It is banal to say an organization isn't perfectly aligned with its customers.

I'm not disagreeing with the commenter's frustration. But I think it can help to try something out: take say the top three companies whose product you interact with on a regular basis. Take stock of (1) how fast that technology is moving; (2) how often things break from your POV; (3) how soon the company acknowledges it; (4) how long it takes for a fix. Then ask "if a friend of yours (competent and hard working) was working there, would I give the company more credit?"

My overall feel is that people underestimate the complexity of the systems at Anthropic and the chaos of the growth.

These kind of conversations are a sort of window into people's expectations and their ability to envision the possible explanations of what is happening at Anthropic.

View on HN · Topics

I'd rather not speak too poorly of Anthropic, because - to the extent I can bring myself to like a tech company - I like Anthropic.

That said, the copy uses "we never intentionally degrade our models" to mean something like "we never degrade one facet of our models unless it improves some other facet of our models" . This is a cop out, because it is what users suspected and complained about. What users want - regardless of whether it is realistic to expect - is for Anthropic to buy even more compute than Anthropic already does, so that the models remain equally smart even if the service demand increases.

View on HN · Topics

I'm aiming for intellectual honesty here. I'm not taking a side for a person or an org, but I'm taking a stand for a quality bar.

> They knew they had deliberately made their system worse

Define "they". The teams that made particular changes? In real-world organizations, not all relevant information flows to all the right places at the right time. Mistakes happen because these are complex systems.

Define "worse". There are lot of factors involved. With a given amount of capacity at a given time, some aspect of "quality" has to give. So "quality" is a judgment call. It is easy to use a non-charitable definition to "gotcha" someone. (Some concepts are inherently indefensible. Sometimes you just can't win. "Quality" is one of those things. As soon as I define quality one way, you can attack me by defining it another way. A particular version of this principle is explained in The Alignment Problem by Brian Christian, by the way, regarding predictive policing iirc.)

I'm seeing a lot of moral outrage but not enough intellectual curiosity. It embarrassingly easy to say "they should have done better" ... ok. Until someone demonstrates to me they understand the complexity of a nearly-billion dollar company rapidly scaling with new technology, growing faster than most people comprehend, I think ... they are just complaining and cooking up reasons so they are right in feeling that way. This possible truth: complex systems are hard to do well apparently doesn't scratch that itch for many people. So they reach for blame . This is not the way to learn. Blaming tends to cut off curiosity.

I suggest this instead: redirect if you can to "what makes these things so complicated?" and go learn about that. You'll be happier, smarter, and ... most importantly ... be building a habit that will serve you well in life. Take it from an old guy who is late to the game on this. I've bailed on companies because "I thought I knew better". :/

View on HN · Topics

Same here. I feel like all of these shenanigans could be because Anthropic are compute constrained, forcing then to take reckless risks around reducing it.

View on HN · Topics

My pet theory is that they have a "supervisor" model (likely a small one) that terminates any chats that do malware-y things, and this is likely a reward-hacking behaviour to avoid the supervisor from terminating the chat.

View on HN · Topics

None of these companies have compute to spare. It’s not in their interest to use more tokens that necessary.

View on HN · Topics

Not true - they absolutely want to goose demand as they continue to burn investor dollars and deploy infra at scale.

If that demand evens slows down in the slightest the whole bubble collapses.

Growth + Demand >> efficiency or $ spend at their current stage. Efficiency is a mature company/industry game.

View on HN · Topics

You don’t have to use compute to pad the token count.

View on HN · Topics

They need to keep up with demand, because compute resources are clearly limited. That means they have no choice but to add these features, or things break, or they have to stop taking new customers. All of those options are unacceptable.

View on HN · Topics

No. Pausing development does not make compute (you know, physical machines?) appear out of thin air.

View on HN · Topics

On the other hand, sacrificing your paying customers at the altar of compute and tokens does not make money appear out of thin air.

View on HN · Topics

It was odd that there was no mention of the forced adaptive reasoning in the article. My guess is they don't have enough compute to do anything else here.

View on HN · Topics

Compute is limited worldwide. No amount of money can make these compute platforms appear overnight. They are buying time because the only other option is to stop accepting customers.

View on HN · Topics

They would honestly have been better off refusing customers if compute is so limited. Degrading the quality leads to customers leaving in the short term, and ruins their long term reputation.

But in either case, if compute is so limited, they’ll have to compete with local coding agents. Qwen3.6-27B is good enough to beat having to wait until 5PM for your Claude Code limit to reset.

View on HN · Topics

To have some confidence in consistency of results (p-value), one has to start from cohort of around 30, if I remember correctly. This is 1.5 orders of magnitude increase of computing power needed to find (absence of) consistent changes of agent's behavior.

View on HN · Topics

Claude code is not infra, the model is the infra. They changed settings to make their models faster and probably cheaper to run too. Honestly with adaptive thinking it no longer matters what model it is if you can dynamically make it do less or more work.

View on HN · Topics

An interesting question to wonder is why these optimizations were pushed so aggressively in the first place. Especially given this is the time they were running a 2x promotion, by themselves, without presumably seeing any slowdown in demand.

View on HN · Topics

Apart from Anthropic nobody knows how much the average user costs them. However the consensus is "much more than that".

If they have to raise prices to stop hemorrhaging money, would you be willing to pay 1000 bucks a month for a max plan? Or 100$ per 1M pitput tokens (playing numberWang here, but the point stands).

If I have to guess they are trying to get balance sheet in order for an IPO and they basically have 3 ways of achieving that:

1. Raising prices like you said, but the user drop could be catastrophic for the IPO itself and so they won't do that

2. Dumb the models down (basically decreasing their cost per token)

3. Send less tokens (ie capping thinking budgets aggressively).

2 and 3 are palatable because, even if they annoying the technical crowd, investors still see a big number of active users with a positive margin for each.

View on HN · Topics

https://marginlab.ai/ (no affiliation)

There are a number of projects working on evals that can check how 'smart' a model is, but the methodology is tricky.

One would want to run the exact same prompt, every day, at different times of the day, but if the eval prompt(s) are complex, the frontier lab could have a 'meta-cognitive' layer that looks for repetitive prompts, and either:
a) feeds the model a pre-written output to give to the user
b) dumbs down output for that specific prompt

Both cases defeat the purpose in different ways, and make a consistent gauge difficult. And it would make sense for them to do that since you're 'wasting' compute compared to the new prompts others are writing.

View on HN · Topics

Interesting. All 3 seems like they’re obviously going to impact quality. e.g, reducing the effort from high to medium.

So then, there must have been an explicit internal guidance/policy that allowed this tradeoff to happen.

Did they fix just the bug or the deeper policy issue?

View on HN · Topics

> On March 4, we changed Claude Code's default reasoning effort from high to medium to reduce the very long latency—enough to make the UI appear frozen—some users were seeing in high mode.

Translation: To reduce the load on our servers.

Summarizer