Compute Capacity Problems

Discussion of Anthropic being supply-constrained on compute, theories about inference being degraded to support training, comparison to OpenAI's early compute investments paying off strategically

Anthropic is currently grappling with a severe compute supply crunch that users suspect is forcing the company to aggressively quantize or "lobotomize" existing models like Opus to prioritize the training of frontier systems like Mythos. While some defend Anthropic’s disciplined focus, many now view OpenAI’s massive early compute investments as a strategic masterstroke that provides a critical advantage in uptime and usage limits. This shortage has fueled theories that Anthropic silently degrades performance for heavy users or shortens internal reasoning chains to manage costs, creating a precarious balancing act between maintaining model quality and scaling their infrastructure to meet surging demand.

View on HN · Topics

As far as I can tell, no one has asked this question publicly before, but I wonder if they're tinkering with quantizations in the background.

When I was exploring local inference, one of the early mistakes I made was using lower quantizations for models, and then assuming that the model sucked based on the results. It took me a while to learn that as the number of bits decreases, the fidelity of "simulation" decreases as well. But so does the total size in memory.

For example for Kimi K2 - a trillion parameter open source model, the 4-bit version is around 580GB in size. The 8-bit version is, 1.1 TB (yes TB - 1,090 GB). And the full fidelity 16-bit version is, 2.05TB.

So essentially the 4-bit version is roughly (give or take how you're packaging it) 1/4th the size of the 16-bit one. And significantly dumber.

If you have a limited amount of resources / don't want to utilize 6 GB200s to run the model, and want to make do with three much cheaper (comparatively) B200 units, then a lower quantization allows you to do that.

I wonder if Opus' inconsistent performance is because - depending on the time of the day - Anthropic routes you to a differently quantized version?

View on HN · Topics

I agree. Ever since the release of R1, it's like every single American AI company has realized that they actually do not want to show CoT, and then separately that they cannot actually run CoT models profitably. Ever since then, we've seen everyone implement a very bad dynamic-reasoning system that makes you feel like an ass for even daring to ask the model for more than 12 tokens of thought.

View on HN · Topics

They have super sustainable revenue. They are deadly supply constrained on compute, and have a really difficult balancing act over the next year or two in which they have to trade off spending that limited compute on model training so that they can stay ahead, while leaving enough of it available for customers that they can keep growing number of customers.

View on HN · Topics

But do they? When was the last time they declined your subscription because they have no compute?

View on HN · Topics

Just last week. They cut off openclaw. And they added a price increased fast mode. And they announced today new features that are not included with max subscriptions.

They are short 5GW roughly and scrambling to add it.

View on HN · Topics

Most weekdays.

https://status.claude.com/

View on HN · Topics

I don’t think this is a plausible argument, as they’re generally capacity constrained, and everyone would like shorter (= faster) responses.

I’m fairly certain that in a few more releases we’ll have models with shorter CoT chains. Whether they’ll still let us see those is another question, as it seems like Anthropic wants to start hiding their CoT, potentially because it reveals some secret sauce.

View on HN · Topics

Funny because many people here were so confident that OpenAI is going to collapse because of how much compute they pre-ordered.

But now it seems like it's a major strategic advantage. They're 2x'ing usage limits on Codex plans to steal CC customers and it seems to be working. I'm seeing a lot of goodwill for Codex and a ton of bad PR for CC.

It seems like 90% of Claude's recent problems are strictly lack of compute related.

View on HN · Topics

> people here were so confident that OpenAI is going to collapse because of how much compute they pre-ordered

That's not why. It was and is because they've been incredibly unfocused and have burnt through cash on ill-advised, expensive things like Sora. By comparison Anthropic have been very focused.

View on HN · Topics

I don't think that was the main reason for people thinking OpenAI is going to collapse here.

By far, the biggest argument was that OpenAI bet too much on compute.

Being unfocused is generally an easy fix. Just cut things that don't matter as much, which they seem to be doing.

View on HN · Topics

Nobody was talking about them betting too much on compute, people were saying that their shady deals on compute with NVIDIA and Oracle were creating a giant bubble in their attempt to get a Too Big To Fail judgement (in their words- taxpayer-backed "backstop").

View on HN · Topics

That’s just short term talk. The main thesis behind their collapse is that they won’t be able to pay their compute bills because they won’t have enough demand to.

View on HN · Topics

To me it seems like they burn so much money they can do lots of things in parallel. My guess would be that e.g. codex and sora are very independently developed. After all there's a quite a hard limit on how many bodies are beneficial to a software project.

View on HN · Topics

They all compete internally over constrained compute resources - for R&D and production.

View on HN · Topics

Seems very short term. Like how cheap Uber was initially. Like Claude was before!

Eventually OpenAI will need to stop burning money.

View on HN · Topics

In hindsight, it is painfully clear that Antropic’s conservative investment strategy has them struggling with keeping up with demand and caused their profit margin to shrink significantly as last buyer of compute.

View on HN · Topics

they've also introduced a lot of caching and token burn related bugs which makes things worse. any bug that multiplies the token burn also multiplies their infrastructure problems.

View on HN · Topics

That’s more a leadership decision because Anthropic are nerfing the model to cut costs, if they stop doing that then they’ll stay ahead.

View on HN · Topics

Most of the compute OpenAI "preordered" is vapour. And it has nothing to do with why people thought the company -- which is still in extremely rocky rapids -- was headed to bankruptcy.

Anthropic has been very disciplined and focused (overwhelmingly on coding, fwiw), while OpenAI has been bleeding money trying to be the everything AI company with no real specialty as everyone else beat them in random domains. If I had to qualify OpenAI's primary focus, it has been glazing users and making a generation of malignant narcissists.

But yes, Anthropic has been growing by leaps and bounds and has capacity issues. That's a very healthy position to be in, despite the fact that it yields the inevitable foot-stomping "I'm moving to competitor!" posts constantly.

View on HN · Topics

Possibly due to moving compute from inference to training

View on HN · Topics

My purely unfounded, gut reaction to Opus 4.7 being released today was "Oh, that explains the recent 4.6 performance - they were spinning up inference on 4.7."

Of course, I have no information on how they manage the deployment of their models across their infra.

View on HN · Topics

Before opus released we also saw huge backlash with it being dumber.

Perhaps they need the compute for the training

View on HN · Topics

Working on some research projects to test Opus 4.7.

The first thing I notice is that it never dives straight into research after the first prompt. It insists on asking follow-up questions. "I'd love to dive into researching this for you. Before I start..." The questions are usually silly, like, "What's your angle on this analysis?" It asks some form of this question as the first follow-up every time.

The second observation is "Adaptive thinking" replaces "Extended thinking" that I had with Opus 4.6. I turned this off, but I wish I had some confidence that the model was working as hard as possible (I don't want it to mysteriously limit its thinking capabilities based on what it assumes requires less thought. I'd rather control the thinking level). I always ran research prompts with extended thinking enabled on Opus 4.6, and it gave me confidence that it was taking time to get the details right.

The third observation is it'll sit in a silent state of "Creating my research plan" for several minutes without starting to burn tokens. At first I thought this was because I had 2 tabs running a research prompt at the same time, but it later happened again when nothing else was running beside it. Perhaps this is due to high demand from several people trying to test the new model.

And fourth, the research output is significantly shorter and less detailed than Opus 4.6. Where before I would get several pages of research findings, now I get a short 2-3 pager.

Overall, I feel a bit confused. It doesn't seem better than 4.6, and from a research standpoint it might be worse. It seems like it got several different "features" that I'm supposed to learn now.

View on HN · Topics

They don't have enough compute for all their customers.

OpenAI bet on more compute early on which prompted people to say they're going to go bankrupt and collapse. But now it seems like it's a major strategic advantage. They're 2x'ing usage limits on Codex plans to steal CC customers and it seems to be working.

It seems like 90% of Claude's recent problems are strictly lack of compute related.

View on HN · Topics

Is that why Anthropic recently gave out free credits for use in off-hours? Possibly an attempt to more evenly distribute their compute load throughout the day?

View on HN · Topics

i suspect they get cheap off peak electricity and compute is cheaper at those times

View on HN · Topics

That's not really how datacenter power works. It's usually a bulk buy with a 95th percentile usage.

View on HN · Topics

I think it's a lot simpler than that. At peak, gpus are all running hot. During low volume, they aren't.

View on HN · Topics

Hard for me to reconcile the idea that they don't have enough compute with the idea that they are also losing money to subsidies.

View on HN · Topics

They're saying Anthropic doesn't have enough compute, not OpenAI. They said OpenAI specifically invested early in compute at a loss.

View on HN · Topics

Model inference compute over model lifetime is ~10x of model training compute now for major providers. Expected to climb as demand for AI inference rises.

View on HN · Topics

For sure and growth also costs money for buying DCs etc.

View on HN · Topics

Its a hard game to play anyway.

Anthropics revenue is increasing very fast.

OpenAI though made crazy claims after all its responsible for the memory prices.

In parallel anthropic announced partnership with google and broadcom for gigawatts of TPU chips while also announcing their own 50 Billion invest in compute.

OpenAI always believed in compute though and i'm pretty sure plenty of people want to see what models 10x or 100x or 1000x can do.

View on HN · Topics

You state your hypnosis quite confidently.
Can you tell me how taking down authentication many times is related to GPU capacity?

View on HN · Topics

Usually they're hemorrhaging performance while training.

From that it's pretty likely they were training mythos for the last few weeks, and then distilling it to opus 4.7

Pure speculation of course, but would also explain the sudden performance gains for mythos - and why they're not releasing it to the general public (because it's the undistilled version which is too expensive to run)

View on HN · Topics

Mythos is speculated to have 10 trillion parameters. Almost certainly they were training it for months.

View on HN · Topics

Exactly. God, it wouldn't be such a problem if they didn't gaslight you and act like it was nothing. Just put up a banner that says Claude is experiencing overloaded capacity right now, so your responses might be whatever.

View on HN · Topics

What a waste of tokens. No wonder Anthropic can't serve their customers. It's not just a lack of compute, it's a ridiculous waste of the limited compute they have. I think (hope?) we look back at the insanity of all this theatre, the same way we do about GPT-2 [1].

1. https://techcrunch.com/2019/02/17/openai-text-generator-dang...

View on HN · Topics

It wouldn't be so irritating if thinking didn't start to take a lot longer for tasks of similar complexity (or maybe it's taking longer to even start to think behind the scenes due to queueing).

View on HN · Topics

They don't have the compute to make Mythos generally available: that's all there is to it. The exclusivity is also nice from a marketing pov.

View on HN · Topics

They don't have demand for the price it would require for inference.

They are definitely distilling it into a much smaller model and ~98% as good, like everybody does.

View on HN · Topics

Some people are speculating that Opus 4.7 is distilled from Mythos due to the new tokenizer (it means Opus 4.7 is a new base model, not just an improved Opus 4.6)

View on HN · Topics

Yes, I was thinking that. But it could as well be the other way around. Using the pretrained 4.7 (1T?) to speed up ~70% Mythos (10T?) pretraining.

It's just speculative decoding but for training. If they did at this scale it's quite an achievement because training is very fragile when doing these kinds of tricks.

View on HN · Topics

Reverse distillation. Using small models to bootstrap large models. Get richer signal early in the run when gradients are hectic, get the large model past the early training instability hell. Mad but it does work somewhat.

Not really similar to speculative decoding?

I don't think that's what they've done here though. It's still black magic, I'm not sure if any lab does it for frontier runs, let alone 10T scale runs.

View on HN · Topics

> They don't have demand for the price it would require for inference.

citation needed. I find it hard to believe; I think there are more than enough people willing to spend $100/Mtok for frontier capabilities to dedicate a couple racks or aisles.

View on HN · Topics

My guess is that it is just too expensive to make generally available. Sounds similar to ChatGPT 4.5 which was too expensive to be practical.

View on HN · Topics

Because it was good until January 2026, then it detoriated into a opus-3.1. Probably given much less context windows or ram.

View on HN · Topics

I've seen a similar psychological phenomenon where people like something a lot, and then they get unreasonably angry and vocal about changes to that thing.

Usage limits are necessary but I guess people expect more subsidized inference than the company can afford. So they make very angry comments online.

For example, there is no evidence that 4.6 ever degraded in quality: https://marginlab.ai/trackers/claude-code-historical-perform...

View on HN · Topics

> Usage limits are necessary but I guess people expect more subsidized inference than the company can afford. So they make very angry comments online

This is reductive. You're both calling people unreasonably angry but then acknowledging there's a limit in compute that is a practical reality for Anthropic. This isn't that hard. They have two choices, rate limit, or silently degrade to save compute.

I have never hit a rate limit, but I have seen it get noticeably stupider. It doesn't make me angry, but comments like these are a bit annoying to read, because you are trying to make people sound delusional while, at the same time, confirming everything they're saying.

I don't think they have turned a big knob that makes it stupider for everyone. I think they can see when a user is overtapping their $20 plan and silently degrade them. Because there's no alert for that. Which is why AI benchmark sites are irrelevant.

View on HN · Topics

If Claude AI is so good at coding, why can't Anthropic use it to improve Claude's uptime and fix the constant token quota issues?

View on HN · Topics

Because they just don’t have enough capacity to serve their demand ?

View on HN · Topics

It's interesting to see Opus 4.7 follow so soon after the announcement of Mythos, especially given that Anthropic are apparently capacity constrained.

Capacity is shared between model training (pre & post) and inference, so it's hard to see Anthropic deciding that it made sense, while capacity constrained, to train two frontier models at the same time...

I'm guessing that this means that Mythos is not a whole new model separate from Opus 4.6 and 4.7, but is rather based on one of these with additional RL post-training for hacking (security vulnerability exploitation).

The alternative would be that perhaps Mythos is based on a early snapshot of their next major base model, and then presumably that Opus 4.7 is just Opus 4.6 with some additional post-training (as may anyways be the case).

View on HN · Topics

So this is the norm: quantized version of the SOTA model is previous model. Full model becomes latest model. Rinse and repeat.

View on HN · Topics

> So my question is: what is the actual reason Anthropic lobotomizes the model when the new one is about to be dropped?

You can only fit one version of a model in VRAM at a time. When you have a fixed compute capacity for staging and production, you can put all of that towards production most of the time. When you need to deploy to staging to run all the benchmarks and make sure everything works before deploying to prod, you have to take some machines off the prod stack and onto the staging stack, but since you haven't yet deployed the new model to prod, all your users are now flooding that smaller prod stack.

So what everyone assumes is that they keep the same throughput with less compute by aggressively quantizing or other optimizations. When that isn't enough, you start getting first longer delays, then sporadic 500 errors, and then downtime.

View on HN · Topics

So if I understand it right, in order to free up VRAM space for a new one, model string in the api like `opus-4.6-YYYYMMDD` is not actually an identifier of the exact weight that is served, but more like ID of group of weights from heavily quantized to the real deal, but all cost the same to me?

How is this even legal?

View on HN · Topics

if Opus 4.7 or Mythos are so good how come Claude has some of the worst uptime in most online services?

View on HN · Topics

Looks completely broken on AWS Bedrock

"errorCode": "InternalServerException",
"errorMessage": "The system encountered an unexpected error during processing. Try your request again.",

View on HN · Topics

Uh oh:

> The new /ultrareview slash command produces a dedicated review session that reads through changes and flags bugs and design issues that a careful reviewer would catch. We’re giving Pro and Max Claude Code users three free ultrareviews to try it out.

More monetization a tier above max subscriptions. I just pointed openclaw at codex after a daily opus bill of $250.

As Anthropic keeps pushing the pricing envelope wider it makes room for differentiation, which is good. But I wish oAI would get a capable agentic model out the door that pushes back on pricing.

Ps I know that Anthropic underbought compute and so we are facing at least a year of this differentiated pricing from them, but still..ouch

View on HN · Topics

Well this explains the outages over the last few days

View on HN · Topics

Here’s the problem. The distribution of query difficulty / task complexity is probably heavily right-skewed which drives up the average cost dramatically. The logical thing for anthropic to do, in order to keep costs under control, is to throttle high-cost queries. Claude can only approximate the true token cost of a given query prior to execution. That means anything near the top percentile will need to get throttled as well.

By definition this means that you’re going to get subpar results for difficult queries. Anything too complicated will get a lightweight model response to save on capacity. Or an outright refusal which is also becoming more common.

New models are meaningless in this context because by definition the most impressive examples from the marketing material will not be consistently reproducible by users. The more users who try to get these fantastically complex outputs the more those outputs get throttled.

View on HN · Topics

Tbf I don't think that it's just this one reason. While I'm not a subscriber to any LLM provider, the general feeling I get from reading comments online is that the models have a long history of getting worse over time. Of course, we don't know why, but presumably they're quantizing models or downgrading you to a weaker model transparently.

Now as for why, I imagine that it's just money. Anthropic presumably just got done training Mythos and Opus 4.7. that must have cost a lot of cash. They have a lot of subscribers and users, but not enough hardware.

What's a little further tweaking of the model when you've already had to dumb it down due to constraints.

Summarizer