Cost vs Quality Tradeoffs

Debate about Anthropic prioritizing cost reduction over quality, with arguments they should raise prices instead of degrading service, and suspicions that explanations are covering for cost-cutting

Frustrated users are increasingly accusing Anthropic of "silently" degrading model quality to prioritize fiscal prudence and cost-cutting over performance, leading many to mockingly relabel the service as a "lazy" version of its former self. While some defenders argue that complex optimizations like KV caching and adaptive reasoning are necessary for scaling a billion-dollar company toward an IPO, critics view these moves as deceptive "rug-pulls" that sacrifice intelligence for server efficiency. A vocal segment of the community suggests that the current flat-rate subscription model is fundamentally broken, expressing a surprising willingness to pay significantly higher prices—potentially hundreds or even thousands of dollars—for a guaranteed, un-nerfed version of the model. Ultimately, the debate highlights a deep rift between users who demand predictable high-IQ performance and a company struggling to balance the astronomical costs of frontier AI with sustainable growth.

View on HN · Topics

I appreciate the reply, but I was never under the impression that gaps in conversations would increase costs nor reduce quality. Both are surprising and disappointing.

I feel like that is a choice best left up to users.

i.e. "Resuming this conversation with full context will consume X% of your 5-hour usage bucket, but that can be reduced by Y% by dropping old thinking logs"

View on HN · Topics

Another way to think about it might be that caching is part of Anthropic's strategy to reduce costs for its users, but they are now trying to be more mindful of their costs (probably partly due to significant recent user growth as well as plans to IPO which demand fiscal prudence).

Perhaps if we were willing to pay more for our subscriptions Anthropic would be able to have longer cache windows but IDK one hour seems like a reasonable amount of time given the context and is a limitation I'm happy to work around (it's not that hard to work around) to pay just $100 or $200 a month for the industry-leading LLM.

Full disclosure: I've recently signed up for ChatGPT Pro as well in addition to my Claude Max sub so not really biased one way or the other. I just want a quality LLM that's affordable.

View on HN · Topics

I might be willing to pay more, maybe a lot more, for a higher subscription than claude max 20x, but the only thing higher is pay per token and i really dont like products that make me have to be that minutely aware of my usage, especially when it has unpredictability to it. I think there's a reason most telecoms went away from per minute or especially per MB charging. Even per GB, as they often now offer X GB, and im ok with that on phone but much less so on computer because of the unpredictability of a software update size.

Kinda like when restaurants make me pay for ketchup or a takeaway box, i get annoyed, just increase the compiled price.

View on HN · Topics

Because it significantly increases actual costs for Anthropic.

If they ignored this then all users who don’t do this much would have to subsidize the people who do.

View on HN · Topics

Nit: It doesn’t have to live in GPU memory. The system will use multiple levels of caching and will evict older cached data to CPU RAM or to disk if a request hasn’t recently come in that used that prefix. The problem is, the KV caches are huge (many GB) and so moving them back onto the GPU is expensive: GPU memory bandwidth is the main resource constraint in inference. It’s also slow.

The larger point stands: the cache is expensive. It still saves you money but Anthropic must charge for it.

Edit: there are a lot of comments here where people don't understand LLM prefix caching, aka the KV cache. That's understandable: it is a complex topic and the usual intuitions about caching you might have from e.g. web development don't apply: a single cache blob for a single request is in the 10s of GB at least for a big model, and a lot of the key details turn on the problems of moving it in and out of GPU memory. The contents of the cache is internal model state ; it's not your context or prompt or anything like that. Furthermore, this isn't some Anthropic-specific thing; all LLM inference with a stable context prefix will use it because it makes inference faster and cheaper. If you want to read up on this subject, be careful as a lot of blogs will tell you about the KV cache as it is used within inference for an single request (a critical detail concept in how LLMs work) but they will gloss over how the KV cache is persisted between requests, which is what we're all talking about here. I would recommend Philip Kiely's new book Inference Engineering for a detailed discussion of that stuff, including the multiple caching levels.

View on HN · Topics

That might be an absurd comparison, but we can fix that.

If you were being charged per character, or running down character limits, and printing on printers that were shared and had economic costs for stalled and started print runs, then:

You wouldn’t “need” to understand. The prints would complete regardless. But you might want to. Personal preference.

Which is true of this issue to.

View on HN · Topics

Okay, sure. There's a dollar/intelligence tradeoff. Let me decide to make it, don't silently make Claude dumber because I forgot about a terminal tab for an hour. Just because a project isn't urgent doesn't mean it's not important. If I thought it didn't need intelligence I would use Sonnet or Haiku.

View on HN · Topics

A strange view. The trade-off has nothing to do with a specific ideology or notable selfishness. It is an intrinsic limitation of the algorithms, which anybody could reasonably learn about.

Sure, the exact choice on the trade-off, changing that choice, and having a pretty product-breaking bug as a result, are much more opaque. But I was responding to somebody who was surprised there's any trade-off at all. Computers don't give you infinite resources, whether or not they're "servers," "in the cloud," or "AI."

View on HN · Topics

These controversies erupt regularly, and I hope that you will see a common thing with most of them: you make a decision for your users without informing them.

Please fight this hubris. Your users matter. Many of us use your tools for everyday work and do not appreciate having the rug pulled from under them on a regular basis, much less so in an underhanded and undisclosed way.

I don't mind the bugs, these will happen. What I do not appreciate is secretly changing things that are likely to decrease performance.

View on HN · Topics

That is not what I wrote. The phrases "without informing them", "in an underhanded and undisclosed way" and "secretly changing things" were important. I'm all for product evolution, but users should be informed when the product is changed, especially when the change can be for the worse (like dumbing down the model).

View on HN · Topics

>and doing that will cause a huge one-time hit against your token limit if the session has grown large.

Anthropic already profited from generating those tokens. They can afford subsidize reloading context.

View on HN · Topics

Compaction wont save you, in fact calling compaction will eat about 3-5x the cold cache cost in usage ive found.

View on HN · Topics

I saw that too, but that's actually even worse on cache - the entire conversation is then a cache miss and needs to be loaded in in order to do the compaction. Then the resulting compacted conversation is also a cache miss.

You ideally want to compact before the conversation is evicted from cache. If you knew you were going to use the conversation again later after cache expiry, you might do this deliberately before leaving a session.

Anthropic could do this automatically before cache expiry, though it would be hard to get right - they'd be wasting a lot of compute compacting conversations that were never going to be resumed anyway.

View on HN · Topics

Right, and reloading that context is the same cost as refilling the cache, so really, they're charging the same, and making it hard.

View on HN · Topics

This points to a fairly fundamental mismatch between the realities of running an LLM and the expectations of users. As a user, I _expect_ the cost of resuming X hours/days later to be no different to resuming seconds or minutes later. The fact that there is a difference, means it's now being compensated for in fairly awkward ways -- none of the solutions seem good, just varying degrees of bad.

Is there a more fundamental issue of trying to tie something with such nuanced costs to an interaction model which has decades of prior expectation of every message essentially being free?

View on HN · Topics

> As a user, I _expect_ the cost of resuming X hours/days later to be no different to resuming seconds or minutes later.

As an informed user who understands his tools, I of course expect large uncached conversations to massively eat into my token budget, since that's how all of the big LLM providers work. I also understand these providers are businesses trying to make money and they aren't going to hold every conversation in their caches indefinitely.

View on HN · Topics

Don't forget "our investigation concluded you are to blame for using the product exactly as advertised" https://x.com/lydiahallie/status/2039800718371307603 including gems like "Sonnet 4.6 is the better default on Pro. Opus burns roughly twice as fast. Switch at session start"

View on HN · Topics

So if they fuck it up again and now they have, let’s say, “db problems” instead of “caching problems”, you would happily simply pay more? Wtf

View on HN · Topics

So is it for latency or is it for cost?

Why did you lie 11 days ago, 3 days after the fix went in, about the cause of excess token usage?

View on HN · Topics

Boris, wait, wait, wait,

Why not use tired cache?

Obviously storage is waaay cheaper than recalculation of embeddings all the way from the very beginning of the session.

No matter how to put this explanation — it still sounds strange. Hell — you can even store the cache on the client if you must.

Please, tell me I’m not understanding what is going on..

otherwise you really need to hire someone to look at this!)

View on HN · Topics

Yes — encryption is the solution for client side caching.

But even if it’s not — I can’t build a scenario in my head where recalculating it on real GPUs is cheaper/faster than retrieving it from some kind of slower cache tier

View on HN · Topics

Isn't that exactly what people had been accusing Anthropic of doing, silently making Claude dumber on purpose to cut costs? There should be, at minimum, a warning on the UI saying that parts of the context were removed due to inactivity.

View on HN · Topics

It is too suprising. Time passed should not matter for using AI.

Either swallow the cost or be transparent to the user and offer both options each time.

View on HN · Topics

For idle sessions I would MUCH rather pay the cost in tokens than reduced quality. Frankly, it's shocking to me that you would make that trade-off for users without their knowledge or consent.

View on HN · Topics

as a variation:

how does this help me as a customer? if i have to redo the context from scratch, i will pay both the high token cost again, but also pay my own time to fill it.

the cost of reloading the window didnt go away, it just went up even more

View on HN · Topics

It astounds me that a company valued in the hundreds-of-billions-of-dollars has written this. One of the following must be true:

1. They actually believed latency reduction was worth compromising output quality for sessions that have already been long idle. Moreover, they thought doing so was better than showing a loading indicator or some other means of communicating to the user that context is being loaded.

2. What I suspect actually happened: they wanted to cost-reduce idle sessions to the bare minimum, and "latency" is a convenient-enough excuse to pass muster in a blog post explaining a resulting bug.

View on HN · Topics

It’s certainly #2. They have shown over dozens of decisions they move very quickly, break stuff, then have to both figure out what broke and how to explain it.

View on HN · Topics

It’s definitely a cost / resource saving strategy on their end.

View on HN · Topics

It's very weird that they frame caching as "latency reduction" when it comes to a cloud service. I mean, yes, technically it reduces latency, but more importantly it reduces cost. Sometimes it's more than 80% of the total cost.

I'm sure most companies and customers will consider compromising quality for 80% cost reduction. If they just be honest they'll be fine.

View on HN · Topics

It's also a bit of a fishy explanation for purging tokens older than an hour. This happens to also be their cache limit. I doubt it is incidental that this change would also dramatically drop their cost.

View on HN · Topics

yes. if instagram started performing intensive JPEG compression that made photos choppy and unpleasant, I would consider that an intentional degredation of the software.

View on HN · Topics

Is default enabling JPEG compression to your software's output because the compression saves you money “intentional degradation” of the software?

I would say it does, and I'd loathe to use anything made by people who'd couch that change to defaults as "providing a selectable option to use a faster, cheaper version".

Yuck.

View on HN · Topics

I'd rather not speak too poorly of Anthropic, because - to the extent I can bring myself to like a tech company - I like Anthropic.

That said, the copy uses "we never intentionally degrade our models" to mean something like "we never degrade one facet of our models unless it improves some other facet of our models" . This is a cop out, because it is what users suspected and complained about. What users want - regardless of whether it is realistic to expect - is for Anthropic to buy even more compute than Anthropic already does, so that the models remain equally smart even if the service demand increases.

View on HN · Topics

I'm aiming for intellectual honesty here. I'm not taking a side for a person or an org, but I'm taking a stand for a quality bar.

> They knew they had deliberately made their system worse

Define "they". The teams that made particular changes? In real-world organizations, not all relevant information flows to all the right places at the right time. Mistakes happen because these are complex systems.

Define "worse". There are lot of factors involved. With a given amount of capacity at a given time, some aspect of "quality" has to give. So "quality" is a judgment call. It is easy to use a non-charitable definition to "gotcha" someone. (Some concepts are inherently indefensible. Sometimes you just can't win. "Quality" is one of those things. As soon as I define quality one way, you can attack me by defining it another way. A particular version of this principle is explained in The Alignment Problem by Brian Christian, by the way, regarding predictive policing iirc.)

I'm seeing a lot of moral outrage but not enough intellectual curiosity. It embarrassingly easy to say "they should have done better" ... ok. Until someone demonstrates to me they understand the complexity of a nearly-billion dollar company rapidly scaling with new technology, growing faster than most people comprehend, I think ... they are just complaining and cooking up reasons so they are right in feeling that way. This possible truth: complex systems are hard to do well apparently doesn't scratch that itch for many people. So they reach for blame . This is not the way to learn. Blaming tends to cut off curiosity.

I suggest this instead: redirect if you can to "what makes these things so complicated?" and go learn about that. You'll be happier, smarter, and ... most importantly ... be building a habit that will serve you well in life. Take it from an old guy who is late to the game on this. I've bailed on companies because "I thought I knew better". :/

View on HN · Topics

I noticed the difference, but coming from Gemini and xAI models it wasn’t that glaring. I still find that Opus makes much better plans than anything else I’ve tried, and it’s been very good at catching my mistakes in using public-key cryptography, also finding out why my crsqlite queries were failing despite no official documentation on the topic.

I’d never use such an expensive model for coding, so that might explain why I have little to complain about.

View on HN · Topics

I went back to 4.5. No regrets and it’s a bit cheaper.

View on HN · Topics

Sure it is. They're well aware their product is a money furnace and they'd have to charge users a few orders of magnitude more just to break even, which is obviously not an option. So all that's left is.. convince users to burn tokens harder, so graphs go up, so they can bamboozle more investors into keeping the ship afloat for a bit longer.

View on HN · Topics

Not true - they absolutely want to goose demand as they continue to burn investor dollars and deploy infra at scale.

If that demand evens slows down in the slightest the whole bubble collapses.

Growth + Demand >> efficiency or $ spend at their current stage. Efficiency is a mature company/industry game.

View on HN · Topics

Are you saying these companies don't want to sell more product to us? Because that's the logical extension of your argument.

View on HN · Topics

No, the argument is they want to sell more product to more people , not just more product (to the same people.) Given that a lot of their income is from flat-rate subscriptions, they make money with more people burning tokens rather than just burning more tokens.

After all, "the first hit's free" model doesn't apply to repeat customers ;-)

View on HN · Topics

All the labs are in a cut throat race, with zero customer loyalty. As if they would intentionally degrade quality/speed for a petty cash grab.

View on HN · Topics

I think they are routing to cheaper models that present themselves as e.g. Opus. I add to prompts now stuff to ensure that I am not dealing with an impostor. If it answers incorrectly, I terminate the session and start again. Anthropic should be audited for this.

View on HN · Topics

That implies it's broken. Juicing revenue and slashing opex at the expense of brand and customer retention is the feature.

View on HN · Topics

A suggestion to Anthropic, just start charging the real price for your software. Of course you have to dumb it down, when the $200 tier in reality produces 5-10 thousand dollars in monthly costs when used by people who know how to max it out.
So then you come up with creative nonsense like "adaptive thinking" when your tool is sometimes working and sometimes outright not - the irony of "intelligent tools" not "thinking" aside. Of course this would kind of ruin your current value proposition as charging the actual price would make your core idea of making large swaths of skilled population un-employed, unfeasible but I am sure if you feed it into the Claude, it will find some points for and against, just like how Karpathy uses his LLM of choice to excrement his blog posts.

View on HN · Topics

They're losing customers because of quality concerns. Pausing development and focusing 100% on quality is how you fix that.

That said, that may not have been obvious at all in the Jan/Feb time frame when they got a wave of customers due to ethical concerns.

View on HN · Topics

On the other hand, sacrificing your paying customers at the altar of compute and tokens does not make money appear out of thin air.

View on HN · Topics

The Claude UI still only has "adaptive" reasoning for Opus 4.7, making it functionally useless for scientific/coding work compared to older models (as Opus 4.7 will randomly stop reasoning after a few turns, even when prompted otherwise). There's no way this is just a bug and not a choice to save tokens.

View on HN · Topics

It's cheaper than retraining the model.

View on HN · Topics

Are they also going to refund all the extra usage api $$$ people spent in the last month?

Also I don’t know how “improving our Code Review tool” is going to improve things going forward, two of the major issues were intentional choices. No code review is going to tell them to stop making poor and compromising decisions.

View on HN · Topics

this is one reason i will not pay for extra usage - it is an incentive for them to be inefficient, or at least to not spend any effort on improving my token usage efficiency.

View on HN · Topics

I stopped using it for nearly a month because of the performance degradation. I paid for the whole month. Wasted money.

View on HN · Topics

Some people seem to be suggesting these are coverups for quantization...

Those who work on agent harnesses for a living realize how sensitive models can be to even minor changes in the prompt.

I would not suspect quantization before I would suspect harness changes.

View on HN · Topics

> investment in polish, quality, and reliability

For there to be any trust in the above, the tool needs to behave predictably day to day. It shouldn't be possible to open your laptop and find that Claude suddenly has an IQ 50 points lower than yesterday. I'm not sure how you can achieve predictability while keeping inference costs in check and messing with quantization, prompts, etc on the backend.

Maybe a better approach might be to version both the models and the system prompts, but frequently adjust the pricing of a given combination based on token efficiency, to encourage users to switch to cheaper modes on their own. Let users choose how much they pay for given quality of output though.

View on HN · Topics

Sure, I've cancelled my Max 20 subscription because you guys prioritize cutting your costs/increasing token efficiency over model performance.
I use expensive frontier labs to get the absolute best performance, else I'd use an Open Source/Chinese one.

Frontier LLMs still suck a lot, you can't afford planned degradation yet.

View on HN · Topics

I am considering proving my feedback by not providing my money any longer.

View on HN · Topics

Because then they lose vertical integration and the extra ability it grants to tune settings to reduce costs / token use / response time for subscription users.

Or improve performance and efficiency, if we’re generous and give them the benefit of the doubt.

It makes sense, in a way. It means the subscription deal is something along the lines of fixed / predictable price in exchange for Anthropic controlling usage patterns, scheduling, throttling (quotas consumptions), defaults, and effective workload shape (system prompt, caching) in whatever way best optimises the system for them (or us if, again, we’re feeling generous) / makes the deal sustainable for them.

It’s a trade-off

View on HN · Topics

They gained that ability to tune settings and then promptly used it in a poor way and degraded customer experience.

View on HN · Topics

That’s what we see.

It may be (but I wouldn’t know) that some of other changes not covered here reduced costs on their side without impacting users, improving the viability of their subscription model. Or maybe even improved things for users.

I’d really appreciate more transparency on this, and not just when things fail.

But I’ve learned my lesson. I’ve been weening off Claude for a few weeks, cancelled my subscription three weeks ago, let it expire yesterday, and moved to both another provider and a third-party open source harness.

View on HN · Topics

Evidently, all these things you just dismissed matter, else all the changes I quoted from the original post wouldn’t have affected anyone, or half as many people, or half as much. Anthropic wouldn’t have had any complaints to investigate, the article promoting this entire thread wouldn’t exist, and we wouldn’t be having this very conversation.

Defaults matter . A large share of people never change them (status quo bias, psychological inertia). Having control over them (and usage quotas) means Anthropic can control and fine-tune what this fixed subscription costs them.

And evidently (re, the original article), they tried to do so.

View on HN · Topics

Given the price I don't really think they're the best option. They're sloppy and competitors are catching up. I'm having same results with other models, and very close with Kimi, which is waaay cheaper.

View on HN · Topics

I guess it's a bit of desperation to find a sustainable business model.

The AI hype is dying, at least outside the silicon valley bubble which hackernews is very much a part of.

That and all the dogfooding by slop coding their user facing application(s).

View on HN · Topics

Are you saying dropping cache after 1 hour is not intentionally degrading performance?

View on HN · Topics

Yes. Caching is a cost optimization not a response quality metric.

View on HN · Topics

But it still degrades performance.

View on HN · Topics

Useful update. Would be useful to me to switch to a nightly / release cycle but I can see why they don't: they want to be able to move fast and it's not like I'm going to churn over these errors. I can only imagine that the benchmark runs are prohibitively expensive or slow or not using their standard harness because that would be a good smoke test on a weekly cadence. At the least, they'd know the trade-offs they're making.

Many of these things have bitten me too. Firing off a request that is slow because it's kicked out of cache and having zero cache hits (causes everything to be way more expensive) so it makes sense they would do this. I tried skipping tool calls and thinking as well and it made the agent much stupider. These all seem like natural things to try. Pity.

View on HN · Topics

absolutely agree: non-1M Opus 4.6 on x20 max was peak AGI

now it's back to regular slop and just to check otherwise i have to spend at least $100

View on HN · Topics

Appreciate the honesty from the team.

At the same time, personally I find prioritizing quality over quantity of output to be a better personal strategy. Ten partially buggy features really aren't as good as three quality ones.

View on HN · Topics

An interesting question to wonder is why these optimizations were pushed so aggressively in the first place. Especially given this is the time they were running a 2x promotion, by themselves, without presumably seeing any slowdown in demand.

View on HN · Topics

It’s incredible how forgiving you guys are with Anthropic and their errors. Especially considering you pay high price for their service and receive lower quality than expected.

View on HN · Topics

At least personally, it feels like the choices are
the one that's okay with being used for mass surveillance and autonomous weapons targeting, the one that's on track to get acquired by the AI company that dragged its feet in getting around to stopping people from making child porn with it, the one that nobody seems to use from Google, and the one that everyone complains about but also still seems to be using because it at least sometimes works well. At this point I've opted out of personal LLM coding by canceling my subscription (although my employer still has subscriptions and wants us to keep using them, so I'll presumably keep using Claude there) but if I had to pick one to spend my own money on I'd still go with Claude.

View on HN · Topics

The consumer surplus is quite high. Even with the regressions in this postmortem, performance was above the models last fall, when I was gladly paying for my subscription and thought it was net saving me time.

That said, there is now much better competition with Codex, so there's only so much rope they have now.

View on HN · Topics

Anthropic actually not so bad. Anthropic models code good, usually. Price not so high compared to time to do it by self.

View on HN · Topics

Exactly. They've done now like 6 rug-pulls.

Idiots keep throwing money at real-time enshittification and 'I am changing the terms. Pray I do not change them further".

And yes, I am absolutely calling people who keep getting screwed and paying for more 'service' as idiots.

And Anthropic has proved that they will pay for less and less. So, why not fuck them over and make more company money?

View on HN · Topics

If anthropic is doing this as a result of "optimizations" they need to stop doing that and raise the price.
The other thing, there should be a way to test a model and validate that the model is answering exactly the same each time.
I have experienced twice... when a new model is going to come out... the quality of the top dog one starts going down... and bam.. the new model is so good.... like the previous one 3 months ago.

The other thing, when anthropic turns on lazy claude... (I want to coin here the term Claudez for the version of claude that's lazy.. Claude zzZZzz = Claudez) that thing is terrible... you ask the model for something... and it's like... oh yes, that will probably depend on memory bandwith... do you want me to search that?...

YES... DO IT... FRICKING MACHINE..

View on HN · Topics

Apart from Anthropic nobody knows how much the average user costs them. However the consensus is "much more than that".

If they have to raise prices to stop hemorrhaging money, would you be willing to pay 1000 bucks a month for a max plan? Or 100$ per 1M pitput tokens (playing numberWang here, but the point stands).

If I have to guess they are trying to get balance sheet in order for an IPO and they basically have 3 ways of achieving that:

1. Raising prices like you said, but the user drop could be catastrophic for the IPO itself and so they won't do that

2. Dumb the models down (basically decreasing their cost per token)

3. Send less tokens (ie capping thinking budgets aggressively).

2 and 3 are palatable because, even if they annoying the technical crowd, investors still see a big number of active users with a positive margin for each.

View on HN · Topics

$1000/mo for guaranteed functionality >= Opus 4.6 at its peak? Yes, I'd probably grumble a bit and then whip out the credit card.

I'm not a heavy LLM user, and I've never come anywhere the $200/month plan limits I'm already subscribed to. But when I do use it, I want the smartest, most relentless model available, operating at the highest performance level possible.

Charge what it takes to deliver that, and I'll probably pay it. But you can damned well run your A/B tests on somebody else.

View on HN · Topics

https://marginlab.ai/ (no affiliation)

There are a number of projects working on evals that can check how 'smart' a model is, but the methodology is tricky.

One would want to run the exact same prompt, every day, at different times of the day, but if the eval prompt(s) are complex, the frontier lab could have a 'meta-cognitive' layer that looks for repetitive prompts, and either:
a) feeds the model a pre-written output to give to the user
b) dumbs down output for that specific prompt

Both cases defeat the purpose in different ways, and make a consistent gauge difficult. And it would make sense for them to do that since you're 'wasting' compute compared to the new prompts others are writing.

View on HN · Topics

It appears that Opus 4.7 has been nerfed already. Can't get any sensible results since yesterday. It just keeps running in circles. Even mention that it is committing fraud by doing superficial work it has been told specifically not to do doesn't help.

View on HN · Topics

something i note from this is that this is not a model weights change, but it is a hidden state change anthropic is doing to the outputs that can tune the quality and down on the "model" without breaking the "we arent changing the model" promise.

how often do these changes happen?

View on HN · Topics

If you think that you can just silently modify the model without any announcements and only react when it doesn't go through unnoticed, then be 100% sure that your clients will check every possible alternative and will leave you as soon as they find anything similar in quality (and no, not a degraded one).

View on HN · Topics

Effort should not be configurable for Opus, it should be set to a single default that provides the highest level of capability. There are zero instances in which I am willing to accept a lesser result in exchange for a slightly faster response from Opus. If that were the case I would be using Flash or Haiku.

View on HN · Topics

Interesting. All 3 seems like they’re obviously going to impact quality. e.g, reducing the effort from high to medium.

So then, there must have been an explicit internal guidance/policy that allowed this tradeoff to happen.

Did they fix just the bug or the deeper policy issue?

View on HN · Topics

Please for the love of god just put the max price plan up like 4x or 5x in cost and make it actually work.

View on HN · Topics

> On March 4, we changed Claude Code's default reasoning effort from high to medium to reduce the very long latency—enough to make the UI appear frozen—some users were seeing in high mode.

Translation: To reduce the load on our servers.

Summarizer