Transparency and Communication

Calls for better communication through official channels rather than random employee tweets, requests for changelogs documenting behavior changes, and frustration with support responses

Users are increasingly disillusioned with Anthropic’s reliance on informal social media updates and "underhanded" undocumented changes, arguing that silent experiments—such as lowering reasoning defaults or pruning context—violate the fundamental "pro-tool contract." Many feel systematically "gaslighted" by a pattern where the company denies performance degradation for weeks, only to later issue post-mortems admitting to the very issues users had been reporting. While some defenders maintain that such volatility is an inherent byproduct of scaling bleeding-edge technology, the prevailing sentiment demands a shift toward professional transparency, including stable changelogs and clear UI indicators for cache and reasoning status. Ultimately, the community warns that if the "black box" approach continues to prioritize corporate optimization over user reliability, professional users will inevitably migrate to more stable, self-hosted alternatives.

View on HN · Topics

I’m coming at this as a complete Claude amateur, but caching for any other service is an optimisation for the company and transparent for the user. I don’t think I’ve ever used a service and thought “oh there’s a cache miss. Gotta be careful”.

I completely agree that it’s infeasible for them to cache for long periods of time, but they need to surface that information in the tools so that we can make informed decisions.

View on HN · Topics

How do you do "due diligence" on an API that frequently makes undocumented changes and only publishes acknowledgement of change after users complain?

You're also talking about internal technical implementations of a chat bot. 99.99% of users won't even understand the words that are being used.

View on HN · Topics

>If you were being charged per character, or running down character limits, and printing on printers that were shared and had economic costs for stalled and started print runs,

and the system was being run by some of the planet’s brightest people whose famous creation is well known to disseminate complex information succinctly,

>then:

You would expect to be led to understand, like… a 1997 Prius.

“This feature showed the vehicle operation regarding the interplay between gasoline engine, battery pack, and electric motors and could also show a bar-graph of fuel economy results.” https://en.wikipedia.org/wiki/Toyota_Prius_(XW10)

View on HN · Topics

mmap(2) and all its underlying machinery are open source and well documented besides.

View on HN · Topics

There are open-source and even open-weight models that operate in exactly this way (as it's based off of years of public research), and even if there weren't the way that LLMs generate responses to inputs is superbly documented.

Seems like every month someone writes up a brilliant article on how to build an LLM from scratch or similar that hits the HN page, usually with fancy animated blocks and everything.

It's not at all hard to find documentation on this topic. It could be made more prominent in the U/I but that's true of lots of things, and hammering on "AI 101" topics would clutter the U/I for actual decision points the user may want to take action upon that you can't assume the user already knows about in the way you (should) be able to assume about how LLMs eat up tokens in the first place.

View on HN · Topics

> users should be curious and actively attempting to understand how it works

Have you ever talked with users?

> this is an endless job

Indeed. If we spend all our time learning what changed with all our tooling when it changes without proper documentation then we spend all our working lives keeping up instead of doing our actual jobs.

View on HN · Topics

This sounds like a religious cult priest blaming the common people for not understanding the cult leader's wish, which he never clearly stated.

View on HN · Topics

He was surprised because it was not clearly communicated. There's a lot of theory behind a product that you could (or could not) better understand, but in the end, something like price doesn't have much to do with the theoretical and practical behavior of the actual application.

View on HN · Topics

It'd probably be helpful for power users and transparency to actually show how the cache is being used. If you run local models with llamacpp-server, you can watch how the cache slots fill up with every turn; when subagents spawn, you see another process id spin up and it takes up a cache slot; when the model starts slowing down is when the context grows (amd 395+ around 80-90k) and the cache loads are bigger because you've got all that.

So yeah, it doesn't take much to surface to the user that the speed/value of their session is ephemeral because to keep all that cache active is computationally expensive because ...

You're still just running text through a extremely complex process, and adding to that text and to avoid re-calculation of the entire chain, you need the cache.

View on HN · Topics

These controversies erupt regularly, and I hope that you will see a common thing with most of them: you make a decision for your users without informing them.

Please fight this hubris. Your users matter. Many of us use your tools for everyday work and do not appreciate having the rug pulled from under them on a regular basis, much less so in an underhanded and undisclosed way.

I don't mind the bugs, these will happen. What I do not appreciate is secretly changing things that are likely to decrease performance.

View on HN · Topics

A company that needs to anchor every single thing with the users will create a stale product.

View on HN · Topics

That is not what I wrote. The phrases "without informing them", "in an underhanded and undisclosed way" and "secretly changing things" were important. I'm all for product evolution, but users should be informed when the product is changed, especially when the change can be for the worse (like dumbing down the model).

View on HN · Topics

I've spent my entire working career dealing with companies that do the opposite. The product still goes stale. Find a better excuse.

You're acquiring users as a recurring revenue source. Consider stability and transparency of implementation details cost of doing business, or hemorrhage users as a result.

View on HN · Topics

I think it’s crazy that they do this, especially without any notice. I would not have renewed my subscription if I knew that they started doing this.

Especially in the analysis part of my work I don‘t care about the actual text output itself most of the time but try to make the model „understand“ the topic.

In the first phase the actual text output itself is worthless it just serves as an indicator that the context was processed correctly and the future actual analysis work can depend on it.
And they‘re… just throwing most the relevant stuff out all out without any notice when I resume my session after a few days?

This is insane, Claude literally became useless to me and I didn’t even know it until now, wasting a lot of my time building up good session context.

There would be nothing lost if they said „If you click yes, we will prune your old thinking making Claude faster and saving you tons of tokens“. Most people would say yes probably so why not ask them… make it an env variable (that is announced not a secretly introduced one to opt out of something new!) or at least write it in a change log if they really don’t want to allow people to use it like before, so there‘d be chance to cancel the subscription in time instead of wasting tons of time on work patterns that not longer work

View on HN · Topics

Pointing at their terms of service will definitely be the instantly summoned defense (as would most modern companies) but the fact that SaaS can so suddenly shift the quality of product being delivered for their subscription without clear notification or explicitly re-enrollment is definitely a legal oversight right now and Italy actually did recently clamp down on Netflix doing this[1]. It's hard to define what user expectations of a continuous product are and how companies may have violated it - and for a long time social constructs kept this pretty in check. As obviously inactive and forgotten about subscriptions have become a more significant revenue source for services that agreement has been eroded, though, and the legal system has yet to catch up.

1. Specifically, this suite was about price increases without clear consideration for both parties - but the same justifications apply to service restrictions without corresponding price decreases.

https://fortune.com/2026/04/20/italian-court-netflix-refunds...

View on HN · Topics

This violates the principle of least surprise, with nothing to indicate Claude got lobotomized while it napped when so many use prior sessions as "primed context" (even if people don't know that's what they were doing or know why it works).

The purpose of spending 10 to 50 prompts getting Claude to fill the context for you is it effectively "fine tunes" that session into a place your work product or questions are handled well.

// If this notion of sufficient context as fine tune seems surprising, the research is out there.)

Approaches tried need to deal with both of these:

1) Silent context degradation breaks the Pro-tool contract. I pay compute so I don't pay in my time; if you want to surface the cost, surface it (UI + price tag or choice), don't silently erode quality of outcomes.

2) The workaround (external context files re-primed on return) eats the exact same cache miss, so the "savings" are illusory — you just pushed the cost onto the user's time. If my own time's cheap enough that's the right trade off, I shouldn't be using your machine.

View on HN · Topics

I don't envy you Boris. Getting flak from all sorts of places can't be easy. But thanks for keeping a direct line with us.

I wish Anthropic's leadership would understand that the dev community is such a vital community that they should appreciate a bit more (i.e. not nice sending lawyers afters various devs without asking nicely first, banning accounts without notice, etc etc). Appreciate it's not easy to scale.

OpenAI seems to be doing a much better job when it comes to developer relations, but I would like to see you guys 'win' since Anthropic shows more integrity and has clear ethical red lines they are not willing to cross unlike OpenAI's leadership.

View on HN · Topics

Im glad they chose to do that as opposed to hidden behavior changes that only confuse users more.

View on HN · Topics

Really good to know. That should have made it into their update letter in point (2). Empowering the user to choose is the right call.

View on HN · Topics

Thanks for giving more information. Just as a comment on (1), a lot of people don't use X/social. That's never going to be a sustainable path to "improve this UX" since it's...not part of the UX of the product.

It's a little concerning that it's number 1 in your list.

View on HN · Topics

Just wanted to say I appreciate your responses here. Engaging so directly with a highly critical audience is a minefield that you're navigating well.

Thank you.

View on HN · Topics

I agree with this.

I'm writing this message even though I don't have much to add because it's often the case on HN that criticism is vocal and appreciation is silent and I'd like to balance out the sentiment.

Anthropic has fumbled on many fronts lately but engaging honestly like this is the right thing to do. I trust you'll get back on track.

View on HN · Topics

> Engaging so directly with a highly critical audience is a minefield that you're navigating well.

They spent two months literally gaslighting this "critical audience" that this could not be happening and literally blaming users for using their vibe-coded slop exactly as advertised.

All the while all the official channels refused to acknowledge any problems.

Now the dissatisfaction and subscription cancellations have reached a point where they finally had to do something.

View on HN · Topics

Examples of gaslighting on April 15th (the first 2 issues were "fixed" by April 10th according to the story):

https://x.com/bcherny/status/2044291036860874901
https://x.com/bcherny/status/2044299431294759355

No mention of anything like "hey, we just fixed two big issues, one that lasted over a month." Just casual replies to everybody like nothing is wrong and "oh there's an issue? just let us know we had no idea!"

View on HN · Topics

Very easy to do when you stand to make tens of millions when your employer IPOs. Let's not maybe give too much praise and employ some critical thinking here.

View on HN · Topics

What is the purpose of this mindset? Should we encourage typical corporate coldness instead?

View on HN · Topics

No, I wouldn't. I'd like some transparency at least.

View on HN · Topics

Instead of showing actual usage, costs and cache status you spent two months denying the issue even exists, making the product silently worse, and now you're "iterating on this"

View on HN · Topics

To add to this. The new indicator is "New task? /clear to save <X> tokens" even though it affects all tasks, not just new ones.

Mislead, gaslight, misdirect is the name of the game

View on HN · Topics

Then you need to update your documentation and teach claude to read the new documentation because here is what claude code answered:

Question: Hey claude, if we have a conversation, and then i take a break. Does it change the expected output of my next answer, if there are 2 hours between the previous message end the next one?

Answer: No. A 2-hour gap doesn't change my output. I have no internal clock between messages — I only see the conversation content plus the currentDate context injected each turn. The prompt cache may expire (5 min TTL), which affects
cost/latency but not the response itself.

The only things that can change output across a break: new context injected (like updated date), memory files being modified, or files on disk changing.

-- This answer directly contradict your post. It seems like the biggest problem is a total lack of documentation for expected behavior.

A similar thing happens if I ask claude code for the difference between plan mode, and accept edits on.

Then Claude told me the only difference was that with plan mode it would ask for permission before doing edits. But I really don't think this is true. It seems like plan mode does a lot more work, and present it in a total different way. It is not just a "I will ask before applying changes" mode.

View on HN · Topics

Don't be silly, they don't expect you to ask the Ai questions and get the right answers. Obviously if you want to know what's going on you should look at their first solution - check what advice they have posted on X...

View on HN · Topics

It does have an built in documentation subagent it can invoke but that doesn’t help much if they don’t document their shenanigans

View on HN · Topics

> I think thats a bad idea. It seems like expecting to have a prompt open like this, accumulating context puts a load on the back end

Let's see what Boris Cherny himself and other Anthropic vibe-coders say about this:

https://x.com/bcherny/status/2044847849662505288

Opus 4.7 loves doing complex, long-running tasks like deep research, refactoring code, building complex features, iterating until it hits a performance benchmark.

https://x.com/bcherny/status/2007179858435281082

For very long-running tasks, I will either (a) prompt Claude to verify its work with a background agent when it's done... so Claude can cook without being blocked on me.

https://x.com/trq212/status/2033097354560393727

Opus 4.6 is incredibly reliable at long running tasks

https://x.com/trq212/status/2032518424375734646

The long context window means fewer compactions and longer-running sessions. I've found myself starting new sessions much less frequently with 1 million context.

https://x.com/trq212/status/2032245598754324968

I used to be a religious /clear user, but doing much less now, imo 4.6 is quite good across long context windows

---

I could go on

View on HN · Topics

> We tried a few different approaches to improve this UX

how about acknowledging that you fucked up your own customers’ money and making a full refund for the affected period?

> Educating users on X/social

that is beyond me

ты не Борис, ты максимум борька

View on HN · Topics

So is it for latency or is it for cost?

Why did you lie 11 days ago, 3 days after the fix went in, about the cause of excess token usage?

View on HN · Topics

That is understandable, but the issue is the sudden drop in quality and the silent surge in token usage.

It also seems like the warning should be in channel and not on X. If I wanted to find out how broken things are on X, I'd be a Grok user.

View on HN · Topics

It is too suprising. Time passed should not matter for using AI.

Either swallow the cost or be transparent to the user and offer both options each time.

View on HN · Topics

Wow so that's why you did #2? The explanation in the CLI is really not clear. I thought it was just a suggestion to compact, no idea it was way more expensive than if I hadn't left it idle for an hour.

You guys really need to communicate that better in the CLI for people not on social

View on HN · Topics

So you made this change completely invisible to the user, without the user being able to choose between the two behaviors, and without even documenting it in the (extremely verbose) changelog [1]? I can't find it, the Docs Assistant can't find it (well, it "I found it!" three times being fed your reply with a non-matching item).

I frequently debug issues while keeping my carefully curated but long context active for days. Losing potentially very important context while in the middle of a debugging session resulting in less optimal answers, is costing me a lot more money than the cache misses would.

In my eyes, Claude Code is mainly a context management tool . I build a foundation of apparent understanding of the problem domain, and then try to work towards a solution in a dialogue. Now you tell me Anthrophic has been silently breaking down that foundation without telling me, wasting potentially hours of my time.

It's a clear reminder that these closed-source harnesses cannot be trusted (now or in the future), and I should find proper alternatives for Claude Code as soon as possible.

[1] https://code.claude.com/docs/en/changelog

View on HN · Topics

> We tried a few different approaches to improve this UX:
1. Educating users on X/social

No. You had random
developers tweet and reply at random times to random users while all of your official channels were completely silent. Including channels for people who are not terminally online on X

View on HN · Topics

There's a cultural divide between SV and the 85% of SMB using M365, for example. When everyone you know uses a thing, I mean, who doesn't?*

There's a reason live service games have splash banners at every login. No matter what you pick as an official e-coms channel, most of your users aren't there!

* To be fair, of all these firms, ANTHROP\C tries the hardest to remember, and deliver like, some people aren't the same. Starting with normals doing normals' jobs.

View on HN · Topics

You need to seriously look at your corporate communications and hire some adults to standarise your messaging, comms and signals. The volatility behind your doors is obvious to us and you'd impress us much more if you slowed down, took a moment to think about your customers and sent a consistent message.

You lost huge trust with the A/B sham test. You lost trust with enshittification of the tokenizer on 4.6 to 4.7. Why not just say "hey, due to huge input prices in energy, GPU demand and compute constraints we've had to increase Pro from $20 to $30." You might lose 5% of customers. But the shady A/B thing and dodgy tokenizer increasing burn rate tells everyone inc. enterprise that you don't care about honesty and integrity in your product.

I hope this feedback helps because you still stand to make an awesome product. Just show a little more professionalism.

View on HN · Topics

It's very weird that they frame caching as "latency reduction" when it comes to a cloud service. I mean, yes, technically it reduces latency, but more importantly it reduces cost. Sometimes it's more than 80% of the total cost.

I'm sure most companies and customers will consider compromising quality for 80% cost reduction. If they just be honest they'll be fine.

View on HN · Topics

what's even more amazing is it took them two weeks to fix what must have been a pretty obvious bug, especially given who they are and what they are selling.

View on HN · Topics

Bit surprised about the amount of flak they're getting here. I found the article seemed clear, honest and definitely plausible.

The deterioration was real and annoying, and shines a light on the problematic lack of transparency of what exactly is going on behind the scenes and the somewhat arbitrary token-cost based billing - too many factors at play, if you wanted to trace that as a user you can just do the work yourself instead.

The fact that waiting for a long time before resuming a convo incurs additional cost and lag seemed clear to me from having worked with LLM APIs directly, but it might be important to make this more obvious in the TUI.

View on HN · Topics

I agree that it’s plausible, and I hope they learn. But trust is earned, and Anthropic’s public responses this past month were dismissive and unhelpful.

Every one of these changes had the same goal: trading the intelligence users rely on for cheaper or faster outputs. Users adapt to how a model behaves, so sudden shifts without transparency are disorienting.

The timing also undercuts their narrative. The fixes landed right before another change with the same underlying intent rolled out. That looks more like they were just reacting to experiments rather than understanding the underlying user pain.

When people pay hundreds or thousands a month, they expect reliability and clear communication, ideally opt-in. Competitors are right there, and unreliability pushes users straight to them.

All of this points to their priorities not being aligned with their users’.

View on HN · Topics

> All of this points to their priorities not being aligned with their users’.

Framing this as "aligned" or "not aligned" ignores the interesting reality in the middle. It is banal to say an organization isn't perfectly aligned with its customers.

I'm not disagreeing with the commenter's frustration. But I think it can help to try something out: take say the top three companies whose product you interact with on a regular basis. Take stock of (1) how fast that technology is moving; (2) how often things break from your POV; (3) how soon the company acknowledges it; (4) how long it takes for a fix. Then ask "if a friend of yours (competent and hard working) was working there, would I give the company more credit?"

My overall feel is that people underestimate the complexity of the systems at Anthropic and the chaos of the growth.

These kind of conversations are a sort of window into people's expectations and their ability to envision the possible explanations of what is happening at Anthropic.

View on HN · Topics

>My overall feel is that people underestimate the complexity of the systems at Anthropic and the chaos of the growth.

Making changes like reducing the usage window at peak times ( https://x.com/trq212/status/2037254607001559305 ) without announcing it (until after the backlash) is the sort of thing that's making people lose trust in Anthropic. They completely ignored support tickets and GitHub issues about that for 3 days.

You shouldn't have to rely on finding an individual employee's posts on Reddit or X for policy announcements.

That policy hasn't even been put into their official documentation nearly one month on - https://support.claude.com/en/articles/11647753-how-do-usage...

A company with their resources could easily do better.

View on HN · Topics

Some of the flak is that issues are often only acknowledged once a fix is in place, and the partial fixes are presented as if they solve the whole problem.

The near-instant transition from "there is no problem" to "we already fixed the problem so stop complaining" is basically gaslighting. (Admittedly the second sentiment comes more from the community, but they get that attitude after taking the "we fixed all the problems" posts at face value.)

View on HN · Topics

And still are gaslighting:

We take reports about degradation very seriously. We never intentionally degrade our models [...] On March 4, we changed Claude Code's default reasoning effort from high to medium

Anthropic is the best company of its kind, but that is badly worded PR.

View on HN · Topics

Is adding JPEG compression to your software “intentional degradation” of the software? I wouldn't say providing a selectable option to use a faster, cheaper version of something qualifies as “degradation”.

It is certainly true that they did a poor job communicating this change to users (I did not know that the default was “high” before they introduced it, I assumed they had added an effort level both above and below whatever the only effort choice was there before). On the other hand, I was using Claude Code a fair bit on “medium” during that time period and it seemed to be performing just fine for me (and saving usage/time over “high”), so it doesn't seem clear that that was the wrong default, if only it had been explained better.

View on HN · Topics

It seems to me you dropped the "gaslighting" claim without owning it. I personally find this frustrating. I prefer when people own up to their mistakes. Like many people, to me, "gaslighting" is just not a term you throw around lightly. Then you shifted to "cop out". (This feels like the motte and bailey.) But I don't think "cop out" is a phrase that works either...

Some terms:... The model is the thing that runs inference. Claude Code is not a model, it is harness. To summarize Anthropic's recent retrospective, their technical mistakes were about the harness.

I'm not here to 'defend' Anthropic's mistakes. They messed up technically. And their communication could have been better. But they didn't gaslight. And on balance, I don't see net evidence that they've "copped out" (by which I mean mischaracterized what happened). I see more evidence of the opposite. I could be wrong about any of this, but I'm here to talk about it in the clearest, best way I can. If anyone wants to point to primary sources, I'll read them.

I want more people to actually spend a few minutes and actually give the explanation offered by Anthropic a try. What if isolating the problems was hard to figure out? We all know hindsight is 20/20 and yet people still armchair quarterback.

At the risk of sounding preachy, I'm here to say "people, we need to do better". Hacker News is a special place, but we lose it a little bit every time we don't in a quality effort.

View on HN · Topics

I think there are plenty of such reply on github. For example the one to AMD AI director's issue.

View on HN · Topics

They didn’t say “your experience is not worse” but they did frequently say “just turn reasoning effort back up and it will be fine”. And that pretty explicitly invalidates all the (correct) feedback which said it’s not just reasoning effort.

They knew they had deliberately made their system worse, despite their lame promise published today that they would never do such a thing. And so they incorrectly assumed that their ham fisted policy blunder was the only problem.

Still plenty I prefer about Claude over GPT but this really stings.

View on HN · Topics

> Define "they". The teams that made particular changes? In real-world organizations, not all relevant information flows to all the right places at the right time. Mistakes happen because these are complex systems.

Accidentally/deliberately making your CS teams ill-informed should not function as a get out of jail free card. Rather the reverse.

View on HN · Topics

I know some people use the word "gaslighting" in connection with Anthropic. I've read some of those threads here, and some on Reddit, but I don't put much stock in them. To step back, hopefully reasonable people can start here:

1. Degraded service sucks.
2. Anthropic not saying i.e. "we're not seeing it" sucks.
3. Not getting a fix when you want it sucks.

Try to understand what I mean when I say none of the above meet the following sense of gaslighting: "Gaslighting is the manipulation of someone into questioning their perception of reality." Emphasis on understand what I mean . This says it well: [1].

If you can point me to an official communication from Anthropic where they say "User <so and so> is not actually seeing degraded performance" when Anthropic knows otherwise that would clearly be gaslighting -- intent matters by my book.

But if their instrumentation was bad and they were genuinely reporting what they could see, that doesn't cross into gaslighting by my book. But I have a tendency to think carefully about ethical definitions. Some people just grab a word off the shelf with a negative valence and run with it: I don't put much stock in what those people say. Words are cheap. Good ethical reasoning is hard and valuable.

It's fine if you have a different definition of "gaslighting". Just remember that some of us have been actually gaslight by people , so we prefer to save the word for situations where the original definition applies. People like us are not opposed to being disappointed, upset, or angry at Anthropic, but we have certain epistemic standards that we don't toss out when an important tool fails to meet our expectations and the company behind it doesn't recognize it soon enough.

[1]: https://www.reddit.com/r/TwoXChromosomes/comments/tep32v/can...

View on HN · Topics

This, so much this!

Pay by token(s) while token usage is totally intransparent is a super convenient money printing machinery.

View on HN · Topics

Your argument seems to be that a statistically-improbable number of people all experienced ultimately- randomly-poor outputs, leading to only a misperception of model degradation… but this is not supported by reality, in which a different cause was found, so I was trying to connect your dots.

View on HN · Topics

Hey, Boris from the team here.

We did both -- we did a number of UI iterations (eg. improving thinking loading states, making it more clear how many tokens are being downloaded, etc.). But we also reduced the default effort level after evals and dogfooding. The latter was not the right decision, so we rolled it back after finding that UX iterations were insufficient (people didn't understand to use /effort to increase intelligence, and often stuck with the default -- we should have anticipated this).

View on HN · Topics

We anticipated the default would be the best option for most people. We were wrong, so we reverted the default.

View on HN · Topics

> Instead of fixing the UI they lowered the default reasoning effort parameter from high to medium? And they "traced this back" because they "take reports about degradation very seriously"? Extremely hard to give them the benefit of doubt here.

They had droves of Claude devs vehemently defending and gaslighting users when this started happening

View on HN · Topics

Everything else aside, their brief "experiment" with removing CC support from the Pro plan got me seriously considering other options. I've been wary of vendor lock-in the whole time, but it was a useful reminder. (opencode+openrouter will probably be my first port of call)

View on HN · Topics

never ever forget theo's gpt 5 hype video and then him having to walk it back.

its very clear that theres money or influence exchanging hands behind the scenes with certain content creators, the information, and openai.

View on HN · Topics

Wow, bad enough for them to actually publish something and not cryptic tweets from employees.

Damage is done for me though. Even just one of these things (messing with adaptive thinking) is enough for me to not trust them anymore. And then their A/B testing this week on pricing.

View on HN · Topics

>On April 16, we added a system prompt instruction to reduce verbosity

In practice I understand this would be difficult but I feel like the system prompt should be versioned alongside the model. Changing the system prompt out from underneath users when you've published benchmarks using an older system prompt feels deceptive.

At least tell users when the system prompt has changed.

View on HN · Topics

This black box approach that large frontier labs have adopted is going to drive people away. To change fundamental behavior like this without notifying them, and only retroactively explaining what happened, is the reason they will move to self-hosting their own models. You can't build pipelines, workflows and products on a base that is just randomly shifting beneath you.

View on HN · Topics

I presume they don't yet have a cohesive monetization strategy, and this is why there is such huge variability in results on a weekly basis. It appears that Anthropic are skipping from one "experiment" to another. As users we only get to see the visible part (the results). Can't design a UI that indicates the software is thinking vs frozen? Does anyone actually believe that?

View on HN · Topics

It seems like there is no concept of deployment, or even A/B test, what works on presumably claude employee's laptop for the hour they spent testing it will ship immediately to everyone.

I mean, yes, even testing in production with some of your customer is better than.. testing with ALL of your customers?

View on HN · Topics

> A harness is just supportive scaffolding to run something.

Thank you for the perfect explanation.

Last week in my confusion about the word because Anthropic was using test, eval, and harness in the same sentence so I thought Anthropic made a test harness, I used Google asking "in computer science what is a harness". It responded only discussing test harnesses which solidified my thinking that is what it is.

I wish Google had responded as clearly you did. In my defense, we don't know if we understand something unless we discuss it.

View on HN · Topics

Glad there is finally some ownership. It is a pity that this was mostly because AMD embarrassed them on GitHub. Users have been reporting these issues for weeks, but were mostly ignored.

View on HN · Topics

> On April 16, we added a system prompt instruction to reduce verbosity. In combination with other prompt changes, it hurt coding quality, and was reverted on April 20. This impacted Sonnet 4.6, Opus 4.6, and Opus 4.7.

Claude caveman in the system prompt confirmed?

View on HN · Topics

Anthropic releases used to feel thorough and well done, with the models feeling immaculately polished. It felt like using a premium product, and it never felt like they were racing to keep up with the news cycle, or reply to competitors.

Recently that immaculately polished feel is harder to find. It coincides with the daily releases of CC, Desktop App, unknown/undocumented changes to the various harnesses used in CC/Cowork. I find it an unwelcome shift.

I still think they're the best option on the market, but the delta isn't as high as it was. Sometimes slowing down is the way to move faster.

View on HN · Topics

Boris from the Claude Code team here. We agree, and will be spending the next few weeks increasing our investment in polish, quality, and reliability. Please keep the feedback coming.

View on HN · Topics

> Please keep the feedback coming

if only there were a place with 9.881 feedbacks waiting to be triaged...

and that maybe not by a duplicate-bot that goes wild and just autocloses everything,
just blessing some of the stuff there with a "you´ve been seen" label would go a long way...

View on HN · Topics

Common pattern of checking the claude code issue tracker for a bug: land on issue #12587, auto closed as duplicate of #12043; check #12043, auto closed as duplicated of #11657; check #11657, auto closed as duplicate of #10645; check #10645, never got a response, or closed as not planned, or some other bullshit.

View on HN · Topics

Except one of the major other wrappers was pi, through OpenClaw. With countless hundreds of thousands of instances running every hour on that heartbeat.

I have no idea what the share of OpenClaw instances running on pi was, or third-party wrappers in general, but it was obviously large enough that Anthropic decided they had to put an end to it.

Conversely, from the latest developments, it would seem they are perfectly fine with people running OpenClaw with Claude models through Claude Code’s programmatic interface using subscriptions.

But in the end, this, my take, your take, is all conjecture. We are both on the outside looking in.

Only the people who work at Anthropic know.

View on HN · Topics

> As of April 23, we’re resetting usage limits for all subscribers.

Wait, didn't they just reset everybody's usage last Thursday, thereby syncing everybody's windows up? (Mine should have reset at 13:00 MDT) ? So this is just the normal weekly reset? Except now my reset says it will come Saturday? This is super-confusing!

View on HN · Topics

1. They changed the default in March from high to medium, however Claude Code still showed high (took 1 month 3 days to notice and remediate)

2. Old sessions had the thinking tokens stripped, resuming the session made Claude stupid (took 15 days to notice and remediate)

3. System prompt to make Claude less verbose reducing coding quality (4 days - better)

All this to say... the experience of suspecting a model is getting worse while Anthropic publicly gaslights their user-base: "we never degrade model performance" is frustrating.

Yes, models are complex and deploying them at scale given their usage uptick is hard. It's clear they are playing with too many independent variables simultaneously.

However you are obligated to communicate honestly to your users to match expectations. Am I being A/B tested? When was the date of the last system prompt change? I don't need to know what changed, just that it did, etc.

Doing this proactively would certainly match expectations for a fast-moving product like this.

View on HN · Topics

Sure, but it gives the impression of degraded model performance. Especially when the interface is still saying the model is operating on "high", the same as it did yesterday, yet it is in "medium" -- it just looks like the model got hobbled.

View on HN · Topics

> Anthropic publicly gaslights their user-base: "we never degrade model performance" is frustrating.

They're not gaslighting anyone here: they're very clear that the model itself, as in Opus 4.7, was not degraded in any way (i.e. if you take them at their word, they do not drop to lower quantisations of Claude during peak load).

However, the infrastructure around it - Claude Code, etc - is very much subject to change, and I agree that they should manage these changes better and ensure that they are well-communicated.

View on HN · Topics

They should really test everything thoroughly and then make it available to general public to avoid these issues!!

View on HN · Topics

They don’t either.

View on HN · Topics

> It’s incredible how forgiving you guys are with Anthropic and their errors.

Ironically, I was thinking the exact opposite. This is bleeding edge stuff and they keep pushing new models and new features. I would expect issues.

I was surprised at how much complaining there is -- especially coming from people who have probably built and launched a lot of stuff and know how easy it is to make mistakes.

View on HN · Topics

I don't think Anthropic has to inform their customers of every change they make, but they should have with this one.

View on HN · Topics

As an end-user, I feel like they're kind of over-cooking and under-describing the features and behavior of what is a tool at the end of the day. Today the models are in a place where the context management, reasoning effort, etc. all needs to be very stable to work well.

The thing about session resumption changing the context of a session by truncating thinking is a surprise to me, I don't think that's even documented behavior anywhere?

It's interesting to look at how many bugs are filed on the various coding agent repos. Hard to say how many are real / unique, but quantities feel very high and not hard to run into real bugs rapidly as a user as you use various features and slash commands.

View on HN · Topics

To think we'd have known about this in advance if they'd just have open sourced Claude Code, rather than them being forced into this embarrassing post mortem. Sunlight is the best disinfectant.

View on HN · Topics

Hi Boris, random observer here. Would you consider apologizing to the community for mistakenly closing tickets related to this and then wrongly keeping them closed when, internally, you realized they were legitimate?

I think an apology for that incident would go a long way.

View on HN · Topics

not many would believe in the sincerity of it anyway.

View on HN · Topics

This reads like good news! They probably still lost a bunch of users due to the negative public sentiment and not responding quickly enough, but at least they addressed it with a good bit of transparency.

View on HN · Topics

Good on Anthropic for giving an update & token refund, given the recent rumors of an inexplicable drop in quality. I applaud the transparency.

View on HN · Topics

not the first time. Still not showing thinking are we?

View on HN · Topics

In other words we did the right things, but we understand feedback, oh and bugs happen.

View on HN · Topics

something i note from this is that this is not a model weights change, but it is a hidden state change anthropic is doing to the outputs that can tune the quality and down on the "model" without breaking the "we arent changing the model" promise.

how often do these changes happen?

View on HN · Topics

Reading the "Going forward" section I see that they have zero understanding of the main complaints.

View on HN · Topics

I agree, but these LLM products are all black-boxes so we need to demand more accountability from them.

View on HN · Topics

The funny thing is, in the last 3 days Claude has gotten substantially worse. So this claim, "All three issues have now been resolved as of April 20 (v2.1.116)" does not land with me at all.

View on HN · Topics

Good on them for resolving all three issues, but is it any good again?

View on HN · Topics

Boris gaslighted us with all the quality related incidents for weeks not acknowledging these problems.

View on HN · Topics

Maybe he didn't know or they were still figuring it out which is fine they're still engineers who can get things wrong sometimes but the communication felt lackluster and being on the receiving end sucks when you had a reliable setup which then degrades. There is a reason people don't upgrade software and why people say if it works don't fix it, but obviously that's not an option for Anthropic when you want to keep improving the product, so they need good measurement tools and quick rollbacks even if properly "benchmarking" LLMs could prove difficult.

View on HN · Topics

I agree but one can admit their situation instead of outrightly rejecting the claims. My own mistake is to have become so hopelessly dependent on them.

View on HN · Topics

It's really hard to understand. There needs to be really loud batman sign in the sky type signals from some hero third party calling out objective product degradation. Do they use cc internally? If so do they use a different version? This should've been almost as loud a break as service just going down altogether, yet it took 2 weeks to fix?!

View on HN · Topics

> we refunded all affected customers

Notably missing from the postmortem

View on HN · Topics

They should do a similar report about their communication team. This was horrible mismanaged.

View on HN · Topics

Corporate bs begins...

View on HN · Topics

Gaslit for months, only to acknowledge.

Summarizer