Summarizer

Cost of AI Development

Analysis of the financial viability of AI coding, including hitting API rate limits, the high cost of Opus 4.5 tokens, and the potential unsustainability of VC-subsidized pricing.

← Back to Opus 4.5 is not the normal AI agent experience that I have had thus far

46 comments tagged with this topic

View on HN · Topics
Recently the v8 rust library changed it from mutable handle scopes to pinned scopes. A fairly simple change that I even put in my CLAUDE.md file. But it still generates methods with HandleScope's and then says... oh I have a different scope and goes on a random walk refactoring completely unrelated parts of the code. All the while Opus 4.5 burns through tokens. Things work great as long as you are testing on the training set. But that said, it is absolutely brilliant with React and Typescript.
View on HN · Topics
Have you ever worried that by programming in this way, you are methodically giving Anthropic all the information it needs to copy your product? If there is any real value in what you are doing, what is to stop Anthropic or OpenAI or whomever from essentially one-shotting Zed? What happens when the model providers 10x their costs and also use the information you've so enthusiastically given them to clone your product and use the money that you paid them to squash you?
View on HN · Topics
We were enjoying cheap second hand rack mount servers, RAM, hard drives, printers, office chairs and so on for a decade after the original dot com crash. Every company that went out of business liquidated their good shit for pennies. I'm hoping after AI comes back down to earth there will be a new glut of cheap second hand GPUs and RAM to get snapped up.
View on HN · Topics
Check out Antigravity+Google AI Pro $20 plan+Opus 4.5. apparently the Opus limits are insanely generous (of course that could change on a dime).
View on HN · Topics
There are several rubs with that operating protocol extending beyond the "you're holding it wrong" claim. 1) There exists a threshold, only identifiable in retrospect, past which it would have been faster to locate or write the code yourself than to navigate the LLM's correction loop or otherwise ensure one-shot success. 2) The intuition and motivations of LLMs derive from a latent space that the LLM cannot actually access. I cannot get a reliable answer on why the LLM chose the approaches it did; it can only retroactively confabulate. Unlike human developers who can recall off-hand, or at least review associated tickets and meeting notes to jog their memory. The LLM prompter always documenting sufficiently to bridge this LLM provenance gap hits rub #1. 3) Gradually building prompt dependency where one's ability to take over from the LLM declines and one can no longer answer questions or develop at the same velocity themselves. 4) My development costs increasingly being determined by the AI labs and hardware vendors they partner with. Particularly when the former will need to increase prices dramatically over the coming years to break even with even 2025 economics.
View on HN · Topics
I've said this multiple times: This is why you use this AI bubble (it IS a bubble) to use the VC-funded AI models for dirt cheap prices and CREATE tools for yourself. Need a very specific linter? AI can do it. Need a complex Roslyn analyser? AI. Any kind of scripting or automation that you run on your own machine. AI. None of that will go away or suddenly stop working when the bubble bursts. Within just the last 6 months I've built so many little utilities to speed up my work (and personal life) it's completely bonkers. Most went from "hmm, might be cool to..." to a good-enough script/program in an evening while doing chores. Even better, start getting the feel for local models. Current gen home hardware is getting good enough and the local models smart enough so you can, with the correct tooling, use them for suprisingly many things.
View on HN · Topics
A risk I see with this approach is that when the bubble pops, you'll be left dependent on a bunch of tools which you don't know how to maintain or replace on your own, and won't have/be able to afford access to LLMs to do it for you.
View on HN · Topics
Open source models hosted by independent providers (or even yourself, which if the bubble pops will be affordable if you manage to pick up hardware on fire sales) are already good enough to explain most code.
View on HN · Topics
> 1) There exists a threshold, only identifiable in retrospect, past which it would have been faster to locate or write the code yourself than to navigate the LLM's correction loop or otherwise ensure one-shot success. I can run multiple agents at once, across multiple code bases (or the same codebase but multiple different branches), doing the same or different things. You absolutely can't keep up with that. Maybe the one singular task you were working on, sure, but the fact that I can work on multiple different things without the same cognitive load will blow you out of the water. > 2) The intuition and motivations of LLMs derive from a latent space that the LLM cannot actually access. I cannot get a reliable answer on why the LLM chose the approaches it did; it can only retroactively confabulate. Unlike human developers who can recall off-hand, or at least review associated tickets and meeting notes to jog their memory. The LLM prompter always documenting sufficiently to bridge this LLM provenance gap hits rub #1. Tell the LLM to document in comments why it did things. Human developers often leave and then people with no knowledge of their codebase or their "whys" are even around to give details. Devs are notoriously terrible about documentation. > 3) Gradually building prompt dependency where one's ability to take over from the LLM declines and one can no longer answer questions or develop at the same velocity themselves. You can't develop at the same velocity, so drop that assumption now. There's all kinds of lower abstractions that you build on top of that you probably can't explain currently. > 4) My development costs increasingly being determined by the AI labs and hardware vendors they partner with. Particularly when the former will need to increase prices dramatically over the coming years to break even with even 2025 economics. You aren't keeping up with the actual economics. This shit is technically profitable, the unprofitable part is the ongoing battle between LLM providers to have the best model. They know software in the past has often been winner takes all so they're all trying to win.
View on HN · Topics
Cursor's planning functionality is very similar and I have found that I can even use "cheap" models like their Composer-1 and get great results in the planning phase, and then turn on Sonnet or Opus to actually produce the plan. 90% of the stuff I need to argue about is during the planning phase, so I save a ton of tokens and rework just making a really good spec. It turns out that Waterfall was always the correct method, it's just really slow ;)
View on HN · Topics
Yeah, if anyone can truly afford the AI empire. Remember all these "leading" companies are running it at a loss, so most companies paying for it are severely underpaying the cost of it all. We would need an insane technological breakthrough of unlimited memory and power before I start to worry, and at that point, I'll just look for a new career.
View on HN · Topics
This sounds like it would take a huge amount of tokens. I've never used agents so could you disclose how much you pay for it?
View on HN · Topics
If they're using Opus then it'll be the $100/month Claude Max 5x plan (could be the more expensive 20x plan depending on how intensive their use is). It does consume a lot of tokens, but I've been using the $100/mo plan and get a lot done without hitting limits. It helps to be mindful of context (regularly amending/pruning your CLAUDE.md instructions, clearing context between tasks, sizing your tasks to stay within the Opus context window). Claude Code plans have token limits that work in 5-hour blocks (that start when you send your first token, so it's often useful to prime it as early in the morning as possible). Claude Code will spawn sub-agents (that often use their cheap Haiki model) for exploration and planning tasks, with only the results imported into the main context. I've found the best results from a more interactive collaboration with Claude Code. As long as you describe the problem clearly, it does a good job on small/moderate tasks. I generally set two instances of Claude Code separate tasks and run them concurrently (the interaction with Claude Code distracts me too much to do my own independent coding simultaneously like with setting a task for a colleague, but I do work on architecture / planning tasks) The one manner of taste that I have had to compromise on is the sheer amount of code - it likes to write a lot of code. I have a better experience if I sweat the low-level code less, and just periodically have it clean up areas where I think it's written too much / too repetitive code. As you give it more freedom it's more prone to failure (and can often get itself stuck in a fruitless spiral) - however as you use it more you get a sense of what it can do independently and what's likely to choke on. A codebase with good human-designed unit & playwright tests is very good. Crucially, you get the best results where your tasks are complex but on the menial side of the spectrum - it can pay attention to a lot of details, but on the whole don't expect it to do great on senior-level tasks. To give you an idea, in a little over a month "npx ccusage" shows that via my Claude Code 5x sub I've used 5M input tokens, 1.5M output, 121M Cache Create, 1.7B Cache Read. Estimated pay-as-you-go API cost equivalent is $1500 (N.B. for the tail end of December they doubled everybody's API limits, so I was using a lot more tokens on more experimental on-the-fly tool construction work)
View on HN · Topics
FYI Opus is available and pretty usable in claude-code on the $20/Mo plan if you are at all judicious. I exclusively use opus for architecture / speccing, and then mostly Sonnet and occasionally Haiku to write the code. If my usage has been light and the code isn't too straightforward, I'll have Opus write code as well.
View on HN · Topics
That's helpful to know, thanks! I gave Max 5x a go and didn't look back. My suspicion is that Opus 4.5 is subsidised, so good to know there's flexibility if prices go up.
View on HN · Topics
The $20 plan for CC is good enough for 10-20 minutes of opus every 5h and you’ll be out of your weekly limit after 4-5 days if you sleep during the night. I wouldn’t be surprised if Anthropic actually makes a profit here. (Yeah probably not, but they aren’t burning cash.)
View on HN · Topics
I'm curious: With that much Claude Code usage, does that put your monthly Anthropic bill above $1000/mo?
View on HN · Topics
Mind sharing the bill for all that?
View on HN · Topics
My company pays for the team Claude code plan which is like $200 a month for each dev. The workflows cost like 10 - 50 cents a PR
View on HN · Topics
It will have to quintuple or more to make business sense for Anthropic. Sure, still cheaper than a full time developer, but don't expect it to stay at $200 for a long time. And then, when you explain to your boss how amazing it is, and can do all this work so easily and quickly, it's when your boss start asking the real question: what am I paying you for?
View on HN · Topics
A programmer, if we use US standards is probably $8000 per month. If you can get 30% more value out of that programmer (trust me, its WAY more then 30%), you gained $2400 of value. If you pay $200, $500, $1000 for that, its still a net positive. Ignoring the salary range of a actual senior... LLMs do not result in bosses firing people, it results in more projects / faster completed projects, what in turn means more $$$ for a company.
View on HN · Topics
More fundamentally: assume a 10 to 30% bump in actual productivity, find a niche (editing software, CRUD frameworks, SharePoint 2.0, stock trading, betting, whatever), and assume you had Anthropics billions or openAIs billions or Microsoft’s billions or Googles billions. Why on earth would you be hunting $20 a month subscriptions from random assed people? Peanuts. Lockheed-Martin could be, but isn’t, opening lemonade stands outside their offices… they don’t because of how buying a Ferrari works.
View on HN · Topics
> Why on earth would you be hunting $20 a month subscriptions from random assed people? Peanuts. For the same reason Microsoft never has and never will chase people for pirating home Windows or Office licenses When they hit the workforce, or even better, start a company guess which OS and office suite they'll use? Hint: It's not Linux and Openoffice. Same with Claude's $20 package. It lets devs use it at home and then compare it to the Copilot shit their company is pushing on them. Maybe they either grumble enough to get a Claude license or they're in a position to make the call. Cheap advertising pretty much. Worked for me too :) I've paid my own Claude license for over a year at home, grumbled at work and we got a Claude pilot going now - and everyone who's tried it so far isn't going back to Copilot + Sonnet 4.5/GPT5.
View on HN · Topics
Im not sure about this. What they really need is to get rid of the free tier and widespread adoption. Inference on the $200 plan seems to be profitable right now so they just need more users to amortize training costs.
View on HN · Topics
All the evidence suggests that inference is quite profitable actually.
View on HN · Topics
It's $150, not a huge difference but worth noting that it's not the same ast the 20x Max plan.
View on HN · Topics
Cheaper than hiring another developer, probably. My experience: for a few dollars I was able to extensively refactor a Python codebase in half a day. This otherwise would have taken multiple days of very tedious work.
View on HN · Topics
i've never hit a limit with my $200 a month plan
View on HN · Topics
I'm at the point where I say fuck it, let them sleep. The tech industry just went through an insane hiring craze and is now thinning out. This will help to separate the chaff from the wheat. I don't know why any company would want to hire "tech" people who are terrified of tech and completely obstinate when it comes to utilizing it. All the people I see downplaying it take a half-assed approach at using it then disparage it when it's not completely perfect. I started tinkering with LLMs in 2022. First use case, speak in natural english to the llm, give it a json structure, have it decipher the natural language and fill in that json structure (vacation planning app, so you talk to it about where/how you want to vacation and it creates the structured data in the app). Sometimes I'd use it for minor coding fixes (copy and paste a block into chatgpt, fix errors or maybe just ideation). This was all personal project stuff. At my job we got LLM access in mid/late 2023. Not crazy useful, but still was helpful. We got claude code in 2024. These days I only have an IDE open so I can make quick changes (like bumping up a config parameter, changing a config bool, etc.). I almost write ZERO code now. I usually have 3+ claude code sessions open. On my personal projects I'm using Gemini + codex primarily (since I have a google account and chatgpt $20/month account). When I get throttled on those I go to claude and pay per token. I'll often rip through new features, projects, ideas with one agent, then I have another agent come through and clean things up, look for code smells, etc. I don't allow the agents to have full unfettered control, but I'd say 70%+ of the time I just blindly accept their changes. If there are problems I can catch them on the MR/PR. I agree about the low hanging fruit and I'm constantly shocked at the sheer amount of FUD around LLMs. I want to generalize, like I feel like it's just the mid/jr level devs that speak poorly about it, but there's definitely senior/staff level people I see (rarely, mind you) that also don't like LLMs. I do feel like the online sentiment is slowly starting to change though. One thing I've noticed a lot of is that when it's an anonymous post it's more likely to downplay LLMs. But if I go on linkedin and look at actual good engineers I see them praising LLMs. Someone speaking about how powerful the LLMs are - working on sophisticated projects at startups or FAANG. Someone with FUD when it comes to LLM - web dev out of Alabama. I could go on and on but I'm just ranting/venting a little. I guess I can end this by saying that in my professional/personal life 9/10 of the top level best engineers I know are jumping on LLMs any chance they get. Only 1/10 talks about AI slop or bullshit like that.
View on HN · Topics
Opus 4.5 ate through my Copilot quota last month, and it's already halfway through it for this month. I've used it a lot, for really complex code. And my conclusion is: it's still not as smart as a good human programmer. It frequently got stuck, went down wrong paths, ignored what I told it to do to do something wrong, or even repeat a previous mistake I had to correct. Yet in other ways, it's unbelievably good. I can give it a directory full of code to analyze, and it can tell me it's an implementation of Kozo Sugiyama's dagre graph layout algorithm, and immediately identify the file with the error. That's unbelievably impressive. Unfortunately it can't fix the error. The error was one of the many errors it made during previous sessions. So my verdict is that it's great for code analysis, and it's fantastic for injecting some book knowledge on complex topics into your programming, but it can't tackle those complex problems by itself. Yesterday and today I was upgrading a bunch of unit tests because of a dependency upgrade, and while it was occasionally very helpful, it also regularly got stuck. I got a lot more done than usual in the same time, but I do wonder if it wasn't too much. Wasn't there an easier way to do this? I didn't look for it, because every step of the way, Opus's solution seemed obvious and easy, and I had no idea how deep a pit it was getting me into. I should have been more critical of the direction it was pointing to.
View on HN · Topics
Copilot and many coding agents truncates the context window and uses dynamic summarization to keep costs low for them. That's how they are able to provide flat fee plans. You can see some of the context limits here: https://models.dev/ If you want the full capability, use the API and use something like opencode. You will find that a single PR can easily rack up 3 digits of consumption costs.
View on HN · Topics
Gerring off of their plans and prompts is so worth it, I know from experience, I'm paying less and getting more so far, paying by token, heavy gemini-3-flash user, it's a really good model, this is the future (distillations into fast, good enough for 90% of tasks), not mega models like Claude. Those will still be created for distillations and the harder problems
View on HN · Topics
Maybe not, then. I'm afraid I have no idea what those numbers mean, but it looks like Gemini and ChatGPT 4 can handle a much larger context than Opus, and Opus 4.5 is cheaper than older versions. Is that correct? Because I could be misinterpreting that table.
View on HN · Topics
I don't know about GPT4 but the latest one (GPT 5.2) has 200k context window while Gemini has 1m, five times higher. You'll be wanting to stay within the first 100k on all of them to avoid hitting quotas very quickly though (either start a new task or compact when you reach that) so in practice there's no difference. I've been cycling between a couple of $20 accounts to avoid running out of quota and the latest of all of them are great. I'd give GPT 5.2 codex the slight edge but not by a lot. The latest Claude is about the same too but the limits on the $20 plan are too low for me to bother with. The last week has made me realize how close these are to being commodities already. Even the CLI the agents are nearly the same bar some minor quirks (although I've hit more bugs in Gemini CLI but each time I can just save a checkpoint and restart). The real differentiating factor right now is quota and cost.
View on HN · Topics
> Opus 4.5 ate theough my Copilot quota last month Sure, Copilot charges 3x tokens for using Opus 4.5, but, how were you still able to use up half the allocated tokens not even one week into January? I thought using up 50% was mad for me (inline completions + opencode), that's even worse
View on HN · Topics
at this point I would suggest getting a $20 subscription to start, seeing if you can expense it the tooling is almost as important as the model
View on HN · Topics
I don’t think waiting for profitability makes sense. They can be massively disruptive without much profit as long as they spend enough money.
View on HN · Topics
I agree with your observation, but not your conclusion. The 20 times it failed basically don't matter -- they are branches that can just be thrown away, and all that was lost is a few dollars on tokens (ignoring the environmental impact, which is a different conversation). As long as it can do the thing on a faster overall timeline and with less human attention than a human doing it fully manually, it's going to win. And it will only continue to get better. And I don't know why people always jump to self-driving cars as the analogy as a negative. We already have self-driving cars. Try a Waymo if you're in a city that has them. Yes, there are still long-tail problems being solved there, and limitations. But they basically work and they're amazing. I feel similarly about agentic development, plus in most cases the failure modes of SWE agents don't involve sudden life and death, so they can be more readily worked around.
View on HN · Topics
With "art" we're now at a situation where I can get 50 variations of a image prompt within seconds from an LLM. Does it matter that 49 of them "failed"? It cost me fractions of a cent, so not really. If every one of the 50 variants was drawn by a human and iterated over days, there would've been a major cost attached to every image and I most likely wouldn't have asked for 50 variations anyway. It's the same with code. The agent can iterate over dozens of possible solutions in minutes or a few hours. Codex Web even has a 4x mode that gives you 4 alternate solutions to the same issue. Complete waste of time and money with humans, but with LLMs you can just do it.
View on HN · Topics
Refactoring does always cost something and I doubt LLMs will ever change that. The more interesting question is whether the cost to refactor or "rewrite" the software will ever become negligible. Until it isn't, it's short-sighted to write code in the manner you're describing. If software does become that cheap, then you can't meaningfully maintain a business on selling software anyway.
View on HN · Topics
One thing I've been tossing around in my head is: - How quickly is cost of refactor to a new pattern with functional parity going down? - How does that change the calculus around tech debt? If engineering uses 3 different abstractions in inconsistent ways that leak implementation details across components and duplicate functionality in ways that are very hard to reason about, that is, in conventional terms, an existential problem that might kill the entire business, as all dev time will end up consumed by bug fixes and dealing with pointless complexity, velocity will fall to nothing, and the company will stop being able to iterate. But if claude can reliably reorganize code, fix patterns, and write working migrations for state when prompted to do so, it seems like the entire way to reason about tech debt has changed. And it has changed more if you are willing to bet that models within a year will be much better at such tasks. And in my experience, claude is imperfect at refactors and still requires review and a lot of steering, but it's one of the things it's better at, because it has clear requirements and testing workflows already built to work with around the existing behavior. Refactoring is definitely a hell of a lot faster than it used to be, at least on the few I've dealt with recently. In my mind it might be kind of like thinking about financial debt in a world with high inflation, in that the debt seems like it might get cheaper over time rather than more expensive.
View on HN · Topics
Their thesis is that code quality does not matter as it is now a cheap commodity. As long as it passes the tests today it's great. If we need to refactor the whole goddamn app tomorrow, no problem, we will just pay up the credits and do it in a few hours.
View on HN · Topics
The fundamental assumption is completely wrong. Code is not a cheap commodity. It is in fact so disastrously expensive that the entire US economy is about to implode while we're unbolting jet engines from old planes to fire up in the parking lots of datacenters for electricity.
View on HN · Topics
It is massively cheaper than an overseas engineer. A cheap engineer can pump out maybe 1000 lines of low quality code in an hour. So like 10k tokens per hour for $50. So best case scenario $5/1000 tokens. LLMS are charging like $5 per million of tokens. And even if it is subsidized 100x it is still cheaper an order of magnitude than an overseas engineer. Not to mention speed. An LLM will spit out 1000 lines in seconds, not hours.
View on HN · Topics
It matters for all the things you’d be able to justify paying a programmer for. What’s about to change is that there will be tons of these little one-off projects that previously nobody could justify paying $150/hr for. A mass democratization of software development. We’ve yet to see what that really looks like.
View on HN · Topics
I've been using 5.2 a lot lately but hit my quota for the first time (and will probably continue to hit it most weeks) so I shelled out for claude code. What differences do you notice? Any 'metagame' that would be helpful?