Summarizer

Token Cost Considerations

Discussion of workflow being token-heavy and expensive. Comparisons between Claude subscription tiers. Arguments that simpler approaches save money while achieving similar results.

← Back to How I use Claude Code: Separation of planning and execution

While some developers find that upgrading to high-tier, premium subscriptions is the only way to achieve a productive flow, others warn that "token-heavy" workflows often lead to hallucination loops that can drain a budget in minutes. To combat these costs, many experienced users advocate for lean, homegrown orchestration—using tiered documentation like static specs and "working memory" files—to keep context windows focused and minimize unnecessary consumption. Ultimately, the consensus suggests a divide between those who view deep context as essential for complex problem-solving and skeptics who argue that excessive orchestration is a "net negative" that primarily benefits the AI providers' bottom lines.

19 comments tagged with this topic

View on HN · Topics
This is exactly what I do. I assume most people avoid this approach due to cost.
View on HN · Topics
> LLMs don’t usually fail at syntax? Really? My experience has been that it’s incredibly easy to get them stuck in a loop on a hallucinated API and burn through credits before I’ve even noticed what it’s done. I have a small rust project that stores stuff on disk that I wanted to add an s3 backend too - Claude code burned through my $20 in a loop in about 30 minutes without any awareness of what it was doing on a very simple syntax issue.
View on HN · Topics
I think it does more harm than good on recent models. The LLM has to override its system prompt to role-play, wasting context and computing cycles instead of working on the task.
View on HN · Topics
I think "understand this directory deeply" just gives more focus for the instruction. So it's like "burn more tokens for this phase than you normally would".
View on HN · Topics
This all looks fine for someone who can't code, but for anyone with even a moderate amount of experience as a developer all this planning and checking and prompting and orchestrating is far more work than just writing the code yourself. There's no winner for "least amount of code written regardless of productivity outcomes.", except for maybe Anthropic's bank account.
View on HN · Topics
I go a bit further than this and have had great success with 3 doc types and 2 skills: - Specs: these are generally static, but updatable as the project evolves. And they're broken out to an index file that gives a project overview, a high-level arch file, and files for all the main modules. Roughly ~1k lines of spec for 10k lines of code, and try to limit any particular spec file to 300 lines. I'm intimately familiar with every single line in these. - Plans: these are the output of a planning session with an LLM. They point to the associated specs. These tend to be 100-300 lines and 3 to 5 phases. - Working memory files: I use both a status.md (3-5 items per phase roughly 30 lines overall), which points to a latest plan, and a project_status (100-200 lines), which tracks the current state of the project and is instructed to compact past efforts to keep it lean) - A planner skill I use w/ Gemini Pro to generate new plans. It essentially explains the specs/plans dichotomy, the role of the status files, and to review everything in the pertinent areas of code and give me a handful of high-level next set of features to address based on shortfalls in the specs or things noted in the project_status file. Based on what it presents, I select a feature or improvement to generate. Then it proceeds to generate a plan, updates a clean status.md that points to the plan, and adjusts project_status based on the state of the prior completed plan. - An implementer skill in Codex that goes to town on a plan file. It's fairly simple, it just looks at status.md, which points to the plan, and of course the plan points to the relevant specs so it loads up context pretty efficiently. I've tried the two main spec generation libraries, which were way overblown, and then I gave superpowers a shot... which was fine, but still too much. The above is all homegrown, and I've had much better success because it keeps the context lean and focused. And I'm only on the $20 plans for Codex/Gemini vs. spending $100/month on CC for half year prior and move quicker w/ no stall outs due to token consumption, which was regularly happening w/ CC by the 5th day. Codex rarely dips below 70% available context when it puts up a PR after an execution run. Roughly 4/5 PRs are without issue, which is flipped against what I experienced with CC and only using planning mode.
View on HN · Topics
This is the way. The practice is: - simple - effective - retains control and quality Certainly the “unsupervised agent” workflows are getting a lot of attention right now, but they require a specific set of circumstances to be effective: - clear validation loop (eg. Compile the kernel, here is gcc that does so correctly) - ai enabled tooling (mcp / cli tool that will lint, test and provide feedback immediately) - oversight to prevent sgents going off the rails (open area of research) - an unlimited token budget That means that most people can't use unsupervised agents. Not that they dont work; Most people have simply not got an environment and task that is appropriate. By comparison, anyone with cursor or claude can immediately start using this approach , or their own variant on it. It does not require fancy tooling. It does not require an arcane agent framework. It works generally well across models. This is one of those few genunie pieces of good practical advice for people getting into AI coding. Simple. Obviously works once you start using it. No external dependencies. BYO tools to help with it, no “buy my AI startup xxx to help”. No “star my github so I can a job at $AI corp too”. Great stuff.
View on HN · Topics
What I've read is that even with all the meticulous planning, the author still needed to intervene. Not at the end but at the middle, unless it will continue building out something wrong and its even harder to fix once it's done. It'll cost even more tokens. It's a net negative. You might say a junior might do the same thing, but I'm not worried about it, at least the junior learned something while doing that. They could do it better next time. They know the code and change it from the middle where it broke. It's a net positive.
View on HN · Topics
I have tried using this and other workflows for a long time and had never been able to get them to work (see chat history for details). This has changed in the last week, for 3 reasons: 1. Claude opus. It’s the first model where I haven’t had to spend more time correcting things than it would’ve taken me to just do it myself. The problem is that opus chews through tokens, which led to.. 2. I upgraded my Claude plan. Previously on the regular plan I’d get about 20 mins of time before running out of tokens for the session and then needing to wait a few hours to use again. It was fine for little scripts or toy apps but not feasible for the regular dev work I do. So I upgraded to 5x. This now got me 1-2 hours per session before tokens expired. Which was better but still a frustration. Wincing at the price, I upgraded again to the 20x plan and this was the next game changer. I had plenty of spare tokens per session and at that price it felt like they were being wasted - so I ramped up my usage. Following a similar process as OP but with a plans directory with subdirectories for backlog, active and complete plans, and skills with strict rules for planning, implementing and completing plans, I now have 5-6 projects on the go. While I’m planning a feature on one the others are implementing. The strict plans and controls keep them on track and I have follow up skills for auditing quality and performance. I still haven’t hit token limits for a session but I’ve almost hit my token limit for the week so I feel like I’m getting my money’s worth. In that sense spending more has forced me to figure out how to use more. 3. The final piece of the puzzle is using opencode over claude code. I’m not sure why but I just don’t gel with Claude code. Maybe it’s all the sautéing and flibertygibbering, maybe it’s all the permission asking, maybe it’s that it doesn’t show what it’s doing as much as opencode. Whatever it is it just doesn’t work well for me. Opencode on the other hand is great. It’s shows what it’s doing and how it’s thinking which makes it easy for me to spot when it’s going off track and correct early. Having a detailed plan, and correcting and iterating on the plan is essential. Making clause follow the plan is also essential - but there’s a line. Too fine grained and it’s not as creative at solving problems. Too loose/high level and it makes bad choices and goes in the wrong direction. Is it actually making me more productive? I think it is but I’m only a week in. I’ve decided to give myself a month to see how it all works out. I don’t intend to keep paying for the 20x plan unless I can see a path to using it to earn me at least as much back.
View on HN · Topics
Just don’t use Claude Code. I can use the Codex CLI with just my $20 subscription and never come close to any usage limits
View on HN · Topics
What if it's just slower so that your daily work fits within the paid tier they want?
View on HN · Topics
It isn’t slower. I use my personal ChatGPT subscriptions with Codex for almost everything at work and use my $800/month company Claude allowance only for the tricky stuff that Codex can’t figure out. It’s never application code. It’s usually some combination of app code + Docker + AWS issue with my underlying infrastructure - created with whatever IAC that I’m using for a client - Terraform/CloudFormation or the CDK. I burned through $10 on Claude in less than an hour. I only have $36 a day at $800 a month (800/22 working days)
View on HN · Topics
> and use my $800/month company Claude allowance only for the tricky stuff that Codex can’t figure out. It doesn’t seem controversial that the model that can solve more complex problems (that you admit the cheaper model can’t solve) costs more. For the things I use it for, I’ve not found any other model to be worth it.
View on HN · Topics
Not in the last 2 months. Current clause subscription is a sunk cost for the next month. Maybe I’ll try codex if Claude doesn’t lead anywhere.
View on HN · Topics
Curious: what are some cases where it'd make sense to not pay for the 20x plan (which is $200/month), and provide a whopping $800/month pay-per-token allowance instead?
View on HN · Topics
Who knows? It’s part of an enterprise plan. I work for a consulting company. There are a number of fallbacks, the first fallback if we are working on an internal project is just to use our internal AWS account and use Claude code with the Anthropic hosted on Bedrock. https://code.claude.com/docs/en/amazon-bedrock The second fallback if it is for a customer project is to use their AWS account for development for them. The rate my company charges for me - my level as an American based staff consultant (highest bill rate at the company) they are happy to let us use Claude Code using their AWS credentials. Besides, if we are using AWS Bedrock hosted Anthropic models, they know none of their secrets are going to Anthropic. They already have the required legal confidentiality/compliancd agreements with AWS.
View on HN · Topics
Is it required to tell Claude to re-read the code folder again when you come back some day later or should we ask Claude to just pickup from research.md file thus saving some tokens?
View on HN · Topics
Sorry but I didn't get the hype with this post, isnt it what most of the people doing? I want to see more posts on how you use the claude "smart" without feeding the whole codebase polluting the context window and also more best practices on cost efficient ways to use it, this workflow is clearly burning million tokens per session, for me is a No
View on HN · Topics
My workflow is a bit different. * I ask the LLM for it's understanding of a topic or an existing feature in code. It's not really planning, it's more like understanding the model first * Then based on its understanding, I can decide how great or small to scope something for the LLM * An LLM showing good understand can deal with a big task fairly well. * An LLM showing bad understanding still needs to be prompted to get it right * What helps a lot is reference implementations. Either I have existing code that serves as the reference or I ask for a reference and I review. A few folks do it at my work do it OPs way, but my arguments for not doing it this way * Nobody is measuring the amount of slop within the plan. We only judge the implementation at the end * it's still non deterministic - folks will have different experiences using OPs methods. If claude updates its model, it outdates OPs suggestions by either making it better or worse. We don't evaluate when things get better, we only focus on things not gone well. * it's very token heavy - LLM providers insist that you use many tokens to get the task done. It's in their best interest to get you to do this. For me, LLMs should be powerful enough to understand context with minimal tokens because of the investment into model training. Both ways gets the task done and it just comes down to my preference for now. For me, I treat the LLM as model training + post processing + input tokens = output tokens. I don't think this is the best way to do non deterministic based software development. For me, we're still trying to shoehorn "old" deterministic programming into a non deterministic LLM.