Summarizer

LLM Input

llm/065c6e83-d0d5-4aca-be3d-92768a8a3506/topic-6-9d11659c-03a8-4e79-9549-20fce998d677-input.json

Pretty-print

prompt

The following is content for you to summarize. Do not respond to the comments—summarize them.

<topic>
Context Window Management # Strategies for handling large codebases and context limits. Maintaining markdown files for subsystems, using skills, aggressive compaction. Concerns about context rot and performance degradation.
</topic>

<comments_about_topic>
1. > the workflow I’ve settled into is radically different from what most people do with AI coding tools

This looks exactly like what anthropic recommends as the best practice for using Claude Code. Textbook.

It also exposes a major downside of this approach: if you don't plan perfectly, you'll have to start over from scratch if anything goes wrong.

I've found a much better approach in doing a design -> plan -> execute in batches, where the plan is no more than 1,500 lines, used as a proxy for complexity.

My 30,000 LOC app has about 100,000 lines of plan behind it. Can't build something that big as a one-shot.

2. I do this too. Relatively small changes, atomic commits with extensive reasoning in the message (keeps important context around). This is a best practice anyway, but used to be excruciatingly much effort. Now it’s easy!

Except that I’m still struggling with the LLM understanding its audience/context of its utterances. Very often, after a correction, it will focus a lot on the correction itself making for weird-sounding/confusing statements in commit messages and comments.

3. > design -> plan -> execute in batches

This is the way for me as well. Have a high-level master design and plan, but break it apart into phases that are manageable. One-shotting anything beyond a todo list and expecting decent quality is still a pipe dream.

4. You don't start with 100k lines, you work in batches that are digestible. You read it once, then move on. The lines add up pretty quickly considering how fast Claude works. If you think about the difference in how many characters it takes to describe what code is doing in English, it's pretty reasonable.

5. Dunno. My 80k+ LOC personal life planner, with a native android app, eink display view still one shots most features/bugs I encounter. I just open a new instance let it know what I want and 5min later it's done.

6. In 5 min you are one shotting smaller changes to the larger code base right? Not the entire 80k likes which was the other comments point afaict.

7. Most of these AI coding articles seem to be about greenfield development.

That said, if you're on a serious team writing professional software there is still tons of value in always telling AI to plan first, unless it's a small quick task. This post just takes it a few steps further and formalizes it.

I find Cursor works much more reliably using plan mode, reviewing/revising output in markdown, then pressing build. Which isn't a ton of overhead but often leads to lots of context switching as it definitely adds more time.

8. For the last few days I've been working on a personal project that's been on ice for at least 6 years. Back when I first thought of the project and started implementing it, it took maybe a couple weeks to eke out some minimally working code.

This new version that I'm doing (from scratch with ChatGPT web) has a far more ambitious scope and is already at the "usable" point. Now I'm primarily solidifying things and increasing test coverage. And I've tested the key parts with IRL scenarios to validate that it's not just passing tests; the thing actually fulfills its intended function so far. Given the increased scope, I'm guessing it'd take me a few months to get to this point on my own, instead of under a week, and the quality wouldn't be where it is. Not saying I haven't had to wrangle with ChatGPT on a few bugs, but after a decent initial planning phase, my prompts now are primarily "Do it"s and "Continue"s. Would've likely already finished it if I wasn't copying things back and forth between browser and editor, and being forced to pause when I hit the message limit.

9. I go a bit further than this and have had great success with 3 doc types and 2 skills:

- Specs: these are generally static, but updatable as the project evolves. And they're broken out to an index file that gives a project overview, a high-level arch file, and files for all the main modules. Roughly ~1k lines of spec for 10k lines of code, and try to limit any particular spec file to 300 lines. I'm intimately familiar with every single line in these.

- Plans: these are the output of a planning session with an LLM. They point to the associated specs. These tend to be 100-300 lines and 3 to 5 phases.

- Working memory files: I use both a status.md (3-5 items per phase roughly 30 lines overall), which points to a latest plan, and a project_status (100-200 lines), which tracks the current state of the project and is instructed to compact past efforts to keep it lean)

- A planner skill I use w/ Gemini Pro to generate new plans. It essentially explains the specs/plans dichotomy, the role of the status files, and to review everything in the pertinent areas of code and give me a handful of high-level next set of features to address based on shortfalls in the specs or things noted in the project_status file. Based on what it presents, I select a feature or improvement to generate. Then it proceeds to generate a plan, updates a clean status.md that points to the plan, and adjusts project_status based on the state of the prior completed plan.

- An implementer skill in Codex that goes to town on a plan file. It's fairly simple, it just looks at status.md, which points to the plan, and of course the plan points to the relevant specs so it loads up context pretty efficiently.

I've tried the two main spec generation libraries, which were way overblown, and then I gave superpowers a shot... which was fine, but still too much. The above is all homegrown, and I've had much better success because it keeps the context lean and focused.

And I'm only on the $20 plans for Codex/Gemini vs. spending $100/month on CC for half year prior and move quicker w/ no stall outs due to token consumption, which was regularly happening w/ CC by the 5th day. Codex rarely dips below 70% available context when it puts up a PR after an execution run. Roughly 4/5 PRs are without issue, which is flipped against what I experienced with CC and only using planning mode.

10. This is pretty much my approach. I started with some spec files for a project I'm working on right now, based on some academic papers I've written. I ended up going back and forth with Claude, building plans, pushing info back into the specs, expanding that out and I ended up with multiple spec/architecture/module documents. I got to the point where I ended up building my own system (using claude) to capture and generate artifacts, in more of a systems engineering style (e.g. following IEEE standards for conops, requirement documents, software definitions, test plans...). I don't use that for session-level planning; Claude's tools work fine for that. (I like superpowers, so far. It hasn't seemed too much)

I have found it to work very well with Claude by giving it context and guardrails. Basically I just tell it "follow the guidance docs" and it does. Couple that with intense testing and self-feedback mechanisms and you can easily keep Claude on track.

I have had the same experience with Codex and Claude as you in terms of token usage. But I haven't been happy with my Codex usage; Claude just feels like it's doing more of what I want in the way I want.

11. Has anyone found a efficient way to avoid repeating the initial codebase assessment when working with large projects?

There are several projects on GitHub that attempt to tackle context and memory limitations, but I haven’t found one that consistently works well in practice.

My current workaround is to maintain a set of Markdown files, each covering a specific subsystem or area of the application. Depending on the task, I provide only the relevant documents to Claude Code to limit the context scope. It works reasonably well, but it still feels like a manual and fragile solution.
I’m interested in more robust strategies for persistent project context or structured codebase understanding.

12. Whenever I build a new feature with it I end up with several plan files leftover. I ask CC to combine them all, update with what we actually ended up building and name it something sensible, then whenever I want to work on that area again it's a useful reference (including the architecture, decisions and tradeoffs, relevant files etc).

13. For my longer spec files, I grep the subheaders/headers (with line numbers) and show this compact representation to the LLM's context window. I also have a file that describes what each spec files is and where it's located, and I force the LLM to read that and pull the subsections it needs. I also have one entrypoint requirements file (20k tokens) that I force it to read in full before it does anything else, every line I wrote myself. But none of this is a silver bullet.

14. I'm interested in this as well.

Skills almost seem like a solution, but they still need an out-of-band process to keep them updated as the codebase evolves. For now, a structured workflow that includes aggressive updates at the end of the loop is what I use.

15. In Claude Web you can use projects to put files relevant for context there.

16. And then you have to remind it frequently to make use of the files. Happened to me so many times that I added it both to custom instructions as well as to the project memory.

17. +1 to creating tickets by simply asking the agent to. It's worked great and larger tasks can be broken down into smaller subtasks that could reasonably be completed in a single context window, so you rarely every have to deal with compaction. Especially in the last few months since Claude's gotten good at dispatching agents to handle tasks if you ask it to, I can plan large changes that span multilpe tickets and tell claude to dispatch agents as needed to handle them (which it will do in parallel if they mostly touch different files), keeping the main chat relatively clean for orchestration and validation work.

18. The separation of planning and execution resonates strongly. I've been using a similar pattern when building with AI APIs — write the spec/plan in natural language first, then let the model execute against it.

One addition that's worked well for me: keeping a persistent context file that the model reads at the start of each session. Instead of re-explaining the project every time, you maintain a living document of decisions, constraints, and current state. Turns each session into a continuation rather than a cold start.

The biggest productivity gain isn't in the code generation itself — it's in reducing the re-orientation overhead between sessions.

19. > Most developers type a prompt, sometimes use plan mode, fix the errors, repeat.

> ...

> never let Claude write code until you’ve reviewed and approved a written plan

I certainly always work towards an approved plan before I let it lost on changing the code. I just assumed most people did, honestly. Admittedly, sometimes there's "phases" to the implementation (because some parts can be figured out later and it's more important to get the key parts up and running first), but each phase gets a full, reviewed plan before I tell it to go.

In fact, I just finished writing a command and instruction to tell claude that, when it presents a plan for implementation, offer me another option; to write out the current (important parts of the) context and the full plan to individual (ticket specific) md files. That way, if something goes wrong with the implementation I can tell it to read those files and "start from where they left off" in the planning.

20. > I am not seeing the performance degradation everyone talks about after 50% context window.

I pretty much agree with that. I use long sessions and stopped trying to optimize the context size, the compaction happens but the plan keeps the details and it works for me.

21. I agree with most of this, though I'm not sure it's radically different. I think most people who've been using CC in earnest for a while probably have a similar workflow? Prior to Claude 4 it was pretty much mandatory to define requirements and track implementation manually to manage context. It's still good, but since 4.5 release, it feels less important. CC basically works like this by default now, so unless you value the spec docs (still a good reference for Claude, but need to be maintained), you don't have to think too hard about it anymore.

The important thing is to have a conversation with Claude during the planning phase and don't just say "add this feature" and take what you get. Have a back and forth, ask questions about common patterns, best practices, performance implications, security requirements, project alignment, etc. This is a learning opportunity for you and Claude. When you think you're done, request a final review to analyze for gaps or areas of improvement. Claude will always find something, but starts to get into the weeds after a couple passes.

If you're greenfield and you have preferences about structure and style, you need to be explicit about that. Once the scaffolding is there, modern Claude will typically follow whatever examples it finds in the existing code base.

I'm not sure I agree with the "implement it all without stopping" approach and let auto-compact do its thing. I still see Claude get lazy when nearing compaction, though has gotten drastically better over the last year. Even so, I still think it's better to work in a tight loop on each stage of the implementation and preemptively compacting or restarting for the highest quality.

Not sure that the language is that important anymore either. Claude will explore existing codebase on its own at unknown resolution, but if you say "read the file" it works pretty well these days.

My suggestions to enhance this workflow:

- If you use a numbered phase/stage/task approach with checkboxes, it makes it easy to stop/resume as-needed, and discuss particular sections. Each phase should be working/testable software.

- Define a clear numbered list workflow in CLAUDE.md that loops on each task (run checks, fix issues, provide summary, etc).

- Use hooks to ensure the loop is followed.

- Update spec docs at the end of the cycle if you're keeping them. It's not uncommon for there to be some divergence during implementation and testing.

22. Doesn’t Claude code do this by switching between edit mode and plan mode?

FWIW I have had significant improvements by clearing context then implementing the plan. Seems like it stops Claude getting hung up on something.

23. I use amazon kiro.

The AI first works with you to write requirements, then it produces a design, then a task list.

The helps the AI to make smaller chunks to work on, it will work on one task at a time.

I can let it run for an hour or more in this mode. Then there is lots of stuff to fix, but it is mostly correct.

Kiro also supports steering files, they are files that try to lock the AI in for common design decisions.

the price is that a lot of the context is used up with these files and kiro constantly pauses to reset the context.

24. Is it required to tell Claude to re-read the code folder again when you come back some day later or should we ask Claude to just pickup from research.md file thus saving some tokens?

25. Sorry but I didn't get the hype with this post, isnt it what most of the people doing? I want to see more posts on how you use the claude "smart" without feeding the whole codebase polluting the context window and also more best practices on cost efficient ways to use it, this workflow is clearly burning million tokens per session, for me is a No

26. This is great. My workflow is also heading in that direction, so this is a great roadmap. I've already learned that just naively telling Claude what to do and letting it work, is a recipe for disaster and wasted time.

I'm not this structured yet, but I often start with having it analyse and explain a piece of code, so I can correct it before we move on. I also often switch to an LLM that's separate from my IDE because it tends to get confused by sprawling context.

27. Is this not just Ralph with extra steps and the risk of context rot?

28. The plan document and todo are an artifact of context size limits. I use them too because it allows using /reset and then continuing.
</comments_about_topic>

Write a concise, engaging paragraph (3-5 sentences) summarizing the key points and perspectives in these comments about the topic. Focus on the most interesting viewpoints. Do not use bullet points—write flowing prose.

topic

Context Window Management # Strategies for handling large codebases and context limits. Maintaining markdown files for subsystems, using skills, aggressive compaction. Concerns about context rot and performance degradation.

commentCount

← Back to job