Context Management Techniques

Discussions on how to optimize context for AI agents, including the use of CLAUDE.md or AGENTS.md to establish rules, and the technical challenges of context limits and pruning during long sessions.

Effective context management for AI agents is evolving into a disciplined architectural practice where users leverage specialized configuration files like `CLAUDE.md` to document "invisible knowledge" and establish rigid operational constraints. A standout strategy involves designing "AI-friendly" codebases and modular sub-agent hierarchies to bypass context limits and reduce "context anxiety" during complex, long-running sessions. While some contributors emphasize the value of automated verification loops and outcome-weighted learning to prevent recurring errors, others highlight the persistent technical struggle against token consumption and the limitations of current UI tools. Ultimately, the consensus shifts from viewing AI as a simple assistant toward treating it as a high-level collaborator that thrives on well-defined specifications, functional APIs, and aggressive, parallelized review processes.

View on HN · Topics

I hope self-promotion isn't frowned upon, but I've been spending the past months figuring out a workflow [1] that helps tackle the "more complicated problems" and ensure long-term maintainability of projects when done purely through Claude Code.

Effectively, I try to:

- Do not allow the LLM to make any implicit decisions, but instead confirm with the user.

- Ensure code is written in such a way that it's easy to understand for LLMs;

- Capture all "invisible knowledge" around decisions and architecture that's difficult to infer from code alone.

It's based entirely on Claude Code sub-agents + skills. The skills almost all invoke a Python script that guides the agents through workflows.

It's not a fast workflow: it frequently takes more than 1 hour just for the planning phase. Execution is significantly faster, as (typically) most issues have been discovered during the planning phase already (otherwise it would be considered a bug and I'd improve the workflow based on that).

I'm under the impression that the creator of Claude Code's post is also intended to raise awareness of certain features of Claude Code, such as hand-offs to the cloud and back. Their workflow only works for small features. It reads a bit like someone took a “best practices” guide and turned it into a twitter post. Nice, but not nearly detailed enough for an actual workflow.

[1] https://github.com/solatis/claude-config/

View on HN · Topics

> Ensure code is written in such a way that it's easy to understand for LLMs;

> Capture all "invisible knowledge" around decisions and architecture that's difficult to infer from code alone.

I work on projects where people love to create all sorts of complex abstractions but also hate writing ADRs (so they don’t) or often any sorts of comments and when they do they’re not very well written. Like the expectation is that you should call and ask the person who wrote something or have a multi-hour meeting where you make decisions and write nothing down.

That sort of environment is only conductive to manual work, dear reader, avoid those. Heed the advice above about documenting stuff.

View on HN · Topics

> Ensure code is written in such a way that it's easy to understand for LLMs

Over the summer last year, I had the AI (Gemini Pro 2.5) write base libraries from scratch that area easy for itself to write code against. Now GPro3 can one-shot (with, at most, a single debug loop at the REPL) 100% of the normal code I need developed (back office/business-type code).

Huge productivity booster, there are a few things that are very easy for humans to do that AI struggles with. By removing them, the AI has been just fantastic to work with.

View on HN · Topics

How would you characterize code is easy for AI to write code against. - and wouldn't that also be true for humans?

View on HN · Topics

AI is greatly aided by clear usage examples and trigger calls, such as "Use when [xyz]" types of standard comments.

View on HN · Topics

All relevant code fits in context. Functional APIs. Standard data structures. Design documents for everything.

I'm doing this in a Clojure context, so that helps—the core language/libraries are unusually stable and widely used and so feature-complete there's basically no hallucinations.

View on HN · Topics

You should try Claude opus 4.5 then. I haven’t had that issue. The key is you need to have well defined specs and detailed instructions for each agent.

View on HN · Topics

Agree. People are stuck applying the "agent" = "employee" analogy and think they are more productive by having a team/company of agents. Unless you've perfectly spec'ed and detailed multiple projects up front, the speed of a single agent shouldn't be the bottleneck.

View on HN · Topics

That’s how it works though. You create a detailed spec up front. That’s the workflow.

View on HN · Topics

> I need 1 agent that successfully solves the most important problem.

If you only have that one problem, that is a reasonable criticism, but you may have 10 different problems and want to focus on the important one while the smaller stuff is AIed away.

> I don't understand how you can generate requirements quicky enough to have 10 parallel agents chewing away at meaningful work.

I am generally happy with the assumptions it makes when given few requirements? In a lot of cases I just need a feature and the specifics are fairly open or very obvious given the context.

For example, I am adding MFA options to one project. As I already have MFA for another portal on it, I just told Claude to add MFA options for all users. Single sentence with no details. Result seems perfectly servicable, if in need of some CSS changes.

View on HN · Topics

> Well, that's what the CI is for. :)

Sure, but I'd prefer to catch it before that, not least because it's a simpler feedback loop to ensure Claude fixes its own messes.

> In any case, it seems like a good idea to also feed the output of compiler errors and warnings and the linter back to your coding agent.

Claude seems to "love" to use linters and error messages if it's given the chance and/or the project structure hints at an ecosystem where certain tools are usually available. But just e.g. listing by name a set of commands it can use to check things in CLAUDE.md will often be enough to have it run it aggressively.

If not enough, you can use hooks to either force it, or sternly remind it after every file edit, or e.g. before it attempts to git commit.

View on HN · Topics

> I don't understand how you can generate requirements quicky enough to have 10 parallel agents chewing away at meaningful work.

You use agents to expand the requirements as well , either in plan mode (as OP does) or with a custom scaffold (rules in CLAUDE.md about how to handle requirements; personally I prefer giving Claude the latitude to start when Claude is ready rather than wait for my go-ahead)

> I don't understand how you can have any meaningful supervising role over 10 things at once given the limits of human working memory.

[this got long: TL;DR: This is what works for me: Stop worrying about individual steps; use sub-agents and slash-commands to encapsulate units of work to make Claude run longer; use permissions to allow as much as you dare (and/or run in a VM to allow Claude to run longer; give Claude tools to verify its work (linters, test suites, sub-agents double-checking the work against the spec) and make it use it; don't sit and wait and read invidiual parts of the conversation - it will only infuriate you to see Claude make stupid mistakes, but if well scaffolded it will fix them before it returns the code to you, so stop reading, breathe, and let it work; only verify when Claude has worked for a long time and checked its own work -- that way you review far less code and far more complete and coherent changes]

You don't. You wait until each agent is done , and you review the PR's. To make this kind of thing work well you need agents and slash-commands, like OP does - sub-agents in particular help prevent the top-level agents from "context anxiety": Claude Code appears to have knowledge of context use, and will be prone to stopping before context runs out; sub-agents use their own context and the top-level agent only uses context to manage the input to and output from them, so the more is farmed out to sub-agents, the longer Claude Code is willing to run. I when I got up this morning, Claude Code had run all night and produced about 110k words of output.

This also requires extensive permissions to use safe tools without asking (what OP does), or --dangerously-skip-permissions (I usually do this; you might want to put this in a container/VM as it will happily do things like "killall -9 python" or similar without "thinking through" consequences - I've had it kill the terminal it itself ran in before), or it'll stop far too quickly.

You'll also want to explicitly tell it to do things in parallel when possible. E.g. if you want to use it as a "smarter linter" (DO NOT rely on it as the only linter, use a regular one too, but using claude to apply more complex rules that requires some reasoning works great), you can ask it to "run the linter agent in parallel on all typescript files" for example, and it will tend to spawn multiple sub-agents running in parallel, and metaphorically twiddle its thumbs waiting for them to finish (it's fun seeing it get "bored" and decide to do other things in the meantime, or get impatient and check on progress obsessively).

You'll also want to make Claude use sub-agents to review, verify, test its work, with instructions to repeat until all the verification sub-agents give its changes a PASS (see 12/ and 13/ in the thread) - there is no reason for you to waste your time reviewing code that Claude itself can tell isn't ready.

[E.g. concrete example: "Vanilla" Claude "loves" using instance_variable_get() in Ruby if facing a class that is missing an accessor for an instance variable. Whether you know Ruby or not, that should stand out like a sore thumb - it's a horrifically gross code smell, as it's basically bypassing encapsulation entirely. But you shouldn't worry about that - if you write Ruby with Claude, you'd want a rule in CLAUDE.md telling it how to address missing accessors, and sub-agent, and possibly a hook, making sure that Claude is told to fix it immediately if it ever uses it.]

Farming it off to sub-agents both makes it willing to work longer, especially on "boring" tasks, and avoids the problem that it'll look at past work and decide it already "knows" this code is ready and start skipping steps.

The key thing is to stop obsessing over every step Claude takes, and treat that as a developer experimenting with something they're not clear on how to do yet. If you let it work, and its instructions are good, and it has ways of checking its work, it will figure out its first attempts are broken, fix them, and leave you with output that takes far less of your time to review.

When Claude tells you its done with a change, if you stop egregious problems, fix your CLAUDE.md, fix your planning steps, fix your agents.

None of the above will absolve you of reviewing code, and you will need to kick things back and have it fix them, and sometimes that will be tedious, but Claude is good enough that the problems you have it fix should be complex, not simple code smells or logic errors, and 9 out 10 times they should signal that your scaffold is lacking important detail about your project or that your spec is incomplete at a functional/acceptance criteria level (not low level detail)

View on HN · Topics

Same thing happens to me in long enough sessions in xterm. Anecdotally it's pretty much guaranteed if I continue a session close to the point of context compacting, or if the context suddenly expands with some tool call.

Edit: for a while I thought this was by design since it was a very visceral / graphical way to feel that you're hitting the edge of context and should probably end the session.

If I get to the flicker point I generally start a new session. The flicker point always happens though from what I have observed.

View on HN · Topics

Claude Code is fairly simple. But Claude Desktop is a freaking mess, it loses chats when I switch tabs, it has no easy way to auto-extend the context, and it's just slow.

View on HN · Topics

What I find surprising is how much human intervention the creator of Claude uses. Every time Claude does something bad we write it in claude.md so he learns from it... Why not create an agent to handle this and learn automatically from previous implementations.
B: Outcome Weighting

# memory/store.py
OUTCOME_WEIGHTS = {
RunOutcome.SUCCESS: 1.0, # Full weight
RunOutcome.PARTIAL: 0.7, # Some issues but shipped
RunOutcome.FAILED: 0.3, # Downweighted but still findable
RunOutcome.CANCELLED: 0.2, # Minimal weight
}

# Applied during scoring:
final_score = score * decay_factor * outcome_weight

C: Anti-Pattern Retrieval

# Similar features → SUCCESS/PARTIAL only
similar_features = store.search(..., outcome_filter=[SUCCESS, PARTIAL])

# Anti-patterns → FAILED only (separate section)
anti_patterns = store.search(..., outcome_filter=[FAILED])

Injected into agent prompt:
## Similar Past Features (Successful)
1. "Add rate limiting with Redis..." (Outcome: success, Score: 0.87)

## Anti-Patterns (What NOT to Do)
_These similar attempts failed - avoid these approaches:_
1. "Add rate limiting with in-memory..." (FAILED, Score: 0.72)

## Watch Out For
- **Redis connection timeout**: Set connection pool size

The flow now:
Query: "Add rate limiting"
│
├──► Similar successful features (ranked by outcome × decay × similarity)
│
├──► Failed attempts (shown as warnings)
│
└──► Agent sees both "what worked" AND "what didn't"

View on HN · Topics

I believe slash commands are all loaded into the initial context and executed when invoked by the user. Skills on the other hand only load the name and description into initial context, and the agent (not user) determines when to invoke them, and only then is the whole skill loaded into context. So skills shift decision making to the agent and use progressive disclosure for context efficiency.

View on HN · Topics

I don't understand how these setups scale longterm, and even more so for the average user. The latter is relevant because, as he points out, his setup isn't that far out of reach of the average person - it's still fairly close to out of the box claude code, and opus.

But between the model qualities varying, the pricing, the timing, the tools constantly changing, I think it's really difficult to build the institutional knowledge and setup that can be used beyond a few weeks.

In the era of AI, I don't tink it's good enough to "have" a working product. It's also important to have all the other things that make a project way more productive, like stellar documentation, better abstractions, clearer architecture. In terms of AI, there's gotta be something better than just a markdown file with random notes. Like what happens when an agent does something because it's picking something up from some random slack convo, or some minor note in a 10k claude.md file. It just seems like the wild west where basic ideas like additional surface area being a liability is ignored because we're too early in the cycle.

tl;dr If it's just pushing around typical mid-level code, then... I just think that's falling behind.

View on HN · Topics

Boris is a power user's power user.

I would highly recommend every project maintain a simple, well-written AGENTS.md file. At first it may seem like a more nitpicky README, but you will quickly see how much coding agents benefit from this added context. Imo, the two most important things to include in AGENTS.md are frequent commands and verification methods.

A third thing I've started adding to my projects is a list of related documentation and libraries that may not be immediately obvious. Things like confluence pages and other repos associated with the project.

View on HN · Topics

One of my side projects has been to recover a K&R C computer algebra system from the 1980's, port to modern 64-bit C. I'd have eight tabs at a time assigned files from a task server, to make passes at 60 or so files. This nearly worked; I'm paused till I can have an agent with a context window that can look at all the code at once. Or I'll attempt a fresh translation based on what I learned.

With a $200 monthly Max subscription, I would regularly stall after completing significant work, but this workflow was feasible. I tried my API key for an hour once; it taught me to laugh at the $200 as quite a deal.

I agree that Opus 4.5 is the only reasonable use of my time. We wouldn't hire some guy off the fryer line to be our CTO; coding needs best effort.

Nevertheless, I thought my setup was involved, but if Boris considers his to be vanilla ice cream then I'm drinking skim milk.

View on HN · Topics

Yeah... I had a fairly in-depth conversation with Claude a couple of days ago about Claude Code and the way it works, and usage limits, and comparison to how other AI coding tools work, and the extremely blunt advice from Claude was that Claude Code was not suitable for serious software development due to usage limits! (props to Anthropic for not sugar coating it!)

Maybe on the Max 20x plan it becomes viable, and no doubt on the Boris Cherny unlimited usage plan it does, but it seems that without very aggressive non-stop context pruning you will rapidly hit limits and the 5-hour timeout even working with a single session, let alone 5 Claude Code sessions and another 5-10 web ones!

The key to this is the way that Claude Code (the local part) works and interacts with Claude AI (the actual model, running in the cloud). Basically Claude Code maintains the context, comprising mostly of the session history, contents of source files it has accessed, and the read/write/edit tools (based on Node.js) it is providing for Claude AI. This entire context, including all files that have been read, and the tools definitions, are sent to Claude AI (eating into your token usage limit) with EVERY request, so once Claude Code has accessed a few source files then the content of those files will "silently" be sent as part of every subsequent request, regardless of what it is. Claude gave me an example of where with 3 smallish files open (a few thousand lines of code), then within 5 requests the token usage might be 80,000 or so, vs the 40,000 limit of the Pro plan or 200,000 limit of the Max 5x plan. Once you hit limit then you have to wait 5 hours for a usage reset, so without Cherny's infinite usage limit this becomes a game of hurry up and wait (make 5 requests, then wait 5 hours and make 5 more).

You can restrict what source files Claude Code has access to, to try to manage context size (e.g. in a C++ project, let it access all the .h module definition files, but block all the .cpp ones) as well as manually inspecting the context all the time to see what is being sent that can be removed. I believe there is some automatic context compaction happening periodically too, but apparently not enough to prevent many/most people hitting usage time outs when working on larger projects.

Not relevant here, but Claude also explained how Cursor manages to provide fast/cheap autocomplete using it's own models by building a vector index of the code base to only pull relevant chunks of code into the context.

View on HN · Topics

Put it in CLAUDE.md because that just gets added to the prompt

View on HN · Topics

A classic hacker news post that will surely interest coders from all walks of life! ~

After regular use of an AI coding assistant for some time, I see something unusual: my biggest wins came from neither better prompts nor a smarter model. They originated from the way I operated.

At first, I thought of it as autocomplete. Afterwards, similar to a junior developer. In the end, a collaborator who requires constraints.

Here is a framework I have landed on.

First Step: Request for everything. Obtain acceleration, but lots of noise.

Stage two: Include regulations. Less Shock, More Trust.

Phase 3: Allow time for acting but don’t hesitate to perform reviews aggressively.

A few habits that made a big difference.

Specify what can be touched or come into contact with.

Asking it to explain differences before applying them.

Consider “wrong but confident” answers as signal to tighten scope.

Wondering what others see only after time.

What transformations occurred after the second or fourth week?

When was the trust increased or reduced?

What regulations do you wish you had added earlier?

Summarizer