Summarizer

Prompt Engineering and Configuration

Strategies involving `CLAUDE.md`, `AGENTS.md`, and custom system prompts to teach the AI coding conventions, architecture, and specific skills for better output.

← Back to Opus 4.5 is not the normal AI agent experience that I have had thus far

52 comments tagged with this topic

View on HN · Topics
Most software engineers are seriously sleeping on how good LLM agents are right now, especially something like Claude Code. Once you’ve got Claude Code set up, you can point it at your codebase, have it learn your conventions, pull in best practices, and refine everything until it’s basically operating like a super-powered teammate. The real unlock is building a solid set of reusable “skills” plus a few agents for the stuff you do all the time. For example, we have a custom UI library, and Claude Code has a skill that explains exactly how to use it. Same for how we write Storybooks, how we structure APIs, and basically how we want everything done in our repo. So when it generates code, it already matches our patterns and standards out of the box. We also had Claude Code create a bunch of ESLint automation, including custom ESLint rules and lint checks that catch and auto-handle a lot of stuff before it even hits review. Then we take it further: we have a deep code review agent Claude Code runs after changes are made. And when a PR goes up, we have another Claude Code agent that does a full PR review, following a detailed markdown checklist we’ve written for it. On top of that, we’ve got like five other Claude Code GitHub workflow agents that run on a schedule. One of them reads all commits from the last month and makes sure docs are still aligned. Another checks for gaps in end-to-end coverage. Stuff like that. A ton of maintenance and quality work is just… automated. It runs ridiculously smoothly. We even use Claude Code for ticket triage. It reads the ticket, digs into the codebase, and leaves a comment with what it thinks should be done. So when an engineer picks it up, they’re basically starting halfway through already. There is so much low-hanging fruit here that it honestly blows my mind people aren’t all over it. 2026 is going to be a wake-up call. (used voice to text then had claude reword, I am lazy and not gonna hand write it all for yall sorry!) Edit: made an example repo for ya https://github.com/ChrisWiles/claude-code-showcase
View on HN · Topics
Recently the v8 rust library changed it from mutable handle scopes to pinned scopes. A fairly simple change that I even put in my CLAUDE.md file. But it still generates methods with HandleScope's and then says... oh I have a different scope and goes on a random walk refactoring completely unrelated parts of the code. All the while Opus 4.5 burns through tokens. Things work great as long as you are testing on the training set. But that said, it is absolutely brilliant with React and Typescript.
View on HN · Topics
This isn't meant as a criticism, or to doubt your experience, but I've talked to a few people who had experiences like this. But, I helped them get Claude code setup, analyze the codebase and document the architecture into markdown (edit as needed after), create an agent for the architecture, and prompt it in an incremental way. Maybe 15-30 minutes of prep. Everyone I helped with this responded with things like "This is amazing", "Wow!", etc. For some things you can fire up Claude and have it generate great code from scratch. But for bigger code bases and more complex architecture, you need to break it down ahead of time so it can just read about the architecture rather than analyze it every time.
View on HN · Topics
Is there any good documentation out there about how to perform this wizardry? I always assumed if you did /init in a new code base, that Claude would set itself up to maximize its own understanding of the code. If there are extra steps that need to be done, why don't Claude's developers just add those extra steps to /init?
View on HN · Topics
Not that I have seen, which is probably a big part of the disconnect. Mostly it's tribal knowledge. I learned through experimentation, but I've seen tips here and there. Here's my workflow (roughly) > Create a CLAUDE.md for a c++ application that uses libraries x/y/z [Then I edit it, adding general information about the architecture] > Analyze the library in the xxx directory, and produce a xxx_architecture.md describing the major components and design > /agent [let claude make the agent, but when it asks what you want it to do, explain that you want it to specialize in subsystem xxx, and refer to xxx_architecture.md Then repeat until you have the major components covered. Then: > Using the files named with architecture.md analyze the entire system and update CLAUDE.md to use refer to them and use the specialized agents. Now, when you need to do something, put it in planning mode and say something like: > There's a bug in the xxx part of the application, where when I do yyy, it does zzz, but it should do aaa. Analyze the problem and come up with a plan to fix it, and automated tests you can perform if possible. Then, iterate on the plan with it if you need to, or just approve it. One of the most important things you can do when dealing with something complex is let it come up with a test case so it can fix or implement something and then iterate until it's done. I had an image processing problem and I gave it some sample data, then it iterated (looking at the output image) until it fixed it. It spent at least an hour, but I didn't have to touch it while it worked.
View on HN · Topics
This is some great advice. What I would add is to avoid the internal plan mode and just build your own. Built in one creates md files outside the project, gives the files random names and its hard to reference in the future. It's also hard to steer the plan mode or have it remember some behavior that you want to enforce. It's much better to create a custom command with custom instructions that acts as the plan mode. My system works like this: /implement command acts as an orchestrator & plan mode, and it is instructed to launch predefined set of agents based on the problem and have them utilize specific skills. Every time /implement command is initiated, it has to create markdown file inside my own project, and then each subagent is also instructed to update the file when it finished working. This way, orchestrator can spot that agent misbehaved, and reviewer agent can see what developer agent tried to do and why it was wrong.
View on HN · Topics
/commands are like macros or mayyybe aliases. You just put in the commands you see yourself repeating often, like "commit the unstaged files in distinct commits, use xxx style for the commit messages..." - then you can iterate on it if you see any gaps or confusion, even give example commands to use in the different steps. Skills on the other hand are commands ON STEROIDS. They can be packaged with actual scripts and executables, the PEP723 Python style + uv is super useful. I have one skill for example that uses Python+Treesitter to check the unit thest quality of a Go project. It does some AST magic to check the code for repetition, stupid things like sleeps and relative timestamps etc. A /command _can_ do it, but it's not as efficient, the scripts for the skill are specifically designed for LLM use and output the result in a hyper-compact form a human could never be arsed to read.
View on HN · Topics
You don't need to do much, the /agent command is the most useful, and it walks you through it. The main thing though is to give the agent something to work with before you create it. That's why I go through the steps of letting Claude analyze different components and document the design/architecture. The major benefit of agents is that it keeps context clean for the main job. So the agent might have a huge context working through some specific code, but the main process can do something to the effect of "Hey UI library agent, where do I need to put code to change the color of widget xyz", then the agent does all the thinking and can reply with "that's in file 123.js, line 200". The cleaner you keep the main context, the better it works.
View on HN · Topics
It can do this at the level of a function, and that's -useful-, but like the parent reply to top-level comment, and despite investing the time, using skills & subagents, etc., I haven't gotten it to do well with C++ or Rust projects of sufficient complexity. I'm not going to say they won't some day, but, it's not today.
View on HN · Topics
Trying to one-shot large codebases is a exercise in futility. You need to let Claude figure out and document the architecture first, then setup agents for each major part of the project. Doing this keeps the context clean for the main agent, since it doesn't have to go read the code each time. So one agent can fill it's entire context understanding part of the code and then the main agent asks it how to do something and gets a shorter response. It takes more work than one-shot, but not a lot, and it pays dividends.
View on HN · Topics
Is there a guide for doing that successfully somewhere? I would love to play with this on a large codebase. I would also love to not reinvent the wheel on getting Claude working effectively on a large code base. I don’t even know where to start with, e.g., setting up agents for each part.
View on HN · Topics
> My issues all stem from that it works, but does the wrong thing It's an opportunity, not a problem. Because it means there's a gap in your specifications and then your tests. I use Aider not Claude but I run it with Anthropic models. And what I found is that comprehensively writing up the documentation for a feature spec style before starting eliminates a huge amount of what you're referring to. It serves a triple purpose (a) you get the documentation, (b) you guide the AI and (c) it's surprising how often this helps to refine the feature itself. Sometimes I invoke the AI to help me write the spec as well, asking it to prompt for areas where clarification is needed etc.
View on HN · Topics
This is how Beads works, especially with Claude Code. What I do is I tell Claude to always create a Bead when I tell it to add something, or about something that needs to be added, then I start brainstorming, and even ask it to do market research what are top apps doing for x, y or z. Then ask it to update the bead (I call them tasks) and then finally when its got enough detail, I tell it, do all of these in parallel.
View on HN · Topics
If it does the wrong thing you tell it what the right thing is and have it try again. With the latest models if you're clear enough with your requirements you'll usually find it does the right thing on the first try.
View on HN · Topics
I'm working mostly in a web framework that's used by me and almost nobody else (the weird little ASGI wrapper buried in Datasette) and I find the coding agents pick it up pretty fast. One trick I use that might work for you as well: Clone GitHub.com/simonw/datasette to /tmp then look at /tmp/docs/datasette for documentation and search the code if you need to Try that with your own custom framework and it might unblock things. If your framework is missing documentation tell Claude Code to write itself some documentation based on what it learns from reading the code!
View on HN · Topics
And if you've told it too many times to fix it, tell it someone has a gun to your head, for some reason it almost always gets it right this very next time.
View on HN · Topics
The human still has to manage complexity. A properly modularized and maintainable code base is much easier for the LLM to operate on — but the LLM has difficulty keeping the code base in that state without strong guidance. Putting “Make minimal changes” in my standard prompt helped a lot with the tendency of basically all agents to make too many changes at once. With that addition it became possible to direct the LLM to make something similar to the logical progression of commits I would have made anyway, but now don’t have to work as hard at crafting. Most of the hype merchants avoid the topic of maintainability because they’re playing to non-technical management skeptical of the importance of engineering fundamentals. But everything I’ve experienced so far working with LLMs screams that the fundamentals are more important than ever.
View on HN · Topics
The context window isn't "crippled". Create a markdown document of your task (or use CLAUDE.md), put it in "plan mode" which allows Claude to use tool calls to ask questions before it generates the plan. When it finishes one part of the plan, have it create a another markdown document - "progress.md" or whatever with the whole plan and what is completed at that point. Type /clear (no more context window), tell Claude to read the two documents. Repeat until even a massive project is complete - with those 2 markdown documents and no context window issues.
View on HN · Topics
> Once you’ve got Claude Code set up, you can point it at your codebase, have it learn your conventions, pull in best practices, and refine everything until it’s basically operating like a super-powered teammate. The real unlock is building a solid set of reusable “skills” plus a few agents for the stuff you do all the time. I agree with this, but I haven't needed to use any advanced features to get good results. I think the simple approach gets you most of the benefits. Broadly, I just have markdown files in the repo written for a human dev audience that the agent can also use. Basically: - README.md with a quick start section for devs, descriptions of all build targets and tests, etc. Normal stuff. - AGENTS.md (only file that's not written for people specifically) that just describes the overall directory structure and has a short step of instructions for the agent: (1) Always read the readme before you start. (2) Always read the relevant design docs before you start. (3) Always run the linter, a build, and tests whenever you make code changes. - docs/*.md that contain design docs, architecture docs, and user stories, just text. It's important to have these resources anyway, agent or no. As with human devs, the better the docs/requirements the better the results.
View on HN · Topics
You intrigue me. > have it learn your conventions, pull in best practices What do you mean by "have it learn your conventions"? Is there a way to somehow automatically extract your conventions and store it within CLAUDE.md? > For example, we have a custom UI library, and Claude Code has a skill that explains exactly how to use it. Same for how we write Storybooks, how we structure APIs, and basically how we want everything done in our repo. So when it generates code, it already matches our patterns and standards out of the box. Did you have to develop these skills yourself? How much work was that? Do you have public examples somewhere?
View on HN · Topics
> What do you mean by "have it learn your conventions"? I'll give you an example: I use ruff to format my python code, which has an opinionated way of formatting certain things. After an initial formatting, Opus 4.5, without prompting, will write code in this same style so that the ruff formatter almost never has anything to do on new commits. Sonnet 4.5 is actually pretty good at this too.
View on HN · Topics
The second part is what I'd also like to have. But I think it should be doable. You can tell it how YOU want the state to be managed and then have it write a custom "linter" that makes the check deterministic. I haven't tried this myself, but claude did create some custom clippy scripts in rust when I wanted to enforce something that isn't automatically enforced by anything out there.
View on HN · Topics
Starting to use Opus 4.5 I'm reduces instrutions in claude.md and just ask claude to look in the codebase to understand the patterns already in use. Going from prompts/docs to instead having code being the "truth". Show don't tell. I've found this patterns has made a huge leap with Opus 4.5.
View on HN · Topics
I feel like I've been doing this since Sonnet 3.5 or Sonnet 4. I'll clone projects/modules/whatever into the working directory and tell claude to check it out. Voila, now it knows your standards and conventions.
View on HN · Topics
"Claude, clone this repo https://github.com/repo , review the coding conventions, check out any markdown or readme files. This is an example of coding conventions we want to use on this project"
View on HN · Topics
Thanks for the example! There's a lot (of boilerplate?) here that I don't understand. Does anyone have good references for catching up to speed what's the purpose of all of these files in the demo?
View on HN · Topics
Agreed and skills are a huge unlock. codex cli even has a skill to create skills; it's super easy to get up to speed with them https://github.com/openai/skills/blob/main/skills/.system/sk...
View on HN · Topics
> we have another Claude Code agent that does a full PR review, following a detailed markdown checklist we’ve written for it. (if you know) how is that compared to coderabbit? i'm seriously looking for something better rn...
View on HN · Topics
Never tried coderabbit, just because this is already good enough with Claude Code. It helped us to catch dozens of important issues we wouldn't have caught. We gave some instructions in the CLAUDE.md doc in the repository - with including a nice personalized roast of the engineer that did the review in the intro and conclusion to make it fun! :) Basically, when you do a "create PR" from your Claude Code, it will help you getting your Linear ticket (or creating one if missing), ask you some important questions (like: what tests have you done?), create the PR on Github, request the reviewers, and post a "Auto Review" message with your credentials. It's not an actual review per se but this is enough for our small team.
View on HN · Topics
Claude Code seems to package a relatively smart prompt as well, as it seems to work better even with one-line prompts than alternatives that just invoke the API. Key word: seems . It's impossible to do a proper qualitative analysis.
View on HN · Topics
Didn't feel like reading all this so I shortened it! sorry! I shortened it for anyone else that might need it ---- Software engineers are sleeping on Claude Code agents. By teaching it your conventions, you can automate your entire workflow: Custom Skills: Generates code matching your UI library and API patterns. Quality Ops: Automates ESLint, doc syncing, and E2E coverage audits. Agentic Reviews: Performs deep PR checks against custom checklists. Smart Triage: Pre-analyzes tickets to give devs a head start. Check out the showcase repo to see these patterns in action.
View on HN · Topics
Yeah I love working with Claude Code, I agree that the new models are amazing, but I spend a decent amount of time saying "wait, why are we writing that from scratch, haven't we written a library for that, or don't we have examples of using a third party library for it?". There is probably some effective way to put this direction into the claude.md, but so far it still seems to do unnecessary reimplementation quite a lot.
View on HN · Topics
With the right prompt the LLM will solve it in the first place. But this is an issue of not knowing what you don't know, so it makes it difficult to write the right prompt. One way around this is to spawn more agents with specific tasks, or to have an agent that is ONLY focused on finding patterns/code where you're reinventing the wheel. I often have one agent/prompt where I build things but then I have another agent/prompt where their only job is to find codesmells, bad patterns, outdated libraries, and make issues or fix these problems.
View on HN · Topics
This will work (if you add more details): "Have an agent investiate issue X in modules Y and Z. The agent should place a report at ./doc/rework-xyz-overview.md with all locations that need refactoring. Once you have the report, have agents refactor 5 classes each in parallel. Each agent writes a terse report in ./doc/rework-xyz/ When they are all done, have another agent check all the work. When that agent reports everything is okay, perform a final check yourself"
View on HN · Topics
And you can automate all this so that it happens every time. I have an `/implement` command that is basically instructed to launch the agents and then do back and forth between them. Then there's a claude code hook that makes sure that all the agents, including the orchestrator and the agents spawned have respected their cycles - it's basically running `claude` with a prompt that tells it to read the plan file and see if the agents have done what they were expected in this cycle - gets executed automatically on each agent end.
View on HN · Topics
You have to think of Opus as a developer whose job at your company lasts somewhere between 30 to 60 minutes before you fire them and hire a new one. Yes, it's absurd but it's a better metaphor than someone with a chronic long term memory deficit since it fits into the project management framework neatly. So this new developer who is starting today is ready to be assigned their first task, they're very eager to get started and once they start they will work very quickly but you have to onboard them. This sounds terrible but they also happen to be extremely fast at reading code and documentation, they know all of the common programming languages and frameworks and they have an excellent memory for the hour that they're employed. What do you do to onboard a new developer like this? You give them a well written description of your project with a clear style guide and some important dos and don'ts, access to any documentation you may have and a clear description of the task they are to accomplish in less than one hour. The tighter you can make those documents, the better. Don't mince words, just get straight to the point and provide examples where possible. The task description should be well scoped with a clear definition of done, if you can provide automated tests that verify when it's complete that's even better. If you don't have tests you can also specify what should be tested and instruct them to write the new tests and run them. For every new developer after the first you need a record of what was already accomplished. Personally, I prefer to use one markdown document per working session whose filename is a date stamp with the session number appended. Instruct them to read the last X log files where X is however many are relevant to the current task. Most of the time X=1 if you did a good job of breaking down the tasks into discrete chunks. You should also have some type of roadmap with milestones, if this file will be larger than 1000 lines then you should break it up so each milestone is its own document and have a table of contents document that gives a simple overview of the total scope. Instruct them to read the relevant milestone. Other good practices are to tell them to write a new log file after they have completed their task and record a summary of what they did and anything they discovered along the way plus any significant decisions they made. Also tell them to commit their work afterwards and Opus will write a very descriptive commit message by default (but you can instruct them to use whatever format you prefer). You basically want them to get everything ready for hand-off to the next 60 minute developer. If they do anything that you don't want them to do again make sure to record that in CLAUDE.md. Same for any other interventions or guidance that you have to provide, put it in that document and Opus will almost always stick to it unless they end up overfilling their context window. I also highly recommend turning off auto-compaction. When the context gets compacted they basically just write a summary of the current context which often removes a lot of the important details. When this happens mid-task you will certainly lose parts of the context that are necessary for completing the task. Anthropic seems to be working hard at making this better but I don't think it's there yet. You might want to experiment with having it on and off and compare the results for yourself. If your sessions are ending up with >80% of the context window used while still doing active development then you should re-scope your tasks to make them smaller. The last 20% is fine for doing menial things like writing the summary, running commands, committing, etc. People have built automated systems around this like Beads but I prefer the hands-on approach since I read through the produced docs to make sure things are going ok and use them as a guide for any changes I need to make mid-project. With this approach I'm 99% sure that Opus 4.5 could handle your refactor without any trouble as long as your classes aren't so enormous that even working on a single one at a time would cause problems with the context window, and if they are then you might be able to handle it by cautioning Opus to not read the whole file and to just try making targeted edits to specific methods. They're usually quite good at finding and extracting just the sections that they need as long as they have some way to know what to look for ahead of time. Hope this helps and happy Clauding!
View on HN · Topics
I had Opus write a whole app for me in 30 seconds the other night. I use a very extensive AGENTS.md to guide AI in how I like my code chiseled. I've been happily running the app without looking at a line of it, but I was discussing the app with someone today, so I popped the code open to see what it looked like. Perfect. 10/10 in every way. I would not have written it that good. It came up with at least one idea I would not have thought of. I'm very lucky that I rarely have to deal with other devs and I'm writing a lot of code from scratch using whatever is the latest version of the frameworks. I understand that gives me a lot of privileges others don't have.
View on HN · Topics
> What bothers me about posts like this is: mid-level engineers are not tasked with atomic, greenfield projects They get those ocassionally all the time though too. Depends on the company. In some software houses it's constant "greenfield projects", one after another. And even in companies with 1-2 pieces of main established software to maintain, there are all kinds of smaller utilities or pipelines needed. > But day to day, when I ask it "build me this feature" it uses strange abstractions, and often requires several attempts on my part to do it in the way I consider "right". In some cases that's legit. In other cases it's just "it did it well, but not how I'd done it", which is often needless stickness to some particular style (often a contention between 2 human programmers too). Basically, what FloorEgg says in this thread: "There are two types of right/wrong ways to build: the context specific right/wrong way to build something and an overly generalized engineer specific right/wrong way to build things." And you can always not just tell it "build me this feature", but tell it (high level way) how to do it, and give it a generic context about such preferences too.
View on HN · Topics
Yeah. Just like another engineer. When you tell another engineer to build you a feature, it's improbable they'll do it they way that you consider "right." This sounds a lot like the old arguments around using compilers vs hand-writing asm. But now you can tell the LLM how you want to implement the changes you want. This will become more and more relevant as we try to maintain the code it generates. But, for right now, another thing Claude's great at is answering questions about the codebase. It'll do the analysis and bring up reports for you. You can use that information to guide the instructions for changes, or just to help you be more productive.
View on HN · Topics
Another thing these posts assume is a single developer keep working on the product with a number of AI agents, not a large team. I think we need to rethink how teams work with AI. Its probably not gonna be a single developer typing a prompt but a team somehow collaborates a prompt or equivalent. XP on steroids? Programming by committee?
View on HN · Topics
I find Opus 4.5 very, very strong at matching the prevailing conventions/idioms/abstractions in a large, established codebase. But I guess I'm quite sensitive to this kind of thing so I explicitly ask Opus 4.5 to read adjacent code which is perhaps why it does it so well. All it takes is a sentence or two, though.
View on HN · Topics
> ask Opus 4.5 to read adjacent code which is perhaps why it does it so well. All it takes is a sentence or two, though. People keep telling me that an LLM is not intelligence, it's simply spitting out statistically relevant tokens. But surely it takes intelligence to understand (and actually execute!) the request to "read adjacent code".
View on HN · Topics
But... you can ask! Ask claude to use encapsulation, or to write the equivalent of interfaces in the language you using, and to map out dependencies and duplicate features, or to maintain a dictionary of component responsibilities. AI coding is a multiplier of writing speed but doesn't excuse planning out and mapping out features. You can have reasonably engineered code if you get models to stick to well designed modules but you need to tell them.
View on HN · Topics
>day to day, when I ask it "build me this feature" it uses strange abstractions, and often requires several attempts on my part to do it in the way I consider "right" Then don't ask it to "build me this feature" instead lay out a software development process with designated human in the loop where you want it and guard rails to keep it on track. Create a code review agent to look for and reject strange abstractions. Tell it what you don't like and it's really good at finding it. I find Opus 4.5, properly prompted, to be significantly better at reviewing code than writing it, but you can just put it in a loop until the code it writes matches the review.
View on HN · Topics
you can definitely just tell it what abstractions you want when adding a feature and do incremental work on existing codebase. but i generally prefer gpt-5.2
View on HN · Topics
I don't see how "two years ago" is incongruous with having been using LLMs for coding, it's exactly the timeline I would expect. Yes, some people do just post "git gud" but there are many people ITT and most of the others on LLM coding articles who are trying to explain their process to anyone who will listen. I'm not sure if it is fully explainable in a single comment though, I'd have to write a multi-part tutorial to cover everything but it's almost entirely just applying the same project management principles that you would in a larger team of developers but customized to the current limitations of LLMs. If you want full tutorials with examples I'm sure they're out there but I'd also just recommend reviewing some project management material and then seeing how you can apply it to a coding agent. You'll only really learn by doing.
View on HN · Topics
I also have >30 years and I've had the same experience. I noticed an immediate improvement with 4.5 and I've been getting great results in general. And yes I do make sure it's not generating crazy architecture. It might do that.. if you let it. So don't let it.
View on HN · Topics
> You have to experience it yourself on your own real problems and over the course of days or weeks. How do you stop it from over-engineering everything?
View on HN · Topics
Not in my experience - you need to build the fact that you don’t want it to do that into your design and specification.
View on HN · Topics
Recent Claude will just look at your code and copy what you've been doing, mostly, in an existing codebase - without being asked. In a new codebase, you can just ask it to "be conscice, keep it simple" or something.
View on HN · Topics
It's very good at following instructions. You can build dedicated agents for different tasks (backend, API design, database design) and make it follow design and coding patterns. It's verbose by default but a few hours of custom instructions and you can make it code just like anyone
View on HN · Topics
I have it propose several approaches, pick and choose from each, and remove what I don't want done. "Use the general structure of A, but use the validation structure of D. Using a view translation layer is too much, just rely on FastAPI/SQLModel's implicit view conversion."