LLMs as Junior Developers

Analogy comparing LLMs to unreliable interns with boundless energy. Discussion of treating AI like junior developers requiring supervision, documentation, and oversight. The shift from coder to software manager role.

While many developers have embraced the "unreliable intern" analogy by shifting their roles from hands-on coders to software managers who utilize strict "plan-first" protocols, others find this constant supervision to be a tedious and "boring" form of babysitting. Proponents argue that rigorous documentation and grounding techniques transform the AI's erratic output into a high-velocity workflow, yet skeptics maintain that reviewing AI-generated "slop" can be more exhausting than writing code from scratch. This evolution has sparked a deep philosophical divide: some celebrate a future of high-level orchestration, while others fear the loss of the "senior insight" and the creative "epicness" that defines traditional engineering. Ultimately, the consensus suggests that while LLMs can uncork immense levels of output, they still lack the ability to learn from mistakes or handle complex architecture without heavy human intervention.

View on HN · Topics

The author seems to think they've hit upon something revolutionary...

They've actually hit upon something that several of us have evolved to naturally.

LLM's are like unreliable interns with boundless energy. They make silly mistakes, wander into annoying structural traps, and have to be unwound if left to their own devices. It's like the genie that almost pathologically misinterprets your wishes.

So, how do you solve that? Exactly how an experienced lead or software manager does: you have systems write it down before executing, explain things back to you, and ground all of their thinking in the code and documentation, avoiding making assumptions about code after superficial review.

When it was early ChatGPT, this meant function-level thinking and clearly described jobs. When it was Cline it meant cline rules files that forced writing architecture.md files and vibe-code.log histories, demanding grounding in research and code reading.

Maybe nine months ago, another engineer said two things to me, less than a day apart:

- "I don't understand why your clinerules file is so large. You have the LLM jumping through so many hoops and doing so much extra work. It's crazy."

- The next morning: "It's basically like a lottery. I can't get the LLM to generate what I want reliably. I just have to settle for whatever it comes up with and then try again."

These systems have to deal with minimal context, ambiguous guidance, and extreme isolation. Operate with a little empathy for the energetic interns, and they'll uncork levels of output worth fighting for. We're Software Managers now. For some of us, that's working out great.

View on HN · Topics

I’ve also found that a bigger focus on expanding my agents.md as the project rolls on has led to less headaches overall and more consistency (non-surprisingly). It’s the same as asking juniors to reflect on the work they’ve completed and to document important things that can help them in the future. Software Manger is a good way to put this.

View on HN · Topics

It makes an endless stream of assumptions. Some of them brilliant and even instructive to a degree, but most of them are unfounded and inappropriate in my experience.

View on HN · Topics

I really like your analogy of LLMs as 'unreliable interns'. The shift from being a 'coder' to a 'software manager' who enforces documentation and grounding is the only way to scale these tools. Without an architecture.md or similar grounding, the context drift eventually makes the AI-generated code a liability rather than an asset. It's about moving the complexity from the syntax to the specification.

View on HN · Topics

> LLM's are like unreliable interns with boundless energy

This isn’t directed specifically at you but the general community of SWEs: we need to stop anthropomorphizing a tool. Code agents are not human capable and scaling pattern matching will never hit that goal. That’s all hype and this is coming from someone who runs the range of daily CC usage. I’m using CC to its fullest capability while also being a good shepherd for my prod codebases.

Pretending code agents are human capable is fueling this koolaide drinking hype craze.

View on HN · Topics

If you have a big rules file you’re in the right direction but still not there. Just as with humans, the key is that your architecture should make it very difficult to break the rules by accident and still be able to compile/run with correct exit status.

My architecture is so beautifully strong that even LLMs and human juniors can’t box their way out of it.

View on HN · Topics

It's also great to describe the full use case flow in the instructions, so you can clearly understand that LLM won't do some stupid thing on its own

View on HN · Topics

LLMs are really eager to start coding (as interns are eager to start working), so the sentence “don’t implement yet” has to be used very often at the beginning of any project.

View on HN · Topics

> but in doing what I don't know as well.

Comments like these really help ground what I read online about LLMs. This matches how low performing devs at my work use AI, and their PRs are a net negative on the team. They take on tasks they aren’t equipped to handle and use LLMs to fill the gaps quickly instead of taking time to learn (which LLMs speed up!).

View on HN · Topics

Of course I can't be certain, but I think the "mixture of experts" design plays into it too. Metaphorically, there's a mid-level manager who looks at your prompt and tries to decide which experts it should be sent to. If he thinks you won't notice, he saves money by sending it to the undergraduate intern.

Just a theory.

View on HN · Topics

But we can predict the outcomes, though. That's what we're saying, and it's true. Maybe not 100% of the time, but maybe it helps a significant amount of the time and that's what matters.

Is it engineering? Maybe not. But neither is knowing how to talk to junior developers so they're productive and don't feel bad. The engineering is at other levels.

View on HN · Topics

If I say “you are our domain expert for X, plan this task out in great detail” to a human engineer when delegating a task, 9 times out of 10 they will do a more thorough job. It’s not that this is voodoo that unlocks some secret part of their brain. It simply establishes my expectations and they act accordingly.

To the extent that LLMs mimic human behaviour, it shouldn’t be a surprise that setting clear expectations works there too.

View on HN · Topics

Is parenting making us better at prompt engineering, or is it the other way around?

View on HN · Topics

Better yet, I have Codex, Gemini, and Claude as my kids, running around in my code playground. How do I be a good parent and not play favorites?

View on HN · Topics

> The output is more or less what I'd be writing as a principal engineer.

I certainly hope this is not true, because then you're not competent for that role. Claude Code writes an absolutely incredible amount of unecessary and superfluous comments, it's makes asinine mistakes like forgetting to update logic in multiple places. It'll gladly drop the entire database when changing column formats, just as an example.

View on HN · Topics

Trust me I'm very impressed at the progress AI has made, and maybe we'll get to the point where everything is 100% correct all the time and better than any human could write. I'm skeptical we can get there with the LLM approach though.

The problem is LLMs are great at simple implementation, even large amounts of simple implementation, but I've never seen it develop something more than trivial correctly. The larger problem is it's very often subtly but hugely wrong. It makes bad architecture decisions, it breaks things in pursuit of fixing or implementing other things. You can tell it has no concept of the "right" way to implement something. It very obviously lacks the "senior developer insight".

Maybe you can resolve some of these with large amounts of planning or specs, but that's the point of my original comment - at what point is it easier/faster/better to just write the code yourself? You don't get a prize for writing the least amount of code when you're just writing specs instead.

View on HN · Topics

> But the aha moment for me was what’s maintainable by AI vs by me by hand are on different realms

I don't find that LLMs are any more likely than humans to remember to update all of the places it wrote redundant functions. Generally far less likely, actually. So forgive me for treating this claim with a massive grain of salt.

View on HN · Topics

Well it's less mental load. It's like Tesla's FSD. Am I a better driver than the FSD? For sure. But is it nice to just sit back and let it drive for a bit even if it's suboptimal and gets me there 10% slower, and maybe slightly pisses off the guy behind me? Yes, nice enough to shell out $99/mo. Code implementation takes a toll on you in the same way that driving does.

I think the method in TFA is overall less stressful for the dev. And you can always fix it up manually in the end; AI coding vs manual coding is not either-or.

View on HN · Topics

“The workflow I’m going to describe has one core principle: never let Claude write code until you’ve reviewed and approved a written plan.”

I’m not sure we need to be this black and white about things. Speaking from the perspective of leading a dev team, I regularly have Claude Code take a chance at code without reviewing a plan. For example, small issues that I’ve written clear details about, Claude can go to town on those. I’ve never been on a team that didn’t have too many of these types of issues to address.

And, a team should have othee guards in place that validates that code before it gets merged somewhere important.

I don’t have to review every single decision one of my teammates is going to make, even those less experienced teammates, but I do prepare teammates with the proper tools (specs, documentation, etc) so they can make a best effort first attempt. This is how I treat Claude Code in a lot of scenarios.

View on HN · Topics

I try these staging-document patterns, but suspect they have 2 fundamental flaws that stem mostly from our own biases.

First, Claude evolves. The original post work pattern evolved over 9 months, before claude's recent step changes. It's likely claude's present plan mode is better than this workaround, but if you stick to the workaround, you'd never know.

Second, the staging docs that represent some context - whether a library skills or current session design and implementation plans - are not the model Claude works with. At best they are shaping it, but I've found it does ignore and forget even what's written (even when I shout with emphasis), and the overall session influences the code. (Most often this happens when a peripheral adjustment ends up populating half the context.)

Indeed the biggest benefit from the OP might be to squeeze within 1 session, omitting peripheral features and investigations at the plan stage. So the mechanism of action might be the combination of getting our own plan clear and avoiding confusing excursions. (A test for that would be to redo the session with the final plan and implementation, to see if the iteration process itself is shaping the model.)

Our bias is to believe that we're getting better at managing this thing, and that we can control and direct it. It's uncomfortable to realize you can only really influence it - much like giving direction to a junior, but they can still go off track. And even if you found a pattern that works, it might work for reasons you're not understanding -- and thus fail you eventually. So, yes, try some patterns, but always hang on to the newbie senses of wonder and terror that make you curious, alert, and experimental.

View on HN · Topics

What I've read is that even with all the meticulous planning, the author still needed to intervene. Not at the end but at the middle, unless it will continue building out something wrong and its even harder to fix once it's done. It'll cost even more tokens. It's a net negative.

You might say a junior might do the same thing, but I'm not worried about it, at least the junior learned something while doing that. They could do it better next time. They know the code and change it from the middle where it broke. It's a net positive.

View on HN · Topics

This is the flow I've found myself working towards. Essentially maintaining more and more layered documentation for the LLM produces better and more consistent results. What is great here is the emphasis on the use of such documents in the planning phase. I'm feeling much more motivated to write solid documentation recently, because I know someone (the LLM) is actually going to read it! I've noticed my efforts and skill acquisition have moved sharply from app developer towards DevOps and architecture / management, but I think I'll always be grateful for the application engineering experience that I think the next wave of devs might miss out on.

I've also noted such a huge gulf between some developers describing 'prompting things into existence' and the approach described in this article. Both types seem to report success, though my experience is that the latter seems more realistic, and much more likely to produce robust code that's likely to be maintainable for long term or project critical goals.

View on HN · Topics

> it's how good senior engineers already work

The other trick all good ones I’ve worked with converged on: it’s quicker to write code than review it (if we’re being thorough). Agents have some areas where they can really shine (boilerplate you should maybe have automated already being one), but most of their speed comes from passing the quality checking to your users or coworkers.

Juniors and other humans are valuable because eventually I trust them enough to not review their work. I don’t know if LLMs can ever get here for serious industries.

View on HN · Topics

> it's how good senior engineers already work

The other trick all good ones I’ve worked with converged on: it’s quicker to write code than review it (if we’re being thorough). Agents have some areas where they can really shine (boilerplate you should maybe have automated already being one), but most of their speed comes from passing the quality checking to your users or coworkers.

Juniors and other humans are valuable because eventually I trust them enough to not review their work. I don’t know if LLMs can ever get here for serious industries.

View on HN · Topics

I don't deny that AI has use cases, but boy - the workflow described is boring:

"Most developers type a prompt, sometimes use plan mode, fix the errors, repeat. "

Does anyone think this is as epic as, say, watch the Unix archives https://www.youtube.com/watch?v=tc4ROCJYbm0 where Brian demos how pipes work; or Dennis working on C and UNIX? Or even before those, the older machines?

I am not at all saying that AI tools are all useless, but there is no real epicness. It is just autogenerated AI slop and blob. I don't really call this engineering (although I also do agree, that it is engineering still; I just don't like using the same word here).

> never let Claude write code until you’ve reviewed and approved a written plan.

So the junior-dev analogy is quite apt here.

I tried to read the rest of the article, but I just got angrier. I never had that feeling watching oldschool legends, though perhaps some of their work may be boring, but this AI-generated code ... that's just some mythical random-guessing work. And none of that is "intelligent", even if it may appear to work, may work to some extent too. This is a simulation of intelligence. If it works very well, why would any software engineer still be required? Supervising would only be necessary if AI produces slop.

View on HN · Topics

It's the same as dealing with a human. You convey a spec for a problem and the language you use matters. You can convey the problem in (from your perspective) a clear way and you will get mixed results nonetheless. You will have to continue to refine the solution with them.

Genuinely: no one really knows how humans work either.

View on HN · Topics

falling asleep here. when will the babysitting end

Summarizer