Greenfield vs Legacy Projects

Observations that AI coding excels at new projects under 10,000 lines of code but struggles maintaining consistency and avoiding regressions in larger, established codebases

While AI tools are hailed as a "10x" productivity multiplier for greenfield development, allowing developers to rapidly execute once-unjustifiable small projects, many users report hitting a "wall" around the 10,000-line mark where models begin to introduce regressions and convoluted logic. Some argue that this threshold is not absolute, suggesting that modular architecture and constrained scoping can allow AI to remain effective even in massive codebases exceeding 100,000 lines. However, skeptics maintain that "vibe coding" from scratch often leads to a messy "babysitting" phase where the AI fails to grasp intertwined dependencies or the nuanced requirements of legacy hardware. Ultimately, the consensus suggests that while AI excels at documentation and initial scaffolding, it still requires significant human oversight to maintain the structural integrity and long-term stability demanded by enterprise-level software.

View on HN · Topics

Sounds like you only tried it on small projects.

View on HN · Topics

At work I use it on giant projects, but it’s less impressive there’s

My mold project is around 10k lines of code, still small.

But I don’t actually care about whether LLMs are good or bad or whatever. All I care is that I am am completing things that I wasn’t able to even start before. Doesn’t really matter to me if that doesn’t count for some reason.

View on HN · Topics

That’s where it really shines. I have a backlog of small projects (-1-2kLOC type state machines , sensors, loggers) and instead of spending 2-3 days I can usually knock them out in half a day. So they get done. On these projects, it is an infinity improvement because I simply wouldn’t have done them, unable to justify the cost.

But on bigger stuff, it bogs down and sometimes I feel like I’m going nowhere. But it gets done eventually, and I have better structured, better documented code. Not because it would be better structured and documented if I left it to its ow devices, but rather it is the best way to get performance out of LLM assistance in code.

The difference now is twofold: First, things like documentation are now -effortless-. Second, the good advice you learned about meticulously writing maintainable code no longer slows you down, now it speeds you up.

View on HN · Topics

I like it because it lets me shoot off a text about making a plot I think about on the bus connecting some random data together. It’s nice having Claude code essentially anywhere. I do think that this is a nice big increment because of that. But also it suffers the large code base problems everyone else complains about. Tbh I think if its context window was ten times bigger this would be less of an issue. Usually compacting seems to be when it starts losing the thread and I have to redirect it.

View on HN · Topics

In my opinion, it has always been the “easy” part of development to make a thing work once. The hard thing is to make a thousand things work together over time with constantly changing requirements, budgets, teams, and org structures.

For the former, greenfield projects, LLMs are easily a 10x productivity improvement. For the latter, it gets a lot more nuanced. Still amazingly useful in my opinion, just not the hands off experience that building from scratch can be now.

View on HN · Topics

From what I get out of this is that these models are trained on basic coding and not enterprise level where you have thousands and thousands of project files all intertwined and linked with dependencies. It didn’t have access to all of that.

View on HN · Topics

I think LLMs have a hard time with large code bases (obviously so do devs).

A giant monorepo would be a bad fit for an LLM IMO.

View on HN · Topics

With agentic search, they actually do pretty well with monorepos.

View on HN · Topics

I think the main thing is, these are all green fields projects. (Note original author talking about executing ideas for projects.)

View on HN · Topics

> it's incredibly obvious that while these tools are surprisingly good at doing repetitive or locally-scoped tasks, they immediately fall apart when faced with the types of things that are actually difficult in software development and require non-trivial amounts of guidance and hand-holding to get things right

I used this line for a long time, but you could just as easily say the same thing for a typical engineer. It basically boils down to "Claude likes its tickets to be well thought out". I'm sure there is some size of project where its ability to navigate the codebase starts to break down, but I've fed it sizeable ones and so long as the scope is constrained it generally just works nowadays

View on HN · Topics

A lot of more senior coders when they actively try vibe coding a greenfield project find that it does actually work. But only for the first ~10kloc. After that the AI, no matter how well you try to prompt it, will start to destroy existing features accidentally, will add unnecessary convoluted logic to the code, will leave benhind dead code, add random traces "for backwards compatibility", will avoid doing the correct thing as "it is too big of a refactor", doesn't understand that the dev database is not the prod database and avoids migrations. And so forth.

I've got 10+ years of coding experience, I am an AI advocate, but not vibe coding. AI is a great tool to help with the boring bits, using it to initialize files, help figure out various approaches, as a first pass code reviewer, helping with configuring, those things all work well.

But full-on replacing coders? It's not there yet. Will require an order of magnitude more improvement.

View on HN · Topics

> only for the first ~10kloc. After that the AI, no matter how well you try to prompt it, will start to destroy existing features accidentally

I am using them in projects with >100kloc, this is not my experience.

at the moment, I am babysitting for any kloc, but I am sure they will get better and better.

View on HN · Topics

It's fine at adding features on a non-vibecoded 100kloc codebase that you somewhat understand. It's when you're vibecoding from scratch that things tend to spin out at a certain point.

I am sure there are ways to get around this sort of wall, but I do think it's currently a thing.

View on HN · Topics

I’m using it in a >200kloc codebase successfully, too. I think a key is to work in a properly modular codebase so it can focus on the correct changes and ignore unrelated stuff.

That said, I do catch it doing some of the stuff the OP mentioned— particularly leaving “backwards compatibility” stuff in place. But really, all of the stuff he mentions, I’ve experienced if I’ve given it an overly broad mandate.

View on HN · Topics

Where are you getting the 10kloc threshold from? Nice round number...

Surely it depends on the design. If you have 10 10kloc modular modules with good abstractions, and then a 10k shell gluing them together, you could build much bigger things, no?

View on HN · Topics

I wonder if you can up the 10kloc if you have a good static analysis of your tool (I vibecoded one in Python) and good tests. Sometimes good tests aren't possible since there are too many different cases but with other forms of codes you can cover all the cases with like 50 to 100 tests or so

View on HN · Topics

I've been experimenting with getting Cursor/ChatGPT to take an old legacy project ( https://github.com/skullspace/Net-Symon-Netbrite ) which is not terribly complex, but interacts with hardware with some very specific instructions and converting that into a python version.
I've tried a few different versions/forks of the code (and other code to resurrect these signs) and each time it just absolutely cannot manage it. Which is quite frustrating and so instead the best thing I've been able to do is get it to comment each line of the code and explain what it is doing so I can manually implement it.

View on HN · Topics

This euphoria quickly turns into disappointment once you finish scaffolding and actually start the development/refinement phase and claude/codex starts shitting all over the code and you have to babysit it 100% of the time.

View on HN · Topics

You have to be joking. I tried Codex for several hours and it has to be one of the worst models I’ve seen. It was extremely fast at spitting out the worst broken code possible. Claude is fine, but what they said is completely correct. At a certain point, no matter what model you use, llms cannot write good working code. This usually occurs after they’ve written thousands of lines of relatively decent code. Then the project gets large enough that if they touch one thing they break ten others.

Summarizer