llm/122b8d72-a8a3-4fcf-8eca-6a52786d1a8b/topic-3-82eba11f-2472-487b-88d1-d4d726492384-input.json
The following is content for you to summarize. Do not respond to the comments—summarize them. <topic> Greenfield vs Legacy Projects # Observations that AI coding excels at new projects under 10,000 lines of code but struggles maintaining consistency and avoiding regressions in larger, established codebases </topic> <comments_about_topic> 1. Sounds like you only tried it on small projects. 2. At work I use it on giant projects, but it’s less impressive there’s My mold project is around 10k lines of code, still small. But I don’t actually care about whether LLMs are good or bad or whatever. All I care is that I am am completing things that I wasn’t able to even start before. Doesn’t really matter to me if that doesn’t count for some reason. 3. That’s where it really shines. I have a backlog of small projects (-1-2kLOC type state machines , sensors, loggers) and instead of spending 2-3 days I can usually knock them out in half a day. So they get done. On these projects, it is an infinity improvement because I simply wouldn’t have done them, unable to justify the cost. But on bigger stuff, it bogs down and sometimes I feel like I’m going nowhere. But it gets done eventually, and I have better structured, better documented code. Not because it would be better structured and documented if I left it to its ow devices, but rather it is the best way to get performance out of LLM assistance in code. The difference now is twofold: First, things like documentation are now -effortless-. Second, the good advice you learned about meticulously writing maintainable code no longer slows you down, now it speeds you up. 4. I like it because it lets me shoot off a text about making a plot I think about on the bus connecting some random data together. It’s nice having Claude code essentially anywhere. I do think that this is a nice big increment because of that. But also it suffers the large code base problems everyone else complains about. Tbh I think if its context window was ten times bigger this would be less of an issue. Usually compacting seems to be when it starts losing the thread and I have to redirect it. 5. In my opinion, it has always been the “easy” part of development to make a thing work once. The hard thing is to make a thousand things work together over time with constantly changing requirements, budgets, teams, and org structures. For the former, greenfield projects, LLMs are easily a 10x productivity improvement. For the latter, it gets a lot more nuanced. Still amazingly useful in my opinion, just not the hands off experience that building from scratch can be now. 6. From what I get out of this is that these models are trained on basic coding and not enterprise level where you have thousands and thousands of project files all intertwined and linked with dependencies. It didn’t have access to all of that. 7. I think LLMs have a hard time with large code bases (obviously so do devs). A giant monorepo would be a bad fit for an LLM IMO. 8. With agentic search, they actually do pretty well with monorepos. 9. I think the main thing is, these are all green fields projects. (Note original author talking about executing ideas for projects.) 10. > it's incredibly obvious that while these tools are surprisingly good at doing repetitive or locally-scoped tasks, they immediately fall apart when faced with the types of things that are actually difficult in software development and require non-trivial amounts of guidance and hand-holding to get things right I used this line for a long time, but you could just as easily say the same thing for a typical engineer. It basically boils down to "Claude likes its tickets to be well thought out". I'm sure there is some size of project where its ability to navigate the codebase starts to break down, but I've fed it sizeable ones and so long as the scope is constrained it generally just works nowadays 11. A lot of more senior coders when they actively try vibe coding a greenfield project find that it does actually work. But only for the first ~10kloc. After that the AI, no matter how well you try to prompt it, will start to destroy existing features accidentally, will add unnecessary convoluted logic to the code, will leave benhind dead code, add random traces "for backwards compatibility", will avoid doing the correct thing as "it is too big of a refactor", doesn't understand that the dev database is not the prod database and avoids migrations. And so forth. I've got 10+ years of coding experience, I am an AI advocate, but not vibe coding. AI is a great tool to help with the boring bits, using it to initialize files, help figure out various approaches, as a first pass code reviewer, helping with configuring, those things all work well. But full-on replacing coders? It's not there yet. Will require an order of magnitude more improvement. 12. > only for the first ~10kloc. After that the AI, no matter how well you try to prompt it, will start to destroy existing features accidentally I am using them in projects with >100kloc, this is not my experience. at the moment, I am babysitting for any kloc, but I am sure they will get better and better. 13. It's fine at adding features on a non-vibecoded 100kloc codebase that you somewhat understand. It's when you're vibecoding from scratch that things tend to spin out at a certain point. I am sure there are ways to get around this sort of wall, but I do think it's currently a thing. 14. I’m using it in a >200kloc codebase successfully, too. I think a key is to work in a properly modular codebase so it can focus on the correct changes and ignore unrelated stuff. That said, I do catch it doing some of the stuff the OP mentioned— particularly leaving “backwards compatibility” stuff in place. But really, all of the stuff he mentions, I’ve experienced if I’ve given it an overly broad mandate. 15. Where are you getting the 10kloc threshold from? Nice round number... Surely it depends on the design. If you have 10 10kloc modular modules with good abstractions, and then a 10k shell gluing them together, you could build much bigger things, no? 16. I wonder if you can up the 10kloc if you have a good static analysis of your tool (I vibecoded one in Python) and good tests. Sometimes good tests aren't possible since there are too many different cases but with other forms of codes you can cover all the cases with like 50 to 100 tests or so 17. I've been experimenting with getting Cursor/ChatGPT to take an old legacy project ( https://github.com/skullspace/Net-Symon-Netbrite ) which is not terribly complex, but interacts with hardware with some very specific instructions and converting that into a python version. I've tried a few different versions/forks of the code (and other code to resurrect these signs) and each time it just absolutely cannot manage it. Which is quite frustrating and so instead the best thing I've been able to do is get it to comment each line of the code and explain what it is doing so I can manually implement it. 18. This euphoria quickly turns into disappointment once you finish scaffolding and actually start the development/refinement phase and claude/codex starts shitting all over the code and you have to babysit it 100% of the time. 19. You have to be joking. I tried Codex for several hours and it has to be one of the worst models I’ve seen. It was extremely fast at spitting out the worst broken code possible. Claude is fine, but what they said is completely correct. At a certain point, no matter what model you use, llms cannot write good working code. This usually occurs after they’ve written thousands of lines of relatively decent code. Then the project gets large enough that if they touch one thing they break ten others. </comments_about_topic> Write a concise, engaging paragraph (3-5 sentences) summarizing the key points and perspectives in these comments about the topic. Focus on the most interesting viewpoints. Do not use bullet points—write flowing prose.
Greenfield vs Legacy Projects # Observations that AI coding excels at new projects under 10,000 lines of code but struggles maintaining consistency and avoiding regressions in larger, established codebases
19