llm/065c6e83-d0d5-4aca-be3d-92768a8a3506/topic-14-2f02f7e3-4a91-49f7-895e-22e22c8411fa-input.json
The following is content for you to summarize. Do not respond to the comments—summarize them. <topic> Reference Implementation Technique # Using existing code from open source projects as examples for Claude. Questions about licensing implications. Claims this dramatically improves output quality. </topic> <comments_about_topic> 1. > If it did work, well, the oldest trick in computer science is writing compilers, i suppose we will just have to write an English to pedantry compiler. "Add tests to this function" for GPT-3.5-era models was much less effective than "you are a senior engineer. add tests for this function. as a good engineer, you should follow the patterns used in these other three function+test examples, using this framework and mocking lib." In today's tools, "add tests to this function" results in a bunch of initial steps to look in common places to see if that additional context already exists, and then pull it in based on what it finds. You can see it in the output the tools spit out while "thinking." So I'm 90% sure this is already happening on some level. 2. Quoting the article: > One trick I use constantly: for well-contained features where I’ve seen a good implementation in an open source repo, I’ll share that code as a reference alongside the plan request. If I want to add sortable IDs, I paste the ID generation code from a project that does it well and say “this is how they do sortable IDs, write a plan.md explaining how we can adopt a similar approach.” Claude works dramatically better when it has a concrete reference implementation to work from rather than designing from scratch. Licensing apparently means nothing. Ripped off in the training data, ripped off in the prompt. 3. Concepts are not copyrightable. 4. The article isn’t describing someone who learned the concept of sortable IDs and then wrote their own implementation. It describes copying and pasting actual code from one project into a prompt so a language model can reproduce it in another project. It’s a mechanical transformation of someone else’s copyrighted expression (their code) laundered through a statistical model instead of a human copyist. 5. “Mechanical” is doing some heavy lifting here. If a human does the same, reimplement the code in their own style for their particular context, it doesn’t violate copyright. Having the LLM see the original code doesn’t automatically make its output a plagiarism. 6. My workflow is a bit different. * I ask the LLM for it's understanding of a topic or an existing feature in code. It's not really planning, it's more like understanding the model first * Then based on its understanding, I can decide how great or small to scope something for the LLM * An LLM showing good understand can deal with a big task fairly well. * An LLM showing bad understanding still needs to be prompted to get it right * What helps a lot is reference implementations. Either I have existing code that serves as the reference or I ask for a reference and I review. A few folks do it at my work do it OPs way, but my arguments for not doing it this way * Nobody is measuring the amount of slop within the plan. We only judge the implementation at the end * it's still non deterministic - folks will have different experiences using OPs methods. If claude updates its model, it outdates OPs suggestions by either making it better or worse. We don't evaluate when things get better, we only focus on things not gone well. * it's very token heavy - LLM providers insist that you use many tokens to get the task done. It's in their best interest to get you to do this. For me, LLMs should be powerful enough to understand context with minimal tokens because of the investment into model training. Both ways gets the task done and it just comes down to my preference for now. For me, I treat the LLM as model training + post processing + input tokens = output tokens. I don't think this is the best way to do non deterministic based software development. For me, we're still trying to shoehorn "old" deterministic programming into a non deterministic LLM. </comments_about_topic> Write a concise, engaging paragraph (3-5 sentences) summarizing the key points and perspectives in these comments about the topic. Focus on the most interesting viewpoints. Do not use bullet points—write flowing prose.
Reference Implementation Technique # Using existing code from open source projects as examples for Claude. Questions about licensing implications. Claims this dramatically improves output quality.
6