llm/7c7e49f1-870c-4915-9398-3b2e1f116c0c/topic-15-b26b4eca-0c83-4c21-9ec6-a8aed6c4d1ff-input.json
You are a comment summarizer. Given a topic and a list of comments tagged with that topic, write a single paragraph summarizing the key points and perspectives expressed in the comments. TOPIC: Documentation vs community answers COMMENTS: 1. As long as software is properly documented, and documentation is published in LLM-friendly formats, LLMs may be able to answer most of the beyond basic questions even when docs don't explicitly cover a particular scenario. Take an API for searching products, one for getting product details, and then an API for deleting a product. The documentation does not need to cover the detailed scenario of "How to delete a product" where the first step is to search, the second step is to get the details (get the ID), and the third step is to delete. The LLM is capable of answering the question "how to delete the product 'product name'". To some degree, many of the questions on SO were beyond basic, but still possible for a human to answer if only they read documentation. LLMs just happen to be capable of reading A LOT of documentation a LOT faster, and then coming up with an answer A LOT faster. 2. If the LLM is also writing the documentation, because the developers surely don’t want to, I’m not sure how well this will work out. I have some co-workers who have tried to use Copilot for their documentation (because they never write any and I’m constantly asking them questions as a result), and the results were so bad they actually spent the time to write proper documentation. It failed successfully, I suppose. 3. Indeed, how documentation is written is key. But funny enough, I have been a strong advocate that documentation should always be written in Reference Docs style, and optionally with additional Scenario Docs. The former is to be consumed by engineers (and now LLMs), while the later is to be consumed by humans. Scenario Docs, or use case docs, are what millions of blog articles were made of in the early days, then we turned to Stack Overflow questions/answers, then companies started writing documentation in this format too. Lots of Quick Starts for X, Y, and Z scenarios using technology K. Some companies gave away completely on writing reference documentation, which would allow engineers to understand the fundamentals of technology K and then be able to apply to X, Y, and Z. But now with LLMs, we can certainly go back to writing Reference docs only, and let LLMs do the extra work on Scenario based docs. Can they hallucinate still? Sure. But they will likely get most beyond-basic-maybe-not-too-advanced scenarios right in the first shot. As for using LLMs to write docs: engineers should be reviewing that as much as they should be reviewing the code generated by AI. 4. "In this imaginary world where everything is perfect and made to be consumed by LLMs, LLMs are the best tool for the job". 5. world where everything is perfect and made to be consumed by LLMs I believe the parent poster was clearly and specifically talking about software documentation that was strong and LLM consumption-friendly, not "everything" 6. Yeah, old news? It's how it is today with humans. You SHOULD be making things in a human/LLM-readable format nowadays anyway if you're in tech, it'll do you well with AIs resorting to citing what you write, and content aggregators - like search engines - giving it more preferential scores. 7. I think the interesting thing here for those of us who use open source frameworks is that we can ask the LLM to look at the source to find the answer (eg. Pytorch or Phoenix in my case). For closed source libraries I do not know. 8. > SO was by far the leading source of high quality answers to technical questions We will arrive on most answers by talking to an LLM. Many of us have an idea about we want. We relied on SO for some details/quirks/gotchas. Example of a common SO question: how to do x in a library or language or platform? Maybe post on the Github for that lib. Or forums.. there are quirky systems like Salesforce or Workday which have robust forums. Where the forums are still much more effective than LLMs. 9. > What do LLMs train off of now? Perhaps they’ll rely on what was used by people who answered SO questions. So: official docs and maybe source code. Maybe even from experience too, i.e. from human feedback and human written code during agentic coding sessions. > The fact that the LLM doesn't insult you is just the cherry on top. Arguably it does insult even more, just by existing alone. 10. I'm hoping increasing we'll see agents helping with this sort of issue. I would like an agent that would do things like pull the spark repo into the working area and consult the source code/cross reference against what you're trying to do. Once technique I've used successfully is to do this 'manually' to ensure codex/Claude code can grep around the libraries I'm using 11. I dont think this is true. People were skeptical of agi / better than human coding which is not the case. As a matter of fact i think searching docs was one of the first manor uses of llms before code. 12. They still use the official documentation/examples, public Github Repos, and your own code which are all more likely to be evergreen. SO was definitely a massive training advantage before LLMs matured though. 13. I regularly use Claude and friends where I ask it to use the web to look at specific GitHub repos or documentation to ask about current versions of things. The “LLMs just get their info from stack overflow” trope from the GPT-3 days is long dead - they’re pretty good at getting info that is very up to date by using tools to access the web. In some cases I just upload bits and pieces from a library along with my question if it’s particularly obscure or something home grown, and they do quite well with that too. Yes, they do get it wrong sometimes - just like stack overflow did too. 14. The amount of docs that have a “Copy as markdown” or “Copy for AI” button has been noticeably increasing, and really helps the LLM with proper context. 15. Now they can read the documentation and code in the repo directly and answer based on that. 16. SO had answers that you couldn't find in the documentation and were you can't look in the source code. If everything would be well documentated SO wouldn't have being as big as it was in the first place. 17. I felt it became easier with slack. The culture to use slack as documentation tooling can become quite annoying. People just @here/@channel without hesitation and producers just also don't do actual documentation. They only respond to slack queries, which works in the moment, but terrible for future team members to even know what questions to search/ask for. 18. > The various admonitions to publish to a personal blog, while encouraging, don't really get at the 0xfaded's request They also completely missed the fact that 0xfaded did write a blog post and it’s linked in the second sentence of the SO post. > There is a relatively simple numerical method with better convergence than Newtons Method. I have a blog post about why it works http://wet-robots.ghost.io/simple-method-for-distance-to-ell... 19. > Clearly we need something in between the fauxpen-access of journals and the wilde west of the blogosphere, probably. I think GP's min-distance solution would work well as an arxiv paper that is never submitted for publication. A curated list of never-published papers, with comments by users, makes sense in this context. Not sure that arxiv itself is a good place, but something close to it in design, with user comments and response-papers could be workable. Something like RFC, but with rich content (not plain-text) and focused on things like GP published (code techniques, tricks, etc). Could even call it "circulars on computer programming" or "circulars on software engineering", etc. PS. I ran an experiment some time back, putting something on arxiv instead of github, and had to field a few comments about "this is not novel enough to be a paper" and my responses were "this is not a publishable paper, and I don't intend to submit it anywhere". IOW, this is not a new or unique problem. (See the thread here - https://news.ycombinator.com/item?id=44290315 ) 20. I think most relevant data that provides best answers lives in GitHub. Sometimes in code, sometimes in issues or discussions. Many libs have their docs there as well. But the information is scattered and not easy to find, and often you need multiple sources to come up with a solution to some problem. 21. The second answer cites Lippert's pre-existing blog post on the subject: https://ericlippert.com/2009/11/12/closing-over-the-loop-var... I agree that there will be some degradation here, but I also think that the developers inclined to do this kind of outreach will still find ways to do it. 22. I don't disagree completely by any means, it's an interesting point, but in your SO answer you already point to your blog post explaining it in more detail, so isn't that the answer, you'd just blog about it and not bother with SO? Then AI finding it (as opposed to already trained well enough on it, I suppose) will still point to it as did your SO answer. 23. Yes, LLMs are great at answering questions, but providing reasonable answers is another matter. Can you really not think of anything that hasn't already been asked and isn't in any documentation anywhere? I can only assume you haven't been doing this very long. Fairly recently I was confronted with a Postgres problem, LLMs had no idea, it wasn't in the manual, it needed someone with years of experience. I took them IRC and someone actually helped me figure it out. Until "AI" gets to the point it has run software for years and gained experience, or it can figure out everything just by reading the source code of something like Postgres, it won't be useful for stuff that hasn't been asked before. 24. Ideally, you'd train them on the core documentation of the language or tool itself. Hopefully, LLMs lead to more thorough documentation at the start of a new language, framework, or tool. Perhaps to the point of the documentation being specifically tailored to read well for the LLM that will parse and internalize it. Most of what StackOverflow was was just a regurgitation of knowledge that people could acquire from documentation or research papers. It obviously became easier to ask on SO than dig through documentation. LLMs (in theory) should be able to do that digging for you at lightning speed. What ended up happening was people would turn to the internet and Stack Overflow to get a quick answer and string those answers together to develop a solution, never reading or internalizing documentation. I was definitely guilty of this many times. I think in the long run it's probably good that Stack Overflow dies. 25. SO has been a curse on technology. I've met teams of people who decide whether to adopt some technology based solely on if they can find SO answers for it. They refuse to read documentation or learn how the technology works; they'll only google for SO answers, and if the answer's not there, they give up. There's an entire generation like this now. 26. I am surprised at the amount of hate for Stack Overflow here. As a developer I can't think of a single website that has helped me as much over the last ten years. It has had a huge benefit for the development community, and I for one will mourn its loss. I do wonder where answers will come from in the future. As others have noted in this thread, documentation is often missing, or incorrect. SO collected the experiences of actual users solving real problems. Will AI share experiences in a similar way? In principle it could, and in practice I think it will need to. The shared knowledge of SO made all developers more productive. In an AI coded future there will need to be a way for new knowledge to be shared. 27. RTFM 28. TFMs are not a thing anymore. Most of them are merely collections of sparse random dots one might join by sheer luck only, granted no other knowledge of the system being attempted to document. 29. I don't know what you are building, but if a thing doesn't have comprehensive docs it doesn't make it into my stack. 30. They would just use documentation. I know there is some synthesis they would lose in the training process but I’m often sending Claude through the context7 MCP to learn documentation for packages that didn’t exist, and it nearly always solves the problem for me. 31. The brilliance of StackOverflow was in being the place to find out how to do tricky workarounds for functionality that either wasn't documented or was buggy such that workarounds were needed to make it actually work. Software quality is now generally a bit better than it was in 2010, but that need is ultimately still there. 32. Aren't a lot of projects using LLMs to generate documentation these days? Write a concise, engaging paragraph (3-5 sentences) that captures the main ideas, notable perspectives, and overall sentiment of these comments regarding the topic. Focus on the most interesting and representative viewpoints. Do not use bullet points or lists - write flowing prose.
Documentation vs community answers
32