Summarizer

LLM Input

llm/5daab79e-f20f-476c-ab87-82c7ff678250/batch-3-ad0cd826-97f3-4b34-91ed-24866ce2419c-input.json
Pretty-print
prompt

You are a comment classifier. Given a list of topics and a batch of comments, assign each comment to up to 3 of the most relevant topics.

TOPICS (use these 1-based indices):
1. Toxic moderation culture
2. LLMs replacing Stack Overflow
3. Duplicate question closures
4. Community hostility toward newcomers
5. Question quality standards
6. Knowledge base vs help forum debate
7. Future of LLM training data
8. Reddit and Discord as alternatives
9. Gamification and reputation systems
10. Outdated answers problem
11. SO sale to private equity
12. Google search integration decline
13. Expert knowledge preservation
14. GitHub Discussions adoption
15. Elitist gatekeeping behavior
16. Human interaction loss
17. Question saturation theory
18. Moderator power dynamics
19. AI-generated content concerns
20. Community decline timeline

COMMENTS TO CLASSIFY:
[
  
{
  "id": "46485347",
  "text": "If the LLM is also writing the documentation, because the developers surely don’t want to, I’m not sure how well this will work out.\n\nI have some co-workers who have tried to use Copilot for their documentation (because they never write any and I’m constantly asking them questions as a result), and the results were so bad they actually spent the time to write proper documentation. It failed successfully, I suppose."
}
,
  
{
  "id": "46494736",
  "text": "Indeed, how documentation is written is key. But funny enough, I have been a strong advocate that documentation should always be written in Reference Docs style, and optionally with additional Scenario Docs.\n\nThe former is to be consumed by engineers (and now LLMs), while the later is to be consumed by humans.\n\nScenario Docs, or use case docs, are what millions of blog articles were made of in the early days, then we turned to Stack Overflow questions/answers, then companies started writing documentation in this format too. Lots of Quick Starts for X, Y, and Z scenarios using technology K. Some companies gave away completely on writing reference documentation, which would allow engineers to understand the fundamentals of technology K and then be able to apply to X, Y, and Z.\n\nBut now with LLMs, we can certainly go back to writing Reference docs only, and let LLMs do the extra work on Scenario based docs. Can they hallucinate still? Sure. But they will likely get most beyond-basic-maybe"
}
,
  
{
  "id": "46484701",
  "text": "\"In this imaginary world where everything is perfect and made to be consumed by LLMs, LLMs are the best tool for the job\"."
}
,
  
{
  "id": "46485334",
  "text": "world where everything is perfect and made to be consumed by LLMs\n\nI believe the parent poster was clearly and specifically talking about software documentation that was strong and LLM consumption-friendly, not \"everything\""
}
,
  
{
  "id": "46494079",
  "text": "Yeah, old news? It's how it is today with humans.\n\nYou SHOULD be making things in a human/LLM-readable format nowadays anyway if you're in tech, it'll do you well with AIs resorting to citing what you write, and content aggregators - like search engines - giving it more preferential scores."
}
,
  
{
  "id": "46494913",
  "text": "The LLMs will learn from our interactions with them. That's why they're often free"
}
,
  
{
  "id": "46485188",
  "text": "> I disagree with most comments that the brusque moderation is the cause of SO's problems\n\nThe moderation was precisely the reason I stopped using stackoverflow and started looking for answers and asking questions elsewhere. It was nearly impossible to ask anything without someone replying \"Why would you even want to do that, do <something completely different that does not solve my problem> instead!\". Or someone claiming it's a duplicate and you should use that ancient answer from another question that 1) barely fits and doesnt solve my problem and 2) is so outdated, it's no longer useful.\n\nWhenever I had to ask something, I had to add a justification as to why I have to do it that way and why previous posts do not solve the issue, and that took more space than the question itself.\n\nI certainly won't miss SO."
}
,
  
{
  "id": "46484601",
  "text": "If we're going to diagnose pre-AI Stack Overflow problems I see two obvious ones:\n\n1. The attempt to cut back on the harshness of moderation meant letting through more low-quality questions.\n\n2. More importantly, a lot of the content is just stale. Like you go to some question and the accepted answer with the most votes is for a ten-year-old version of the technology."
}
,
  
{
  "id": "46485365",
  "text": "> Like you go to some question and the accepted answer with the most votes is for a ten-year-old version of the technology.\n\nThis is still a problem with LLMs as a result. The bigger problem is that now the LLM doesn’t show you it was a 10 year old solution, you have to try it, watch it fail, then find out it’s old, and ask for a more up to date example, then watch it flounder around. I’ve experienced this more times than I can count."
}
,
  
{
  "id": "46487315",
  "text": "Then you're doing it wrong?\n\nI'd need to see a few examples, but this is easily solved by giving the llm more context, any really. Give it the version number, give it a url to a doc. Better yet git clone the repo and tell it to reference the source.\n\nApologies for using you as an example, but this is a common theme on people who slam LLMs. They ask it a specific/complex question with little context and then complain when the answer is wrong."
}
,
  
{
  "id": "46494223",
  "text": "I’ve specified many of these things and still had it fall on its face. And at some point, I’m providing so much detail that I may as well do it myself, which is ultimately what ends up happening.\n\nAlso, it seems assuming the latest version would make much more sense than assuming a random version from 10 years ago. If I was handing work off to another person, I would expect to only need to specify the version if it was down level, or when using the latest stable release."
}
,
  
{
  "id": "46489992",
  "text": "This is exactly the issue that most people run into and it's literally the GIGO principle that we should all be familiar with by now. If your design spec amounts to \"fix it\" then don't be surprised at the results. One of the major improvements I've noticed in Claude Code using Opus 4.5 is that it will often read the source of the library we're using so that it fully understands the API as well as the implementation.\n\nYou have to treat LLMs like any other developer that you'd delegate work to and provide them with a well thought out specification of the feature they're building or enough details about how to reproduce a bug for them to diagnose and fix it. If you want their code to conform to the style you prefer then you have to give them a style guide and examples or provide a linter and code formatter and let them know how to run it.\n\nThey're getting better at making up for these human deficits as more and more of these common failure cases are recorded but you can get much better ou"
}
,
  
{
  "id": "46485794",
  "text": "Have you tried using context7 or a similar MCP to have the agent automatically fetch up to date documentation?"
}
,
  
{
  "id": "46484285",
  "text": "> The fundamental value proposition of SO is getting an answer to a question\n\nBut the horrible moderation was in part a reason why many SO questions had no answers.\n\nI am not saying poor moderation caused all of this, but it contributed negatively and many people were pissed at that and stopped using SO. It is not the only reason SO declined, but there are many reasons for SO failure after its peak days."
}
,
  
{
  "id": "46485561",
  "text": "To the extent that moderation ever prevented questions from getting answers, that was by closing them.\n\nWhen a question gets closed before an answer comes in, the OP has nine days to fix it before it gets deleted automatically by the system.\n\nThe value proposition is getting an answer to a question that is useful to a reasonably broad audience . That very often means a question that someone else asked, the answer to which is useful to you. It is not getting an \"answer\" to a \"question\" where an individual dumps some code trying to figure out what's wrong."
}
,
  
{
  "id": "46490422",
  "text": "> When a question gets closed before an answer comes in, the OP has nine days to fix it before it gets deleted automatically by the system.\n\nOne of the bigger problems with the site's moderation systems was that 1) this system was incredibly opaque and unintuitive to new users, 2) the reopen queue was almost useless, leading to a very small percentage of closed questions ever getting reopened, and 3) even if a question did get reopened, it would be buried thousands of posts down the front page and answerers would likely never see it.\n\nThere were many plans and proposals to overhaul this system -- better \"on hold\" UI that would walk users through the process of revising their question, and a revamp of the review queues aimed at making them effective at pushing content towards reopening. These efforts got as far as the \"triage\" queue, which did little to help new users without the several other review queues that were planned to be downstream of it but scrapped as SE abruptly stopped wor"
}
,
  
{
  "id": "46490997",
  "text": "Yes.\n\nThe \"on hold\" change got reversed because new users apparently just found it confusing.\n\nOther attempts to communicate have not worked because the company and the community are separate entities (and the company has more recently shown itself to be downright hostile to the community). We cannot communicate this system better because even moderators do not have access to update the documentation . The best we can really do is write posts on the meta site and hope people find them, and operate the \"customer service desk\" there where people get the bad news.\n\nBut a lot of the time people really just don't read anyway. Especially when they get question-banned; they are sent messages that include links explaining the situation, and they ask on the meta site about things that are clearly explained in those links. (And they sometimes come up with strange theories about it that are directly contradicted by the information given to them. E.g. just the other day we had https://meta.stackov"
}
,
  
{
  "id": "46487130",
  "text": "And that was the core problem with Stack Overflow - they wanted to build a system of core Q&As to be a reference, but everyone treated it as a \"fix my very specific problem now\".\n\n99% of all the junk that got closed was just dumps of code and 'it doesn't work'. Not useful to anyone."
}
,
  
{
  "id": "46492388",
  "text": "And 99% of the other stuff, that wasn't just a code dump and \"it doesn't work\", was also closed."
}
,
  
{
  "id": "46484536",
  "text": "There was, obviously, only one main reason: LLMs. Anything else makes no sense. Even if the moderation was \"horrible\" (which sounds to me like a horrible exaggeration), there was nothing which came close to being as good as SO. There was no replacement. People will use the best available platform, even if you insist in describing it as \"horrible\". It's was not horrible compared to the alternatives, web forums like Reddit and HN, which are poorly optimized for answering questions."
}
,
  
{
  "id": "46484669",
  "text": "Look at the data - it had already been on the downslide for years before LLMs became a meaningful alternative. AI was the killing blow, but there was undoubtedly other factors."
}
,
  
{
  "id": "46486625",
  "text": "The decline was much slower, not the following exponential decline that can only have been caused by LLMs."
}
,
  
{
  "id": "46484732",
  "text": "You overvalue the impact of LLMs in regards to SO. They did have an impact, but it's the moderation that ultimately bent and broke the camel's back. An LLM may give seemingly good answers, but it always lacks in nuance and, most importantly, in being vetted by another person. It's the quality assurance that matters, and anyone with even a bit of technical skill quickly brushes up against that illusion of knowledge an LLM gives and will either try to figure it out on their own or seek out other sources to solve it if it matters. Reddit, for all its many problems, was often still easier to ask on and easier to get answers on without needing an intellectual charade and without some genius not reading the post, closing it and linking to a similar sounding title despite the content being very different. Which is the crux of the issue; you can't ask questions on SO. Or rather, you can't ask questions. No, no, that's not enough. You'll have to engage with the community, answer many other ques"
}
,
  
{
  "id": "46490104",
  "text": "It was bad enough that many people resorted to asking their questions in Discord instead which is a massive boomerang back to trying to get help in IRC and just praying that someone is online and willing to help you on the spot. Having to possibly ask your question multiple times before you get some spotty help in a real time chat where it's next to impossible to find again seems unimaginably worse than using an online forum but the fact of it remains and tells us there was something driving people away from sites like SO."
}
,
  
{
  "id": "46488439",
  "text": "> I disagree with most comments that the brusque moderation is the cause of SO's problems, though it certainly didn't help.\n\nBy the time my generation was ready to start using SO, the gatekeeping was so severe that we never began asking questions. Look at the graph. The number of questions was in decline before 2020. It was already doomed because it lost the plot and killed any valuable culture. LLMs were a welcome replacement for something that was not fun to use. LLMs are an unwelcome replacement for many other things that are a joy to engage with."
}
,
  
{
  "id": "46485360",
  "text": "> I disagree with most comments that the brusque moderation is the cause of SO's problems, though it certainly didn't help. SO has had poor moderation from the beginning.\n\nOverwhelmingly, people consider the moderation poor because they expect to be able to come to the site and ask things that are well outside of the site's mission. (It's also common to attribute community actions to \"moderators\" who in reality have historically done hardly any of it; the site simply didn't scale like that. There have been tens of millions of questions, versus a couple dozen moderators.)\n\nThe kinds of questions that people are getting quick, accurate answers for from an LLM are, overwhelmingly, the sort of thing that SO never wanted. Generally because they are specific to the person asking: either that person's issue won't be relevant to other people, or the work hasn't been done to make it recognizable by others.\n\nAnd then of course you have the duplicates. You would not believe the logic some people "
}
,
  
{
  "id": "46485560",
  "text": "It seems you deny each problem that everyone sees in SO. The fact is SO repulsed people, so there is a gap between your interpretation and reality.\n\n> It is as though people think they are being insulted when they are immediately given a link to where they can get the necessary answer, by volunteers.\n\nThis, for example. Question can be marked as duplicate without an answer. In this case yes, it feels insulting because the other is asked in such a weird way, that no-one will find the old when they search for the new (for example after a library change) and marking it as duplicate of an unanswered answer if a guarantee that the next SEO user won’t see it."
}
,
  
{
  "id": "46485718",
  "text": "> Question can be marked as duplicate without an answer.\n\nNo, they literally cannot. The only valid targets for closure are existing questions that have an upvoted or accepted answer. The system will not permit the closure (or vote to close) otherwise.\n\nIf you mean \"without writing a direct answer to the new question first\", that is the exact point of the system . Literally all you have to do is click the link and read the existing answers.\n\n> it feels insulting because the other is asked in such a weird way, that no-one will find the old when they search for the new\n\nSure. But someone else knew about the old question, found it for you , and directly pointed you at it so that you could get an answer immediately . And did all of this for free .\n\nAnd , by doing this, now everyone else who thinks of your phrasing for the question, will be immediately able to find the old question, without even having to wait for someone to recognize the duplicate."
}
,
  
{
  "id": "46486041",
  "text": "I’m sure I’ve had the experience of being told it’s a duplicate, without resolving my problem.\n\nIn any case, you may be right, and yet if you search this thread for “horrible” and “obnoxious”, you’ll find dozens of occurrence. Maybe defining the rules of engagement so that the user is wrong every time doesn’t work."
}
,
  
{
  "id": "46490038",
  "text": ">> Question can be marked as duplicate without an answer.\n\n> No, they literally cannot.\n\nYou missed that people repeatedly closed question as duplicate when it was not a duplicate.\n\nSo it had answer, just to a different mildly related question.\n\nLLM are having problems but they gaslight me in say 3% of cases, not 60% of cases like SO mods."
}
,
  
{
  "id": "46490298",
  "text": "Please feel free to show examples."
}
,
  
{
  "id": "46489998",
  "text": "> It is as though people think they are being insulted when they are immediately given a link to where they can get the necessary answer, by volunteers.\n\nMultiple times my questions closed as duplicates of question that was answering a different question.\n\nEven when I explicitly linked that QA in my question and described how it differs from mine."
}
,
  
{
  "id": "46483554",
  "text": "That \"Dead Internet\" phrase keeps becoming more likely, and this graph shows that. Human-to-human interactions, LLMs using those interactions, less human-to-human interactions because of that, LLMs using... ?"
}
,
  
{
  "id": "46485714",
  "text": "This doesn't mean that it's over for SO. It just means we'll probably trend towards more quality over quantity. Measuring SO's success by measuring number of questions asked is like measuring code quality by lines of code. Eventually SO would trend down simply by advancements of search technology helping users find existing answers rather than asking new ones. It just so happened that AI advanced made it even better (in terms of not having to need to ask redundant questions)."
}
,
  
{
  "id": "46486070",
  "text": "\"I suspect that the gradual decline, beginning around 2016, is due to growth in a number of other sources of answers.\"\n\nI think at least one other reason is that a lot of the questions were already posted. There are only so many questions of interest, until a popular new technology comes along. And if you look at mathoverflow (which wouldnt have the constant shocks from new technologies) the trend is pretty stable...until right around 2022. And even since then, the dropoff isn't nearly so dramatic.\nhttps://data.stackexchange.com/mathoverflow/query/edit/19272..."
}
,
  
{
  "id": "46487965",
  "text": ">>what happens now?\n\nI'll tell you what happens now: LLMs continue to regurgitate and iterate and hallucinate on the questions and answers they ingested from S.O. - 90% of which are incorrect. LLM output continues to poison itself as more and more websites spring up recycling outdated or incorrect answers, and no new answers are given since no one wants to waste the time to ask a human a question and wait for the response .\n\nThe overall intellectual capacity sinks to the point where everything collaboratively built falls apart.\n\nThe machines don't need AGI to take over, they just need to wait for us to disintegrate out of sheer laziness, sloth and self-righteous.... /okay.\n\nthere was always a needy component to Stack Overflow. \"I have to pass an exam, what is the best way to write this algorithm?\" and shit like that. A lazy component. But to be honest, it was the giving of information which forced you to think, and research, and answer correctly , which made systems like S.O. worthwhil"
}
,
  
{
  "id": "46488339",
  "text": "Labs are spending billions on data set curation and RL from human experts to fill in the areas where they're currently weak. It's higher quality data than SO, the only issue is that it's not public."
}
,
  
{
  "id": "46488374",
  "text": "Can you explain what you're saying in greater depth?\n\nAre you saying that the reason there is no human expertise on the internet anymore is that everyone with knowledge is now under contract to train AIs?"
}
,
  
{
  "id": "46488690",
  "text": "No, I think the reason human expertise on the internet is dying out is because we have a cacophany of voices trying to be heard on the internet, and experts aren't interested in screaming into the void unless they directly need to do it to pay their bills."
}
,
  
{
  "id": "46488808",
  "text": "I would say that going onto Stack Overflow to answer questions made me a better coder - yeah, even with the cacophony of bullshit and repeats. It's almost more offensive for that job to be taken by \"AI\" than the job of writing the stupid code I was trying to help people fix.\n\n[edit] because I kind of get what you're saying... I truly don't care what marginal benefits people are trying to get out of popularity in the high school locker room that is the Social Media internet. I still have a weird habit of giving everyone a full answer to their questions, and trying to teach people what I know when I can. Not for kudos or points, but because the best way to learn is by teaching ."
}
,
  
{
  "id": "46483513",
  "text": "> I wonder if, 10 years from now, LLMs will still be answering questions that were answered in the halcyon 2014-2020 days of SO better than anything that came after?\n\nI've wondered this too and I wonder if the existing corpus plus new GitHub/doc site scrapes will be enough to keep things current."
}
,
  
{
  "id": "46483610",
  "text": "Widespread internet adoption created “eternal September”, widespread LLM deployment will create “eternal 2018”"
}
,
  
{
  "id": "46485078",
  "text": "The fundamental value proposition of SO is getting an answer to a question\n\nFor me, the value was writing answers on topics I was interested in…and internet points as feedback on their quality.\n\nWhen SE abandoned their app, it broke my habit."
}
,
  
{
  "id": "46486171",
  "text": "There's another significant forum: GitHub, the rise of which coincided with the start of SO's decline. I bet most niche questions went over to GH repos' issue/discussion forums, and SO was left with more general questions that bored contributors."
}
,
  
{
  "id": "46488799",
  "text": "> - I know I'm beating a dead horse here, but what happens now? Despite stratification I mentioned above, SO was by far the leading source of high quality answers to technical questions. What do LLMs train off of now? I wonder if, 10 years from now, LLMs will still be answering questions that were answered in the halcyon 2014-2020 days of SO better than anything that came after? Or will we find new, better ways to find answers to technical questions?\n\nTo me this shows just how limited LLMs are. Hopefully more people realize that LLMs aren't as useful as they seem, and in 10 years they're relegated to sending spam and generating marketting websites."
}
,
  
{
  "id": "46488854",
  "text": "Or we just stagnate, as tech no longer can afford to change."
}
,
  
{
  "id": "46490492",
  "text": "> The fundamental value proposition of SO is getting an answer to a question; if you can the same answer faster, you don't need SO.\n\nPlus they might find the answer on SO without asking a new question - You probably would expect the # of new questions to peak or plateau even if the site wasn't dying, due to the accumulation of already-answered questions."
}
,
  
{
  "id": "46483821",
  "text": "Too bad stack overflow didn't high-quality-LLM itself early. I assume it had the computer-related brainpower.\n\nwith respect to the \"moderation is the cause\" thing... Although I also don't buy moderation as the cause, I wonder if any sort of friction from the \"primary source of data\" can cause acceleration.\n\nfor example, when I'm doing an interenet search for the definition of a word like buggywhip, some search results from the \"primary source\" show:\n\n> buggy whip, n. meanings, etymology and more | Oxford English Dictionary\n\n> Factsheet What does the noun buggy whip mean? There is one meaning in OED's entry for the noun buggy whip. See 'Meaning & use' for definition, usage, and quotation evidence.\n\nwhich are non-answer to keep their traffic.\n\nbut the AI answer is... the answer.\n\nIf SO early on had had some clear AI answer + references, I think that would have kept people on their site."
}
,
  
{
  "id": "46485700",
  "text": "The meta post describing the policy of banning AI-generated answers from the site ( https://meta.stackoverflow.com/questions/421831 ) is the most popular of all time. Company interference with moderator attempts to enforce that policy lead to a moderator strike. The community is vehemently against the company's current repeated attempts to sneak AI into the system, which have repeatedly produced embarrassing results (see for example https://meta.stackoverflow.com/questions/425081 and https://meta.stackoverflow.com/questions/425162 ; https://meta.stackoverflow.com/questions/427807 ; https://meta.stackoverflow.com/questions/425766 etc.).\n\nWhat you propose is a complete non-starter."
}
,
  
{
  "id": "46486398",
  "text": "Your first example is a public announcement of an llm assisted ask question form. A detailed request for feedback on an experiment isn't \"sneaking\" and the replies are a tire fire of stupidity. One of your top complaints about users in this thread is they ask the wrong sort of questions so AI review seems like it should be useful.\n\nThe top voted answer asks why SO is even trying to improve anything when there's a moderator strike on. What is this, the 1930s? It's a voluntary role, if you don't like it just don't do it.\n\nThe second top voted answer says \"I was able to do a prompt injection and make it write me sql with an injection bug\". So? It also complains that the llm might fix people's bad English, meaning they ask the wrong question, lol.\n\nIt seems clear these people started from a belief that ai is always bad, and worked backwards to invent reasons why this specific feature is bad.\n\nIt's crazy that you are defending this group all over this HN thread, telling people that toxicity"
}

]

Return ONLY a JSON array with this exact structure (no other text):
[
  
{
  "id": "comment_id_1",
  "topics": [
    1,
    3,
    5
  ]
}
,
  
{
  "id": "comment_id_2",
  "topics": [
    2
  ]
}
,
  ...
]

Rules:
- Each comment can have 0 to 3 topics
- Use 1-based topic indices
- Only assign topics that are genuinely relevant to the comment
- If no topics match, use an empty array: 
{
  "id": "...",
  "topics": []
}
commentCount

← Back to job