Summarizer

LLM Input

llm/7c7e49f1-870c-4915-9398-3b2e1f116c0c/batch-5-9a6fc7c8-71e1-45e5-b0ed-636c3888c6f8-input.json
Pretty-print
prompt

You are a comment classifier. Given a list of topics and a batch of comments, assign each comment to up to 3 of the most relevant topics.

TOPICS (use these 1-based indices):
1. Toxic moderation culture
2. LLMs replacing Stack Overflow
3. Duplicate question closures
4. Knowledge repository vs help desk debate
5. Community decline timeline
6. Discord as alternative platform
7. Future of LLM training data
8. Gamification and reputation systems
9. Expert knowledge preservation
10. Reddit as alternative
11. Question quality standards
12. Moderator power dynamics
13. Google search integration decline
14. Stack Exchange expansion problems
15. Human interaction loss
16. Documentation vs community answers
17. Site mission misalignment
18. New user experience
19. GitHub Discussions alternative
20. Corporate ownership changes

COMMENTS TO CLASSIFY:
[
  
{
  "id": "46493479",
  "text": "> As moderation and curation restricted (changing the close reasons to more and more specific things - \"it's not on that list, so you can't close it\") meant that the content that was not as well thought out but did match the rules became more and more prevalent and overwhelmed the ability for the \"spolskyites\" to close since so many of the atwoodians have left.\n\nJust to make sure: I always got the impression that Atwood was the one who wanted to keep things strictly on mission and Spolsky was the one more interested in growing a community. Yes? I do get the impression that there was a serious ideological conflict there; between the \"library of detailed, high-quality answers\" and the, well, \"to every question\" (without a proper understanding of what should count as a distinct, useful question that can have a high-quality answer). But also, the reputation gamification was incredibly poorly thought out for the \"library\" goal ( https://meta.stackexchange.com/questions/387356/the-stack-ex... ). And I suspect they both shared blame in that.\n\nA lot of it was also ignored for too long because of the assumption that a) the site would just die if it clamped down on everything from the start; b) the site would naturally attract experts with good taste in questions (including maybe even the ability to pose good https://en.wikipedia.org/wiki/Dorothy_Dixer questions) before the beginners ever cleared the barrier of trying to phrase a proper question instead of using a forum.\n\n(Nowadays, there are still small forums all over the place. And many of them try to maintain some standards for the OP. And they're all plagued with neophytes who try to use the forum as if it were a chat room . The old adage about foolproofing rings true.)\n\nAround 2014 is when the conflict really seems to have boiled over (as new question volume was peaking). Notably, that also seems to be when the dupe-hammer was introduced ( https://meta.stackoverflow.com/questions/254589 )."
}
,
  
{
  "id": "46494325",
  "text": "Jeff was the author of https://stackoverflow.blog/2011/06/13/optimizing-for-pearls-... and was more focused on quality than community - his vision was the library.\n\nJoel was indeed more community minded - though part of that community mindedness was also more expectations of community moderation than what the tooling was able to scale for.\n\nAnd yes, they both were to blame for gamification - though part of that was the Web 2.0 ideals of the time and the hook to keep a person coming back to it. It was part of the question that was to be answered \"how do you separate the core group from the general participants on a site?\" ... and that brings me to \"people need to read A Group Is Its Own Worst Enemy\" ( https://news.ycombinator.com/item?id=23723205 ) to understand how it shaped Stack Overflow.\n\nhttps://blog.codinghorror.com/its-clay-shirkys-internet-we-j... (2008)\n\nhttps://web.archive.org/web/20110827205048/https://stackover... (Podcast #23 from 2011)\n\nAtwood: Maybe. But the cool thing about this is this is not just me, because that would be boring. It is actually me and Clay Shirky. You know, Clay Shirky is one of my heroes.\n\nSpolsky: Oh...\n\nAtwood: Yeah I know, it's awesome. So we get to talk about like building communities online and I get to talk about StackOverflow, you know, and all the lessons we've learned and, get to present with Clay. Obviously he's an expert so. That's one of the people that I have emailed actually, because I thought that would be good, because he is from New-York city as well. So we could A) show him the site and B) talk about the thing we are going to do together in March, because he needs to see the site to have some context. I mean I did meet him and talk to him about this earlier a few months ago, I think I mentioned it on the podcasts. But that was before we had sort of even going to beta, so there's really not a lot to show him. But I would love to show him in person. So we'll see if I'll hear back from him, I do not know.\n\nhttps://meta.stackexchange.com/questions/105232/clay-shirkys... (2011)\n\n2014 sounds about right for when it peaked... it was also when a lot of things hit the fan one after another. General stress, the decline of community moderation. The dup hammer was a way to try to reduce the amount of close votes needed - but in doing so it became \"everything is a nail\" when the dup hammer. It was used to close poor questions as dups of other questions ... and rather than making it easier to close questions that didn't fit well, corporate allowed the \"everything is a dup\" problem to fester.\n\nThat also then made Stack Overflow's search become worse. Consider https://meta.stackoverflow.com/a/262080 which provides itself as a timestamp of 2014...\n\nHow much traffic do the questions that get duped to something bring? Especially the (currently) 410 questions linked to the Java NPE question.\n\nThat question now has 10,356 questions linked to it... and that's part of the \"why search quality is going down\" - because poor questions were getting linked and not deleted. Search went downhill, dupe hammer was over used because regular close votes took too long because community moderation was going down, which in turn caused people to be grumpy about \"closed as dup\" rather than \"your question looks like it is about X, but lacks an MCVE to be able to verify that... so close it as a dup of X rather than needing 5 votes to get an MCVE close.. which would have been more helpful in guiding a user - but would mean people would start doing FGITW to answer it maybe and you'd get it as a dup of something else instead.\"\n\nAll sorts of problems around that time."
}
,
  
{
  "id": "46498633",
  "text": "Thanks; lots of great information here.\n\nRegarding duplicates and deletion you may be interested in my thoughts: https://meta.stackoverflow.com/questions/426214/when-is-it-a... ; https://meta.stackoverflow.com/questions/434215/where-do-the... ; https://meta.stackoverflow.com/questions/421677/closing-a-qu... seem relevant here, browsing through a search of my saved posts.\n\nHaving duplicates should make the search better, by pointing people who phrase the same problem in different ways to the same place. But low-quality questions often don't produce something searchable for others, and they cover topics relevant to people who lack search skills."
}
,
  
{
  "id": "46485567",
  "text": "Dunno why you are being downvoted - there is a certain type of person who contributes virtually nothing on Wikipedia except peripheral things like categories. BrownHairedGirl was the most toxic person in Wikipedia but she was lauded by her minions - and yet she did virtually no content creation whatsoever. Yet made millions of edits!"
}
,
  
{
  "id": "46485664",
  "text": "Google also played a part. After a while, I noticed that for my programming related questions, almost no SO discussions showed up. When they did appear on the first page, they were usually abysmal and unusable for me.\n\nWhen it started all kinds of very clever people were present and helped even with very deep and complex questions and problems. A few years later these people disappeared. The moderation was ok in the beginning, then they started wooing away a lot of talented people. And then the mods started acting like nazis, killing discussions, proper questions on a whim.\n\nAnd then bots (?) or karma obsessed/farming people started to upvote batshit crazy, ridiculous answers, while the proper solution had like 5 upvotes and no green marker next to it.\n\nIt was already a cesspool before AI took over and they sold all their data. Initial purpose achieved."
}
,
  
{
  "id": "46483709",
  "text": "Moderation got worse over time"
}
,
  
{
  "id": "46488274",
  "text": "> What do LLMs train off of now?\n\nPerhaps they’ll rely on what was used by people who answered SO questions. So: official docs and maybe source code. Maybe even from experience too, i.e. from human feedback and human written code during agentic coding sessions.\n\n> The fact that the LLM doesn't insult you is just the cherry on top.\n\nArguably it does insult even more, just by existing alone."
}
,
  
{
  "id": "46482769",
  "text": "I spent the last 14 days chasing an issue with a Spark transform. Gemini and Claude were exceptionally good at giving me answers that looked perfectly reasonable: none of them worked, they were almost always completely off-road.\n\nEventually I tried with something else, and found a question on stackoverflow, luckily with an answer. That was the game changer and eventually I was able to find the right doc in the Spark (actually Iceberg) website that gave me the final fix.\n\nThis is to say that LLMs might be more friendly. But losing SO means that we're getting an idiot friendly guy with a lot of credible but wrong answers in place of a grumpy and possibly toxic guy which, however, actually answered our questions.\n\nNot sure why someone is thinking this is a good thing."
}
,
  
{
  "id": "46483169",
  "text": "What I always appreciate about SO is the dialogue between commenters. LLMs give one answer, or bullet points around a theme, or just dump a load of code in your IDE. SO gives a debate, in which the finer points of an issue are thrashed out, with the best answers (by and large) floating to the top.\n\nSO, at its best, is numerous highly-experienced and intelligent humans trying to demonstrate how clever they are. A bit like HN, you learn from watching the back and forth. I don't think this is something that LLMs can ever replicate. They don't have the egos and they certainly don't have the experience.\n\nWhatever people's gripes about the site, I learned a hell of a lot from it. I still find solutions there, and think a world without it would be worse."
}
,
  
{
  "id": "46483503",
  "text": "The fundamental difference between asking on SO and asking an LLM is that SO is a public forum, and an LLM will be communicated with in private. This has a lot of implications, most of which surround the ability for people to review and correct bad information."
}
,
  
{
  "id": "46487645",
  "text": "The other major benefit of SO being a public forum is that once a question was wrestled with and eventually answered, other engineers could stumble upon and benefit from it. With SO being replaced by LLMs, engineers are asking LLMs the same questions over and over, likely getting a wide range of different answers (some correct and others not) while also being an incredible waste of resources."
}
,
  
{
  "id": "46484811",
  "text": "Surely the fundamental difference is one asks actual humans who know what's right vs statistical models that are right by accident."
}
,
  
{
  "id": "46487990",
  "text": "Humans do not know what’s right. What’s worse is the phenomenon of people who don’t actually know but want to seem like they know so they ask the person with the question for follow up information that is meaningless and irrelevant to the question.\n\nHey, can you show me the log files?\n\nSure here you go. Please help!\n\nHmm, I don’t really know what I’m looking for in these. Good luck!"
}
,
  
{
  "id": "46486372",
  "text": "Providing context to ask a Stack Overflow question was time-consuming.\n\nIn the time it takes to properly format and ask a question on Stack Overflow, an engineer can iterate through multiple bad LLM responses and eventually get to the right one.\n\nThe stats tell the uncomfortable truth. LLMs are a better overall experience than Stack Overflow, even after accounting for inaccurate answers from the LLM.\n\nDon't forget, human answers on Stack Overflow were also often wrong or delayed by hours or days.\n\nI think we're romanticizing the quality of the average human response on Stack Overflow."
}
,
  
{
  "id": "46489749",
  "text": "The purpose of StackOverflow was never to get askers quick answers to their specific questions. Its purpose is to create a living knowledge repository of problems and solutions which future folk may benefit from. Asking a question on StackOverflow is more like adding an article to Wikipedia than pinging a colleague for help.\n\nIf someone doesn't care about contributing to such a repository then they should ask their question elsewhere (this was true even before the rise of LLMs).\n\nStackOverflow itself attempts to explain this in various ways, but obviously not sufficiently as this is an incredibly common misconception."
}
,
  
{
  "id": "46487779",
  "text": "That's only because of LLMs consuming pre-existing discussions on SO. They aren't creating novel solutions."
}
,
  
{
  "id": "46486912",
  "text": "What I'm appreciating here is the quality of the _best_ human responses on SO.\n\nThere are always a number of ways to solve a problem. A good SO response gives both a path forward, and an explanation why, in the context of other possible options, this is the way to do things.\n\nLLMs do not automatically think of performance, maintainability, edge cases etc when providing a response, in no small part because they do not think.\n\nAn LLM will write you a regex HTML parser.[0]\n\nThe stats look bleak for SO. Perhaps there's a better \"experience\" with LLMs, but my point is that this is to our detriment as a community.\n\n[^0]: He comes, https://stackoverflow.com/questions/1732348/regex-match-open..."
}
,
  
{
  "id": "46485922",
  "text": "> What I always appreciate about SO is the dialogue between commenters.\n\nStack Overflow is explicitly not for \"dialogue\", recent experiments (which are generally not well received by the regulars on the meta site) notwithstanding. The purpose of the comments on questions is to help refine the question and ensure it meets standards, and in some cases serve other meta purposes like pointing at different-but-related questions to help future readers find what they're looking for. Comments are generally subject to deletion at any time and were originally designed to be visually minimal. They are not part of the core experience.\n\nOf course, the new ownership is undoing all of that, because of engagement metrics and such."
}
,
  
{
  "id": "46486213",
  "text": "Heh, OK, dialogue wasn't the right word. I am a better informed person by the power of internet pedantry."
}
,
  
{
  "id": "46483585",
  "text": "SO also isn't afraid to tell you that your question is stupid and you should do it a better way.\n\nSome people take that as a personal attack, but it can be more helpful than a detailed response to the wrong question."
}
,
  
{
  "id": "46491889",
  "text": "The problem is the people who decide which questions are stupid are misaligned with the site's audience."
}
,
  
{
  "id": "46497381",
  "text": "This comment and the parent one make me realize that people who answer probably value the exchange between experts more than the answer.\n\nPerhaps the antidote involves a drop of the poison.\n\nLet an LLM answer first, then let humans collaborate to improve the answer.\n\nBonus: if you can safeguard it, the improved answer can be used to train a proprietary model."
}
,
  
{
  "id": "46498803",
  "text": "> This comment and the parent one make me realize that people who answer probably value the exchange between experts more than the answer.\n\nI'm more amused that ExpertsExchange.com figured out the core of the issue, 30 years ago, down to their site's name."
}
,
  
{
  "id": "46483605",
  "text": "> I don't think this is something that LLMs can ever replicate. They don't have the egos and they certainly don't have the experience\n\nInteresting question - the result is just words so surely a LLM can simulate an ego. Feed it the Linux kernel mailing list?\n\nIsn’t back and forth exactly what the new MoE thinking models attempt to simulate?\n\nAnd if they don’t have the experience that is just a question of tokens?"
}
,
  
{
  "id": "46483991",
  "text": "SO was somewhere people put their hard won experience into words, that an LLM could train on.\n\nThat won't be happening anymore, neither on SO or elsewhere. So all this hard won experience, from actually doing real work, will be inaccessible to the LLMs. For modern technologies and problems I suspect it will be a notably worse experience when using an LLM than working with older technologies.\n\nIt's already true for example, when using the Godot game engine instead of Unity. LLMs constantly confuse what you're trying to do with Unity problems, offer Unity based code solutions etc."
}
,
  
{
  "id": "46488381",
  "text": "> Isn’t back and forth exactly what the new MoE thinking models attempt to simulate?\n\nI think the name \"Mixture of Experts\" might be one of the most misleading labels in our industry. No, that is not at all what MoE models do.\n\nThink of it rather like, instead of having one giant black box, we now have multiple smaller opaque boxes of various colors, and somehow (we don't really know how) we're able to tell if your question is \"yellow\" or \"purple\" and send that to the purple opaque box to get an answer.\n\nThe result is that we're able to use less resources to solve any given question (by activating smaller boxes instead of the original huge one). The problem is we don't know in advance which questions are of which color: it's not like one \"expert\" knows CSS and the other knows car engines.\n\nIt's just more floating point black magic, so \"How do I center a div\" and \"what's the difference between a V6 and V12\" are both \"yellow\" questions sent to the same box/expert, while \"How do I vertically center a div\" is a red question, and \"what's the most powerful between a V6 and V12\" is a green question which activates a completely different set of weights."
}
,
  
{
  "id": "46483629",
  "text": "I don't know if this is still the case but back in the day people would often redirect comments to some stackoverflow chat feature, the links to which would always return 404 not found errors."
}
,
  
{
  "id": "46489415",
  "text": "You can ask an LLM to provide multiple approaches to solutions and explore the pros and cons of each, then you can drill down and elaborate on particular ones. It works very well."
}
,
  
{
  "id": "46483236",
  "text": "There are so many \"great\" answers on StackOverflow. Giving the why and not just the answer."
}
,
  
{
  "id": "46482902",
  "text": "It's flat wrong to suggest SO had the right answer all the time, and in fact in my experience for trickier work it was often wrong or missing entirely.\n\nLLMs have a better hit rate with me."
}
,
  
{
  "id": "46483020",
  "text": "The example wasn't even finding a right answer so I don't see where you got that..\n\nSearching questions/answers on SO can surface correct paths on situations where the LLMs will keep giving you variants of a few wrong solutions, kind of like the toxic duplicate closers.. Ironically, if SO pruned the history to remove all failures to match its community standards then it would have the same problem."
}
,
  
{
  "id": "46483405",
  "text": "\"But losing SO means that we're getting an idiot friendly guy with a lot of credible but wrong answers in place of a grumpy and possibly toxic guy which, however, actually answered our questions.\"\n\n> \"actually answered our questions.\"\n\nRead carefully."
}
,
  
{
  "id": "46492273",
  "text": "Yes, it does answer you question, when the site lets it go through.\n\nNote that \"answers your question\" does not mean \"solving your problem\". Sometimes the answer to a question is \"this is infeasible because XYZ\" and that's good feedback to get to help you re-evaluate a problem. Many LLMs still struggle with this and would rather give a wrong answer than a negative one.\n\nThat said, the \"why don't you use X\" response is practically a stereotype for a reason. So it's certainly not always useful feedback. If people could introspect and think \"can 'because my job doesn't allow me to install Z' be a valid response to this\", we'd be in a true Utopia."
}
,
  
{
  "id": "46483495",
  "text": ">> Eventually I tried with something else, and found a question on stackoverflow, luckily with an answer. That was the game changer and eventually I was able to find the right doc\n\nRead carefully and paraphrase to the generous side. The metaphor that follows that is obviously trying to give an example of what might be somehow lost."
}
,
  
{
  "id": "46483688",
  "text": "This is a fair critique. I am often not generous enough with people."
}
,
  
{
  "id": "46483526",
  "text": "Interpreting that claim as \"SO users always, 100% of the time answer questions correctly\" is uncharitable to the point of being unreasonable.\n\nMost people would interpret the claim as concisely expressing that you get better accuracy from grumpy SO users than friendly LLMs."
}
,
  
{
  "id": "46483823",
  "text": "For the record I was interpreting that as LLMs are useless (which may have been just as uncharitable), which I categorically deny. I would say they're about just as useful without wading through the mire that SO was."
}
,
  
{
  "id": "46486985",
  "text": "It entirely depends on the language you were using. The quality of both questions and answers between e.g. Go and JavaScript is incredible. Even as a relative beginner in JS I could not believe the amount of garbage that I came across, something that rarely happened for Go."
}
,
  
{
  "id": "46483111",
  "text": "No point in arguing with people who bring a snowball into Congress to disprove global warming."
}
,
  
{
  "id": "46488372",
  "text": "I'm hoping increasing we'll see agents helping with this sort of issue. I would like an agent that would do things like pull the spark repo into the working area and consult the source code/cross reference against what you're trying to do.\n\nOnce technique I've used successfully is to do this 'manually' to ensure codex/Claude code can grep around the libraries I'm using"
}
,
  
{
  "id": "46483183",
  "text": "You still get the same thing though?\n\nThat grumpy guy is using an LLM and debugging with it. Solves the problem. AI provider fine tunes their model with this. You now have his input baked into it's response.\n\nHow you think these things work? It's either a human direct input it's remembering or a RL enviroment made by a human to solve the problem you are working on.\n\nNothing in it is \"made up\" it's just a resolution problem which will only get better over time."
}
,
  
{
  "id": "46484825",
  "text": "How does that work if there's no new data for them to train on, only AI slurry?"
}
,
  
{
  "id": "46489410",
  "text": "Because what you’re describing is the exception. Almost always with LLM’s I get a better solution, or helpful pointer in the direction of a solution, and I get it much faster. I honestly don’t understand anyone could prefer Google/SO, and in fact that the numbers show that they don’t. You’re in an extreme minority."
}
,
  
{
  "id": "46485309",
  "text": "> But losing SO means that we're getting an idiot friendly guy with a lot of credible but wrong answers in place of a grumpy and possibly toxic guy which, however, actually answered our questions.\n\nWhich by the way is incredibly ironic to read on the internet after like fifteen years of annoying people left and right about toxic this and toxic that.\n\nExtreme example: Linus Torvalds used to be notoriously toxic.\n\nWould you still defend your position if the “grumpy” guy answered in Linus’ style?"
}
,
  
{
  "id": "46485494",
  "text": "> Would you still defend your position if the “grumpy” guy answered in Linus’ style?\n\nIf they answered correctly, yes.\n\nMy point is that providing _actual knowledge_ is by itself so much more valuable compared to _simulated knowledge_, in particular when that simulated knowledge is hyper realistic and wrong."
}
,
  
{
  "id": "46492332",
  "text": "Sadly, an accountable individual representing an organization is different from a community of semi-anonymous users with a bunch of bureaucracy that can't or doesn't care about every semis anonymous user"
}
,
  
{
  "id": "46483741",
  "text": "Q&A isn't going away. There's still GitHub Discussions."
}
,
  
{
  "id": "46482640",
  "text": "Not a big surprise once LLMs came along: stack overflow developed some pretty unpleasant traits over time. Everything from legitimate questions being closed for no good reason (or being labeled a duplicate even though they often weren’t), out of date answers that never get updated as tech changes, to a generally toxic and condescending culture amongst the top answerers. For all their flaws, LLMs are so much better."
}
,
  
{
  "id": "46482705",
  "text": "Agreed. I personally stopped contributing to StackOverflow before LLMs, because of the toxic moderation.\n\nNow with LLMs, I can't remember the last time I visited StackOverflow."
}
,
  
{
  "id": "46483019",
  "text": "People in this thread are missing another key component in the decline of StackOverflow - the more experienced you become, the less useful it is.\n\nThe harder the problem, the less engagement it gets. People who spend hours working on your issue are rewarded with a single upvote. Meanwhile, \"how do I concat a string\" gets dozens or hundreds of upvotes.\n\nThe incentive/reward structure punished experienced folks with challenging/novel questions.\n\nPair that with the toxic moderation and trigger-happy close-votes, you get a zombie community with little new useful content."
}

]

Return ONLY a JSON array with this exact structure (no other text):
[
  
{
  "id": "comment_id_1",
  "topics": [
    1,
    3,
    5
  ]
}
,
  
{
  "id": "comment_id_2",
  "topics": [
    2
  ]
}
,
  ...
]

Rules:
- Each comment can have 0 to 3 topics
- Use 1-based topic indices
- Only assign topics that are genuinely relevant to the comment
- If no topics match, use an empty array: 
{
  "id": "...",
  "topics": []
}
commentCount

← Back to job