llm/5daab79e-f20f-476c-ab87-82c7ff678250/batch-4-46eb05f4-2c8f-4637-8d6c-93c4fc92b3ee-input.json
You are a comment classifier. Given a list of topics and a batch of comments, assign each comment to up to 3 of the most relevant topics.
TOPICS (use these 1-based indices):
1. Toxic moderation culture
2. LLMs replacing Stack Overflow
3. Duplicate question closures
4. Community hostility toward newcomers
5. Question quality standards
6. Knowledge base vs help forum debate
7. Future of LLM training data
8. Reddit and Discord as alternatives
9. Gamification and reputation systems
10. Outdated answers problem
11. SO sale to private equity
12. Google search integration decline
13. Expert knowledge preservation
14. GitHub Discussions adoption
15. Elitist gatekeeping behavior
16. Human interaction loss
17. Question saturation theory
18. Moderator power dynamics
19. AI-generated content concerns
20. Community decline timeline
COMMENTS TO CLASSIFY:
[
{
"id": "46485687",
"text": "> What do LLMs train off of now? I wonder if, 10 years from now, LLMs will still be answering questions that were answered in the halcyon 2014-2020 days of SO better than anything that came after? Or will we find new, better ways to find answers to technical questions?\n\nThat's a great question. I have no idea how things will play out now - do models become generalized enough to handle \"out of distrubition\" problems or not ? If they don't then I suppose a few years from now we'll get an uptick in Stackoverflow questions; the website will still exist it's not going anywhere."
}
,
{
"id": "46483941",
"text": "The newer questions that LLMs can't answer will be answered in forums - either SO, reddit, or elsewhere. There will be a much higher percentage of relevant content with far fewer new pages regurgitating questions about solved problems. So the LLMs will be able to keep up."
}
,
{
"id": "46483594",
"text": "I think the interesting thing here for those of us who use open source frameworks is that we can ask the LLM to look at the source to find the answer (eg. Pytorch or Phoenix in my case). For closed source libraries I do not know."
}
,
{
"id": "46484982",
"text": "Instead of having chat-interfaces target single developers, moving towards multiplayer interfaces may bring back some of what has been lost--looping in experts or third-party knowledge when a problem is too though to tackle via agentic means.\n\nNow all our interactions are neatly kept in personalised ledgers, bounded and isolated from one another. Whether by design or by technical infeasability, the issue remains that knowledge becomes increasingly bounded too instead of collaborative."
}
,
{
"id": "46486482",
"text": "> will we find new, better ways to find answers to technical questions?\n\nI honestly don't think they need to. As we've seen so far, for most jobs in this world, answers that sound correct are good enough.\n\nIs chasing more accuracy a good use of resources if your audience can't tell the difference anyway?"
}
,
{
"id": "46486415",
"text": "We'll get to the point where we can mass moderate core knowledge eventually. We may need to hand out extra weight for verified experts and some kind of most-votes-win type logic (perhaps even comments?), but live training data updates will be a massive evolution for language models."
}
,
{
"id": "46483820",
"text": "> SO was by far the leading source of high quality answers to technical questions\n\nWe will arrive on most answers by talking to an LLM. Many of us have an idea about we want. We relied on SO for some details/quirks/gotchas.\n\nExample of a common SO question: how to do x in a library or language or platform? Maybe post on the Github for that lib. Or forums.. there are quirky systems like Salesforce or Workday which have robust forums. Where the forums are still much more effective than LLMs."
}
,
{
"id": "46483892",
"text": "I don't think \"good moderation or not\" really touches what was happening with SO.\n\nI joined SO early and it had a \"gamified\" interface that I actually found fun. Putting in effort and such I able to slowly gain karma.\n\nThe problem was as the site scaled, the competition to answer a given question became more and more intense and that made it miserable. I left at that point but I think a lot people stayed with dynamic that was extremely unhealthy. (and the quality of accepted questions declined also).\n\nWith all this, the moderation criteria didn't have to directly change, it just had to fail to deal with the effects that were happening."
}
,
{
"id": "46485709",
"text": "Agreed. The reputation system was extremely ill considered and never revisited. You may be interested in https://meta.stackexchange.com/questions/387356 ."
}
,
{
"id": "46485273",
"text": "> I disagree with most comments that the brusque moderation is the cause of SO's problems\n\nJust to add another personal data point: i started posting in on StackOverflow well before llms were a thing and moderation instantly turned ne off and i immediately stopped posting.\n\nModerators used to edit my posts and reword what i wrote, which is unacceptable. My posts were absolutely peaceful and not inflammatory.\n\nModeration was an incredible problem for stack overflow."
}
,
{
"id": "46485756",
"text": "> Moderators used to edit my posts and reword what i wrote, which is unacceptable. My posts were absolutely peaceful and not inflammatory.\n\n99.9% probability the people who made those edits a) were not moderators ; b) were acting completely in accordance with established policy (please read: \"Why do clear, accurate, appropriately detailed posts still get edited?\" https://meta.stackexchange.com/questions/403176 )\n\nWhy do you think you should be the one who gets to decide whether that's \"acceptable\"? The site existed before you came to it, and it has goals, purposes and cultural norms established beforehand. It's your responsibility, before using any site on the Internet that accepts user-generated content, to try to understand the site's and community's expectations for that content.\n\nOn Stack Overflow, the expectations are:\n\n1. You license the content to the site and to the community, and everyone is allowed to edit it. (This is also explicitly laid out in the TOS.)\n\n2. You are contrib"
}
,
{
"id": "46486911",
"text": "The tone of this answer explains everything why people fled SO as soon as they possibly could."
}
,
{
"id": "46490013",
"text": "What \"tone\"? Why is it unreasonable to say these sorts of things about Stack Overflow, or about any community? How is \"your questions and answers need to meet our standards to be accepted\" any different from \"your pull requests need to meet our standards to be accepted\"?"
}
,
{
"id": "46487160",
"text": "Thank you for being the voice of reason in this comment section!"
}
,
{
"id": "46485187",
"text": "I stopped because of moderators. They literally killed the site for me."
}
,
{
"id": "46485867",
"text": "> I disagree with most comments that the brusque moderation is the cause of SO's problems\n\nQuestions asked on SO that got downvoted by the heavy handed moderation would have been answered by LLMs without any of the flak whatsoever.\n\nThose who had downvoted other's questions on SO for not being good enough, must be asking a lot of such not good enough questions to an LLM today.\n\nSure, the SO system worked, but it was user hostile and I'm glad we all don't have to deal with it anymore."
}
,
{
"id": "46485234",
"text": "As an early user of SO [1], I feel reasonably qualified to discuss this issue. Note that I barely posted after 2011 or so so I can't really speak to the current state.\n\nBut what I can say is that even back in 2010 it was obvious to me that moderation was a problem, specifically a cultural problem. I'm really talking about the rise of the administrative/bureaucratic class that, if left unchecked, can become absolute poison.\n\nI'm constantly reminded of the Leonard Nimoy voiced line from Civ4: \"the bureaucracy is expanding to meet the needs of the expanding bureaucracy\". That sums it up exactly. There is a certain type of person who doesn't become a creator of content but rather a moderator of content. These are people who end up as Reddit mods, for example.\n\nRules and standards are good up to a point but some people forget that those rules and standards serve a purpose and should never become a goal unto themselves. So if the moderators run wild, they'll start creating work for themselve"
}
,
{
"id": "46487175",
"text": "> This manifested as the war of \"closed, non-constructive\" on SO. Some really good questions were killed this way because the moderators decided on their own that a question had to have a provable answer to avoid flame wars.\n\nIt's literally a Q&A site. Questions need actual answers, not just opinions or \"this worked for me\"."
}
,
{
"id": "46485807",
"text": "> This manifested as the war of \"closed, non-constructive\" on SO. Some really good questions were killed this way because the moderators decided on their own that a question had to have a provable answer to avoid flame wars.\n\nPlease point at some of these \"really good\" questions, if you saved any links. (I have privileges to see deleted questions; deletion is normally soft unless there's a legal requirement or something.) I'll be happy to explain why they are not actually what the site wanted and not compatible with the site's goals.\n\nThe idea that the question \"should have provable answers\" wasn't some invention of moderators or the community; it came directly from Atwood ( https://stackoverflow.blog/2011/01/17/real-questions-have-an... ).\n\n> I lost that battle. You can argue taht questions like \"should I use Javascript or Typescript?\" don't belong on SO (as the moderators did). My position was that even though there's no definite answer, somebody can give you a list of strengths and "
}
,
{
"id": "46487995",
"text": "I believe that this tension about what type of questions was baked into the very foundation of StackOverflow.\n\nhttps://www.joelonsoftware.com/2008/09/15/stack-overflow-lau...\n\n> What kind of questions are appropriate? Well, thanks to the tagging system, we can be rather broad with that. As long as questions are appropriately tagged, I think it’s okay to be off topic as long as what you’re asking about is of interest to people who make software. But it does have to be a question. Stack Overflow isn’t a good place for imponderables, or public service announcements, or vague complaints, or storytelling.\n\nvs\n\nhttps://blog.codinghorror.com/introducing-stackoverflow-com/\n\n> Stackoverflow is sort of like the anti-experts-exchange (minus the nausea-inducing sleaze and quasi-legal search engine gaming) meets wikipedia meets programming reddit. It is by programmers, for programmers, with the ultimate intent of collectively increasing the sum total of good programming knowledge in the world. No m"
}
,
{
"id": "46493479",
"text": "> As moderation and curation restricted (changing the close reasons to more and more specific things - \"it's not on that list, so you can't close it\") meant that the content that was not as well thought out but did match the rules became more and more prevalent and overwhelmed the ability for the \"spolskyites\" to close since so many of the atwoodians have left.\n\nJust to make sure: I always got the impression that Atwood was the one who wanted to keep things strictly on mission and Spolsky was the one more interested in growing a community. Yes? I do get the impression that there was a serious ideological conflict there; between the \"library of detailed, high-quality answers\" and the, well, \"to every question\" (without a proper understanding of what should count as a distinct, useful question that can have a high-quality answer). But also, the reputation gamification was incredibly poorly thought out for the \"library\" goal ( https://meta.stackexchange.com/questions/387356/the-stack-ex.."
}
,
{
"id": "46494325",
"text": "Jeff was the author of https://stackoverflow.blog/2011/06/13/optimizing-for-pearls-... and was more focused on quality than community - his vision was the library.\n\nJoel was indeed more community minded - though part of that community mindedness was also more expectations of community moderation than what the tooling was able to scale for.\n\nAnd yes, they both were to blame for gamification - though part of that was the Web 2.0 ideals of the time and the hook to keep a person coming back to it. It was part of the question that was to be answered \"how do you separate the core group from the general participants on a site?\" ... and that brings me to \"people need to read A Group Is Its Own Worst Enemy\" ( https://news.ycombinator.com/item?id=23723205 ) to understand how it shaped Stack Overflow.\n\nhttps://blog.codinghorror.com/its-clay-shirkys-internet-we-j... (2008)\n\nhttps://web.archive.org/web/20110827205048/https://stackover... (Podcast #23 from 2011)\n\nAtwood: Maybe. But the cool thing ab"
}
,
{
"id": "46485567",
"text": "Dunno why you are being downvoted - there is a certain type of person who contributes virtually nothing on Wikipedia except peripheral things like categories. BrownHairedGirl was the most toxic person in Wikipedia but she was lauded by her minions - and yet she did virtually no content creation whatsoever. Yet made millions of edits!"
}
,
{
"id": "46485664",
"text": "Google also played a part. After a while, I noticed that for my programming related questions, almost no SO discussions showed up. When they did appear on the first page, they were usually abysmal and unusable for me.\n\nWhen it started all kinds of very clever people were present and helped even with very deep and complex questions and problems. A few years later these people disappeared. The moderation was ok in the beginning, then they started wooing away a lot of talented people. And then the mods started acting like nazis, killing discussions, proper questions on a whim.\n\nAnd then bots (?) or karma obsessed/farming people started to upvote batshit crazy, ridiculous answers, while the proper solution had like 5 upvotes and no green marker next to it.\n\nIt was already a cesspool before AI took over and they sold all their data. Initial purpose achieved."
}
,
{
"id": "46483709",
"text": "Moderation got worse over time"
}
,
{
"id": "46488274",
"text": "> What do LLMs train off of now?\n\nPerhaps they’ll rely on what was used by people who answered SO questions. So: official docs and maybe source code. Maybe even from experience too, i.e. from human feedback and human written code during agentic coding sessions.\n\n> The fact that the LLM doesn't insult you is just the cherry on top.\n\nArguably it does insult even more, just by existing alone."
}
,
{
"id": "46482769",
"text": "I spent the last 14 days chasing an issue with a Spark transform. Gemini and Claude were exceptionally good at giving me answers that looked perfectly reasonable: none of them worked, they were almost always completely off-road.\n\nEventually I tried with something else, and found a question on stackoverflow, luckily with an answer. That was the game changer and eventually I was able to find the right doc in the Spark (actually Iceberg) website that gave me the final fix.\n\nThis is to say that LLMs might be more friendly. But losing SO means that we're getting an idiot friendly guy with a lot of credible but wrong answers in place of a grumpy and possibly toxic guy which, however, actually answered our questions.\n\nNot sure why someone is thinking this is a good thing."
}
,
{
"id": "46483169",
"text": "What I always appreciate about SO is the dialogue between commenters. LLMs give one answer, or bullet points around a theme, or just dump a load of code in your IDE. SO gives a debate, in which the finer points of an issue are thrashed out, with the best answers (by and large) floating to the top.\n\nSO, at its best, is numerous highly-experienced and intelligent humans trying to demonstrate how clever they are. A bit like HN, you learn from watching the back and forth. I don't think this is something that LLMs can ever replicate. They don't have the egos and they certainly don't have the experience.\n\nWhatever people's gripes about the site, I learned a hell of a lot from it. I still find solutions there, and think a world without it would be worse."
}
,
{
"id": "46485922",
"text": "> What I always appreciate about SO is the dialogue between commenters.\n\nStack Overflow is explicitly not for \"dialogue\", recent experiments (which are generally not well received by the regulars on the meta site) notwithstanding. The purpose of the comments on questions is to help refine the question and ensure it meets standards, and in some cases serve other meta purposes like pointing at different-but-related questions to help future readers find what they're looking for. Comments are generally subject to deletion at any time and were originally designed to be visually minimal. They are not part of the core experience.\n\nOf course, the new ownership is undoing all of that, because of engagement metrics and such."
}
,
{
"id": "46486213",
"text": "Heh, OK, dialogue wasn't the right word. I am a better informed person by the power of internet pedantry."
}
,
{
"id": "46483503",
"text": "The fundamental difference between asking on SO and asking an LLM is that SO is a public forum, and an LLM will be communicated with in private. This has a lot of implications, most of which surround the ability for people to review and correct bad information."
}
,
{
"id": "46487645",
"text": "The other major benefit of SO being a public forum is that once a question was wrestled with and eventually answered, other engineers could stumble upon and benefit from it. With SO being replaced by LLMs, engineers are asking LLMs the same questions over and over, likely getting a wide range of different answers (some correct and others not) while also being an incredible waste of resources."
}
,
{
"id": "46484811",
"text": "Surely the fundamental difference is one asks actual humans who know what's right vs statistical models that are right by accident."
}
,
{
"id": "46487990",
"text": "Humans do not know what’s right. What’s worse is the phenomenon of people who don’t actually know but want to seem like they know so they ask the person with the question for follow up information that is meaningless and irrelevant to the question.\n\nHey, can you show me the log files?\n\nSure here you go. Please help!\n\nHmm, I don’t really know what I’m looking for in these. Good luck!"
}
,
{
"id": "46486372",
"text": "Providing context to ask a Stack Overflow question was time-consuming.\n\nIn the time it takes to properly format and ask a question on Stack Overflow, an engineer can iterate through multiple bad LLM responses and eventually get to the right one.\n\nThe stats tell the uncomfortable truth. LLMs are a better overall experience than Stack Overflow, even after accounting for inaccurate answers from the LLM.\n\nDon't forget, human answers on Stack Overflow were also often wrong or delayed by hours or days.\n\nI think we're romanticizing the quality of the average human response on Stack Overflow."
}
,
{
"id": "46489749",
"text": "The purpose of StackOverflow was never to get askers quick answers to their specific questions. Its purpose is to create a living knowledge repository of problems and solutions which future folk may benefit from. Asking a question on StackOverflow is more like adding an article to Wikipedia than pinging a colleague for help.\n\nIf someone doesn't care about contributing to such a repository then they should ask their question elsewhere (this was true even before the rise of LLMs).\n\nStackOverflow itself attempts to explain this in various ways, but obviously not sufficiently as this is an incredibly common misconception."
}
,
{
"id": "46487779",
"text": "That's only because of LLMs consuming pre-existing discussions on SO. They aren't creating novel solutions."
}
,
{
"id": "46486912",
"text": "What I'm appreciating here is the quality of the _best_ human responses on SO.\n\nThere are always a number of ways to solve a problem. A good SO response gives both a path forward, and an explanation why, in the context of other possible options, this is the way to do things.\n\nLLMs do not automatically think of performance, maintainability, edge cases etc when providing a response, in no small part because they do not think.\n\nAn LLM will write you a regex HTML parser.[0]\n\nThe stats look bleak for SO. Perhaps there's a better \"experience\" with LLMs, but my point is that this is to our detriment as a community.\n\n[^0]: He comes, https://stackoverflow.com/questions/1732348/regex-match-open..."
}
,
{
"id": "46483585",
"text": "SO also isn't afraid to tell you that your question is stupid and you should do it a better way.\n\nSome people take that as a personal attack, but it can be more helpful than a detailed response to the wrong question."
}
,
{
"id": "46491889",
"text": "The problem is the people who decide which questions are stupid are misaligned with the site's audience."
}
,
{
"id": "46483605",
"text": "> I don't think this is something that LLMs can ever replicate. They don't have the egos and they certainly don't have the experience\n\nInteresting question - the result is just words so surely a LLM can simulate an ego. Feed it the Linux kernel mailing list?\n\nIsn’t back and forth exactly what the new MoE thinking models attempt to simulate?\n\nAnd if they don’t have the experience that is just a question of tokens?"
}
,
{
"id": "46483991",
"text": "SO was somewhere people put their hard won experience into words, that an LLM could train on.\n\nThat won't be happening anymore, neither on SO or elsewhere. So all this hard won experience, from actually doing real work, will be inaccessible to the LLMs. For modern technologies and problems I suspect it will be a notably worse experience when using an LLM than working with older technologies.\n\nIt's already true for example, when using the Godot game engine instead of Unity. LLMs constantly confuse what you're trying to do with Unity problems, offer Unity based code solutions etc."
}
,
{
"id": "46488381",
"text": "> Isn’t back and forth exactly what the new MoE thinking models attempt to simulate?\n\nI think the name \"Mixture of Experts\" might be one of the most misleading labels in our industry. No, that is not at all what MoE models do.\n\nThink of it rather like, instead of having one giant black box, we now have multiple smaller opaque boxes of various colors, and somehow (we don't really know how) we're able to tell if your question is \"yellow\" or \"purple\" and send that to the purple opaque box to get an answer.\n\nThe result is that we're able to use less resources to solve any given question (by activating smaller boxes instead of the original huge one). The problem is we don't know in advance which questions are of which color: it's not like one \"expert\" knows CSS and the other knows car engines.\n\nIt's just more floating point black magic, so \"How do I center a div\" and \"what's the difference between a V6 and V12\" are both \"yellow\" questions sent to the same box/expert, while \"How do I vertica"
}
,
{
"id": "46483629",
"text": "I don't know if this is still the case but back in the day people would often redirect comments to some stackoverflow chat feature, the links to which would always return 404 not found errors."
}
,
{
"id": "46489415",
"text": "You can ask an LLM to provide multiple approaches to solutions and explore the pros and cons of each, then you can drill down and elaborate on particular ones. It works very well."
}
,
{
"id": "46483236",
"text": "There are so many \"great\" answers on StackOverflow. Giving the why and not just the answer."
}
,
{
"id": "46482902",
"text": "It's flat wrong to suggest SO had the right answer all the time, and in fact in my experience for trickier work it was often wrong or missing entirely.\n\nLLMs have a better hit rate with me."
}
,
{
"id": "46483020",
"text": "The example wasn't even finding a right answer so I don't see where you got that..\n\nSearching questions/answers on SO can surface correct paths on situations where the LLMs will keep giving you variants of a few wrong solutions, kind of like the toxic duplicate closers.. Ironically, if SO pruned the history to remove all failures to match its community standards then it would have the same problem."
}
,
{
"id": "46483405",
"text": "\"But losing SO means that we're getting an idiot friendly guy with a lot of credible but wrong answers in place of a grumpy and possibly toxic guy which, however, actually answered our questions.\"\n\n> \"actually answered our questions.\"\n\nRead carefully."
}
,
{
"id": "46492273",
"text": "Yes, it does answer you question, when the site lets it go through.\n\nNote that \"answers your question\" does not mean \"solving your problem\". Sometimes the answer to a question is \"this is infeasible because XYZ\" and that's good feedback to get to help you re-evaluate a problem. Many LLMs still struggle with this and would rather give a wrong answer than a negative one.\n\nThat said, the \"why don't you use X\" response is practically a stereotype for a reason. So it's certainly not always useful feedback. If people could introspect and think \"can 'because my job doesn't allow me to install Z' be a valid response to this\", we'd be in a true Utopia."
}
]
Return ONLY a JSON array with this exact structure (no other text):
[
{
"id": "comment_id_1",
"topics": [
1,
3,
5
]
}
,
{
"id": "comment_id_2",
"topics": [
2
]
}
,
...
]
Rules:
- Each comment can have 0 to 3 topics
- Use 1-based topic indices
- Only assign topics that are genuinely relevant to the comment
- If no topics match, use an empty array:
{
"id": "...",
"topics": []
}
50