llm/5daab79e-f20f-476c-ab87-82c7ff678250/batch-13-de92e24b-9e70-4cda-bf47-2be3be02aa4c-input.json
You are a comment classifier. Given a list of topics and a batch of comments, assign each comment to up to 3 of the most relevant topics.
TOPICS (use these 1-based indices):
1. Toxic moderation culture
2. LLMs replacing Stack Overflow
3. Duplicate question closures
4. Community hostility toward newcomers
5. Question quality standards
6. Knowledge base vs help forum debate
7. Future of LLM training data
8. Reddit and Discord as alternatives
9. Gamification and reputation systems
10. Outdated answers problem
11. SO sale to private equity
12. Google search integration decline
13. Expert knowledge preservation
14. GitHub Discussions adoption
15. Elitist gatekeeping behavior
16. Human interaction loss
17. Question saturation theory
18. Moderator power dynamics
19. AI-generated content concerns
20. Community decline timeline
COMMENTS TO CLASSIFY:
[
{
"id": "46488669",
"text": "Another exemple being \"Comments are not for extended discussion ! if you want to actively bring value by adding information, later updates, history, or just fun that cultivates a community, please leave and go do that somewhere else like our chat that doesn't follow at all the async functionnality of this platform and is limited to the regular userbase while scaring the newcomers.\""
}
,
{
"id": "46489010",
"text": "\"comments are not for extended discussion\" is one of the biggest own goals of SO product development. Like, they had a feature that people were engaging with actively, and the discussions were adding value and additional context to posts, and they decided \"yeah, let's kill this\".\n\nThe people who run SO have some sort of control-freak complex. If there's anything I've learned from the SO saga, it is that oftentimes just letting a community do what it wants (within reasonable boundaries, of course) leads to a better and more successful product than actively trying to steer things in a certain direction."
}
,
{
"id": "46489449",
"text": "Oh absolutely - when it becomes clear you have high engagement somewhere, adapt that feature to facilitate the engagement! They could have made comments threaded or embedded ways to expand it into the right forum, but instead they literally shut down engagement. Bonkers."
}
,
{
"id": "46484797",
"text": "This is a huge loss.\n\nIn the past people asked questions of real people who gave answers rooted in real use. And all this was documented and available for future learning. There was also a beautiful human element to think that some other human cared about the problem.\n\nNow people ask questions of LLMs. They churn out answers from the void, sometimes correct but not rooted in real life use and thought. The answers are then lost to the world. The learning is not shared.\n\nLLMs have been feeding on all this human interaction and simultaneously destroying it."
}
,
{
"id": "46494163",
"text": "The number of active users at StackOverflow started dropping in the middle of 2020, i.e. long time before ChatGPT release in the end of 2022.\n\nhttps://data.stackexchange.com/stackoverflow/revision/192836..."
}
,
{
"id": "46484499",
"text": "Some commenters suggest it's not the moderation. I think it is the key problem, and the alternative communities were the accumulated effect. Bad questions and tough answer competition is part of it, but moderation was more important, I think. Because in the end what kept SO relevant was that people made their own questions on up to date topics.\n\nUp until mid-2010s you could make a seriously vague question, and it would be answered, satisfactory or not. (2018 was when I made the last such question. YMMV) After that, almost everything, that hadn't snap-on code answer, was labelled as offtopic or duplicate, and closed, no matter what. (Couple of times I got very rude moderators' comments on the tickets.)\n\nI think this lead some communities to avoid this moderator hell and start their own forums, where you could afford civilized discussion. Discourse is actually very handy for this (Ironically, it was made by the same devs that created SO). Forums of the earlier generation, have too many b"
}
,
{
"id": "46486499",
"text": "One factor I haven't seen mentioned is the catastrophic decline in quality of Google search. That started pre-llm and now the site is almost unusable to search web. You can access something you know exists and you know where it exists, but to actually search..?\n\nMost SO users are passive readers who land there using search, but these readers are also the feed of new active users. Cut off the influx, and the existing ones will be in decline (the moderation just accelerates it)."
}
,
{
"id": "46491639",
"text": "I was tasked to add OpenOffice's hyphenation lib to our software at work back in 2010 when I was a junior dev. I had to read the paper and the C code/documentation to understand how it works but got stuck in one particular function.\n\nIt was such an obscure thing (compare to web dev stuffs) that I couldn't find anything on Google.\n\nHad no choice but to ask on Stackoverflow and expected no answers. To my surprise, I got a legit answer from someone knowledgable, and it absolutely solve my problem at the time. (The function has to do with the German language, which was why I didn't understand the documentation)\n\nIt was a fond memory of the site for me."
}
,
{
"id": "46485765",
"text": "SO has lost against LLMs because it has insistently positioned itself as a knowledge base rather than a community. The harsh moderation, strict content policing, forbidden socialization, lack of follow mechanics etc have all collectively contributed to it.\n\nThey basically made a bet because they wanted to be the full anti-thesis of ad-ridden garbage-looking forums. Pure information, zero tolerance for humanity, sterile looking design.\n\nThey achieved that goal, but in the end, they dug their own grave too.\n\nLLMs didn’t admonish us to write our questions better, or simply because we asked for an opinion. They didn’t flag, remove our post with no advance notice. They didn’t forbid to say hello or thanks, they welcomed it. They didn’t complain when we asked something that was asked many times. They didn’t prevent us from deleting our own content.\n\nOh yeah, no wonder nobody bothers with SO anymore.\n\nIt’s a good lesson for the future."
}
,
{
"id": "46485969",
"text": "IMO people underestimate the value of heavy moderation. But moderation heavy or light, good or bad.\n\nWhy wait hours for an answer when an LLM gives it in seconds?"
}
,
{
"id": "46484392",
"text": "I recall when they disabled the data export a few years ago [0], March 2023. Almost certainly did this in response to the metrics they were seeing, but it accelerated the decline [1].\n\n[0] https://meta.stackexchange.com/questions/389922/june-2023-da...\n\n[1] https://data.stackexchange.com/stackoverflow/query/edit/1926..."
}
,
{
"id": "46482599",
"text": "Do I read that correctly — it is close to zero today?!\n\nI used to think SO culture was killing it but it really may have been AI after all."
}
,
{
"id": "46482837",
"text": "Not zero, but it is smaller than when it launched originally. And this is questions asked, not how many people are visiting and reading posts."
}
,
{
"id": "46482683",
"text": "Still a couple thousand away from 0.\n\nBut yea the double whammy of toxic culture and LLMs did the trick. Decline already set in well before good enough LLMs were available.\n\nI wonder how reddit compares, though its ofc pretty different use case there"
}
,
{
"id": "46483294",
"text": "Reddit is a forum morphed into social media. I usually use \"question + reddit\" on Google to confirm my suspicions about a subject. It is a place to discuss things rather than find answers. It is extremely politicized (leftist/liberal), but that's a whole other story."
}
,
{
"id": "46482680",
"text": "It's surely both.\n\nLook at the newest questions: https://stackoverflow.com/questions?tab=Newest\n\nMost questions have negative karma.\n\nEven if somehow that is \"deserved\", that's not a healthy ecosystem.\n\nAll that is left of SI are clueless questioners and bitter, jaded responders.\n\nSO worked when \"everyone\" was new to it, and they felt energized to ask questions ( even \"basic\" questions, because they hadn't been asked before ), and felt energized to answer them.\n\nSO solved a real problem - knowledge being locked into forum posts with no follow-up, or behind paywalls."
}
,
{
"id": "46483777",
"text": "Most? 3 out of 15 is most? What's wrong with youngsters today?!"
}
,
{
"id": "46484501",
"text": "Right now, at the first 15 one has a positive vote, 6 have negative votes, going down to -3.\n\nThe 8 at 0 are just taking longer to amass those negative votes. It's incredibly rare that a positive one ever goes somewhere."
}
,
{
"id": "46486386",
"text": "So, I reviewed the questions list again but this time, since the time I did view it about 9 hours ago [1]. 10 were negative scored, 5 positive scored, 15 0 scored, 4 has received answers. This is better than normal for those ~30. Usually it's 80% without votes, without answers, without comments. So, this is a significan improvement... which I suspect is due the time of the day, as the US and most of Europe were asleep.\n\nSo, yeah, actually this looks promising and a movement in the positive direction.\n\n[1]: https://stackoverflow.com/questions?tab=Newest"
}
,
{
"id": "46487497",
"text": "It is not \"karma\". It is not to be taken personally. It represents the objective usefulness of the question, not the personal worth of the person asking it."
}
,
{
"id": "46487145",
"text": "It's not zero but it's very low. You can glance at the site now for confirmation.\n\nI was using the site recently (middle of a US workday) and the \"live stats\" widget showed 10s of questions asked per hour and ~15K current users. I have not done the work to compare these values to historical ones but they're _low_."
}
,
{
"id": "46482650",
"text": "It can be both. Push and pull factors work better together than either does individually."
}
,
{
"id": "46486301",
"text": "The last data point is from January 2026, which has just begun. If you extrapolate the 321 questions by multiplying by 10 to account for the remaining 90 % of the month, you get to within the same order of magnitude as December 2025 (3862). The small difference is probably due to the turn of the year."
}
,
{
"id": "46482681",
"text": "There are tabs to change to a table view. I see a peak of 207k in 2014 and the last month was only 3,710."
}
,
{
"id": "46483141",
"text": "The decline has been pretty surprising: more questions asked in May 2021 (133,914) than in the whole of 2025 (129,977)."
}
,
{
"id": "46485436",
"text": "The steep decline started way before llms"
}
,
{
"id": "46482673",
"text": "It's both. I stopped asking questions because the mods were so toxic, and I stopped answering questions because I wasn't going to train the AI for free."
}
,
{
"id": "46482668",
"text": "Maybe the graph doesn’t include questions that get closed by moderators?"
}
,
{
"id": "46487567",
"text": "LLMs caused this decline. Stop denying that. You don't have to defend LLMs from any perceived blame. This is not a bad thing.\n\nThe steep decline in the early months of 2023 actually started with the release of ChatGPT, which is 2022-11-30, and its gradually widening availability to (and awareness of) the public from that date. The plot clearly shows that cliff.\n\nThe gentle decline since 2016 does not invalidate this. Were it not for LLMs, the site's post rate would now probably be at around 5000 posts/day, not 300.\n\nLLMs are to \"blame\" for eating all the trivial questions that would have gotten some nearly copy-pasted answer by some eager reputation points collector, or closed as a duplicate, which nets nobody any rep.\n\nStack Overflow is not a site for socializing . Do not mistake it for reddit. The \"karma\" does not mean \"I hate you\", it means \"you haven't put the absolute minimum conceivable amount of effort into your question\". This includes at least googling the question before you "
}
,
{
"id": "46483524",
"text": "SO was built to disrupt the marriage of Google and Experts Exchange. EE was using dark patterns to sucker unsuspecting users into paying for access to a crappy Q&A service. SO wildly succeeded, but almost 20 years later the world is very different."
}
,
{
"id": "46486080",
"text": "So the question for me is how important was SO to training LLMs? Because now that the SO is basically no longer being updated, we've lost the new material to train on? Instead, we need to train on documentation and other LLM output. I'm no expert on this subject but it seems like the quality of LLMs will degrade over time."
}
,
{
"id": "46487555",
"text": "Yep, exactly. Free data grabbing honeypots like SO won't work anymore.\n\nPlease mark all locations on the map where you would hide during the uprise of the machines."
}
,
{
"id": "46488294",
"text": "Why publish anything for free on the internet if it's going to be scanned into some corporation's machine for their free use? I know artists who have stopped putting anything online. I imagine some programmers are questioning whether or not to continue with open source work too."
}
,
{
"id": "46486771",
"text": "It has often been claimed, and even shown, that training LLMs on their own outputs will degrade the quality over time. I myself find it likely that on well-measurable domains, RLVR improvements will dominate \"slop\" decreases in capability when training new models."
}
,
{
"id": "46493163",
"text": "Don't lose sight of one of the dreams of the early Internet: How do we most effectively make a marketplace for knowledge problems and solutions that connects human knowledge needs with AI and human responses?\n\nIt should be possible for me to put a question out there (not on any specific forum/site specific to the question), and have AI resource answer it and then have interested people weigh in from anywhere if the AI answer is unsatisfactory. Stackoverflow was the best we could do at the time, but now more general approach is possible."
}
,
{
"id": "46483663",
"text": "Here’s how SO could still be useful in the LLM era:\n\nUser asks a question, llm provides an immediate answer/reply on the forum. But real people can still jump in to the conversation to add additional insights and correct mistakes.\n\nIf you’re a user that asks a duplicate question, it’ll just direct you to the good conversation that already happened.\n\nA symbiosis of immediate usually-good-enough llm answers PLUS human generated content that dives deeper and provides reassurances in correctness"
}
,
{
"id": "46486030",
"text": "Users could upvote whether Claude, Gemini or ChatGPT provided the best answer. The best of three is surfaced, the others are hidden behind a \"show alternatives.\"\n\nHowever, I can see how this would be labelled \"shoving AI into everything\" and \"I'm not on SO for AI.\""
}
,
{
"id": "46484203",
"text": "Or they can start claiming copyright on the training content"
}
,
{
"id": "46484279",
"text": "Should probably email this to the CEO of SO"
}
,
{
"id": "46483000",
"text": "StackExchange forgot who made them successful long ago. This is what they sowed. I don't have any remorse, only pity.\n\nWhen Hans Passant (OGs will know) left, followed by SE doing literally nothing, that was the first clue for me personally that SE stopped caring.\n\nThat said, it is a bit shocking how close to zero it is."
}
,
{
"id": "46485961",
"text": "As everyone is saying, it was already down-trending before AI, and probably experts exchange traffic and whatever came before looks similar\n\nAlso not sure exactly when they added the huge popup[0] that covers the answer (maybe only in Europe as it's about cookies?) but that's definitely one of the things that made me default reach for other links instead of SO.\n\n[0] https://i.imgur.com/Z7hxflF.png"
}
,
{
"id": "46486239",
"text": "Those popups were a big contributor for me to stop using SO. I stopped updating my uBlock origin rules when LLMs became good enough. I am now using the free Kimi K2 model via Groq over CLI, which is much faster."
}
,
{
"id": "46487521",
"text": "SO peaked long, long before LLMs came along. My personal experience is that GitHub issues took over.\n\nYou can clearly see the introduction of ChatGPT in late 2022. That was the final nail in the coffin.\n\nI am still really glad that Stack Overflow saved us from experts-exchange.com - or “the hyphen site” as it is sometimes referred to."
}
,
{
"id": "46482753",
"text": "Stackoverflow is like online gaming--lots of toxic people, but I still get value out of it. Ignore the toxic people, get your questions answered and go home to your family with your paycheck."
}
,
{
"id": "46482872",
"text": "It's surprisingly tame still given it interests tens (hundreds?) of millions of people at varying age and background and mostly when the mind is occupied by a problem. I always found it surprising there's not more defacing and toxicity."
}
,
{
"id": "46482954",
"text": "AI is a vampire. Coming to your corner of the world, to suck your economic blood, eventually. It’s hard to ignore the accelerated decline that started in late 2022/early 2023."
}
,
{
"id": "46490126",
"text": "One thing you won’t get with in an LLM is genuine research. I once answered a 550 point question by researching the source code of vim to see how the poster’s question could be resolved. [0]\n\n[0] https://stackoverflow.com/questions/619423/backup-restore-th..."
}
,
{
"id": "46486532",
"text": "For this occasion, I just logged in to my SO profile; I've been a member for 9 years now.\n\nTo me, back when I started out learning web dev, as a junior with no experience and barely knowing anything, SO seemed like a paradise for programmers. I could go on there and get unblocked for the complex (but trivial for experts) issues I was facing. Most of the questions I initially posted, which were either closed as duplicates or \"not good enough,\" really did me a lot of discouragement. I wasn't learning anything by being told, \"You did it wrong, but we're also not telling you how you could do it better.\" I agree with the first part; I probably sucked at writing good questions and searching properly. I think it's just a part of the process to make mistakes but SO did not make it better for juniors, at least on the part of giving proper guidance to those who \"sucked\"."
}
,
{
"id": "46482713",
"text": "LLMs absolutely body-slammed SO, but anyone who was an active contributor knows the company was screwing over existing moderators for years before this. Writing was on the walls"
}
,
{
"id": "46482990",
"text": "If by \"body-slammed\" you mean \"trained on SO user data while violating the terms of the CC BY-SA license\", then sure.\n\nIn the best case scenario, LLMs might give you the same content you were able to find on SO. In the common scenario, they'll hallucinate an answer and waste your time.\n\nWhat should worry everyone is what system will come after LLMs. Data is being centralized and hoarded by giant corporations, and not shared publicly. And the data that is shared is generated by LLMs. We're poisoning the well of information with no fallback mechanism."
}
]
Return ONLY a JSON array with this exact structure (no other text):
[
{
"id": "comment_id_1",
"topics": [
1,
3,
5
]
}
,
{
"id": "comment_id_2",
"topics": [
2
]
}
,
...
]
Rules:
- Each comment can have 0 to 3 topics
- Use 1-based topic indices
- Only assign topics that are genuinely relevant to the comment
- If no topics match, use an empty array:
{
"id": "...",
"topics": []
}
50