llm/5daab79e-f20f-476c-ab87-82c7ff678250/batch-7-15dc500c-8232-4f07-a779-6d344c7a1ce1-input.json
You are a comment classifier. Given a list of topics and a batch of comments, assign each comment to up to 3 of the most relevant topics.
TOPICS (use these 1-based indices):
1. Toxic moderation culture
2. LLMs replacing Stack Overflow
3. Duplicate question closures
4. Community hostility toward newcomers
5. Question quality standards
6. Knowledge base vs help forum debate
7. Future of LLM training data
8. Reddit and Discord as alternatives
9. Gamification and reputation systems
10. Outdated answers problem
11. SO sale to private equity
12. Google search integration decline
13. Expert knowledge preservation
14. GitHub Discussions adoption
15. Elitist gatekeeping behavior
16. Human interaction loss
17. Question saturation theory
18. Moderator power dynamics
19. AI-generated content concerns
20. Community decline timeline
COMMENTS TO CLASSIFY:
[
{
"id": "46482834",
"text": "Do we have any stats for the number of GitHub discussions created each month to compare to this?"
}
,
{
"id": "46485891",
"text": "> legitimate questions being closed for no good reason\n\nThey are closed for good reasons. People just have their own ideas about what the reasons should be. Those reasons make sense according to others' ideas about what they'd like Stack Overflow to be, but they are completely wrong for the site's actual goals and purposes. The close reasons are well documented ( https://meta.stackoverflow.com/questions/417476 ) and well considered, having been exhaustively discussed over many years.\n\n> or being labeled a duplicate even though they often weren’t\n\nI have seen so many people complain about this. It is vanishingly rare that I actually agree with them. In the large majority of cases it is comically obvious to me that the closure was correct. For example, there have been many complaints in the Python tag that were on the level of \"why did you close my question as a duplicate of how to do X with a list? I clearly asked how to do it with a tuple!\" (for values of X where you do it the same way"
}
,
{
"id": "46487260",
"text": "You seem to have filled this thread with a huge number of posts that try to justify SO's actions. Over and over, these justifications are along the lines of \"this is our mission\", \"read our policy\", \"understand us\".\n\nOften, doing what your users want leads to success. Stamping authority over your users, and giving out a constant air of \"we know better than all of you\", drives them away. And when it's continually emphasized publicly (rather than just inside a marketing department) that the \"mission\" and the \"policy\" are infinitely more important than what your users are asking for, that's a pretty quick route to failure.\n\nWhen you're completely embedded in a culture, you don't have the ability to see it through the eyes of the majority on the outside. I would suggest that some of your replies here - trying to deny the toxicity and condescension - are clearly showing this."
}
,
{
"id": "46489978",
"text": "> Often, doing what your users want leads to success.\n\nYou misunderstand.\n\nPeople with accounts on Stack Overflow are not \"our users\".\n\nStack Exchange, Inc. does not pay the moderators, nor high-rep community members (who do the bulk of the work, since it is simply far too much for a handful of moderators) a dime to do any of this.\n\nBuilding that resource was never going to keep the lights on with good will and free user accounts (hence \"Stack Overflow for Teams\" and of course all the ads). Even the company is against us, because the new owners paid a lot of money for this. That doesn't change what we want to accomplish, or why.\n\n> When you're completely embedded in a culture, you don't have the ability to see it through the eyes of the majority on the outside.\n\nI am not \"embedded in\" the culture. I simply understand it and have put a lot of time into its project. I hear the complaints constantly. I just don't care . Because you are trying to say that I shouldn't help make the thing I "
}
,
{
"id": "46493240",
"text": "> \"why did you close my question as a duplicate of how to do X with a list? I clearly asked how to do it with a tuple!\" (for values of X where you do it the same way.)\n\nThis is a great example of a question that should not be closed as a duplicate. Lists are not tuples in Python, regardless of how similar potential answers may be."
}
,
{
"id": "46482756",
"text": "It seemed to me that pre-llm, google had stopped surfacing stackoverflow answers in search results."
}
,
{
"id": "46482851",
"text": "My memory is there were a spate of SO scraping sites that google would surface above SO and google just would not zap.\n\nIt would have been super trivial to fix but google didn’t.\n\nMy pet theory was that google were getting doubleclick revenue from the scrapers so had incentives to let them scrape and to promote them in search results."
}
,
{
"id": "46483873",
"text": "I remember those too! There were seemingly thousands of them!\n\nReminds me of my most black-hat project — a Wikipedia proxy with 2 Adsense ads injected into the page. It made me like $20-25 a month for a year or so but sadly (nah, perfectly fairly) Google got wise to it."
}
,
{
"id": "46484372",
"text": "I'm actually surprised it was only ~$20 a month."
}
,
{
"id": "46482939",
"text": "Because nobody was clicking on them"
}
,
{
"id": "46483349",
"text": "Oh yeah.\n\nMy favorite feature of LLMs, is the only dumb question, is the one I don't ask.\n\nI guess someone could train an LLM to be spiteful and nasty, but that would only be for entertainment."
}
,
{
"id": "46485057",
"text": "If you say the wrong thing to grok, it will go off on you. It's quite entertaining!"
}
,
{
"id": "46483006",
"text": "I suppose all sites that have a voting component run the risk of becoming unpleasant.\n\nHacker News, and we who frequent it, ought to have that in mind."
}
,
{
"id": "46483027",
"text": "dang and the other HN moderators do a heroic job to set the tone, which has second- and third-order effects on behavior."
}
,
{
"id": "46483482",
"text": "I think it has more to do with the fact that when you offer zero salary for moderators, you have to take what you can get, and it ain't good. I don't really see a connection to the voting mechanic."
}
,
{
"id": "46487281",
"text": "Why do you think it makes a difference if they are paid or not? Or more to the point: what are you saying? That people have different standards when paid? That lack of remuneration justifies poor effort? Isn’t that a very transactional view of human interaction? Are we that transactional? Do we want this?\n\nWe’re talking about how communities can become toxic. How we humans sometimes create an environment that is at odds with our intentions. Or at least what we outwardly claim to be our intentions.\n\nI think it is a bit sad when people feel they have to be compensated to not let a community deteriorate."
}
,
{
"id": "46488313",
"text": "> That people have different standards when paid? That lack of remuneration justifies poor effort? Isn’t that a very transactional view of human interaction? Are we that transactional?\n\nThe answer to all of these questions is yes, for the most part. Volunteers are much harder to wrangle than employees and it's much easier for drama and disagreements to flare when there are zero consequences other than losing an unpaid position, particularly if anonymity is in the mix.\n\nVolunteers can be great but on average they're going to be far harder to manage and far more fickle than employees."
}
,
{
"id": "46488839",
"text": "Then you have a much darker view of humanity than I have. What you seem to suggest is that because building a community on volunteers is hard it is not worth doing.\n\nWhat makes a community worthwhile is its ability to resolve differences productively. I think that if you replace individual responsibility with transactionality you have neither community nor long term viability or scalability.\n\nThen again, we live in times when transactional thinking seems to dominate discourse."
}
,
{
"id": "46494122",
"text": "It's because I was involved with a large volunteer-based project that was a literal 24/7/365 operation for several years (dozens of volunteers at any given time and tens of thousands of concurrent users) and can speak first hand as to the differences.\n\nI didn't say it's not worth doing but it will bring challenges that wouldn't exist with employees. Paying people adds a strong motivator to keep toxic behaviour at bay.\n\nYour experiences will heavily depend on the type of project you're running but regardless, you can't hold volunteers, especially online, to the same expectations or standards as employees. The amount of time and effort they can invest will wax and wane and there's nothing you can do about it. Anonymity and lack of repercussions will eventually lead to drama or power struggles when a volunteer steps out of line in a way that they wouldn't in paid employment. There is no fix that'll stop occasional turbulence, it's just the way it is. Not all of your volunteers will be the"
}
,
{
"id": "46485091",
"text": "It's also disconnected incentives. SO users get numbers to go up by taking moderation actions so of course they do that. Also you literally get banned from reviewing questions if you don't flag enough of them to be closed. These are incentives put in place by the SO company intentionally.\n\nIt's not like only slimy people get to use moderator tools like on Reddit, since you need a lot of reputation points you get by having questions and answers voted up. It's more like (1) you select people who write surface-level-good answers since that's what's upvoted, and they moderate with a similar attitude and (2) once you have access to moderator tools you're forced to conform with (1) or your access is revoked, and (3) the company is completely incompetent and doesn't give a shit about any of this."
}
,
{
"id": "46483038",
"text": "That depends on what you mean by \"came along\". If you mean \"once that everyone got around to the idea that LLMs were going to be good at this thing\" then sure, but it was not long ago that the majority of people around here were very skeptical of the idea that LLMs would ever be any good at coding."
}
,
{
"id": "46483113",
"text": "What you're arguing about is the field completely changing over 3 years; it's nothing, as a time for everyone to change their minds.\n\nLLMs were not productified in a meaningful way before ChatGPT in 2022 (companies had sufficiently strong LLMs, but RLHF didn't exist to make them \"PR-safe\"). Then we basically just had to wait for LLM companies to copy Perplexity and add search engines everywhere (RAG already existed, but I guess it was not realistic to RAG the whole internet), and they became useful enough to replace StackOverflow."
}
,
{
"id": "46483062",
"text": "I dont think this is true. People were skeptical of agi / better than human coding which is not the case. As a matter of fact i think searching docs was one of the first manor uses of llms before code."
}
,
{
"id": "46483103",
"text": "That's because there has been rapid improvement by LLMs.\n\nTheir tendency to bullshit is still an issue, but if one maintains a healthy skepticism and uses a bit of logic it can be managed. The problematic uses are where they are used without any real supervision.\n\nEnabling human learning is a natural strength for LLMs and works fine since learning tends to be multifaceted and the information received tends to be put to a test as a part of the process."
}
,
{
"id": "46483041",
"text": "all true, but i still find myself ask questions there after llm gave wrong answers and wasted my time"
}
,
{
"id": "46482859",
"text": "The irony is that the LLMs are trained on stack overflow and should inherit a lot of those traits and errors."
}
,
{
"id": "46483983",
"text": "Yeah, but they don't inherit their rules and attitude.\n\nReally, if we could apply some RLHF to the Stack Overflow community, it would be doing a lot better."
}
,
{
"id": "46482839",
"text": "How can we be sure that LLMs won't start giving stale answers?"
}
,
{
"id": "46482907",
"text": "We can't. I don't think the LLMs themselves can recognize when an answer is stale. They could if contradicting data was available, but their very existence suppresses the contradictory data."
}
,
{
"id": "46485847",
"text": "LLMs don't experience the world, so they have no reason a priori to know what is or isn't truthful in the training data.\n\n(Not to mention the confabulation. Making up API method names is natural when your model of the world is that the method names you've seen are examples and you have no reason to consider them an exhaustive listing.)"
}
,
{
"id": "46483094",
"text": "They will, but model updates and competition help solve the problem. If people find that Claude consistently gives better/more relevant answers over GPT, for example, people will choose the better model.\n\nThe worst thing with Q/A sites isn't they don't work. It's that they there are no alternatives to stackoverflow. Some of the most upvoted answers on stackoverflow prove that it can work well in many cases, but too bad most other times it doesn't."
}
,
{
"id": "46482920",
"text": "They still use the official documentation/examples, public Github Repos, and your own code which are all more likely to be evergreen. SO was definitely a massive training advantage before LLMs matured though."
}
,
{
"id": "46483466",
"text": "LLMs are just statistics, eventually they kill themselves with feedback loop by consuming their own farts (literally)"
}
,
{
"id": "46482926",
"text": ">For all their flaws, LLMs are so much better\n\nBut LLMs get their answers from StackOverflow and similar places being used as the source material. As those start getting outdated because of lack of activity, LLMs won't have the source material to answer questions properly."
}
,
{
"id": "46483272",
"text": "I regularly use Claude and friends where I ask it to use the web to look at specific GitHub repos or documentation to ask about current versions of things. The “LLMs just get their info from stack overflow” trope from the GPT-3 days is long dead - they’re pretty good at getting info that is very up to date by using tools to access the web. In some cases I just upload bits and pieces from a library along with my question if it’s particularly obscure or something home grown, and they do quite well with that too. Yes, they do get it wrong sometimes - just like stack overflow did too."
}
,
{
"id": "46484484",
"text": "The amount of docs that have a “Copy as markdown” or “Copy for AI” button has been noticeably increasing, and really helps the LLM with proper context."
}
,
{
"id": "46485267",
"text": "they’re pretty good at getting info that is very up to date by using tools to access the web\n\nYeah that's a charitable way to phrase \"perform distributed denial of service attacks\". Browsing github as a human with their draconian rate limits that came about as a result of AI bots is fucking great."
}
,
{
"id": "46487890",
"text": "You know DDoS attacks are illegal, right? If you have proof that OpenAI is DDoSing your site, go sue them for millions of dollars."
}
,
{
"id": "46482970",
"text": "StackOverflow answers are outdated. Every time I end up on that site these days, I find myself reading answers from 12 years ago that are no longer relevant."
}
,
{
"id": "46483363",
"text": "I see plenty of old answers that are still very relevant. Suppose it depends on what language/tech tags you follow."
}
,
{
"id": "46485851",
"text": "There have been many times I have seen someone complain on the meta site about answers being old and outdated, and then they give specific examples, and I go check them out and they're actually still perfectly valid."
}
,
{
"id": "46483048",
"text": "Now they can read the documentation and code in the repo directly and answer based on that."
}
,
{
"id": "46485015",
"text": "SO had answers that you couldn't find in the documentation and were you can't look in the source code.\n\nIf everything would be well documentated SO wouldn't have being as big as it was in the first place."
}
,
{
"id": "46483127",
"text": "I think the industry is quickly moving to syntheticly derived knowledge, or custom/systematic knowledge production from humans."
}
,
{
"id": "46484621",
"text": "not only stackoverflow, but also reddit.com/r/aws reddit.com/r/docker reddit.com/r/postgresql all 3 of them have extremely toxic communities. ask a question and get downvoted instantly! Noo!! your job is to actually upvote the question to maximize exposure for the algorithm unless it is a really really stupid question that a google search could fix"
}
,
{
"id": "46482755",
"text": "Yep, LLMs are perfect for the \"quick buy annoying to answer 500 times\" questions about writing a short script, or configuring something, or using the right combination of command line parameters.\n\nQuicker than searching the entirety of Google results and none of the attitude."
}
,
{
"id": "46482761",
"text": "> For all their flaws, LLMs are so much better.\n\nFor now. They still need to be enshitted."
}
,
{
"id": "46482853",
"text": "Models are check-pointed. You can save one you like and use it forever."
}
,
{
"id": "46483841",
"text": "You can save an open source + open weights model, which is frozen in time. That’s still very useful for some things but lacks knowledge of current data.\n\nSo we’ll end up with a choice of low-performing stale models or high-performing enshittified models which know about more current information."
}
,
{
"id": "46483929",
"text": "Open source models get updated all the time. You'd only be a few months behind."
}
]
Return ONLY a JSON array with this exact structure (no other text):
[
{
"id": "comment_id_1",
"topics": [
1,
3,
5
]
}
,
{
"id": "comment_id_2",
"topics": [
2
]
}
,
...
]
Rules:
- Each comment can have 0 to 3 topics
- Use 1-based topic indices
- Only assign topics that are genuinely relevant to the comment
- If no topics match, use an empty array:
{
"id": "...",
"topics": []
}
50