llm/7c7e49f1-870c-4915-9398-3b2e1f116c0c/batch-8-f6dd5896-ddbb-4478-950d-b3d592bfc09e-input.json
You are a comment classifier. Given a list of topics and a batch of comments, assign each comment to up to 3 of the most relevant topics.
TOPICS (use these 1-based indices):
1. Toxic moderation culture
2. LLMs replacing Stack Overflow
3. Duplicate question closures
4. Knowledge repository vs help desk debate
5. Community decline timeline
6. Discord as alternative platform
7. Future of LLM training data
8. Gamification and reputation systems
9. Expert knowledge preservation
10. Reddit as alternative
11. Question quality standards
12. Moderator power dynamics
13. Google search integration decline
14. Stack Exchange expansion problems
15. Human interaction loss
16. Documentation vs community answers
17. Site mission misalignment
18. New user experience
19. GitHub Discussions alternative
20. Corporate ownership changes
COMMENTS TO CLASSIFY:
[
{
"id": "46482939",
"text": "Because nobody was clicking on them"
}
,
{
"id": "46483006",
"text": "I suppose all sites that have a voting component run the risk of becoming unpleasant.\n\nHacker News, and we who frequent it, ought to have that in mind."
}
,
{
"id": "46483027",
"text": "dang and the other HN moderators do a heroic job to set the tone, which has second- and third-order effects on behavior."
}
,
{
"id": "46483482",
"text": "I think it has more to do with the fact that when you offer zero salary for moderators, you have to take what you can get, and it ain't good. I don't really see a connection to the voting mechanic."
}
,
{
"id": "46487281",
"text": "Why do you think it makes a difference if they are paid or not? Or more to the point: what are you saying? That people have different standards when paid? That lack of remuneration justifies poor effort? Isn’t that a very transactional view of human interaction? Are we that transactional? Do we want this?\n\nWe’re talking about how communities can become toxic. How we humans sometimes create an environment that is at odds with our intentions. Or at least what we outwardly claim to be our intentions.\n\nI think it is a bit sad when people feel they have to be compensated to not let a community deteriorate."
}
,
{
"id": "46488313",
"text": "> That people have different standards when paid? That lack of remuneration justifies poor effort? Isn’t that a very transactional view of human interaction? Are we that transactional?\n\nThe answer to all of these questions is yes, for the most part. Volunteers are much harder to wrangle than employees and it's much easier for drama and disagreements to flare when there are zero consequences other than losing an unpaid position, particularly if anonymity is in the mix.\n\nVolunteers can be great but on average they're going to be far harder to manage and far more fickle than employees."
}
,
{
"id": "46488839",
"text": "Then you have a much darker view of humanity than I have. What you seem to suggest is that because building a community on volunteers is hard it is not worth doing.\n\nWhat makes a community worthwhile is its ability to resolve differences productively. I think that if you replace individual responsibility with transactionality you have neither community nor long term viability or scalability.\n\nThen again, we live in times when transactional thinking seems to dominate discourse."
}
,
{
"id": "46494122",
"text": "It's because I was involved with a large volunteer-based project that was a literal 24/7/365 operation for several years (dozens of volunteers at any given time and tens of thousands of concurrent users) and can speak first hand as to the differences.\n\nI didn't say it's not worth doing but it will bring challenges that wouldn't exist with employees. Paying people adds a strong motivator to keep toxic behaviour at bay.\n\nYour experiences will heavily depend on the type of project you're running but regardless, you can't hold volunteers, especially online, to the same expectations or standards as employees. The amount of time and effort they can invest will wax and wane and there's nothing you can do about it. Anonymity and lack of repercussions will eventually lead to drama or power struggles when a volunteer steps out of line in a way that they wouldn't in paid employment. There is no fix that'll stop occasional turbulence, it's just the way it is. Not all of your volunteers will be there for the greater good of your community.\n\nAgain, that is absolutely not to say that it can't be worth the effort but if you go into it eyes open, you'll have a much better time and be able to do a better job at heading off problems.\n\nI've seen other people express similar opinions to yours and it wasn't until they experienced being in the driver's seat that they understood how difficult it is."
}
,
{
"id": "46497921",
"text": "My argument is that it stops being a community when it becomes a business."
}
,
{
"id": "46485091",
"text": "It's also disconnected incentives. SO users get numbers to go up by taking moderation actions so of course they do that. Also you literally get banned from reviewing questions if you don't flag enough of them to be closed. These are incentives put in place by the SO company intentionally.\n\nIt's not like only slimy people get to use moderator tools like on Reddit, since you need a lot of reputation points you get by having questions and answers voted up. It's more like (1) you select people who write surface-level-good answers since that's what's upvoted, and they moderate with a similar attitude and (2) once you have access to moderator tools you're forced to conform with (1) or your access is revoked, and (3) the company is completely incompetent and doesn't give a shit about any of this."
}
,
{
"id": "46483349",
"text": "Oh yeah.\n\nMy favorite feature of LLMs, is the only dumb question, is the one I don't ask.\n\nI guess someone could train an LLM to be spiteful and nasty, but that would only be for entertainment."
}
,
{
"id": "46485057",
"text": "If you say the wrong thing to grok, it will go off on you. It's quite entertaining!"
}
,
{
"id": "46483038",
"text": "That depends on what you mean by \"came along\". If you mean \"once that everyone got around to the idea that LLMs were going to be good at this thing\" then sure, but it was not long ago that the majority of people around here were very skeptical of the idea that LLMs would ever be any good at coding."
}
,
{
"id": "46483113",
"text": "What you're arguing about is the field completely changing over 3 years; it's nothing, as a time for everyone to change their minds.\n\nLLMs were not productified in a meaningful way before ChatGPT in 2022 (companies had sufficiently strong LLMs, but RLHF didn't exist to make them \"PR-safe\"). Then we basically just had to wait for LLM companies to copy Perplexity and add search engines everywhere (RAG already existed, but I guess it was not realistic to RAG the whole internet), and they became useful enough to replace StackOverflow."
}
,
{
"id": "46483062",
"text": "I dont think this is true. People were skeptical of agi / better than human coding which is not the case. As a matter of fact i think searching docs was one of the first manor uses of llms before code."
}
,
{
"id": "46483103",
"text": "That's because there has been rapid improvement by LLMs.\n\nTheir tendency to bullshit is still an issue, but if one maintains a healthy skepticism and uses a bit of logic it can be managed. The problematic uses are where they are used without any real supervision.\n\nEnabling human learning is a natural strength for LLMs and works fine since learning tends to be multifaceted and the information received tends to be put to a test as a part of the process."
}
,
{
"id": "46483041",
"text": "all true, but i still find myself ask questions there after llm gave wrong answers and wasted my time"
}
,
{
"id": "46482859",
"text": "The irony is that the LLMs are trained on stack overflow and should inherit a lot of those traits and errors."
}
,
{
"id": "46483983",
"text": "Yeah, but they don't inherit their rules and attitude.\n\nReally, if we could apply some RLHF to the Stack Overflow community, it would be doing a lot better."
}
,
{
"id": "46482839",
"text": "How can we be sure that LLMs won't start giving stale answers?"
}
,
{
"id": "46482907",
"text": "We can't. I don't think the LLMs themselves can recognize when an answer is stale. They could if contradicting data was available, but their very existence suppresses the contradictory data."
}
,
{
"id": "46485847",
"text": "LLMs don't experience the world, so they have no reason a priori to know what is or isn't truthful in the training data.\n\n(Not to mention the confabulation. Making up API method names is natural when your model of the world is that the method names you've seen are examples and you have no reason to consider them an exhaustive listing.)"
}
,
{
"id": "46483094",
"text": "They will, but model updates and competition help solve the problem. If people find that Claude consistently gives better/more relevant answers over GPT, for example, people will choose the better model.\n\nThe worst thing with Q/A sites isn't they don't work. It's that they there are no alternatives to stackoverflow. Some of the most upvoted answers on stackoverflow prove that it can work well in many cases, but too bad most other times it doesn't."
}
,
{
"id": "46482920",
"text": "They still use the official documentation/examples, public Github Repos, and your own code which are all more likely to be evergreen. SO was definitely a massive training advantage before LLMs matured though."
}
,
{
"id": "46483466",
"text": "LLMs are just statistics, eventually they kill themselves with feedback loop by consuming their own farts (literally)"
}
,
{
"id": "46482926",
"text": ">For all their flaws, LLMs are so much better\n\nBut LLMs get their answers from StackOverflow and similar places being used as the source material. As those start getting outdated because of lack of activity, LLMs won't have the source material to answer questions properly."
}
,
{
"id": "46483272",
"text": "I regularly use Claude and friends where I ask it to use the web to look at specific GitHub repos or documentation to ask about current versions of things. The “LLMs just get their info from stack overflow” trope from the GPT-3 days is long dead - they’re pretty good at getting info that is very up to date by using tools to access the web. In some cases I just upload bits and pieces from a library along with my question if it’s particularly obscure or something home grown, and they do quite well with that too. Yes, they do get it wrong sometimes - just like stack overflow did too."
}
,
{
"id": "46484484",
"text": "The amount of docs that have a “Copy as markdown” or “Copy for AI” button has been noticeably increasing, and really helps the LLM with proper context."
}
,
{
"id": "46485267",
"text": "they’re pretty good at getting info that is very up to date by using tools to access the web\n\nYeah that's a charitable way to phrase \"perform distributed denial of service attacks\". Browsing github as a human with their draconian rate limits that came about as a result of AI bots is fucking great."
}
,
{
"id": "46487890",
"text": "You know DDoS attacks are illegal, right? If you have proof that OpenAI is DDoSing your site, go sue them for millions of dollars."
}
,
{
"id": "46496257",
"text": "Ah, I see you have a JD from OpenAI.\n\nI don't run personal sites worth millions of dollars. I do, however, use sites like Sourcehut, DigiKey, Github, Mouser, Farnell, etc, etc, etc. that have opted to put everything behind bullshit captchas because of the DDoS (nee AI) bots."
}
,
{
"id": "46482970",
"text": "StackOverflow answers are outdated. Every time I end up on that site these days, I find myself reading answers from 12 years ago that are no longer relevant."
}
,
{
"id": "46483363",
"text": "I see plenty of old answers that are still very relevant. Suppose it depends on what language/tech tags you follow."
}
,
{
"id": "46485851",
"text": "There have been many times I have seen someone complain on the meta site about answers being old and outdated, and then they give specific examples, and I go check them out and they're actually still perfectly valid."
}
,
{
"id": "46483048",
"text": "Now they can read the documentation and code in the repo directly and answer based on that."
}
,
{
"id": "46485015",
"text": "SO had answers that you couldn't find in the documentation and were you can't look in the source code.\n\nIf everything would be well documentated SO wouldn't have being as big as it was in the first place."
}
,
{
"id": "46483127",
"text": "I think the industry is quickly moving to syntheticly derived knowledge, or custom/systematic knowledge production from humans."
}
,
{
"id": "46484621",
"text": "not only stackoverflow, but also reddit.com/r/aws reddit.com/r/docker reddit.com/r/postgresql all 3 of them have extremely toxic communities. ask a question and get downvoted instantly! Noo!! your job is to actually upvote the question to maximize exposure for the algorithm unless it is a really really stupid question that a google search could fix"
}
,
{
"id": "46482755",
"text": "Yep, LLMs are perfect for the \"quick buy annoying to answer 500 times\" questions about writing a short script, or configuring something, or using the right combination of command line parameters.\n\nQuicker than searching the entirety of Google results and none of the attitude."
}
,
{
"id": "46482761",
"text": "> For all their flaws, LLMs are so much better.\n\nFor now. They still need to be enshitted."
}
,
{
"id": "46482853",
"text": "Models are check-pointed. You can save one you like and use it forever."
}
,
{
"id": "46483841",
"text": "You can save an open source + open weights model, which is frozen in time. That’s still very useful for some things but lacks knowledge of current data.\n\nSo we’ll end up with a choice of low-performing stale models or high-performing enshittified models which know about more current information."
}
,
{
"id": "46483929",
"text": "Open source models get updated all the time. You'd only be a few months behind."
}
,
{
"id": "46483951",
"text": "Direct enshittification is intentional and wouldn’t affect open models.\n\nIndirect pollution via AI slop in the input and the same content manipulation mechanisms as SEO hacking is still a threat for open models."
}
,
{
"id": "46483266",
"text": "Doesn't help when the ads are a layer above the model."
}
,
{
"id": "46483764",
"text": "There are open source models you yourself or a trusted third party can run. No ads."
}
,
{
"id": "46484289",
"text": "Yup. Like Claude 3 Opus."
}
,
{
"id": "46483997",
"text": "Really? I thought you could only do that with open source models. Can you teach me how to checkpoint the current version of Claude Code so I can keep it as-is forever?"
}
,
{
"id": "46482788",
"text": "Yeah just wait for the ads"
}
,
{
"id": "46482734",
"text": "Indeed. StackOverflow was by far the most unpleasant website that I have regularly interacted with. Sometimes, just seeing how users were treated there (even in Q&A threads that I wasn’t involved in at all) disturbed me so much it was actually interfering with my work. I’m so, so glad that I can now just ask an AI to get the same (or better) answers, without having to wade through the barely restrained hate on that site."
}
]
Return ONLY a JSON array with this exact structure (no other text):
[
{
"id": "comment_id_1",
"topics": [
1,
3,
5
]
}
,
{
"id": "comment_id_2",
"topics": [
2
]
}
,
...
]
Rules:
- Each comment can have 0 to 3 topics
- Use 1-based topic indices
- Only assign topics that are genuinely relevant to the comment
- If no topics match, use an empty array:
{
"id": "...",
"topics": []
}
50