Summarizer

LLM Input

llm/7c7e49f1-870c-4915-9398-3b2e1f116c0c/batch-9-5632742f-3035-4480-9b50-7004df66f18d-input.json

prompt

You are a comment classifier. Given a list of topics and a batch of comments, assign each comment to up to 3 of the most relevant topics.

TOPICS (use these 1-based indices):
1. Toxic moderation culture
2. LLMs replacing Stack Overflow
3. Duplicate question closures
4. Knowledge repository vs help desk debate
5. Community decline timeline
6. Discord as alternative platform
7. Future of LLM training data
8. Gamification and reputation systems
9. Expert knowledge preservation
10. Reddit as alternative
11. Question quality standards
12. Moderator power dynamics
13. Google search integration decline
14. Stack Exchange expansion problems
15. Human interaction loss
16. Documentation vs community answers
17. Site mission misalignment
18. New user experience
19. GitHub Discussions alternative
20. Corporate ownership changes

COMMENTS TO CLASSIFY:
[
  
{
  "id": "46482934",
  "text": "This change was happening well before LLMs. People were tired of being yelled at and treated poorly.\n\nA cautionary tale for many of these types of tech platforms, this one included."
}
,
  
{
  "id": "46482624",
  "text": "They will no doubt blame this on AI, somehow (ChatGPT release: late 2022, decline start: mid 2020), instead of the toxicity of the community and the site's goals of being a knowledgebase instead of a QA site despite the design.\n\nPS - This comment is closed as a [duplicate] of this comment: https://news.ycombinator.com/item?id=46482620"
}
,
  
{
  "id": "46482714",
  "text": "Right. I often end up on Stack Exchange when researching various engineering-related topics, and I'm always blown away by how incredibly toxic the threads are. We get small glimpses of that on HN, but it was absolutely out of control on Stack Exchange.\n\nAt the same time, I think there was another factor: at some point, the corpus of answered questions has grown to a point where you no longer needed to ask, because by default, Google would get you to the answer page. LLMs were just a cherry on top."
}
,
  
{
  "id": "46485942",
  "text": "> I'm always blown away by how incredibly toxic the threads are.\n\nThey are not \"threads\" and are not supposed to be \"threads\". Thinking about them as if they were, is what leads to the perception of toxicity."
}
,
  
{
  "id": "46488426",
  "text": "It's funny that people blame the site for this.\n\nThat toxicity is just part of software engineering culture. It's everywhere."
}
,
  
{
  "id": "46488517",
  "text": "Its karma farming. Number must go up regardless of the human cost. Thats why the same problem is seen here, to a lesser extent.\n\nKarma in social media is a technology to produce competitiveness and unhappiness, usually to increase advertising engagement.\n\nCompare how nice the people are on 4chan /g/ board compared to the declining years of SO. Or Reddit for that matter."
}
,
  
{
  "id": "46483519",
  "text": "I agree there was some natural slow down as the corpus grew - the obvious questions were answered. But if the community was healthy, that should not have caused growth to stop. New technologies get created all the time, each starting with zero SO questions. (Or Google releases v2.0 which invalidates all answers written about v1.)\n\nSO just stopped being fun for me. I wish more systems would use their point systems though."
}
,
  
{
  "id": "46486194",
  "text": "I think about better voting systems all the time (one major issue being downvote can mean \"I want fewer people to see this\", \"I disagree\", and \"This is factually wrong\" and you never know which.\n\nBut I am not sure if SO's is actually that good, given it led to this toxic behavior.\n\nI think something like slashdot's metamoderation should work best but I never participated there nor have I seen any other website use anything similar."
}
,
  
{
  "id": "46490808",
  "text": "Arstechnica used to have different kinds of upvotes for \"funny\" vs \"insightful\" - I forget exactly all of them. But I found it awesome. I wanted to and could read the insightful comments, not the funny ones. A couple years back they redid the discussion system and got rid of it. Since then the quality of discussion has IMHO completely tanked."
}
,
  
{
  "id": "46501312",
  "text": "Other tech support forums are terrible in other ways. AI is a godsend.\n\nTypical response:\n\nI am RJ, an Independent Advisor and Microsoft Gold Certified Support Specialist Enthusiast.\n\nI know how your system is not functioning as desired! Rest assured, I am here to help you resolve this today.\n\nPlease follow these steps in order. Do not skip any steps.\n\nStep 1: Reboot your computer\nStep 2: Reinstall windows\nStep 3: Contact Microsoft support\n\nDid this resolve your issue? [ Yes ] [ No ]\n\nIf this helped, please mark this as the Answer and give me a 5-star rating so I can continue providing high-quality, scripted responses to other users!\n\nStandard Disclaimer: I do not work for Microsoft. I am an independent volunteer who enjoys copying and pasting from a manual written in 2014."
}
,
  
{
  "id": "46483043",
  "text": "People overestimate the impact of toxicity on number of monthly questions. The initial growth was due to missing answers. After some time there is a saturation point where all basic questions are already answered and can be found via Google. If you ask them again they are marked as dups."
}
,
  
{
  "id": "46483422",
  "text": "That would be true if no new technologies were created every year (even more often)."
}
,
  
{
  "id": "46488087",
  "text": "There are new technologies, but if you look at the most viewed questions, they will be about Python, JS, Java, C, and C++ without libraries."
}
,
  
{
  "id": "46484034",
  "text": "You do not find the 2009 jQuery answer satisfying?"
}
,
  
{
  "id": "46485938",
  "text": "> the site's goals of being a knowledgebase instead of a QA site despite the design.\n\nA Q&A site is a knowledge base. That's just how the information is presented.\n\nIf you want a forum — a place where you ask the question to get answered one-on-one — you have countless options for that.\n\nStack Overflow pages have a different design from that explicitly to encourage building a knowledge base. That's why there's a question at the top and answers underneath it, and why there are not follow-up questions, \"me too\" posts, discussion of annoyances related to the question, tangential rants, generic socialization etc.\n\nJeff Atwood was quite clear about this from the beginning."
}
,
  
{
  "id": "46483307",
  "text": "The downward trend seems to start ~2017, and was interrupted by a spike during the early months of COVID-19. I'd be interested to know what drove that jump, perhaps people were less hesitant to post when they were working from home?"
}
,
  
{
  "id": "46483393",
  "text": "More people spent lot more time learning new tech skills (at every experience level).\n\nThe excess time available (less commute or career pause etc) and more interest (much more new opportunities) were probably leading reasons why they spent more time I would imagine."
}
,
  
{
  "id": "46485515",
  "text": "I’d guess it’s also because it’s not as easy to ask your random question to a coworker when they’re not sitting next to you in the office."
}
,
  
{
  "id": "46485727",
  "text": "I felt it became easier with slack.\n\nThe culture to use slack as documentation tooling can become quite annoying. People just @here/@channel without hesitation and producers just also don't do actual documentation. They only respond to slack queries, which works in the moment, but terrible for future team members to even know what questions to search/ask for."
}
,
  
{
  "id": "46485956",
  "text": "A huge amount of people were just starting to learn programming, because they were stuck at home and had the time to pick something up.\n\nIf you look at the trends tag by tag, you can see that the languages, libraries, technologies etc. that appeal to beginners and recreational coders grew disproportionately."
}
,
  
{
  "id": "46482693",
  "text": "If you ignore the early pandemic bump, it even looks like the decline started in late 2017, though it's more variable than after the bump"
}
,
  
{
  "id": "46483678",
  "text": "I wonder what is the role of moderating duplicate questions. More time passes - more existing data there is and less need for new questions. If you moderate duplicate questions, will they disappear from these charts? Is this decline actually logical?\n\n2020 there was new CEO and moderator council was formed:\nhttps://stackoverflow.blog/2020/01/21/scripting-the-future-o..."
}
,
  
{
  "id": "46483320",
  "text": "Many people are pointing out the toxicity, but the biggest thing that drove me away, especially for specific quantitative questions, was that SO was flat out wrong (and confidently so) on many issues.\n\nIt was bad enough that I got back in the habit of buying and building a library of serious reference books because they were the only reliable way to answer detailed technical questions."
}
,
  
{
  "id": "46500367",
  "text": "If you do not mind my asking, what sorts of questions were you asking that were resulting in wrong answers?"
}
,
  
{
  "id": "46482822",
  "text": "There is an obvious acceleration of the downwards trend at the time ChatGPT got popular. AI is clearly a part of this, but not the only thing that affects SO activity."
}
,
  
{
  "id": "46483652",
  "text": "I wonder if we can attribute some $billion of the investment in LLMs directly to the toxicity on StackOverflow."
}
,
  
{
  "id": "46482878",
  "text": "Ironically they could probably do some really useful deduplication/normalization/search across questions and answers using AI/embeddings today, if only they’d actually allowed people to ask the same questions infinite different ways, and treated the result of that as a giant knowledge graph.\n\nI was into StackOverflow in the early 2010s but ultimately stopped being an active contributor because of the stupid moderation."
}
,
  
{
  "id": "46488320",
  "text": "Toxic community is mostly a meme myth. I have like 30k points and whatever admins were doing was well deserved as 90% of the questions were utterly impossible to help with. Most people wanted free help and couldn't even bother to put in 5 minutes of work."
}
,
  
{
  "id": "46485206",
  "text": "Use of GPT3 among programmers started 2021 with GitHub Copilot which preceded ChatGPT.\n\nI agree the toxic moderation (and tone-deaf ownership!) initiated the slower decline earlier that then turned into the LLM landslide.\n\nTbf SO also suffered from its own success as a knowledgebase where the easy pickings were long gone by then."
}
,
  
{
  "id": "46482695",
  "text": "It is sort of because of AI - it provided a way of escaping StackOverflow's toxicity!"
}
,
  
{
  "id": "46483093",
  "text": "Could view it as push/pull dynamics: pushed away by toxicity, pulled to good answers from AI."
}
,
  
{
  "id": "46483734",
  "text": "Actual analysts here that have looked at this graph like... a lot, so let me contextualize certain themes that tend to crop up from these:\n\n- The reduction of questions over time is asymptomatic of SO. When you have a library of every question asked, at some point, you asked most of the easy questions. Have a novel question becomes hard.\n- This graph is using the Posts table, not PostsWithDeleted. So, it only tells you of the questions that survived at this point in time, this [0] is the actual graph which while describes a curve that shows the same behavior, it's more \"accurate\" of the actual post creation.\n- This is actually a Good Thing™. For years most of the questions went unanswered, non-voted, non-commented, just because there was too many questions happening all the time. So the general trend is not something that the SO community needs to do anything about. Almost 20% of every question asked is marked as duplicate. If people searched... better™ they wouldn't ask as many questions, and so everyone else had more bandwidth to deal with the rest.\n- There has been a shift in help desk style of request, where people starting to prefer discord and such to get answers. This is actually a bad thing because that means that the knowledge isn't public nor indexed by the world. So, information becomes harder to find, and you need to break it free from silos.\n- The site, or more accurately, the library will never die. All the information is published in complete archives that anyone can replicate and restart if the company goes under or goes evil. So, yeah, such concerns, while appreciated, are easily addressed. At worst, you would be losing a month or two of data.\n\n[0]: https://data.stackexchange.com/stackoverflow/query/edit/1926..."
}
,
  
{
  "id": "46486591",
  "text": "OP here: I had the same thought, but noticed a very similar trend in both [0]; I think this graph is more interesting because you'd expect the number of new users to be growing [1], but this seems to have very little effect on deleted questions or even answers\n\n[0]: https://data.stackexchange.com/stackoverflow/query/1927371#g...\n\n[1]: https://data.stackexchange.com/stackoverflow/query/1927375#g...\n\nThe second graph here ([1]) is especially interesting because the total montly number of new users seems completely unrelated to number of posts, until you filter for a rep > 1 which has a close to identical trend"
}
,
  
{
  "id": "46485674",
  "text": "> When you have a library of every question asked, at some point, you asked most of the easy questions. Have a novel question becomes hard\n\nThis would be true if programming were a static field, but given that new programming languages/frameworks/technologies/techniques/etc. are constantly coming out and evolving, that argument doesn't make sense."
}
,
  
{
  "id": "46486310",
  "text": "Programming is not a static field in the answers side, but it's in the question side. \"How to print characters on a terminal with python?\" is the same problem today as it was 25 years ago. The answer changed but the problem remained. That's what people saying that programming isn't static is missing: the problem space grows significantly slower than the solution space."
}
,
  
{
  "id": "46490025",
  "text": "But you’re supposed to replace Python with a new language that hasn’t been asked about."
}
,
  
{
  "id": "46484014",
  "text": "> which while describes a curve that shows the same behavior, it's more \"accurate\" of the actual post creation.\n\nI would say that this graph looks a lot more extreme, actually!"
}
,
  
{
  "id": "46484094",
  "text": "At my place of work we use an indexing service for discord that creates an index of searchable static pages for all discord interactions.\n\nSo while I agree the help desk style system isn’t really better it also doesn’t necessarily mean that it is lost forever in a silo.\n\nBefore you ask, we use https://www.linen.dev/ but I’m sure there are other similar solutions by now"
}
,
  
{
  "id": "46487263",
  "text": "What exactly do you mean by \"asymptomatic\"? The proper meaning of the word does not fit into what you wrote.\n\n\"Asymptomatic\" means you have a cold but you show none of the symptoms, hence a -symptom-atic, no symptoms."
}
,
  
{
  "id": "46487477",
  "text": "parent is meaning asymptote as in maths where a line is approaching to a value but never touching it.\nmaybe it was a typo or auto correct...\n\nhttps://en.wikipedia.org/wiki/Asymptote"
}
,
  
{
  "id": "46485418",
  "text": "Your post formatting is making this very difficult to read"
}
,
  
{
  "id": "46486317",
  "text": "I am aware, I tried to do some kind of bullet point as I've seen other posts, but I don't understand how to \"activate it\""
}
,
  
{
  "id": "46486992",
  "text": "there is no special list rendering in HN markdown. You just have to put extra spaces between each list item so that they are separate paragraphs"
}
,
  
{
  "id": "46483109",
  "text": "I guess I'm the only one that was a fan of SO's moderation. I never got too deep into it (answered some TypeScript questions). But the intention to reduce duped questions made a lot of sense to me. I like the idea of a \"living document\" where energy is focused on updating and improving answers to old versions of the same question. As a user looking for answers it means I can worry less about finding some other variation of the same question that has a more useful answer\n\nI understand some eggs got cracked along the way to making this omelette but overall I'd say about 90% of the time I clicked on a SO link I was rewarded with the answer I was looking for.\n\nJust my two cents"
}
,
  
{
  "id": "46483358",
  "text": "The problem with duplicate questions is that they weren't duplicates at all, and mods weren't competent enough to tell a difference."
}
,
  
{
  "id": "46483771",
  "text": "Show me one that was closed by a moderator. Just one. And I will tell you exactly what happened."
}
,
  
{
  "id": "46484538",
  "text": "I think the poster you're responding to is correct. I've seen it many times myself. And just so you know, asking for a piece of data and not getting it is not going to be proof that you're right."
}
,
  
{
  "id": "46486253",
  "text": "No, but it will show, as someone else already responded, that they don't understand SO systems and processes at all. The question they linked [0] was closed by the asker themselves. It's literally one of the comments [1] on the question. Most questions aren't even closed by moderators, not even by user voting, but by the askers themselves [2], which can be seen on the table as community user. The community user gets attributed of all automated actions and whenever the user agrees with closure of their own question [3]. (The same user also gets attributed of bunch of other stuff [4]\n\nThis shows that critics of Stack Overflow don't understand how Stack Overflow works and start assigning things that SO users see normal and expected to some kind of malice or cabal. Now, if you learned how it works, and how long it has been working this way, you will see that cases of abuses are not only rare, they usually get resolved once they are known.\n\n[0]: https://stackoverflow.com/questions/32711321/setting-element...\n\n[1]: https://stackoverflow.com/questions/32711321/setting-element...\n\n[2]: https://meta.stackoverflow.com/questions/432658/2024-a-year-...\n\n[3]: https://meta.stackexchange.com/questions/250922/can-we-clari...\n\n[4]: https://meta.stackexchange.com/a/19739/213575"
}
,
  
{
  "id": "46485048",
  "text": "I logged into my old account and found an old question I asked:\n\nhttps://stackoverflow.com/questions/32711321/setting-element..."
}
,
  
{
  "id": "46485197",
  "text": "The linked answer seems like a valid guess for a relevant dupe. Like I said in my comment, \"I understand a few eggs got cracked along the way to making this omelette\" but I really don't think this was as widespread of a problem as people are making it out to be.\n\nThey also have Meta Stack Overflow to appeal if you think your question was unfairly marked as a dupe. From what I read, it seems that most mods back off readily"
}

]

Return ONLY a JSON array with this exact structure (no other text):
[
  
{
  "id": "comment_id_1",
  "topics": [
    1,
    3,
    5
  ]
}
,
  
{
  "id": "comment_id_2",
  "topics": [
    2
  ]
}
,
  ...
]

Rules:
- Each comment can have 0 to 3 topics
- Use 1-based topic indices
- Only assign topics that are genuinely relevant to the comment
- If no topics match, use an empty array: 
{
  "id": "...",
  "topics": []
}

commentCount

50

← Back to job