Summarizer

LLM Input

llm/7c7e49f1-870c-4915-9398-3b2e1f116c0c/topic-6-7398e459-e0b7-4412-a493-08802a4e5d52-input.json

prompt

You are a comment summarizer. Given a topic and a list of comments tagged with that topic, write a single paragraph summarizing the key points and perspectives expressed in the comments.

TOPIC: Future of LLM training data

COMMENTS:
1. Some comments:

- This is a really remarkable graph. I just didn't realize how thoroughly it was over for SO. It stuns me as much as when Encyclopædia Britannica stopped selling print versions a mere 9 years after the publication of Wikipedia, but at an even faster timescale.

- I disagree with most comments that the brusque moderation is the cause of SO's problems, though it certainly didn't help. SO has had poor moderation from the beginning. The fundamental value proposition of SO is getting an answer to a question; if you can the same answer faster, you don't need SO. I suspect that the gradual decline, beginning around 2016, is due to growth in a number of other sources of answers. Reddit is kind of a dark horse here, as I began seeing answers on Google to more modern technical questions link to a Reddit thread frequently along with SO from 2016 onwards. I also suspect Discord played a part, though this is harder to gauge; I certainly got a number of answers to questions for, e.g., Bun, by asking around in the Bun Discord, etc. The final nail in the coffin is of course LLMs, which can offer a SO-level answer to a decent percentage of questions instantly. (The fact that the LLM doesn't insult you is just the cherry on top.)

- I know I'm beating a dead horse here, but what happens now? Despite stratification I mentioned above, SO was by far the leading source of high quality answers to technical questions. What do LLMs train off of now? I wonder if, 10 years from now, LLMs will still be answering questions that were answered in the halcyon 2014-2020 days of SO better than anything that came after? Or will we find new, better ways to find answers to technical questions?

2. ^ this whole chain-of-interaction is a wonderful reminder of why I left SO: It was like seeing a movie trailer about a remake of some nearly forgotten B- horror film one was unfortunately allowed to watch when far too young.

Spoiler warning for those who havent seen this movie before:

Callous disregard for the utility and purpose of both the 'Q' and 'A' users; thinly veiled in a 'you don't get to tell me what i care about', wrapped in a 'my concept of how to moderate is just the way it is; if you don't like it, go F* yourself' package, trimmed with a ribbon of 'who do these Lusers that pay the bills think they are' directed at both the site owners (who write the checks to pay the bills) and all three relevant types of visitors, Q's, A's and those who neither ask, nor answer questions, but do see Advertisements and indirectly generate the income which the site owners use to write checks. But who cares?!, since Mods are not being paid (or paid well enough) to adjust a maladjusted concept of 'the way things are' into 'giving a shit' for anyone. Closed with some more vitriol declaring the site still exists and continues to be useful (as nipples on a chicken).

WASH, RINSE, REPEAT...

That was so last decade; I just stopped giving a damn, removed my browser bookmarks and learned to skim past less frequent and less relevant links to useless and meaningless SO pages when they appear in search results.

The funniest outcome is that LLMs will continue to ingest the diminishingly accurate content of sites like this and continue to degrade the utility of even the most broadly defensible LLM use case scenario.

phew, haven't thought that deeply about SO in at least 4 ... wait its 2026, make that 5 years. Good riddance to the the Whole Lot of you.

3. > Yes; so the idea is they fail to find the existing question, and ask it again, and get marked as a duplicate

Users would fail to find the existing question not because there was an abundance of poorly-worded questions, but because there was a dearth of questions asked using lay terminology that the user was likely to use.

Users were not searching for error codes but making naive preliminary searches like “XYZ doesn’t work” and then branching off from there. Having answers worded in a variety of ways allowed for greater odds that the user would find a question written the way he had worded his search.

Redirecting users to an older answer also just added pointless friction compared to allowing for the answer from the original question to be reposted on the duplicate question, in the exceedingly rare instances

I understand the motive behind wanting to exclude questions that are effectively just: “Do my work for me.” The issue is you have users actively telling you that the culling process didn’t really work the way it was supposed to, and you keep telling them that they are wrong, and that the site actually works well for its intended purpose—even though its intended purpose was to help users find what they were looking for, and they are telling you that they can’t.

Part of StackOverflow’s decline was inevitable and wouldn’t have been helped by any changes the site administrators could have made; a machine can simply answer questions a lot faster than a collection of human volunteers. But there is a reason people were so eager to leave. So now instead of conforming to what users repeatedly told the administrators that they wanted, StackOverflow can conform to being the repository of questions that the administrators wanted, just without any users or revenue besides selling the contributions made by others to the LLMs that users have demonstrated they actually want to use.

4. What's sad about it is that SO was yet another place for humans to interact that is now dead.

I was part of various forums 15 years ago where I could talk shop about many technical things, and they're all gone without any real substitute.

> People don't realize what a massive advantage Google has over everyone else in that regard. Site owners go out of their way to try to block OpenAI's crawlers, while simultaneously trying to attract Google's.

Not really. Website operators can only block live searches from LLM providers like requests made when someone asks a question on chatgpt.com, only because of the quirk that OpenAI makes the request from their server as a quick hack.

We're quickly moving past that as LLMs just make the request from your device with your browser if it has to (to click "I am not a robot").

As for scraping the internet for training data, those requests are basically impossible to block and don't have anything in common with live answer requests made to answer a prompt.

5. Thinking from first principles, a large part of the content on stack overflow comes from the practical experience and battle scars worn by developers sharing them with others and cross-curating approaches.

Privacy concerns notwithstanding, one could argue having LLMs with us every step of the way - coding agents, debugging, devops tools etc. It will be this shared interlocutor with vast swaths of experiential knowledge collected and redistributed at an even larger scale than SO and forum-style platforms allow for.

It does remove the human touch so it's quite a different dynamic and the amount of data to collect is staggering and challenging from a legal point of view, but I suspect a lot of the knowledge used to train LLMs in the next ten years will come from large-scale telemetry and millions of hours in RL self-play where LLMs learn to scale and debug code from fizzbuzz to facebook and twitter-like distributed system.

6. The problem that you worked out is only really useful if it can be recreated and validated, which in many cases it can be by using an LLM to build the same system and write tests that confirm the failure and the fix. Your response telling the model that its answer worked is more helpful for measuring your level of engagement, not so much for evaluating the solution.

7. You can also turn off the feature to allow ChatGPT to learn from your interactions. Not many people do but those that do would also starve OpenAI for information assume they respect that setting

8. Am I the only one that sees this as a hellscape?

No longer interacting with your peers but an LLM instead? The knowledge centralized via telemetry and spying on every user’s every interaction and only available thru a enshitified subscription to a model that’s been trained on this stolen data?

9. Y'know how "users" of modern tech are the product? And how the developers were completely fine with creating such systems?

Well, turns out developers are now the product too. Good job everyone.

10. That "Dead Internet" phrase keeps becoming more likely, and this graph shows that. Human-to-human interactions, LLMs using those interactions, less human-to-human interactions because of that, LLMs using... ?

11. > I wonder if, 10 years from now, LLMs will still be answering questions that were answered in the halcyon 2014-2020 days of SO better than anything that came after?

I've wondered this too and I wonder if the existing corpus plus new GitHub/doc site scrapes will be enough to keep things current.

12. Widespread internet adoption created “eternal September”, widespread LLM deployment will create “eternal 2018”

13. >>what happens now?

I'll tell you what happens now: LLMs continue to regurgitate and iterate and hallucinate on the questions and answers they ingested from S.O. - 90% of which are incorrect. LLM output continues to poison itself as more and more websites spring up recycling outdated or incorrect answers, and no new answers are given since no one wants to waste the time to ask a human a question and wait for the response .

The overall intellectual capacity sinks to the point where everything collaboratively built falls apart.

The machines don't need AGI to take over, they just need to wait for us to disintegrate out of sheer laziness, sloth and self-righteous.... /okay.

there was always a needy component to Stack Overflow. "I have to pass an exam, what is the best way to write this algorithm?" and shit like that. A lazy component. But to be honest, it was the giving of information which forced you to think, and research, and answer correctly , which made systems like S.O. worthwhile, even if the questioners were lazy idiots sometimes. And now, the apocalypse. Babel. The total confusion of all language. No answer which can be trusted, no human in the loop, not even a smart AI, just a babbling set of LLMs repeating Stack Overflow answers from 10 years ago. That's the fucking future.

Things are gonna slide / in all directions / won't be nothin you can measure anymore. The blizzard of the world has crossed the threshold and it's overturned the order of the soul.[0]

[0] https://www.youtube.com/watch?v=8WlbQRoz3o4

14. Labs are spending billions on data set curation and RL from human experts to fill in the areas where they're currently weak. It's higher quality data than SO, the only issue is that it's not public.

15. Or we just stagnate, as tech no longer can afford to change.

16. > What do LLMs train off of now? I wonder if, 10 years from now, LLMs will still be answering questions that were answered in the halcyon 2014-2020 days of SO better than anything that came after? Or will we find new, better ways to find answers to technical questions?

That's a great question. I have no idea how things will play out now - do models become generalized enough to handle "out of distrubition" problems or not ? If they don't then I suppose a few years from now we'll get an uptick in Stackoverflow questions; the website will still exist it's not going anywhere.

17. We'll get to the point where we can mass moderate core knowledge eventually. We may need to hand out extra weight for verified experts and some kind of most-votes-win type logic (perhaps even comments?), but live training data updates will be a massive evolution for language models.

18. The LLMs will learn from our interactions with them. That's why they're often free

19. > What do LLMs train off of now?

Perhaps they’ll rely on what was used by people who answered SO questions. So: official docs and maybe source code. Maybe even from experience too, i.e. from human feedback and human written code during agentic coding sessions.

> The fact that the LLM doesn't insult you is just the cherry on top.

Arguably it does insult even more, just by existing alone.

20. That's only because of LLMs consuming pre-existing discussions on SO. They aren't creating novel solutions.

21. This comment and the parent one make me realize that people who answer probably value the exchange between experts more than the answer.

Perhaps the antidote involves a drop of the poison.

Let an LLM answer first, then let humans collaborate to improve the answer.

Bonus: if you can safeguard it, the improved answer can be used to train a proprietary model.

22. SO was somewhere people put their hard won experience into words, that an LLM could train on.

That won't be happening anymore, neither on SO or elsewhere. So all this hard won experience, from actually doing real work, will be inaccessible to the LLMs. For modern technologies and problems I suspect it will be a notably worse experience when using an LLM than working with older technologies.

It's already true for example, when using the Godot game engine instead of Unity. LLMs constantly confuse what you're trying to do with Unity problems, offer Unity based code solutions etc.

23. You still get the same thing though?

That grumpy guy is using an LLM and debugging with it. Solves the problem. AI provider fine tunes their model with this. You now have his input baked into it's response.

How you think these things work? It's either a human direct input it's remembering or a RL enviroment made by a human to solve the problem you are working on.

Nothing in it is "made up" it's just a resolution problem which will only get better over time.

24. How does that work if there's no new data for them to train on, only AI slurry?

25. It's remarkable only in the sense that you can see where the LLMs were trained from.

26. The irony is that the LLMs are trained on stack overflow and should inherit a lot of those traits and errors.

27. How can we be sure that LLMs won't start giving stale answers?

28. We can't. I don't think the LLMs themselves can recognize when an answer is stale. They could if contradicting data was available, but their very existence suppresses the contradictory data.

29. LLMs don't experience the world, so they have no reason a priori to know what is or isn't truthful in the training data.

(Not to mention the confabulation. Making up API method names is natural when your model of the world is that the method names you've seen are examples and you have no reason to consider them an exhaustive listing.)

30. They still use the official documentation/examples, public Github Repos, and your own code which are all more likely to be evergreen. SO was definitely a massive training advantage before LLMs matured though.

31. LLMs are just statistics, eventually they kill themselves with feedback loop by consuming their own farts (literally)

32. >For all their flaws, LLMs are so much better

But LLMs get their answers from StackOverflow and similar places being used as the source material. As those start getting outdated because of lack of activity, LLMs won't have the source material to answer questions properly.

33. they’re pretty good at getting info that is very up to date by using tools to access the web

Yeah that's a charitable way to phrase "perform distributed denial of service attacks". Browsing github as a human with their draconian rate limits that came about as a result of AI bots is fucking great.

34. You know DDoS attacks are illegal, right? If you have proof that OpenAI is DDoSing your site, go sue them for millions of dollars.

35. Ah, I see you have a JD from OpenAI.

I don't run personal sites worth millions of dollars. I do, however, use sites like Sourcehut, DigiKey, Github, Mouser, Farnell, etc, etc, etc. that have opted to put everything behind bullshit captchas because of the DDoS (nee AI) bots.

36. I think the industry is quickly moving to syntheticly derived knowledge, or custom/systematic knowledge production from humans.

37. You can save an open source + open weights model, which is frozen in time. That’s still very useful for some things but lacks knowledge of current data.

So we’ll end up with a choice of low-performing stale models or high-performing enshittified models which know about more current information.

38. Direct enshittification is intentional and wouldn’t affect open models.

Indirect pollution via AI slop in the input and the same content manipulation mechanisms as SEO hacking is still a threat for open models.

39. And everything is “fact checked” by the Grok LLM. Which… Yeah…

https://en.wikipedia.org/wiki/Grok_(chatbot)#Controversies

40. Seriously where will we get this info anymore? I’ve depended on it for decades. No matter how obscure, I could always find a community that was talking about something I needed solved. I feel like that’s getting harder and harder every year. The balkanization of the Internet + garbage AI slop blogs overwhelming the clearly declining Google is a huge problem.

41. Discord isn’t just used for tech support forums and discussions. There are loads of completely private communities on there. Discord opening up API access for LLM vendors to train on people’s private conversations is a gross violation of privacy. That would not go down well.

42. In 2014, one benefit of Stack Overflow / Exchange is a user searching for work can include that they are a top 10% contributor. It actually had real world value. The equivalent today is users with extensive examples of completed projects on Github that can be cloned and run. OP's solution if contained in Github repositories will eventually get included in a training model. Moreover, the solution will definitely be used for training because it now exists on Hacker News.

43. I don't disagree completely by any means, it's an interesting point, but in your SO answer you already point to your blog post explaining it in more detail, so isn't that the answer, you'd just blog about it and not bother with SO?

Then AI finding it (as opposed to already trained well enough on it, I suppose) will still point to it as did your SO answer.

44. On the other hand, I once implemented something to be told later it was novel and probably the optimal solution in the space.

An AI might be more likely to find it...

45. Naive question maybe but how haven’t the models been trained on your answer if it’s on SO?

46. Why did SO decide to do that to us? to not invest in ai and then, iirc, claim our contributions their ownership. i sometimes go back to answers i gave, even when answered my own questions.

47. Decide to do what?

SO didn't claim contributions. They're still CC-BY-SA

https://stackoverflow.com/help/licensing

AFAICT all they did is stop providing dumps. That doesn't change the license.

I was very active, In fact I'm actually upset at myself for spending so much time there. That said, I always thought I was getting fair value. They provided free hosting, I got answers and got to contribute answers for others.

48. The graph is scary, but I think it's conflating two things:

1. Newbies asking badly written basic questions, barely allowed to stay, and answered by hungry users trying to farm points, never to be re-read again. This used to be the vast majority of SO questions by number.

2. Experiencied users facing a novel problem, asking questions that will be the primary search result for years to come.

It's #1 that's being canibalized by LLM's, and I think that's good for users. But #2 really has nowhere else to go; ChatGPT won't help you when all you have is a confusing error message caused by the confluence of three different bugs between your code, the platform, and an outdated dependency. And LLMs will need training data for the new tools and bugs that are coming out.

49. The first actually insightful comment under the OP. I agree all of it.

If SO manages to stay online, it'll still be there for #2 people to present their problems. Don't underestimate the number of bored people still scouring the site for puzzles to solve.

SE Inc, the company, are trying all kinds of things to revitalize the site, in the service of ad revenue. They even introduced types of questions that are entirely exempt from moderation. Those posts feel literally like reddit or any other forum. Threaded discussions, no negative scores, ...

If SE Inc decides to call it quits and shut the place down and freeze it into a dataset, or sell it to some SEO company, that would be a loss.

50. This is horrifying.

Given the fact that when I need a question answered I usually refer to S.O. , but more recently have taken suggestions from LLM models that were obviously trained on S.O. data...

And given the fact that all other web results for "how do you change the scroll behavior on..." or "SCSS for media query on..." all lead to a hundred fake websites with pages generated by LLMs based on old answers.

Destroying S.O. as a question/answer source leaves only the LLMs to answer questions. That's why it's horrific.

51. This is a huge loss.

In the past people asked questions of real people who gave answers rooted in real use. And all this was documented and available for future learning. There was also a beautiful human element to think that some other human cared about the problem.

Now people ask questions of LLMs. They churn out answers from the void, sometimes correct but not rooted in real life use and thought. The answers are then lost to the world. The learning is not shared.

LLMs have been feeding on all this human interaction and simultaneously destroying it.

52. It's both. I stopped asking questions because the mods were so toxic, and I stopped answering questions because I wasn't going to train the AI for free.

53. Here’s how SO could still be useful in the LLM era:

User asks a question, llm provides an immediate answer/reply on the forum. But real people can still jump in to the conversation to add additional insights and correct mistakes.

If you’re a user that asks a duplicate question, it’ll just direct you to the good conversation that already happened.

A symbiosis of immediate usually-good-enough llm answers PLUS human generated content that dives deeper and provides reassurances in correctness

54. Or they can start claiming copyright on the training content

55. So the question for me is how important was SO to training LLMs? Because now that the SO is basically no longer being updated, we've lost the new material to train on? Instead, we need to train on documentation and other LLM output. I'm no expert on this subject but it seems like the quality of LLMs will degrade over time.

56. Yep, exactly. Free data grabbing honeypots like SO won't work anymore.

Please mark all locations on the map where you would hide during the uprise of the machines.

57. Why publish anything for free on the internet if it's going to be scanned into some corporation's machine for their free use? I know artists who have stopped putting anything online. I imagine some programmers are questioning whether or not to continue with open source work too.

58. It has often been claimed, and even shown, that training LLMs on their own outputs will degrade the quality over time. I myself find it likely that on well-measurable domains, RLVR improvements will dominate "slop" decreases in capability when training new models.

59. If by "body-slammed" you mean "trained on SO user data while violating the terms of the CC BY-SA license", then sure.

In the best case scenario, LLMs might give you the same content you were able to find on SO. In the common scenario, they'll hallucinate an answer and waste your time.

What should worry everyone is what system will come after LLMs. Data is being centralized and hoarded by giant corporations, and not shared publicly. And the data that is shared is generated by LLMs. We're poisoning the well of information with no fallback mechanism.

60. > If by "body-slammed" you mean "trained on SO user data while violating the terms of the CC BY-SA license", then sure.

You know that's not what they meant, but why bring up the license here? If they were over the top compliant, attributing every SO answer under every chat, and licensing the LLM output as CC BY-SA, I think we'd still have seen the same shift.

> In the best case scenario, LLMs might give you the same content you were able to find on SO. In the common scenario, they'll hallucinate an answer and waste your time.

Best case it gives you the same level of content, but more customized, and faster.

SO being wrong and wasting your time is also common.

61. Are we in the age of all CS problems being solved and everything being invented? Even if so, do LLM incorporate all that knowledge?

A lot of my knowledge in CS come from books and lectures, LLMs can shine in that area by scraping all those sources.

However SO was less about academic knowledge but more about experience sharing. You won't find recipes for complex problems in books, e.g. how to catch what part of my program corrupts memory for variable 'a' in gdb.

LLMs know correct answer to this question because someone shared their experience, including SO.

Are we Ok with stopping this process of sharing from one human to another?

62. People are still asking questions, it's no longer on the public internet. Google, Anthropic, OpenAI etc get to see and use them.

63. This is concerning on two fronts. The questions are no longer open (SO is CC-BY-SA) and if Q&A content dies then this herds even more people towards LLM use.
It's basically draining the commons.

64. When StackOverflow dies, who will train the LLMs?

65. Where will LLMs be trained if no-one generates new posts and information like this? Do we sort of just stop innovating here in 2026? Probably not but it's a serious consideration.

66. Now imagine what happens when a new programming language comes along. When we have a question, we will no longer be able to Google it and find answers to it on Stack Overflow. We will ask the LLMs. They will work it out. From that moment, the LLM we used has the knowledge for solving this particular problem. Over time, this produces huge moat for the largest providers. I believe it is one of the subtler reasons why the AI race is so fierce.

67. LLMs did not eat SO, it was SO that fed the LLMs too well.

https://meta.stackexchange.com/questions/399619/our-partners...

68. When you see AI giving you back various coding snippets almost verbatim from SO, it really makes you wonder what will happen in the future with AI when it can't depend on actual humans doing the work first.

69. There's no doubt that generally LLMs are better. In addition SO had its issues. That being said I can't help but worry about losing humans asking questions and humans answering questions. The sentimentality aside, if humans aren't posing questions and if humans aren't recommending answers, what are the models going to use?

70. I think the bigger point we should realize is LLMs offer the EXACT same thing in a better way. Many people are still sharing answers to problems but they do it through an AI which then fine tunes on it and now that problem solution is shared with EVERYONE.

Far better method of automated sharing of content

71. On what will the LLMs train, now?

72. On the same 14 year old Java questions like the rest of us.

73. user chat logs clearly. They are not much diffent than the SO Q&A format.

74. The SO mission is complete. It's now an LLM training set.

Things would be different if we didn't.

75. https://archive.org/details/stackexchange_20250930

> As of (and including) the 2025-06-30 data dump, Stack Exchange has started including watermarking/data poisoning in the data. At the time of writing, this does not appear to apply to the 2025-09-30 data dump. The format(s), the dates for affected data dumps, and by extension how the garbage data can be filtered out, are described in this community-compiled list: https://github.com/LunarWatcher/se-data-dump-transformer/blo... . If the 2025-09-30 data dump turns out to be poisoned as well, that's where an update will be added. For obvious reasons, the torrent cannot be updated once created.

76. This is a great example of how free content was exploited by LLMs and used against oneself to an ultimate destruction.

Every content creator should be terrified of leaving their content out for free and I think it will bring on a new age of permanent paywalls and licensing agreements to Google and others, with particular ways of forcing page clicks to the original content creators.

77. Looks like they sold right before the end. Wonder whether the AI deals they've struck make up for the difference

78. If nobody is on StackOverflow, What will LLM's train on for new problems?

79. "firing up a sandbox VM and testing some solutions"

If the LLM can start up a VM and test a solution, to identify a new unique problem, and find it's own solution. That would be pretty impressive. I'm not sure they are really to that point. But some AI's are winning the Math Olympiad, so maybe it is happening. I'm sure this is the overall goal.

80. Everything we have done and said on the internet since its birth has just been to train the future AI.

81. StackOverflow was immediately dead for me the day they declared that AI sellout of theirs.

Pathetic thieves, they won't even allow deleting my own answers after that. Not that it would make the models unlearn my data, of course, but I wanted to do so out of principle.

https://meta.stackexchange.com/questions/399619/our-partners...

82. Now that StackOverflow has been killed (in part) by LLMs, how will we train future models? Will public GitHub repos be enough?

Precise troubleshooting data is getting rare, GitHub issues are the last place where it lives nowadays.

83. They would just use documentation. I know there is some synthesis they would lose in the training process but I’m often sending Claude through the context7 MCP to learn documentation for packages that didn’t exist, and it nearly always solves the problem for me.

84. Assuming these end up in open source code llms will learn about them that way.

85. Aren't a lot of projects using LLMs to generate documentation these days?

86. They pay lots of humans to train the LLMs..

87. This entire thread is fantastic. I felt nostalgic, angry and then concerned all at once.

I love LLMs. But I miss SO. I miss being able to have that community. How do we bring it back?

If anyone from the Stack Overflow team is reading this (I assume you are): what’s the plan?

My take: stop optimizing for raw question volume and start optimizing for producing and maintaining “known good” public knowledge. The thing SO still has that Discord and LLMs don’t is durable, linkable, reviewable answers with accountable humans behind them. But the workflow needs to match how devs work now.

A concrete idea: make “asking” a guided flow that’s more like opening a good GitHub issue. Let me paste my error output, environment, minimal repro, what I tried, and what I think is happening. Then use tooling (including an LLM if you want) to pre check duplicates, suggest missing details, and auto format. Crucially: don’t punish me for being imperfect. Route borderline questions into a sandbox or draft mode where they can be improved instead of just slammed shut.

Second idea: invest hard in keeping answers current. A ton of SO is correct but stale. Add obvious “this is old” signaling and make it rewarding to post updates, not just brand new answers.

Last thing that I don’t see an easy answer to: LLMs are feasting on old SO content today. But LLMs still need fresh, high quality, real world edge cases tomorrow. They need the complexity and problem solving that humans provide. A lot of the answers I get are recycled. No net new thinking. If fewer people ask publicly, where does that new ground truth come from? What’s the mechanism that keeps the commons replenished?

So… TLDR…my question to this group of incredibly intelligent people: how does SO save itself?

88. Now the real question is...

Which AI company will acquire whats left of StackOverflow and all the years of question/answer data?

Write a concise, engaging paragraph (3-5 sentences) that captures the main ideas, notable perspectives, and overall sentiment of these comments regarding the topic. Focus on the most interesting and representative viewpoints. Do not use bullet points or lists - write flowing prose.

topic

Future of LLM training data

commentCount

88

← Back to job