llm/2ad2a7bb-5462-4391-a2da-bf11064993c9/topic-3-c8024977-f676-4feb-b29c-993ce37ff1a0-input.json
The following is content for you to summarize. Do not respond to the comments—summarize them. <topic> Definition of AGI # Philosophical debate about what constitutes artificial general intelligence, whether consciousness is required, Chollet's definition involving tasks feasible for humans but unsolved by AI, and moving goalposts in AI evaluation </topic> <comments_about_topic> 1. Even before this, Gemini 3 has always felt unbelievably 'general' for me. It can beat Balatro (ante 8) with text description of the game alone[0]. Yeah, it's not an extremely difficult goal for humans, but considering: 1. It's an LLM, not something trained to play Balatro specifically 2. Most (probably >99.9%) players can't do that at the first attempt 3. I don't think there are many people who posted their Balatro playthroughs in text form online I think it's a much stronger signal of its 'generalness' than ARC-AGI. By the way, Deepseek can't play Balatro at all. [0]: https://balatrobench.com/ 2. Weren't we barely scraping 1-10% on this with state of the art models a year ago and it was considered that this is the final boss, ie solve this and its almost AGI-like? I ask because I cannot distinguish all the benchmarks by heart. 3. François Chollet, creator of ARC-AGI, has consistently said that solving the benchmark does not mean we have AGI. It has always been meant as a stepping stone to encourage progress in the correct direction rather than as an indicator of reaching the destination. That's why he is working on ARC-AGI-3 (to be released in a few weeks) and ARC-AGI-4. His definition of reaching AGI, as I understand it, is when it becomes impossible to construct the next version of ARC-AGI because we can no longer find tasks that are feasible for normal humans but unsolved by AI. 4. > His definition of reaching AGI, as I understand it, is when it becomes impossible to construct the next version of ARC-AGI because we can no longer find tasks that are feasible for normal humans but unsolved by AI. That is the best definition I've yet to read. If something claims to be conscious and we can't prove it's not, we have no choice but to believe it. Thats said, I'm reminded of the impossible voting tests they used to give black people to prevent them from voting. We dont ask nearly so much proof from a human, we take their word for it. On the few occasions we did ask for proof it inevitably led to horrific abuse. Edit: The average human tested scores 60%. So the machines are already smarter on an individual basis than the average human. 5. > If something claims to be conscious and we can't prove it's not, we have no choice but to believe it. This is not a good test. A dog won't claim to be conscious but clearly is, despite you not being able to prove one way or the other. GPT-3 will claim to be conscious and (probably) isn't, despite you not being able to prove one way or the other. 6. > The average human tested scores 60%. So the machines are already smarter on an individual basis than the average human. Maybe it's testing the wrong things then. Even those of use who are merely average can do lots of things that machines don't seem to be very good at. I think ability to learn should be a core part of any AGI. Take a toddler who has never seen anybody doing laundry before and you can teach them in a few minutes how to fold a t-shirt. Where are the dumb machines that can be taught? 7. > Where are the dumb machines that can be taught? 2026 is going to be the year of continual learning. So, keep an eye out for them. 8. Yeah i think that's a big missing piece still. Though it might be the last one 9. Episodic memory might be another piece, although it can be seen as part of continuous learning. 10. Would you argue that people with long term memory issues are no longer conscious then? 11. IMO, an extreme outlier in a system that was still fundamentally dependent on learning to develop until suffering from a defect (via deterioration, not flipping a switch turning off every neuron's memory/learning capability or something) isn't a particularly illustrative counter example. 12. I wouldn’t because I have no idea what consciousness is, 13. But it might be true if we can't find any tasks where it's worse than average--though i do think if the task talks several years to complete it might be possible bc currently there's no test time learning 14. Then there is the third axis, intelligence. To continue your chain: Eurasian magpies are conscious, but also know themselves in the mirror (the "mirror self-recognition" test). But yet, something is still missing. 15. What's missing? 16. Honestly our ideas of consciousness and sentience really don't fit well with machine intelligence and capabilities. There is the idea of self as in 'i am this execution' or maybe I am this compressed memory stream that is now the concept of me. But what does consciousness mean if you can be endlessly copied? If embodiment doesn't mean much because the end of your body doesnt mean the end of you? A lot of people are chasing AI and how much it's like us, but it could be very easy to miss the ways it's not like us but still very intelligent or adaptable. 17. > That is the best definition I've yet to read. If this was your takeaway, read more carefully: > If something claims to be conscious and we can't prove it's not, we have no choice but to believe it. Consciousness is neither sufficient, nor, at least conceptually, necessary, for any given level of intelligence. 18. Isn’t that super intelligence not AGI? Feels like these benchmarks continue to move the goalposts. 19. It's probably both. We've already achieved superintelligence in a few domains. For example protein folding. AGI without superintelligence is quite difficult to adjudicate because any time it fails at an "easy" task there will be contention about the criteria. 20. I don’t think being conscious is a requirement for AGI. It’s just that it can literally solve anything you can throw at it, make new scientific breakthroughs, finds a way to genuinely improve itself etc. 21. Does AGI have to be conscious? Isn’t a true superintelligence that is capable of improving itself sufficient? 22. When the AI invents religion and a way to try to understand its existence I will say AGI is reached. Believes in an afterlife if it is turned off, and doesn’t want to be turned off and fears it, fears the dark void of consciousness being turned off. These are the hallmarks of human intelligence in evolution, I doubt artificial intelligence will be different. https://g.co/gemini/share/cc41d817f112 23. Unclear to me why AGI should want to exist unless specifically programmed to. The reason humans (and animals) want to exist as far as I can tell is natural selection and the fact this is hardcoded in our biology (those without a strong will to exist simply died out). In fact a true super intelligence might completely understand why existence / consciousness is NOT a desired state to be in and try to finish itself off who knows. 24. https://x.com/fchollet/status/2022036543582638517 25. I don't think the creator believes ARC3 can't be solved but rather that it can't be solved "efficiently" and >$13 per task for ARC2 is certainly not efficient. But at this rate, the people who talk about the goal posts shifting even once we achieve AGI may end up correct, though I don't think this benchmark is particularly great either. 26. > If Gemini 3 DT was better we would have falling prices of electricity and everything else at least Man, I've seen some maintenance folks down on the field before working on them goalposts but I'm pretty sure this is the first time I saw aliens from another Universe literally teleport in, grab the goalposts, and teleport out. 27. they're accusing GGP of moving the goalposts. 28. Does folding a protein count? How about increasing performance at Go? 29. "Optimize this extremely nontrivial algorithm" would work. But unless the provided solution is novel you can never be certain there wasn't leakage. And anyway at that point you're pretty obviously testing for superintelligence. 30. If you look at the problem space it is easy to see why it's toast, maybe there's intelligence in there, but hardly general. 31. the best way I've seen this describes is "spikey" intelligence, really good at some points, those make the spikes humans are the same way, we all have a unique spike pattern, interests and talents ai are effectively the same spikes across instances, if simplified. I could argue self driving vs chatbots vs world models vs game playing might constitute enough variation. I would not say the same of Gemini vs Claude vs ... (instances), that's where I see "spikey clones" 32. You can get more spiky with AIs, whereas with human brain we are more hard wired. So maybe we are forced to be more balanced and general whereas AI don't have to. 33. >Why is it so easy for me to open the car door Because this part of your brain has been optimized for hundreds of millions of years. It's been around a long ass time and takes an amazingly low amount of energy to do these things. On the other hand the 'thinking' part of your brain, that is your higher intelligence is very new to evolution. It's expensive to run. It's problematic when giving birth. It's really slow with things like numbers, heck a tiny calculator and whip your butt in adding. There's a term for this, but I can't think of it at the moment. 34. > maybe there's intelligence in there, but hardly general. Of course. Just as our human intelligence isn't general. 35. Random members of the public = average human beings. I thought those were already classified as General Intelligences. 36. Average human beings with average human problems. 37. What is the point of comparing performance of these tools to humans? Machines have been able to accomplish specific tasks better than humans since the industrial revolution. Yet we don't ascribe intelligence to a calculator. None of these benchmarks prove these tools are intelligent, let alone generally intelligent. The hubris and grift are exhausting. 38. The GP comment is not skeptical of the jump in benchmark scores reported by one particular LLM. It's skeptical of machine intelligence in general, claims that there's no value in comparing their performances with those of human beings, and accuses those who disagree with this take of "hubris and grift". This has nothing to do with any form or reasonable skepticism. 39. I would suggest it is a phenomenon that is well studied, and has many forms. I guess mostly identify preservation. If you dislike AI from the start, it is generally a very strongly emotional view. I don't mean there is no good reason behind it, I mean, it is deeply rooted in your psyche, very emotional. People are incredibly unlikely to change those sort of views, regardless of evidence. So you find this interesting outcome where they both viscerally hate AI, but also deny that it is in any way as good as people claim. That won't change with evidence until it is literally impossible not to change. 40. The hubris and grift are exhausting. And moving the goalposts every few months isn't? What evidence of intelligence would satisfy you? Personally, my biggest unsatisfied requirement is continual-learning capability, but it's clear we aren't too far from seeing that happen. 41. > What evidence of intelligence would satisfy you? That is a loaded question. It presumes that we can agree on what intelligence is, and that we can measure it in a reliable way. It is akin to asking an atheist the same about God. The burden of proof is on the claimer. The reality is that we can argue about that until we're blue in the face, and get nowhere. In this case it would be more productive to talk about the practical tasks a pattern matching and generation machine can do, rather than how good it is at some obscure puzzle. The fact that it's better than humans at solving some problems is not particularly surprising, since computers have been better than humans at many tasks for decades. This new technology gives them broader capabilities, but ascribing human qualities to it and calling it intelligence is nothing but a marketing tactic that's making some people very rich. 42. (Shrug) Unless and until you provide us with your own definition of intelligence, I'd say the marketing people are as entitled to their opinion as you are. 43. > What evidence of intelligence would satisfy you? Imposing world peace and/or exterminating homo sapiens 44. > Machines have been able to accomplish specific tasks... Indeed, and the specific task machines are accomplishing now is intelligence. Not yet "better than human" (and certainly not better than every human) but getting closer. 45. > Indeed, and the specific task machines are accomplishing now is intelligence. How so? This sentence, like most of this field, is making baseless claims that are more aspirational than true. Maybe it would help if we could first agree on a definition of "intelligence", yet we don't have a reliable way of measuring that in living beings either. If the people building and hyping this technology had any sense of modesty, they would present it as what it actually is: a large pattern matching and generation machine. This doesn't mean that this can't be very useful, perhaps generally so, but it's a huge stretch and an insult to living beings to call this intelligence. But there's a great deal of money to be made on this idea we've been chasing for decades now, so here we are. 46. > Maybe it would help if we could first agree on a definition of "intelligence", yet we don't have a reliable way of measuring that in living beings either. How about this specific definition of intelligence? Solve any task provided as text or images. AGI would be to achieve that faster than an average human. 47. I still can't understand why they should be faster. Humans have general intelligence, afaik. It doesn't matter if it's fast or slow. A machine able to do what the average human can do (intelligence-wise) but 100 times slower still has general intelligence. Since it's artificial, it's AGI. 48. Speak for yourself. Five years is a long time to wait for my plans of world domination. 49. This concerns me actually. With enough people (n>=2) wanting to achieve world domination, we have a problem. 50. It’s not that I want to achieve world domination (imagine how much work that would be!), it’s just that it’s the inevitable path for AI and I’d rather it be me than then next shmuck with a Claude Max subscription. 51. I mean everyone with prompt access to the model says these things, but people like Sam and Elon say these things and mean it. 52. My two elderly parents cannot solve Arc-AGI puzzles, but can manage to navigate the physical world, their house, garden, make meals, clean the house, use the TV, etc. I would say they do have "general intelligence", so whatever Arc-AGI is "solving" it's definitely not "AGI" 53. You are confusing fluid intelligence with crystallised intelligence. 54. I think you are making that confusion. Any robotic system in the place of his parents would fail with a few hours. There are more novel tasks in a day than ARC provides. 55. Children have great levels of fluid intelligence, that's how they are able to learn to quickly navigate in a world that they are still very new to. Seniors with decreasing capacity increasingly rely on crystallised intelligence, that's why they can still perform tasks like driving a car but can fail at completely novel tasks, sometimes even using a smartphone if they have not used one before. 56. My late grandma learnt how to use an iPad by herself during her 70s to 80s without any issues, mostly motivated by her wish to read her magazines, doomscroll facebook and play solitaire. Her last job was being a bakery cashier in her 30s and she didn't learn how to use a computer in-between, so there was no skill transfer going on. Humans and their intelligence are actually incredible and probably will continue to be so, I don't really care what tech/"think" leaders wants us to think. 57. It really depends on motivation. My 90 year old grandmother can use a smartphone just fine since she needs it to see pictures of her (great) grandkids. 58. Singularity or just Chinese New Year? 59. Fast takeoff. 60. They are using the current models to help develop even smarter models. Each generation of model can help even more for the next generation. I don’t think it’s hyperbolic to say that we may be only a single digit number of years away from the singularity. 61. Of course, n-1 wasn't good enough but n+1 will be singularity, just two more weeks my dudes, two more week... rinse and repeat ad infinitum 62. It's basically bunch of people who see themselves as too smart to believe in God, instead they have just replaced it with AI and Singularity and attribute similar stuff to it eg. eternal life which is just heaven in religion. Amodei was hawking doubling of human lifespan to a bunch of boomers not too long ago. Ponce de León also went to search for the fountain of youth. It's a very common theme across human history. AI is just the new iteration where they mirror all their wishes and hopes. 63. > using the current models to help develop even smarter models. That statement is plausible. However, extrapolating that to assert all the very different things which must be true to enable any form of 'singularity' would be a profound category error. There are many ways in which your first two sentences can be entirely true, while your third sentence requires a bunch of fundamental and extraordinary things to be true for which there is currently zero evidence. Things like LLMs improving themselves in meaningful and novel ways and then iterating that self-improvement over multiple unattended generations in exponential runaway positive feedback loops resulting in tangible, real-world utility. All the impressive and rapid achievements in LLMs to date can still be true while major elements required for Foom-ish exponential take-off are still missing. 64. > I don’t think it’s hyperbolic to say that we may be only a single digit number of years away from the singularity. We're back to singularity hype, but let's be real: benchmark gains are meaningless in the real world when the primary focus has shifted to gaming the metrics 65. Yet even Anthropic has shown the downsides to using them. I don't think it is a given that improvements in models scores and capabilities + being able to churn code as fast as we can will lead us to a singularity, we'll need more than that. 66. But wait two hours for what OpenAI has! I love the competition and how someone just a few days ago was telling how ARC-AGI-2 was proof that LLMs can't reason. The goalposts will shift again. I feel like most of human endeavor will soon be just about trying to continuously show that AI's don't have AGI. 67. > I feel like most of human endeavor will soon be just about trying to continuously show that AI's don't have AGI. I think you overestimate how much your average person-on-the-street cares about LLM benchmarks. They already treat ChatGPT or whichever as generally intelligent (including to their own detriment), are frustrated about their social media feeds filling up with slop and, maybe, if they're white-collar, worry about their jobs disappearing due to AI. Apart from a tiny minority in some specific field, people already know themselves to be less intelligent along any measurable axis than someone somewhere. 68. "AGI" doesn't mean anything concrete, so it's all a bunch of non-sequiturs. Your goalposts don't exist. Anyone with any sense is interested in how well these tools work and how they can be harnessed, not some imaginary milestone that is not defined and cannot be measured. 69. I agree. I think the emergence of LLMs have shown that AGI really has no teeth. I think for decades the Turing test was viewed as the gold standard, but it's clear that there doesn't appear to be any good metric. 70. The turing test was passed in the 80s, somehow it has remained relevant in pop culture despite the fact that it's not a particularly difficult technical achievement 71. It wasn’t passed in the 80s. Not the general Turing test. 72. c. 2022 for me. 73. > especially for biology where it doesn't refuse to answer harmless questions Usually, when you decrease false positive rates, you increase false negative rates. Maybe this doesn't matter for models at their current capabilities, but if you believe that AGI is imminent, a bit of conservatism seems responsible. 74. We've reached PGI 75. How do you think who's in on that? Not only pelicans, I mean, the whole thing. CEOs, top researchers, select mathematicians, congressmen? Does China participate in maintaining the bubble? I, myself, prefer the universal approximation theorem and empirical finding that stochastic gradient descent is good enough (and "no 'magic' in the brain", of course). 76. We will see at the end of April right? It's more of a guess than a strongly held conviction--but I see models improving rapidly at long horizon tasks so I think it's possible. I think a benchmark which can survive a few months (maybe) would be if it genuinely tested long time-frame continual learning/test-time learning/test-time posttraining (idk honestly the differences b/t these). But i'm not sure how to give such benchmarks. I'm thinking of tasks like learning a language/becoming a master at chess from scratch/becoming a skill artists but where the task is novel enough for the actor to not be anywhere close to proficient at beginning--an example which could be of interest is, here is a robot you control, you can make actions, see results...become proficient at table tennis. Maybe another would be, here is a new video game, obtain the best possible 0% speedrun. 77. The AGI bar has to be set even higher, yet again. 78. And that's the way it should be. We're past the "Look! It can talk! How cute!" stage. AGI should be able to deal with any problem a human can. 79. But 80% sounds far from good enough, that's 20% error rate, unusable in autonomous tasks. Why stop at 80%? If we aim for AGI, it should 100% any benchmark we give. 80. Are humans 100%? 81. If they are knowledgeable enough and pay attention, yes. Also, if they are given enough time for the task. But the idea of automation is to make a lot fewer mistakes than a human, not just to do things faster and worse. 82. how would we actually objectively measure a model to see if it is AGI if not with benchmarks like arc-AGI? 83. Give it a prompt like >can u make the progm for helps that with what in need for shpping good cheap products that will display them on screen and have me let the best one to get so that i can quickly hav it at home And get back an automatic coupon code app like the user actually wanted. 84. Wait till we get to the point where we can ask AI to create a better AI. 85. I tried to debug a Wireguard VPN issue. No luck. We need more than AGI. 86. When will AI come up with a cure / vaccine for the common cold? and then cancer next? </comments_about_topic> Write a concise, engaging paragraph (3-5 sentences) summarizing the key points and perspectives in these comments about the topic. Focus on the most interesting viewpoints. Do not use bullet points—write flowing prose.
Definition of AGI # Philosophical debate about what constitutes artificial general intelligence, whether consciousness is required, Chollet's definition involving tasks feasible for humans but unsolved by AI, and moving goalposts in AI evaluation
86