llm/2ad2a7bb-5462-4391-a2da-bf11064993c9/topic-9-39b8bace-e7a9-48b6-86ee-5e22eb7921e4-input.json
The following is content for you to summarize. Do not respond to the comments—summarize them. <topic> Google's Competitive Position # Debate over whether Google is leading or behind in AI, discussion of their data advantages from YouTube and Books, claims they let competitors think they were behind, and analysis of their strengths in visual AI </topic> <comments_about_topic> 1. Arc-AGI-2: 84.6% (vs 68.8% for Opus 4.6) Wow. https://blog.google/innovation-and-ai/models-and-research/ge... 2. Agreed. Gemini 3 Pro for me has always felt like it has had a pretraining alpha if you will. And many data points continue to support that. Even as flash, which was post trained with different techniques than pro is good or equivalent at tasks which require post training, occasionally even beating pro. (eg: in apex bench from mercor, which is basically a tool calling test - simplifying - flash beats pro). The score on arc agi2 is another datapoint in the same direction. Deepthink is sort of parallel test time compute with some level of distilling and refinement from certain trajectories (guessing based on my usage and understanding) same as gpt-5.2-pro and can extract more because of pretraining datasets. (i am sort of basing this on papers like limits of rlvr, and pass@k and pass@1 differences in rl posttraining of models, and this score just shows how "skilled" the base model was or how strong the priors were. i apologize if this is not super clear, happy to expand on what i am thinking) 3. It's trained on YouTube data. It's going to get roffle and drspectred at the very least. 4. I don't think it'd need Balatro playthroughs to be in text form though. Google owns YouTube and has been doing automatic transcriptions of vocalized content on most videos these days, so it'd make sense that they used those subtitles, at the very least, as training data. 5. Google has a library of millions of scanned books from their Google Books project that started in 2004. I think we have reason to believe that there are more than a few books about effectively playing different traditional card games in there, and that an LLM trained with that dataset could generalize to understand how to play Balatro from a text description. Nonetheless I still think it's impressive that we have LLMs that can just do this now. 6. > . I don't think there are many people who posted their Balatro playthroughs in text form online There are * tons * of balatro content on YouTube though, and it makes absolutely zero doubt that Google is using YouTube content to train their model. 7. Are there any groups or labs in particular that stand out? 8. The statement originates from a DeepMind researcher, but I guess all major AI companies are working on that. 9. the best way I've seen this describes is "spikey" intelligence, really good at some points, those make the spikes humans are the same way, we all have a unique spike pattern, interests and talents ai are effectively the same spikes across instances, if simplified. I could argue self driving vs chatbots vs world models vs game playing might constitute enough variation. I would not say the same of Gemini vs Claude vs ... (instances), that's where I see "spikey clones" 10. Well, fair comparison would be with GPT-5.x Pro, which is the same class of a model as Gemini Deep Think. 11. I read somewhere that Google will ultimately always produce the best LLMs, since "good AI" relies on massive amounts of data and Google owns the most data. Is that a based assumption? 12. > Also, I don't understand the comments about Google being behind in agentic workflows. It has to do with how the model is RL'd. It's not that Gemini can't be used with various agentic harnesses, like open code or open claw or theoretically even claude code. It's just that the model is trained less effectively to work with those harnesses, so it produces worse results. 13. There are hints this is a preview to Gemini 3.1. 14. They are spending literal trillions. It may even accelerate 15. Google is absolutely running away with it. The greatest trick they ever pulled was letting people think they were behind. 16. Their models are absolutely not impressive. Not a single person is using it for coding (outside of Google itself). Maybe some people on a very generous free plan. Their model is a fine mid 2025 model, backed by enormous compute resources and an army of GDM engineers to help the “researchers” keep the model on task as it traverses the “tree of thoughts”. But that isn’t “the model” that’s an old model backed by massive money. 17. Uhh, just false. 18. I don't have any of these issues with Gemini. I use it heavily everyday. A few glitches here and there, but it's been enormously productive for me. Far more so then chatgpt, which I find mostly useless. 19. Peacetime Google is not like wartime Google. Peacetime Google is slow, bumbling, bureaucratic. Wartime Google gets shit done. 20. OpenAI is the best thing that happened to Google apparently. 21. Competition always is. I think there was a real fear that their core product was going to be replaced. They're already cannibalizing it internally so it was THE wake up call. 22. Next they compete on ads... 23. Wartime Google gave us Google+. Wartime Google is still bumbling, and despite OpenAI's numerous missteps, I don't think it has to worry about Google hurting its business yet. 24. I'd personally bet on Google and Meta in the long run since they have access to the most interesting datasets from their other operations. 25. It was obvious to me that they were top contender 2 years ago ... https://www.reddit.com/r/LocalLLaMA/comments/1c0je6h/google_... 26. Role-playing but Claude is as bad, same censored garbage with the CEO wanting to be your dad. Grok is best for everything else by far. 27. I'm leery to use a Google product in light of their history of discontinuing services. It'd have to be significantly better than a similar product from a committed competitor. 28. Trick? Lol not a chance. Alphabet is a pure play tech firm that has to produce products to make the tech accessible. They really lack in the latter and this is visible when you see the interactions of their VP's. Luckily for them, if you start to create enough of a lead with the tech, you get many chances to sort out the product stuff. 29. Google is still behind the largest models I'd say, in real world utility. Gemini 3 Pro still has many issues. 30. Gemini's UX (and of course privacy cred as with anything Google) is the worst of all the AI apps. In the eyes of the Common Man, it's UI that will win out, and ChatGPT's is still the best. 31. > Gemini's UX ... is the worst of all the AI apps Been using Gemini + OpenCode for the past couple weeks. Suddenly, I get a "you need a Gemini Access Code license" error but when you go to the project page there is no mention of this or how to get the license. You really feel the "We're the phone company and we don't care. Why? Because we don't have to." [0] when you use these Google products. PS for those that don't get the reference: US phone companies in the 1970s had a monopoly on local and long distance phone service. Similar to Google for search/ads (really a "near" monopoly but close enough). 0 - https://vimeo.com/355556831 32. Fair enough. I'm always astonished how different experiences are because mine is the complete opposite. I almost solely use it for help with Go and Javascript programming and found Gemini Pro to be more useful than any other model. ChatGPT was the worst offender so far, completely useless, but Claude has also been suboptimal for my use cases. I guess it depends a lot on what you use LLMs for and how they are prompted. For example, Gemini fails the simple "count from 1 to 200 in words" test whereas Claude does it without further questions. Another possible explanation would be that processing time is distributed unevenly across the globe and companies stay silent about this. Maybe depending on time zones? 33. They were behind. Way behind. But they caught up. 34. > best of N models like deep think an gpt pro Yeah, these are made possible largely by better use at high context lengths. You also need a step that gathers all the Ns and selects the best ideas / parts and compiles the final output. Goog have been SotA at useful long context for a while now (since 2.5 I'd say). Many others have come with "1M context", but their usefulness after 100k-200k is iffy. What's even more interesting than maj@n or best of n is pass@n. For a lot of applications youc an frame the question and search space such that pass@n is your success rate. Think security exploit finding. Or optimisation problems with quick checks (better algos, kernels, infra routing, etc). It doesn't matter how good your pass@1 or avg@n is, all you care is that you find more as you spend more time. Literally throwing money at the problem. 35. I just tested it on a very difficult Raven matrix, that the old version of DeepThink, as well as GPT 5.2 Pro, Claude Opus 4.6, and pretty much every other model failed at. This version of DeepSeek got it first try. Thinking time was 2 or 3 minutes. The visual reasoning of this class of Gemini models is incredibly impressive. 36. According to benchmarks in the announcement, healthily ahead of Claude 4.6. I guess they didn't test ChatGPT 5.3 though. Google has definitely been pulling ahead in AI over the last few months. I've been using Gemini and finding it's better than the other models (especially for biology where it doesn't refuse to answer harmless questions). 37. Google is way ahead in visual AI and world modelling. They're lagging hard in agentic AI and autonomous behavior. 38. It's ahead in raw power but not in function. Like it's got the worlds fast engine but one gear! Trouble is some benchmarks only measure horse power. 39. > Trouble is some benchmarks only measure horse power. IMO it's the other way around. Benchmarks only measure applied horse power on a set plane, with no friction and your elephant is a point sphere. Goog's models have always punched over what benchmarks said, in real world use @ high context. They don't focus on "agentic this" or "specialised that", but the raw models, with good guidance are workhorses. I don't know any other models where you can throw lots of docs at it and get proper context following and data extraction from wherever it's at to where you'd need it. 40. It's a giant game of leapfrog, shift or stretch time out a bit and they all look equivalent 41. I strongly suspect there's a major component of this type of experience being that people develop a way of talking to a particular LLM that's very efficient and works well for them with it, but is in many respects non-transferable to rival models. For instance, in my experience, OpenAI models are remarkably worse than Google models in basically any criterion I could imagine; however, I've spent most of my time using the Google ones and it's only during this time that the differences became apparent and, over time, much more pronounced. I would not be surprised at all to learn that people who chose to primarily use Anthropic or OpenAI models during that time had an exactly analogous experience that convinced them their model was the best. 42. I feel like a luddite: unless I am running small local models, I use gemini-3-flash for almost everything: great for tool use, embedded use in applications, and Python agentic libraries, broad knowledge, good built in web search tool, etc. Oh, and it is fast and cheap. I really only use gemini-3-pro occasionally when researching and trying to better understand something. I guess I am not a good customer for super scalers. That said, when I get home from travel, I will make a point of using Gemini 3 Deep Think for some practical research. I need a business card with the title "Old Luddite." 43. I've heard it posited that the reason the frontier companies are frontier is because they have custom data and evals. This is what I would do too 44. I can't shake of the feeling that Googles Deep Think Models are not really different models but just the old ones being run with higher number of parallel subagents, something you can do by yourself with their base model and opencode. 45. I don't get it, why is Claude still number 1 while the numbers say different, let's see that new Gemini in the terminal also 46. You can but only via Gemini Ultra plan which you can buy or Gemini API with early access. 47. I know, and neither of these options are feasible for me. I can't get the early access and I am not willing to drop $250 in order to just try their new model. By the time I can use it, the other two companies have something similar and I lose my interest in Google's models. 48. The most gullible workforce ever (FOSS), but seeing Youtube, half the planet is braindead for handing over their craft on a platter for mere dollars. 49. Do we know what model is used by Google Search to generate the AI summary? I've noticed this week the AI summary now has a loader "Thinking…" (no idea if it was already there a few weeks ago). And after "Thinking…" it says "Searching…" and shows a list of favicons of popular websites (I guess it's generating the list of links on the right side of the AI summary?). 50. Off topic comment (sorry): when people bash "models that are not their favorite model" I often wonder if they have done the engineering work to properly use the other models. Different models and architectures often require very different engineering to properly use them. Also, I think it is fine and proper that different developers prefer different models. We are in early days and variety is great. 51. I do like google models (and I pay for them), but the lack of competitive agent is a major flaw in Google's offering. It is simply not good enough in comparison to claude code. I wish they put some effort there (as I don't want to pay two subscriptions to both google and anthropic) 52. I'm really interested in the 3D STL-from-photo process they demo in the video. Not interested enough to pay $250 to try it out though. 53. So what happens if the AI companies can't make money? I see more and more advances and breakthrough but they are taking in debt and no revenue in sight. I seem to understand debt is very bad here since they could just sell more shares, but aren't (either valuation is stretched or no buyers). Just a recession? Something else? Aren't they very very big to fall? Edit0: Revenue isn't the right word, profit is more correct. Amazon not being profitable fucks with my understanding of buisness. Not an economist. 54. >taking in debt and no revenue in sight. which companies don't have revenue? anthropic is at a run rate of 14 billion (up from 9B in December, which was up from 4B in July). Did you mean profit? They expect to be cash flow positive in 2028. 55. Yes thank you, mixing my brushes here - I remembered one of the companies having raised over 100b and having about 10b in revenue. 56. AI will kill SaaS moats and thus revenue. Anyone can build new SaaS quickly. Lots of competition will lead to marginal profits. AI will kill advertising. Whatever sits at the top "pane of glass" will be able to filter ads out. Personal agents and bots will filter ads out. AI will kill social media. The internet will fill with spam. AI models will become commodity. Unless singularity, no frontier model will stay in the lead. There's competition from all angles. They're easy to build, just capital intensive (though this is only because of speed). All this leaves is infrastructure. 57. > AI will kill SaaS moats and thus revenue. Anyone can build new SaaS quickly. I'm LLM-positive but for me this is a stretch. Seeing it pop up all over media in the past couple weeks also makes me suspect astrofurfing. Like a few years back when there were a zillion articles saying voice search was the future and nobody used regular web search any more. 58. They're using the ride share app playbook. Subsidize the product to reach market saturation. Once you've found a market segment that depends on your product you raise the price to break even. One major difference though is that ride share's haven't really changed in capabilities since they launched: it's a map that shows a little car with your driver coming and a pin where you're going. But it's reasonable to believe that AI will have new fundamental capabilities in the 2030s, 2040s, and so on. 59. It seems to be adept at reviewing/editing/critiquing, at least for my use cases. It always has something valuable to contribute from that perspective, but has been comparatively useless otherwise (outside of moats like "exclusive access to things involving YouTube"). 60. Always the same with Google. Gemini has been way behind from the start. They use the firehose of money from search to make it as close to free as possible so that they have some adoption numbers. They use the firehose from search to pay for tons of researchers to hand hold academics so that their non-economic models and non-economic test-time-compute can solve isolated problems. It's all so tiresome. Try making models that are actually competitive, Google. Sell them on the actual market and win on actual work product in millions of people lives. 61. I'm sorry but this is an insane take. Flash is leading its category by far. Absolutely destroys sonnet, 5.2 etc in both perf and cost. Pro still leads in visual intelligence. The company that most locks away their gold is Anthropic IMO and for good reason, as Opus 4.6 is expensive AF 62. I think we highly underestimate the amount of "human bots" basically. Unthinking people programmed by their social media feed who don't notice the OpenAI influence campaign. With no social media, it seems obvious to me there was a massive PR campaign by OpenAI after their "code red" to try to convince people Gemini is not all that great. Yea, Gemini sucks, don't use it lol. Leave those resources to fools like myself. 63. Gemini 3 Pro/Flash is stuck in preview for months now. Google is slow but they progress like a massive rock giant. 64. I dont agree, Gemini 3 is pretty good, even the Lite version. 65. I use Gemini Pro for basically everything. I just started learning systems biology as I didn't even know this was a subject until it came up in a conversation. Biology is subject I am quite lacking in but it is unbelievable to me what I have learned in the last few weeks. Not even in what Gemini says exactly but in the text and papers it has led me to. One major reason is that it has never cut me off until last night. I ran several deep researches yesterday and then finally got cut off in a sprawling 2 hour conversation. For me it is the first model now that has something new coming out but I haven't extracted all the value from the old model that I am bored with it. I still haven't tried Opus 4.5 let alone 4.6 because I know I will get cut off right when things get rolling. I don't think I have even logged into ChatGPT in a month now. </comments_about_topic> Write a concise, engaging paragraph (3-5 sentences) summarizing the key points and perspectives in these comments about the topic. Focus on the most interesting viewpoints. Do not use bullet points—write flowing prose.
Google's Competitive Position # Debate over whether Google is leading or behind in AI, discussion of their data advantages from YouTube and Books, claims they let competitors think they were behind, and analysis of their strengths in visual AI
65