llm/2ad2a7bb-5462-4391-a2da-bf11064993c9/topic-1-6f320587-50eb-42c9-a2a7-df63f4f4dfeb-input.json
The following is content for you to summarize. Do not respond to the comments—summarize them. <topic> Gemini vs Claude for Coding # Strong consensus that Claude dominates agentic coding workflows while Gemini lags behind, discussion of tool calling failures, instruction following issues, and hallucinations when using Gemini for development tasks </topic> <comments_about_topic> 1. My experience also shows that Gemini has unique strength in “generalized” (read: not coding) tasks. Gemini 2.5 Pro and 3 Pro seems stronger at math and science for me, and their Deep Research usually works the hardest, as long as I run it during off-hours. Opus seems to beat Gemini almost “with one hand tied behind its back” in coding, but Gemini is so cheap that it’s usually my first stop for anything that I think is likely to be relatively simple. I never worry about my quota on Gemini like I do with Opus or Chat-GPT. Comparisons generally seem to change much faster than I can keep my mental model updated. But the performance lead of Gemini on more ‘academic’ explorations of science, math, engineering, etc has been pretty stable for the past 4 months or so, which makes it one of the longer-lasting trends for me in comparing foundation models. I do wish I could more easily get timely access to the “super” models like Deep Think or o3 pro. I never seem to get a response to requesting access, and have to wait for public access models to catch up, at which point I’m never sure if their capabilities have gotten diluted since the initial buzz died down. They all still suck at writing an actually good essay/article/literary or research review, or other long-form things which require a lot of experienced judgement to come up with a truly cohesive narrative. I imagine this relates to their low performance in humor - there’s just so much nuance and these tasks represent the pinnacle of human intelligence. Few humans can reliably perform these tasks to a high degree of performance either. I myself am only successful some percentage of the time. 2. Yes, agentic-wise, Claude Opus is best. Complex coding is GPT-5.x. But for smartness, I always felt Gemini 3 Pro is best. 3. Strange, because I could not for the life of me get Gemini 3 to follow my instructions the other day to work through an example with a table, Claude got it first try. 4. Claude is king for agentic workflows right now because it’s amazing at tool calling and following instructions well (among other things) 5. Codex ranks higher for instruction following 6. Am I the only one that can’t find Gemini useful except if you want something cheap? I don’t get what was the whole code red about or all that PR. To me I see no reason to use Gemini instead of of GPT and Anthropic combo. I should add that I’ve tried it as chat bot, coding through copilot and also as part of a multi model prompt generation. Gemini was always the worst by a big margin. I see some people saying it is smarter but it doesn’t seem smart at all. 7. maybe it depends on the usage, but in my experience most of the times the Gemini produces much better results for coding, especially for optimization parts. The results that were produced by Claude wasn't even near that of Gemini. But again, depends on the task I think. 8. I find the quality is not consistent at all and of all the LLMs I use Gemini is the one most likely to just verge off and ignore my instructions. 9. Same, as far as I am concerned, Gemini is optimized for benchmarks. I mean last week it insisted suddenly on two consecutive prompts that my code was in python. It was in rust. 10. At $13.62 per task it's practically unusable for agent tasks due to the cost. I found that anything over $2/task on Arc-AGI-2 ends up being way to much for use in coding agents. 11. I'm having trouble just keeping track of all these different types of models. Is "Gemini 3 Deep Think" even technically a model? From what I've gathered, it is built on top of Gemini 3 Pro, and appears to be adding specific thinking capabilities, more akin to adding subagents than a truly new foundational model like Opus 4.6. Also, I don't understand the comments about Google being behind in agentic workflows. I know that the typical use of, say, Claude Code feels agentic, but also a lot of folks are using separate agent harnesses like OpenClaw anyway. You could just as easily plug Gemini 3 Pro into OpenClaw as you can Opus, right? Can someone help me understand these distinctions? Very confused, especially regarding the agent terminology. Much appreciated! 12. > Also, I don't understand the comments about Google being behind in agentic workflows. It has to do with how the model is RL'd. It's not that Gemini can't be used with various agentic harnesses, like open code or open claw or theoretically even claude code. It's just that the model is trained less effectively to work with those harnesses, so it produces worse results. 13. Let's come back in 12 months and discuss your singularity then. Meanwhile I spent like $30 on a few models as a test yesterday, none of them could tell me why my goroutine system was failing, even though it was painfully obvious (I purposefully added one too many wg.Done), gemini, codex, minimax 2.5, they all shat the bed on a very obvious problem but I am to believe they're 98% conscious and better at logic and math than 99% of the population. Every new model release neckbeards come out of the basements to tell us the singularity will be there in two more weeks 14. On the flip side, twice I put about 800K tokens of code into Gemini and asked it to find why my code was misbehaving, and it found it. The logic related to the bug wasn't all contained in one file, but across several files. This was Gemini 2.5 Pro. A whole generation old. 15. Out of curiosity, did you give a test for them to validate the code? I had a test failing because I introduced a silly comparison bug (> instead of <), and claude 4.6 opus figured out it wasn't the test the problem, but the code and fixed the bug (which I had missed). 16. I'm talking about Gemini in the app and on the web. As well as AI studio. At work we go through Copilot, but there the agentic mode with Gemini isn't the best either. 17. Antigravity is an embarrassment. The models feel terrible, somehow, like they're being fed terrible system prompts. Plus the damn thing kept crashing and asking me to "restart it". What?! At least Kiro does what it says on the tin. 18. My experience with Antigravity is the opposite. It's the first time in over 10 years that an IDE has managed to take me out a bit out of the jetbrain suite. I did not think that was something possible as I am a hardcore jetbrain user/lover. 19. It's literally just vscode? I tried it the other day and I couldn't tell it apart from windsurf besides the icon in my dock 20. Agreed on the product. I can't make Gemini read my emails on GMail. One day it says it doesn't have access, the other day it says Query unsuccessful. Claude Desktop has no problem reaching to GMail, on the other hand :) 21. Their models are absolutely not impressive. Not a single person is using it for coding (outside of Google itself). Maybe some people on a very generous free plan. Their model is a fine mid 2025 model, backed by enormous compute resources and an army of GDM engineers to help the “researchers” keep the model on task as it traverses the “tree of thoughts”. But that isn’t “the model” that’s an old model backed by massive money. 22. These benchmarks are super impressive. That said, Gemini 3 Pro benchmarked well on coding tasks, and yet I found it abysmal. A distant third behind Codex and Claude. Tool calling failures, hallucinations, bad code output. It felt like using a coding model from a year ago. Even just as a general use model, somehow ChatGPT has a smoother integration with web search (than google!!), knowing when to use it, and not needing me to prompt it directly multiple times to search. Not sure what happened there. They have all the ingredients in theory but they've really fallen behind on actual usability. Their image models are kicking ass though. 23. Not in my experience with Gemini Pro and coding. It hallucinates APIs that aren't there. Claude does not do that. Gemini has flashes of brilliance, but I regard it as unpolished some things work amazingly, some basics don't work. 24. Don't let the benchmarks fool you. Gemini models are completely useless not matter how smart they are. Google still hasn't figure out tool calling and making the model follow instructions. They seem to only care about benchmarking and being the most intelligent model on paper. This has been a problem of Gemini since 1.0 and they still haven't fixed it. Also the worst model in terms of hallucinations. 25. Disagree. Claude Code is great for coding, Gemini is better than everything else for everything else. 26. What is "everything else" in your view? Just curious -- I really only seriously use models for coding, so I am curious what I am missing. 27. I find Gemini's web page much snappier to use than ChatGPT - I've largely swapped to it for most things except more agentic tasks. 28. The lack of "projects" alone makes their chat interface really unpleasant compared to ChatGPT and Claude. 29. Fair enough. I'm always astonished how different experiences are because mine is the complete opposite. I almost solely use it for help with Go and Javascript programming and found Gemini Pro to be more useful than any other model. ChatGPT was the worst offender so far, completely useless, but Claude has also been suboptimal for my use cases. I guess it depends a lot on what you use LLMs for and how they are prompted. For example, Gemini fails the simple "count from 1 to 200 in words" test whereas Claude does it without further questions. Another possible explanation would be that processing time is distributed unevenly across the globe and companies stay silent about this. Maybe depending on time zones? 30. Gemini is completely unusable in VS Code. It's rated 2/5 stars, pathetic: https://marketplace.visualstudio.com/items?itemName=Google.g... Requests regularly time out, the whole window freezes, it gets stuck in schizophrenic loops, edits cannot be reverted and more. It doesn't even come close to Claude or ChatGPT. 31. it is interesting that the video demo is generating .stl model. I run a lot of tests of LLMs generating OpenSCAD code (as I have recently launched https://modelrift.com text-to-CAD AI editor) and Gemini 3 family LLMs are actually giving the best price-to-performance ratio now. But they are very, VERY far from being able to spit out a complex OpenSCAD model in one shot. So, I had to implement a full fledged "screenshot-vibe-coding" workflow where you draw arrows on 3d model snapshot to explain to LLM what is wrong with the geometry. Without human in the loop, all top tier LLMs hallucinate at debugging 3d geometry in agentic mode - and fail spectacularly. 32. According to benchmarks in the announcement, healthily ahead of Claude 4.6. I guess they didn't test ChatGPT 5.3 though. Google has definitely been pulling ahead in AI over the last few months. I've been using Gemini and finding it's better than the other models (especially for biology where it doesn't refuse to answer harmless questions). 33. Google is way ahead in visual AI and world modelling. They're lagging hard in agentic AI and autonomous behavior. 34. Google models and CLI harness feels behind in agentic coding compared OpenAI and Antrophic 35. I gather that 4.6 strengths are in long context agentic workflows? At least over Gemini 3 pro preview, opus 4.6 seems to have a lot of advantages 36. I have some very difficult to debug bugs that Opus 4.6 is failing at. Planning to pay $250 to see if it can solve those. 37. I'd rather say it has a mind of its own; it does things its way. But I have not tested this model, so they might have improved its instruction following. 38. Well, one thing i know for sure: it reliably misplaces parentheses in lisps. 39. Clearly, the AI is trying to steer you towards the ML family of languages for its better type system, performance, and concurrency ;) 40. I made offmetaedh.com with it. Feels pretty great to me. 41. It found a small but nice little optimization in Stockfish: https://github.com/official-stockfish/Stockfish/pull/6613 Previous models including Claude Opus 4.6 have generally produced a lot of noise/things that the compiler already reliably optimizes out. 42. Claude Cowork does this by default and you can see how exactly it is coordinating them etc. 43. Isn’t there? I mean, Claude code has been my biggest usecase and it basically one shots everything now 44. Yes, LLMs have become extremely good at coding (not software engineer though). But try using them for anything original that cannot be adapted from GitHub and Stack Overflow. I haven't seen much improvement at all at such tasks. 45. I don't get it, why is Claude still number 1 while the numbers say different, let's see that new Gemini in the terminal also 46. Too bad we can’t use it. Whenever Google releases something, I can never seem to use it in their coding cli product. 47. So last week I tried Gemini pro 3, Opus 4.6, GLM 5, Kimi2.5 so far using Kimi2.5 yeilded the best results (in terms of cost/performance) for me in a mid size Go project. Curious to know what others think ? 48. I predict Gemini Flash will dominate when you try it. If you're going for cost performance balance choosing Gemini Pro is bewildering. Gemini Flash _outperforms_ Pro in some coding benchmarks and is the clear parento frontier leader for intelligence/cost. It's even cheaper than Kimi 2.5. https://artificialanalysis.ai/?media-leaderboards=text-to-im... 49. We're getting to the point where we can ask AI to invent new programming languages. 50. Not trained for agentic workflows yet unfortunately - this looks like it will be fantastic when they have an agent friendly one. Super exciting. 51. Off topic comment (sorry): when people bash "models that are not their favorite model" I often wonder if they have done the engineering work to properly use the other models. Different models and architectures often require very different engineering to properly use them. Also, I think it is fine and proper that different developers prefer different models. We are in early days and variety is great. 52. I do like google models (and I pay for them), but the lack of competitive agent is a major flaw in Google's offering. It is simply not good enough in comparison to claude code. I wish they put some effort there (as I don't want to pay two subscriptions to both google and anthropic) 53. Gemini was awesome and now it’s garbage. It’s impossible for it to do anything but cut code down, drop features, lose stuff and give you less than the code you put in. It’s puzzling because it spent months at the head of the pack now I don’t use it at all because why do I want any of those things when I’m doing development. I’m a paid subscriber but there’s no point any more I’ll spend the money on Claude 4.6 instead. 54. I never found it useful for code. It produced garbage littered with gigantic comments. Me: Remove comments Literally Gemini: // Comments were removed </comments_about_topic> Write a concise, engaging paragraph (3-5 sentences) summarizing the key points and perspectives in these comments about the topic. Focus on the most interesting viewpoints. Do not use bullet points—write flowing prose.
Gemini vs Claude for Coding # Strong consensus that Claude dominates agentic coding workflows while Gemini lags behind, discussion of tool calling failures, instruction following issues, and hallucinations when using Gemini for development tasks
54