llm/2ad2a7bb-5462-4391-a2da-bf11064993c9/batch-11-e7669fa5-0649-4898-a707-8bf8d1b0b4ee-input.json
The following is content for you to classify. Do not respond to the comments—classify them.
<topics>
1. ARC-AGI Benchmark Validity
Related: Debate over whether ARC-AGI measures general intelligence or just spatial reasoning puzzles, concerns about benchmarkmaxxing, semi-private vs private test sets, cost per task at $13.62, and whether solving it indicates anything meaningful about AGI capabilities
2. Gemini vs Claude for Coding
Related: Strong consensus that Claude dominates agentic coding workflows while Gemini lags behind, discussion of tool calling failures, instruction following issues, and hallucinations when using Gemini for development tasks
3. Benchmarkmaxxing Concerns
Related: Skepticism that high benchmark scores reflect real-world performance, suspicions that labs optimize specifically for popular tests, concerns about training data leakage, and debate over whether improvements are genuine or gamed
4. Definition of AGI
Related: Philosophical debate about what constitutes artificial general intelligence, whether consciousness is required, Chollet's definition involving tasks feasible for humans but unsolved by AI, and moving goalposts in AI evaluation
5. Google Product Quality Issues
Related: Complaints about Gemini app UX problems including context loss, Russian propaganda sources, switching languages mid-sentence, document upload failures, and poor integration compared to ChatGPT
6. Balatro Gaming Benchmark
Related: Discussion of Gemini 3's ability to play the card game Balatro from text descriptions alone, debate over whether this demonstrates generalization, and comparisons showing other models like DeepSeek failing at the task
7. Model Release Acceleration
Related: Observation that AI model releases are accelerating dramatically, multiple frontier models released within days, connection to Chinese New Year timing, and competition between US and Chinese labs
8. Cost vs Performance Tradeoffs
Related: Analysis of inference costs versus capabilities, Gemini Flash praised for cost-performance ratio, concerns about $13.62 per ARC-AGI task, and debate over what price makes models practical for real applications
9. Deep Research Reliability
Related: Mixed experiences with AI deep research capabilities, complaints about garbage citations, hallucinated sources, contradictory information, and questions about whether it saves time when sources must be verified
10. Google's Competitive Position
Related: Debate over whether Google is leading or behind in AI, discussion of their data advantages from YouTube and Books, claims they let competitors think they were behind, and analysis of their strengths in visual AI
11. Pelican on Bicycle Benchmark
Related: Simon Willison's informal SVG generation test, discussion of whether it's being trained on specifically, quality improvements in latest models, and debate over its validity as a casual benchmark
12. AI Consciousness Claims
Related: Pushback against suggestions that passing tests indicates consciousness, comparisons to simple programs claiming consciousness, discussion of self-awareness research, and skepticism about anthropomorphizing AI capabilities
13. Test Time Compute Approaches
Related: Analysis of thinking vs non-thinking models, best-of-N approaches like Deep Think, computational complexity differences, and questions about whether sufficiently large non-thinking models can match smaller thinking ones
14. Real World Task Performance
Related: Frustration that benchmark gains don't translate to practical improvements, examples of models failing simple debugging tasks, and arguments that actual work product matters more than test scores
15. AI Job Displacement Fears
Related: Concerns about software engineers being replaced, comparisons to factory worker displacement, debate over whether AI creates or destroys jobs, and skepticism about optimistic narratives from AI company executives
16. Spatial Reasoning Limitations
Related: Discussion of LLMs struggling with spatial tasks, image orientation affecting OCR accuracy, and whether ARC-AGI improvements indicate genuine spatial reasoning advances or benchmark-specific solutions
17. Model Architecture Secrecy
Related: Observation that frontier labs no longer share architecture details like parameter counts, shift from technical discussions to capability-focused marketing, and desire for more transparency
18. Academic vs Practical Intelligence
Related: Distinction between Gemini excelling at academic benchmarks while feeling less useful for practical tasks, discussion of book smart vs street smart analogies for AI capabilities
19. First Proof Mathematical Challenge
Related: Discussion of newly released unsolved math problems designed to test frontier models, predictions about whether current models can solve genuine research-level mathematics
20. Subscription Pricing Frustration
Related: Complaints about $250/month Google AI Ultra subscription required for Deep Think access, desire to test new models without platform lock-in, and calls for OpenRouter availability
0. Does not fit well in any category
</topics>
<comments_to_classify>
[
{
"id": "46993583",
"text": "It would make more sense to me if it had never been awesome."
}
,
{
"id": "46995124",
"text": "They may quantize the models after release to save money."
}
,
{
"id": "46993936",
"text": "It seems to be adept at reviewing/editing/critiquing, at least for my use cases. It always has something valuable to contribute from that perspective, but has been comparatively useless otherwise (outside of moats like \"exclusive access to things involving YouTube\")."
}
,
{
"id": "46999363",
"text": "Dr., please tell me are we cooked? :crying-emoji"
}
,
{
"id": "46999507",
"text": "I wish they would unleash it on the Google Cloud console. Whatever version of Gemini they offer in the sidebar when I log in is terrible ."
}
,
{
"id": "46992952",
"text": "I need to test the sketch creation a s a p. I need this in my life because learning to use Freecad is too difficult for a busy person like me (and frankly, also quite lazy)"
}
,
{
"id": "46992976",
"text": "FWIW, the FreeCAD 1.1 nightlies are much easier and more intuitive to use due to the addition of many on-canvas gizmos."
}
,
{
"id": "46991634",
"text": "Why a Twitter post and not the official Google blog post… https://blog.google/innovation-and-ai/models-and-research/ge..."
}
,
{
"id": "46992729",
"text": "Just normal randomness I suppose. I've put that URL at the top now, and included the submitted URL in the top text."
}
,
{
"id": "46991854",
"text": "The official blog post was submitted earlier ( https://news.ycombinator.com/item?id=46990637 ), but somehow this story ranked up quickly on the homepage."
}
,
{
"id": "46992442",
"text": "@dang will often replace the post url & merge comments\n\nHN guidelines prefer the original source over social posts linking to it."
}
,
{
"id": "46992458",
"text": "Agreed - blog post is more appropriate than a twitter post"
}
,
{
"id": "46997698",
"text": "Fuck google. Boycott Google."
}
,
{
"id": "46997928",
"text": "Israel is not one of the boots. Deplorable as their domestic policy may be, they're not wagging the dog of capitalist imperialism. To imply otherwise is to reveal yourself as biased, warped in a way that keeps you from going after much bigger, and more real systems of political economy holding back our civilization from universal human dignity and opportunity."
}
,
{
"id": "46998235",
"text": "Lol what? Not sure if you are defending Israel or google because your communication style is awful. But if you are defending Israel then you're an idiot who is excusing genocide. If you're defending google then you're just a corporate bootlicker who means nothing."
}
,
{
"id": "46999160",
"text": "As opposed to Hamas who actually committed the genocide"
}
,
{
"id": "46993932",
"text": "Always the same with Google.\n\nGemini has been way behind from the start.\n\nThey use the firehose of money from search to make it as close to free as possible so that they have some adoption numbers.\n\nThey use the firehose from search to pay for tons of researchers to hand hold academics so that their non-economic models and non-economic test-time-compute can solve isolated problems.\n\nIt's all so tiresome.\n\nTry making models that are actually competitive, Google.\n\nSell them on the actual market and win on actual work product in millions of people lives."
}
,
{
"id": "47001359",
"text": "I'm sorry but this is an insane take. Flash is leading its category by far. Absolutely destroys sonnet, 5.2 etc in both perf and cost.\n\nPro still leads in visual intelligence.\n\nThe company that most locks away their gold is Anthropic IMO and for good reason, as Opus 4.6 is expensive AF"
}
,
{
"id": "47001891",
"text": "I think we highly underestimate the amount of \"human bots\" basically.\n\nUnthinking people programmed by their social media feed who don't notice the OpenAI influence campaign.\n\nWith no social media, it seems obvious to me there was a massive PR campaign by OpenAI after their \"code red\" to try to convince people Gemini is not all that great.\n\nYea, Gemini sucks, don't use it lol. Leave those resources to fools like myself."
}
,
{
"id": "46993289",
"text": "Gemini 3 Pro/Flash is stuck in preview for months now. Google is slow but they progress like a massive rock giant."
}
,
{
"id": "46993270",
"text": "Does anyone actually use Gemini 3 now? I cant stand its sleek salesy way of introduction, and it doesnt hold to instructions hard – makes it unapplicable for MECE breakdowns or for writing."
}
,
{
"id": "46997148",
"text": "I use it often. Occasionally for quick questions, but mostly for deep research."
}
,
{
"id": "46993467",
"text": "I do. It's excellent when paired with an MCP like context7."
}
,
{
"id": "46993297",
"text": "I dont agree, Gemini 3 is pretty good, even the Lite version."
}
,
{
"id": "46993363",
"text": "What do you use it for and why? Genuinely curious"
}
,
{
"id": "47001778",
"text": "I use Gemini Pro for basically everything. I just started learning systems biology as I didn't even know this was a subject until it came up in a conversation.\n\nBiology is subject I am quite lacking in but it is unbelievable to me what I have learned in the last few weeks. Not even in what Gemini says exactly but in the text and papers it has led me to.\n\nOne major reason is that it has never cut me off until last night. I ran several deep researches yesterday and then finally got cut off in a sprawling 2 hour conversation.\n\nFor me it is the first model now that has something new coming out but I haven't extracted all the value from the old model that I am bored with it. I still haven't tried Opus 4.5 let alone 4.6 because I know I will get cut off right when things get rolling.\n\nI don't think I have even logged into ChatGPT in a month now."
}
,
{
"id": "46993715",
"text": "It indeed departs from instructions pretty regularly. But I find it very useful and for the price it beats the world.\n\n\"The price\" is the marginal price I am paying on top of my existing Google 1, YouTube Premium, and Google Fi subs, so basically nothing on the margin."
}
]
</comments_to_classify>
Based on the comments above, assign each to up to 3 relevant topics.
Return ONLY a JSON array with this exact structure (no other text):
[
{
"id": "comment_id_1",
"topics": [
1,
3,
5
]
}
,
{
"id": "comment_id_2",
"topics": [
2
]
}
,
{
"id": "comment_id_3",
"topics": [
0
]
}
,
...
]
Rules:
- Each comment can have 0 to 3 topics
- Use 1-based topic indices for matches
- Use index 0 if the comment does not fit well in any category
- Only assign topics that are genuinely relevant to the comment
Remember: Output ONLY the JSON array, no other text.
27