Summarizer

LLM Input

llm/2ad2a7bb-5462-4391-a2da-bf11064993c9/topic-6-20bc02d6-085e-410b-a97e-b722de88e4f0-input.json

Pretty-print

prompt

The following is content for you to summarize. Do not respond to the comments—summarize them.

<topic>
Model Release Acceleration # Observation that AI model releases are accelerating dramatically, multiple frontier models released within days, connection to Chinese New Year timing, and competition between US and Chinese labs
</topic>

<comments_about_topic>
1. DeepSeek hasn't been SotA in at least 12 calendar months, which might as well be a decade in LLM years

2. What about Kimi and GLM?

3. These are well behind the general state of the art (1yr or so), though they're arguably the best openly-available models.

4. Idk man, GLM 5 in my tests matches opus 4.5 which is what, two months old?

5. 4.5 was never sota

6. According to artificial analysis ranking, GLM-5 is at #4 after Claude Opus 4.5, GPT-5.2-xhigh and Claude Opus 4.6 .

7. But... there's Deepseek v3.2 in your link (rank 7)

8. Could it also be that the models are just a lot better than a year ago?

9. What's the point of denying or downplaying that we are seeing amazing and accelerating advancements in areas that many of us thought were impossible?

10. Is it me or is the rate of model release is accelerating to an absurd degree? Today we have Gemini 3 Deep Think and GPT 5.3 Codex Spark. Yesterday we had GLM5 and MiniMax M2.5. Five days before that we had Opus 4.6 and GPT 5.3. Then maybe two weeks I think before that we had Kimi K2.5.

11. I think it is because of the Chinese new year.
The Chinese labs like to publish their models arround the Chinese new year, and the US labs do not want to let a DeepSeek R1 (20 January 2025) impact event happen again, so i guess they publish models that are more capable then what they imagine Chinese labs are yet capable of producing.

12. Singularity or just Chinese New Year?

13. I guess. Deepseek v3 was released on boxing day a month prior

https://api-docs.deepseek.com/news/news1226

14. And made almost zero impact, it was just a bigger version of Deepseek V2 and when mostly unnoticed because its performances weren't particularly notable especially for its size.

It was R1 with its RL-training that made the news and crashed the srock market.

15. Aren't we saying "lunar new year" now?

16. I don't think so; there are different lunar calendars.

17. There are hints this is a preview to Gemini 3.1.

18. More focus has been put on post-training recently. Where a full model training run can take a month and often requires multiple tries because it can collapse and fail, post-training is don't on the order of 5 or 6 days.

My assumption is that they're all either pretty happy with their base models or unwilling to do those larger runs, and post-training is turning out good results that they release quickly.

19. So, yes, for the past couple weeks it has felt that way to me. But it seems to come in fits and starts. Maybe that will stop being the case, but that's how it's felt to me for awhile.

20. its cause of a chain of events.

Next week Chinese New year -> Chinese labs release all the models at once before it starts -> US labs respond with what they have already prepared

also note that even in US labs a large proportion of researchers and engineers are chinese and many celebrate the Chinese New Year too.

TLDR: Chinese New Year. Happy Horse year everybody!

21. Fast takeoff.

22. They are spending literal trillions. It may even accelerate

23. There's more compute now than before.

24. They are using the current models to help develop even smarter models. Each generation of model can help even more for the next generation.

I don’t think it’s hyperbolic to say that we may be only a single digit number of years away from the singularity.

25. Of course, n-1 wasn't good enough but n+1 will be singularity, just two more weeks my dudes, two more week... rinse and repeat ad infinitum

26. Interestingly, the title of that PDF calls it "Gemini 3.1 Pro". Guess that's dropping soon.

27. I looked at the file name but not the document title (specifically because I was wondering if this is 3.1). Good spot.

edit: they just removed the reference to "3.1" from the pdf

28. I think this is 3.1 (3.0 Pro with the RL improv of 3.0 Flash).
But they probably decided to market it as Deep Think because why not charge more for it.

29. The Deep Think moniker is for parallel compute models though, not long CoT like pro models.

It's possible though that deep think 3 is running 3.1 models under the hood.

30. That's odd considering 3.0 is still labeled a "preview" release.

31. I think it'll be 3.1 by the time it's labelled GA - they said after 3.0 launch that they figured out new RL methods for Flash that the Pro model hasn't benefitted from.

32. The rumor was that 3.1 was today's drop

33. Where are these rumors floating around?

34. One of many https://x.com/synthwavedd/status/2021983382314660075

35. Huh, so if a China-based lab takes ARC-AGI-2 on the new year, then they can say they had just-shy of a solution anyway.

36. The general purpose ChatGpt 5.3 hasn’t been released yet, just 5.3-codex.

37. It's a giant game of leapfrog, shift or stretch time out a bit and they all look equivalent

38. It’s incredible how fast these models are getting better. I thought for sure a wall would be hit, but these numbers smashes previous benchmarks. Anyone have any idea what the big unlock that people are finding now?

39. I unironically believe that arc-agi-3 will have a introduction to solved time of 1 month

40. We will see at the end of April right? It's more of a guess than a strongly held conviction--but I see models improving rapidly at long horizon tasks so I think it's possible. I think a benchmark which can survive a few months (maybe) would be if it genuinely tested long time-frame continual learning/test-time learning/test-time posttraining (idk honestly the differences b/t these).

But i'm not sure how to give such benchmarks. I'm thinking of tasks like learning a language/becoming a master at chess from scratch/becoming a skill artists but where the task is novel enough for the actor to not be anywhere close to proficient at beginning--an example which could be of interest is, here is a robot you control, you can make actions, see results...become proficient at table tennis. Maybe another would be, here is a new video game, obtain the best possible 0% speedrun.

41. Everyone is already at 80% for that one. Crazy that we were just at 50% with GPT-4o not that long ago.

42. I think I'm finally realizing that my job probably won't exist in 3-5. Things are moving so fast now that the LLMs are basically writing themselves. I think the earlier iterations moved slower because they were limited by human ability and productivity limitations.
</comments_about_topic>

Write a concise, engaging paragraph (3-5 sentences) summarizing the key points and perspectives in these comments about the topic. Focus on the most interesting viewpoints. Do not use bullet points—write flowing prose.

topic

Model Release Acceleration # Observation that AI model releases are accelerating dramatically, multiple frontier models released within days, connection to Chinese New Year timing, and competition between US and Chinese labs

commentCount

← Back to job