Summarizer

LLM Input

llm/2ad2a7bb-5462-4391-a2da-bf11064993c9/topic-16-3d24260c-5370-480e-b243-d6c17ca449a4-input.json

prompt

The following is content for you to summarize. Do not respond to the comments—summarize them.

<topic>
Model Architecture Secrecy # Observation that frontier labs no longer share architecture details like parameter counts, shift from technical discussions to capability-focused marketing, and desire for more transparency
</topic>

<comments_about_topic>
1. I'm having trouble just keeping track of all these different types of models.

Is "Gemini 3 Deep Think" even technically a model? From what I've gathered, it is built on top of Gemini 3 Pro, and appears to be adding specific thinking capabilities, more akin to adding subagents than a truly new foundational model like Opus 4.6.

Also, I don't understand the comments about Google being behind in agentic workflows. I know that the typical use of, say, Claude Code feels agentic, but also a lot of folks are using separate agent harnesses like OpenClaw anyway. You could just as easily plug Gemini 3 Pro into OpenClaw as you can Opus, right?

Can someone help me understand these distinctions? Very confused, especially regarding the agent terminology. Much appreciated!

2. The term “model” is one of those super overloaded terms. Depending on the conversation it can mean:

- a product (most accurate here imo)

- a specific set of weights in a neural net

- a general architecture or family of architectures (BERT models)

So while you could argue this is a “model” in the broadest sense of the term, it’s probably more descriptive to call it a product. Similarly we call LLMs “language” models even if they can do a lot more than that, for example draw images.

3. I'm pretty sure only the second is properly called a model, and "BERT models" are simply models with the BERT architecture.

4. It depends on time. 5 years ago it was quite well defined that it’s the last one, maybe the second one in some context. Especially when distinction was important, it was always the last one. In our case it was. We trained models to have weights. We even stored models and weights separately, because models change slower than weights. You could choose a model and a set of weights, and run them. You could change weights any time.

Then marketing, and huge amount of capital came.

5. It seems unlikely "model" was ever equivalent in meaning to "architecture". Otherwise there would be just one "CNN model" or just one "transformer model" insofar there is a single architecture involved.

6. More focus has been put on post-training recently. Where a full model training run can take a month and often requires multiple tries because it can collapse and fail, post-training is don't on the order of 5 or 6 days.

My assumption is that they're all either pretty happy with their base models or unwilling to do those larger runs, and post-training is turning out good results that they release quickly.

7. Look what they need to mimic a fraction of [the power of having the logit probabilities exposed so you can actually see where the model is uncertain]

8. Do we get any model architecture details like parameter size etc.? Few months back, we used to talk more on this, now it's mostly about model capabilities.

9. I'm honestly not sure what you mean? The frontier labs have kept arch as secrets since gpt3.5

10. At the very least gemini 3's flyer claims 1T parameters.
</comments_about_topic>

Write a concise, engaging paragraph (3-5 sentences) summarizing the key points and perspectives in these comments about the topic. Focus on the most interesting viewpoints. Do not use bullet points—write flowing prose.

topic

Model Architecture Secrecy # Observation that frontier labs no longer share architecture details like parameter counts, shift from technical discussions to capability-focused marketing, and desire for more transparency

commentCount

10

← Back to job