Model Architecture Secrecy

Observation that frontier labs no longer share architecture details like parameter counts, shift from technical discussions to capability-focused marketing, and desire for more transparency

The technical clarity of AI is dissolving as the term "model" shifts from a specific architectural definition to an overloaded marketing label for diverse products and agentic workflows. Commenters note a distinct decline in transparency, with frontier labs now prioritizing capability-focused hype over sharing vital technical specs like parameter counts or logit probabilities. This trend is exacerbated by a pivot toward rapid post-training cycles, making it increasingly difficult for observers to distinguish between truly new foundational breakthroughs and refined versions of existing architectures.

View on HN · Topics

I'm having trouble just keeping track of all these different types of models.

Is "Gemini 3 Deep Think" even technically a model? From what I've gathered, it is built on top of Gemini 3 Pro, and appears to be adding specific thinking capabilities, more akin to adding subagents than a truly new foundational model like Opus 4.6.

Also, I don't understand the comments about Google being behind in agentic workflows. I know that the typical use of, say, Claude Code feels agentic, but also a lot of folks are using separate agent harnesses like OpenClaw anyway. You could just as easily plug Gemini 3 Pro into OpenClaw as you can Opus, right?

Can someone help me understand these distinctions? Very confused, especially regarding the agent terminology. Much appreciated!

View on HN · Topics

The term “model” is one of those super overloaded terms. Depending on the conversation it can mean:

- a product (most accurate here imo)

- a specific set of weights in a neural net

- a general architecture or family of architectures (BERT models)

So while you could argue this is a “model” in the broadest sense of the term, it’s probably more descriptive to call it a product. Similarly we call LLMs “language” models even if they can do a lot more than that, for example draw images.

View on HN · Topics

I'm pretty sure only the second is properly called a model, and "BERT models" are simply models with the BERT architecture.

View on HN · Topics

It depends on time. 5 years ago it was quite well defined that it’s the last one, maybe the second one in some context. Especially when distinction was important, it was always the last one. In our case it was. We trained models to have weights. We even stored models and weights separately, because models change slower than weights. You could choose a model and a set of weights, and run them. You could change weights any time.

Then marketing, and huge amount of capital came.

View on HN · Topics

It seems unlikely "model" was ever equivalent in meaning to "architecture". Otherwise there would be just one "CNN model" or just one "transformer model" insofar there is a single architecture involved.

View on HN · Topics

More focus has been put on post-training recently. Where a full model training run can take a month and often requires multiple tries because it can collapse and fail, post-training is don't on the order of 5 or 6 days.

My assumption is that they're all either pretty happy with their base models or unwilling to do those larger runs, and post-training is turning out good results that they release quickly.

View on HN · Topics

Look what they need to mimic a fraction of [the power of having the logit probabilities exposed so you can actually see where the model is uncertain]

View on HN · Topics

Do we get any model architecture details like parameter size etc.? Few months back, we used to talk more on this, now it's mostly about model capabilities.

View on HN · Topics

I'm honestly not sure what you mean? The frontier labs have kept arch as secrets since gpt3.5

View on HN · Topics

At the very least gemini 3's flyer claims 1T parameters.

Summarizer