Summarizer

Missing Benchmarks and Weights

Criticism that no model weights or compiler tools were released, and lack of performance benchmarks against baseline approaches limits reproducibility and evaluation.

← Back to Executing programs inside transformers with exponentially faster inference

Critics argue that the project is virtually unusable without released weights or compiler tools, hindering the very low-budget experimentation the system claims to support. A significant portion of the debate centers on the report's presentation, with some commenters dismissing it as "repetitive AI fluff" that uses a salesman-like tone to mask a lack of empirical data. Others contend that blaming AI for the missing benchmarks is a distraction, suggesting that the omission of results is a deliberate choice by the authors rather than a byproduct of their writing tools. Ultimately, while the neurosymbolic approach holds some interest, the community finds it difficult to evaluate the project's merits without the transparency of reproducible benchmarks.

3 comments tagged with this topic

View on HN · Topics
This seems like it has some potential, but is pretty much useless as it is. Shame there are no weights released - let alone the "compiler" tool they used to actually synthesize computational primitives into model weights. It seems like a "small model" system that's amenable to low budget experiments, and I would love to see what this approach can be pushed towards. I disagree with the core premise, it's basically the old neurosymbolic garbage restated, but embedding predefined computational primitives into LLMs could have some uses nonetheless.
View on HN · Topics
>"This shows the downside of using AI to write up your project." I just find phrases like this a bit obnoxious at times. >You would not have had a problem with calling out a badly composed rambling article 5 years ago. Then why not just say that? It's rambling bla bla bla. What's so hard about that? Why invent a reason for issues, as if rambling articles didn't get written 5 years ago. Like No, being written by an LLM or not is not the reason the article has no benchmarks or interpretability results. Those things would be there regardless if the author was interested in that, so again, it just seems there's little point in making such assertions.
View on HN · Topics
It's very hard to discuss this. To some people it's obvious, to some it isn't. To me, every single paragraphs is obvious fluff AI writing. One problem with it is the repetitiveness and the schmoozing salesman feel. The other is the lack of benchmarks and stuff. It's both. The two are connected because the AI has to lean in to its bullshitter persona when it's not given enough raw material to write up something strong. But whenever an AI writes in its default voice like this, it also indicates that the context was not well curated. But anyway, yes, I can also just move on to the next article. Most of the time I indeed do that.