Jevons Paradox in Computing

Efficiency gains in quantization or compression immediately consumed by larger models or contexts, memory throughput improvements not reducing prices but enabling more demand

The AI landscape is increasingly defined by Jevons Paradox, as technical breakthroughs in quantization and memory efficiency are instantly consumed by the pursuit of larger context windows and deeper reasoning steps. Rather than lowering hardware costs, these optimizations appear to fuel an "infinite demand" for VRAM and HBM, where any freed-up headroom is immediately filled with more complex model architectures or increased token usage. While some skeptics question whether massive context windows offer diminishing returns in performance, the consensus suggests that companies will prioritize expanded capabilities over financial relief, effectively entering a "Reverse-Moore's Law" era. Ultimately, this cycle ensures that as AI becomes more efficient, its footprint expands to fill every available resource, keeping hardware markets under constant pressure.

View on HN · Topics

But wouldn’t you rather hbm prices come down first ? Memory makers will be fine. There is practically infinite demand.
Unless you get china style rationing of compute per person world wide.

The real issue is everyone wanting to upgrade to hbm, ddr5, and nvme5 at the same time.

View on HN · Topics

Everyone's betting on Jevons paradox

the hope is that Ai is "the next semiconductor" and "the next internet"

View on HN · Topics

> This demand for RAM is built on a foundation of sand

Not exactly.

LLMs are already quite useful today if you use them as a tool, so they are there to stay. The remaining problem is scalability, a.k.a. how to make LLMs cheap to use.

But scalability is not really a requirement when you look the bigger picture. If smaller software company/projects can't afford to use AI, the bigger ones might just. Eventually they will discover variable use cases for such tech, even if it only serves big firms i.e. defense, resource extraction, war, finance etc.

To the other end, if scalability is achieved, the use of LLM products will be cheaper too, so smaller project can also use them. But of course, if LLM usage is too cheap, then many were-to-be-consumers will just create software projects by themselves at their homes.

View on HN · Topics

BTW, a number of corrections. The TurboQuant paper was submitted to Arxiv back in April 2025: https://arxiv.org/abs/2504.19874

Current "TurboQuant" implementations are about 3.8X-4.9X on compression (w/ the higher end taking some significant hits of GSM8K performance) and with about 80-100% baseline speed (no improvement, regression): https://github.com/vllm-project/vllm/pull/38479

For those not paying attention, it's probably worth sending this and ongoing discussion for vLLM https://github.com/vllm-project/vllm/issues/38171 and llama.cpp through your summarizer of choice - TurboQuant is fine, but not a magic bullet. Personally, I've been experimenting with DMS and I think it has a lot more promise and can be stacked with various quantization schemes.

The biggest savings in kvcache though is in improved model architecture. Gemma 4's SWA/global hybrid saves up to 10X kvcache, MLA/DSA (the latter that helps solve global attention compute) does as well, and using linear, SSM layers saves even more.

None of these reduce memory demand (Jevon's paradox, etc), though. Looking at my coding tools, I'm using about 10-15B cached tokens/mo currently (was 5-8B a couple months ago) and while I think I'm probably above average on the curve, I don't consider myself doing anything especially crazy and this year, between mainstream developers, and more and more agents, I don't think there's really any limit to the number of tokens that people will want to consume.

View on HN · Topics

The net effect won’t be a memory use reduction to achieve the same thing. We’ll do more with the same amount of memory. Companies will increase the context windows of their offerings and people will use it.

That is the sad reality of the future of memory.

View on HN · Topics

I am not convinced that more context will be useful, practical use of current models at 1mil context window shows they get less effective as the window grows. Given model progress is slowing as well, perhaps we end up reaching a balance of context size and competency sooner than expected.

View on HN · Topics

Stuff in more code. Stuff in more system prompt. Stuff in raw utf8 characters instead of tokens to fix strawberries. Stuff in WAY more reasoning steps.

Given the current tech, I also doubt there will be practical uses and I hope we’ll see the opposite of what I wrote. But given the current industry, I fully trust them so somehow fill their hardware.

Market history shows us than when the cost of something goes down, we do more with the same amount, not the same thing with less. But I deeply hope to be wrong here and the memory market will relax.

View on HN · Topics

Your skepticism is well placed. Every time a new quantization or compression technique drops, the immediate response is to just scale up context length or run a bigger model to fill whatever headroom was freed up. It's Jevons paradox applied to VRAM - efficiency gains get eaten by increased usage almost immediately.

View on HN · Topics

You can still use as much memory, but fit more things into it, so I don’t think the current market hogs will let go easily.

View on HN · Topics

that will only increase the demand for RAM as models will now be usable in scenarios that weren't feasible prior, and the ceiling for model and context size is not even visible at this point

I hate to mention Jevons paradox as it has become cliche by now, but this is a textbook such scenario

View on HN · Topics

The demand is being driven by inference though. I really don't think there will be much motivation.

View on HN · Topics

Are we entering the Reverse-Moore's Law era.

View on HN · Topics

something that's better than the current state-of-the-art for half the price. And what's that going to do to the trillions in AI DC investment?

They'll just spend whatever they were planning to spend and get more performance.

Summarizer