Moving Collectors vs Malloc

Technical debate about GC versus manual memory management, CPU vs RAM tradeoffs, efficiency of different allocation strategies in resource-constrained environments

The debate centers on the complex interplay between RAM footprint and CPU efficiency, questioning whether reducing memory usage actually improves performance or merely shifts the burden to the processor. Some contributors argue that moving garbage collectors are superior because they allow developers to trade abundant RAM for lower CPU overhead, theoretically achieving higher efficiency than manual "malloc/free" strategies by minimizing the frequency of memory management tasks. Conversely, critics emphasize that cache locality and memory throughput are the true bottlenecks, suggesting that smaller working sets and tighter data packing are more effective than simply inflating the heap to save cycles. Ultimately, the discussion highlights that software efficiency is a delicate balancing act; because a machine is effectively exhausted once either resource hits its limit, the goal is to find the optimal RAM-to-CPU ratio for a specific workload rather than minimizing one at the expense of the other.

View on HN · Topics

Using a lot less RAM often implies using more CPU, so even with inflated RAM prices, it's not a good tradeoff (at least not in general).

View on HN · Topics

In practice, you generally see the opposite. The "CPU" is in fact limited by memory throughput. (The exception is intense number crunching or similar compute-heavy code, where thermal and power limits come into play. But much of that code can be shifted to the GPU.)

View on HN · Topics

RAM throughput and RAM footprint are only weakly related. The throughput is governed by the cache locality of access patterns. A program with a 50MB footprint could put more pressure on the RAM bus than one with a 5GB footprint.

View on HN · Topics

You're absolutely right? I don't really disagree with anything you're saying there, that's why I said "generally" and "in practice".

View on HN · Topics

Reducing your RAM consumption is not the best approach to reducing your RAM throughput is my point. It could be effective in some specific situations, but I would definitely not say that those situations are more common than the other ones.

View on HN · Topics

I don't understand how this connects to your original claim, which was about trading ram usage for CPU cycles. Could you elaborate?

From what I understand, increasing cache locality is orthogonal to how much RAM an app is using. It just lets the CPU get cache hits more often, so it only relates to throughout.

That might technically offload work to the CPU, but that's work the CPU is actually good at. We want to offload that.

In the case of Electron apps, they use a lot of RAM and that's not to spare the CPU

View on HN · Topics

> increasing cache locality is orthogonal to how much RAM an app is using. It just lets the CPU get cache hits more often, so it only relates to throughout.

Cache misses mean CPU stalls, which mean wasted CPU (i.e. the CPU accomplises less than it could have in some amount of time).

> In the case of Electron apps, they use a lot of RAM and that's not to spare the CPU

The question isn't why apps use a lot of RAM, but what the effects of reducing it are. Redcuing memory consumption by a little can be cheap, but if you want to do it by a lot, development and maintenance costs rise and/or CPU costs rise, and both are more expensive than RAM, even at inflated prices.

To get a sense for why you use more CPU when you want to reduce your RAM consumption by a lot, using much less RAM while allowing the program to use the same data means that you're reusing the same memory more frequently, and that takes computational work.

But I agree that on consumer devices you tend to see software that uses a significant portion of RAM and a tiny portion of CPU and that's not a good balance, just as the opposite isn't. The reason is that CPU and RAM are related, and your machine is "spent" when one of them runs out. If a program consumes a lot of CPU, few other programs can run on the machine no matter how much free RAM it has, and if a program consumes a lot of RAM, few other programs can run no matter how much free CPU you have. So programs need to aim for some reasonable balance of the RAM and CPU they're using. Some are inefficient by using too little RAM (compared to the CPU they're using), and some are inefficient by using too little CPU (compared to the RAM they're using).

View on HN · Topics

> Cache misses mean CPU stalls, which mean wasted CPU (i.e. the CPU accomplises less than it could have in some amount of time).

Yeah, I was saying CPU cache hits would result in better performance. The creator of Zig has argued that the easiest way to improve cache locality is by having smaller working sets of memory to begin with. No, it's not a given this will always work in every case. You can reduce working memory and not have better cache locality. But in a general sense, I understand why he argues for it.

> So programs need to aim for some reasonable balance of the RAM and CPU they're using

I agree with this, but

> but if you want to do it by a lot, development and maintenance costs rise and/or CPU costs rise, and both are more expensive than RAM, even at inflated prices

I would like you to clarify further, because saying CPU costs are more expensive than RAM costs is a bit misleading. A CPU might literally cost more than RAM, but a CPU is remarkably faster, and for work done, much cheaper and more efficient, especially with cache hits.

You had originally said

> It could be effective in some specific situations, but I would definitely not say that those situations are more common than the other ones

This is what I'm confused on. Why do you think most cases wouldn't benefit from this? Almost every app I've used is way on one end of the spectrum with regards to memory consumption vs CPU cycles. Don't you think there are actually a lot of cases where we could reduce memory usage AND increase cache locality, fitting more data into cache lines, avoiding GC pressure, avoiding paging and allocations, and the software would 100% be faster?

View on HN · Topics

> But in a general sense, I understand why he argues for it.

Andrew is not wrong, but he's talking about optimisations with relatively little impact compared to others and is addressing people who already write software that's otherwise optimised. More concretely, keeping data packed tighter and reducing RAM footprint are not the same. The former does help CPU utilisation but doesn't make as big of an impact on the latter as things that are detrimental to the CPU (such as switching from moving collectors to malloc/free).

> Why do you think most cases wouldn't benefit from this?

The context to which "this" is referring to was "Reducing your RAM consumption is not the best approach to reducing your RAM throughput is my point." For data-packing, Andy Kelley style, to reduce the RAM bandwidth, the access patterns must be very regular, such as processing some large data structure in bulk (where prefetching helps). This is something you could see in batch applications (such as compilers), but not in most programs, which are interactive. If your data access patterns are random, packing it more tightly will not significantly reduce your RAM bandwidth.

View on HN · Topics

That's fine, but I was responding to a comment that said that RAM prices would put pressure to optimise footprint. Optimising footprint could often lead to wasting more CPU, even if your starting point was optimising for neither.

View on HN · Topics

My response was that I disagree with this conclusion that something like "pressure to optimize RAM implies another hardware tradeoff" is the primary thing which will give, not that I'm changing the premise.

Pressure to optimize can more often imply just setting aside work to make the program be nearer to being limited by algorithmic bounds rather than doing what was quickest to implement and not caring about any of it. Having the same amount of time, replacing bloated abstractions with something more lightweight overall usually nets more memory gains than trying to tune something heavy to use less RAM at the expense of more CPU.

View on HN · Topics

You're thinking an algorithmic tradeoff, but this is an abstraction tradeoff.

View on HN · Topics

Some of the algorithms are built deep into the runtime. E.g. languages that rely on malloc/free allocators (which require maintaining free lists) are making a pretty significnant tradoff of wasting CPU to save on RAM as opposed to languages using moving collectors.

View on HN · Topics

Free lists aren't expensive for most usage patterns. For cases where they are we've got stuff like arena allocators. Meanwhile GC is hardly cheap.

Of course memory safety has a quality all its own.

View on HN · Topics

> Free lists aren't expensive for most usage patterns.

Whatever little CPU they waste is often worth more than the RAM they save.

> For cases where they are we've got stuff like arena allocators.

... that work by using more RAM to save on CPU.

View on HN · Topics

GC burns far more CPU cycles. Meanwhile I'm not sure where you got this idea about the value of CPU cycles relative to RAM. Most tasks stall on IO. Those that don't typically stall on either memory bandwidth or latency. Meanwhile CPU bound tasks typically don't perform allocations and if forced avoid the heap like the plague.

View on HN · Topics

> GC burns far more CPU cycles

Far less for moving collectors. That's why they're used: to reduce the overhead of malloc/free based memory management. The whole point of moving collectors is that they can make the CPU cost of memory management arbitrarily low , even lower than stack allocation. In practice it's more complicated, but the principle stands.

The reason some programs "avoid the heap like the plague" is because their memory management is CPU-inefficient (as in the case of malloc/free allocators).

> Meanwhile I'm not sure where you got this idea about the value of CPU cycles relative to RAM

There is a fundamental relationship between CPU and RAM. As we learn in basic complexity theory, the power of what can be computed depends on how much memory an algorithm can use. On the flip side, using memory and managing memory requires CPU.

To get the most basic intuition, let's look at an extreme example. Consider a machine with 1 GB of free RAM and two programs that compute the same thing and consume 100% CPU for their duration. One uses 80MB of RAM and runs for 100s; the other uses 800MB of RAM and runs for 99s (perhaps thanks to a moving collector). Which is more efficient? It may seem that we need to compare the value of 1% CPU reduction vs a 10x increase in RAM consumption, but that's not necessary. The second program is more efficient. Why? Because when a program consumes 100% of the CPU, no other program can make use of any RAM, and so both programs effectively capture all 1GB, only the second program captures it for one second less.

This scales even to cases when the CPU consumption is less than 100% CPU, as the important thing to realise is that the two resources are coupled. The thing that needs to be optimised isn't CPU and RAM separately, but the RAM/CPU ratio. A program can be less efficient by using too little RAM if using more RAM can reduce its CPU consumption to get the right ratio (e.g. by using a moving collector) and vice versa.

View on HN · Topics

Moving collectors as generally used are a huge waste of memory throughput, and this shows up consistently in the performance measurements. Moving data is very expensive! The whole point of ownership tracking in programming languages is so that large chunks of "owned" data can just stay put until freed, and only the owning handle (which is tiny) needs to move around. Most GC programming languages do a terrible job of supporting that pattern.

View on HN · Topics

That's just not true. To give you a few pieces of the picture, moving collectors move little memory and do so rarely (relative to the allocation rate):

In the young generation, few objects survive and so few are moved (the very few that survive longer are moved into the old gen); in the old generation, most objects survive, but the allocation rate is so low that moving them is rare (although the memory management technique in the old gen doesn't matter as much precisely because the allocation rate is so low, so whether you want a moving algorithm or not in the old gen is less about speed and more about other concerns).

On top of that, the general principle of moving collectors (and why in theory they're cheaper than stack allocation) is that the cost of the overall work of moving memory is roughly constant for a specific workload, but its frequency can be made as low as you want by using more RAM.

The reason moving collectors are used in the first place is to reduce the high overhead of malloc/free allocators.

Anyway, the general point I was making above is that a machine is exhausted not when both CPU and RAM are exhausted, but when one of them is. Efficient hardware utilisation is when the program strikes some good balance between them. There's not much point to reducing RAM footprint when CPU utilisation is high or reducing CPU consumption when RAM consumption is high. Using much of one and little of the other is wasteful when you can reduce the higher one by increasing the other. Moving collectors give you a convenient knob to do that: if a program consumes a lot of CPU and little RAM, you can increase the heap and turn some RAM into CPU and vice versa.

View on HN · Topics

hopefully not implying needing a gc for memory safety...

View on HN · Topics

Yeah, there's always Fil-C (Rust isn't memory safe in practice).

Summarizer