Caveman Token Optimization

Discussion of tools like caveman for reducing token usage through terse language, skepticism about effectiveness, concerns about confusing models with unusual speech patterns

As LLM tokenizers become more complex and costly, developers are turning to tools like "caveman" and "RTK" to aggressively compress language and reduce overhead. While some proponents see this as a necessary defense against rising expenses and a way to speed up agentic loops, skeptics warn that forcing models to "larp" as less articulate personas can severely degrade reasoning quality and lead to "lobotomized" performance. Despite the debate over whether these tools are practical or merely "coding voodoo," emerging research into compressed chain-of-thought suggests that stripping away linguistic filler could eventually offer a path to high-efficiency computing without sacrificing cognitive resolution.

View on HN · Topics

> Opus 4.7 uses an updated tokenizer that improves how the model processes text. The tradeoff is that the same input can map to more tokens—roughly 1.0–1.35× depending on the content type.

caveman[0] is becoming more relevant by the day. I already enjoy reading its output more than vanilla so suits me well.

[0] https://github.com/JuliusBrussee/caveman/tree/main

View on HN · Topics

I hope people realize that tools like caveman are mostly joke/prank projects - almost the entirety of the context spent is in file reads (for input) and reasoning (in output), you will barely save even 1% with such a tool, and might actually confuse the model more or have it reason for more tokens because it'll have to formulate its respone in the way that satisfies the requirements.

View on HN · Topics

Why? Doesn't have jokey copy. Any thoughts on claude-mem[0] + context-mode[1]?

[0] https://github.com/thedotmack/claude-mem

[1] https://github.com/mksglu/context-mode

View on HN · Topics

While the caveman stuff is obviously not serious, there is a lot of legit research in this area.

Which means yes, you can actually influence this quite a bit. Read the paper “Compressed Chain of Thought” for example, it shows it’s really easy to make significant reductions in reasoning tokens without affecting output quality.

There is not too much research into this (about 5 papers in total), but with that it’s possible to reduce output tokens by about 60%. Given that output is an incredibly significant part of the total costs, this is important.

https://arxiv.org/abs/2412.13171

View on HN · Topics

Some labs do it internally because RLVR is very token-expensive. But it degrades CoT readability even more than normal RL pressure does.

It isn't free either - by default, models learn to offload some of their internal computation into the "filler" tokens. So reducing raw token count always cuts into reasoning capacity somewhat. Getting closer to "compute optimal" while reducing token use isn't an easy task.

View on HN · Topics

Yeah the readability suffers, but as long as the actual output (ie the non-CoT part) stays unaffected it’s reasonably fine.

I work on a few agentic open source tools and the interesting thing is that once I implemented these things, the overall feedback was a performance improvement rather than performance reduction, as the LLM would spend much less time on generating tokens.

I didn’t implement it fully, just a few basic things like “reduce prose while thinking, don’t repeat your thoughts” etc would already yield massive improvements.

View on HN · Topics

Yeah you could easily imagine stenography like inputs and outputs for rapid iteration loops. It's also true that in social media people already want faster-to-read snippets that drop grammar so the desire for density is already there for human authors/readers.

View on HN · Topics

All LLMs also effectively work by ”larping” a role. You steer it towards larping a caveman and well.. let’s just say they weren’t known for their high iq

View on HN · Topics

Fun fact: Neanderthals actually had larger brains than Homo Sapiens! Modern humans are thought to have outcompeted them by working better together in larger groups, but in terms of actual individual intelligence, Neanderthals may have had us beat. Similarly, humans have been undergoing a process of self-domestication over the last couple millenia that have resulted in physiological changes that include a smaller brain size - again, our advantage over our wilder forebearers remains that we're better in larger social groups than they were and are better at shared symbolic reasoning and synchronized activity, not necessarily that our brains are more capable.

(No, none of this changes that if you make an LLM larp a caveman it's gonna act stupid, you're right about that.)

View on HN · Topics

This is why ancient Chinese scholar mode (also extremely terse) is better.

View on HN · Topics

Exactly. The model is exquisitely sensitive to language. The idea that you would encourage it to think like a caveman to save a few tokens is hilarious but extremely counter-productive if you care about the quality of its reasoning.

View on HN · Topics

This specific form may be a joke, but token conscious work is becoming more and more relevant..
Look at
https://github.com/AgusRdz/chop

And

https://github.com/toon-format/toon

View on HN · Topics

Also https://github.com/rtk-ai/rtk but some people see that changing how commands output stuff can confuse some models

View on HN · Topics

I hesitated 100% when i saw caveman gaining steam, changing something like this absolutely changes the behaviour of the models responses, simply including like a "lmao" or something casual in any reply will change the tone entirely into a more relaxed style like ya whatever type mode.

I think a lot of people echo my same criticism, I would assume that the major LLM providers are the actual winners of that repo getting popular as well, for the same reason you stated.

> you will barely save even 1% with such a tool

For the end user, this doesnt make a huge impact, in fact it potentially hurts if it means that you are getting less serious replies from the model itself. However as with any minor change across a ton of users, this is significant savings for the providers.

I still think just keeping the model capable of easily finding what it needs without having to comb through a lot of files for no reason, is the best current method to save tokens. it takes some upfront tokens potentially if you are delegating that work to the agent to keep those navigation files up to date, but it pays dividends when future sessions your context window is smaller and only the proper portions of the project need to be loaded into that window.

View on HN · Topics

We started out with oobabooga, so caveman is the next logical evolution on the road to AGI.

View on HN · Topics

They are indeed impractical in agentic coding.

However in deep research-like products you can have a pass with LLM to compress web page text into caveman speak, thus hugely compressing tokens.

View on HN · Topics

I don't understand how this would work without a huge loss in resolution or "cognitive" ability.

Prediction works based on the attention mechanism, and current humans don't speak like cavemen - so how could you expect a useful token chain from data that isn't trained on speech like that?

I get the concept of transformers, but this isn't doing a 1:1 transform from english to french or whatever, you're fundamentally unable to represent certain concepts effectively in caveman etc... or am I missing something?

View on HN · Topics

Good catch actually.

Okay maybe not exactly caveman dialect, but text compression using LLM is definitely possible to save on tokens in deep research.

View on HN · Topics

I wonder if you can have it reason in caveman

View on HN · Topics

would you be surprised if this is what happens when you ask it to write like one?

folks could have just asked for _austere reasoning notes_ instead of "write like you suffer from arrested development"

View on HN · Topics

Caveman is fun, but the real tool you want to reduce token usage is headroom

https://github.com/gglucass/headroom-desktop (mac app)

https://github.com/chopratejas/headroom (cli)

View on HN · Topics

I tried to use rtk for the same, and my agent session would just loop the same tool call over and over again. Does headroom work better?

View on HN · Topics

Way better. You don’t notice it’s there.

View on HN · Topics

I was doing some experiments with removing top 100-1000 most common English words from my prompts. My hypothesis was that common words are effectively noise to agents. Based on the first few trials I attempted, there was no discernible difference in output. Would love to compare results with caveman.

Caveat: I didn’t do enough testing to find the edge cases (eg, negation).

View on HN · Topics

Yeah, when I'm writing code I try to avoid zeros and ones, since those are the most common bits, making them essentially noise

View on HN · Topics

I literally just posted a blog on this. Some seemingly insignificant words are actually highly structural to the model. https://www.ruairidh.dev/blog/compressing-prompts-with-an-au...

View on HN · Topics

I suspect even typos have an impact on how the model functions.

I wonder if there’s a pre-processor that runs to remove typos before processing. If not, that feels like a space that could be worked on more thoroughly.

View on HN · Topics

The ability for audio processing to figure out spelling from context, especially with regards to acronyms that are pronounced as words, leads me to believe there’s potential for a more intelligent spell check preprocess using a cheaper model.

View on HN · Topics

there is no pre-processor, i've had typos go through, with claude asking to make sure i meant one thing instead of the other

View on HN · Topics

I strongly suspected that there was some pre/postprocessing going on when trying to get it to output rot13("uryyb, jbyeq"), but it's probably just due to massively biased token probabilities. Still, it creates some hilarious output, even when you clearly point out the error:

Hmm, but wait — the original you gave was jbyeq not jbeyq:
j→w, b→o, y→l, e→r, q→d = world
So the final answer is still hello, world. You're right that I was misreading the input. The result stands.

View on HN · Topics

Oh wow, I love this idea even if it's relatively insignificant in savings.

I am finding my writing prompt style is naturally getting lazier, shorter, and more caveman just like this too. If I was honest, it has made writing emails harder.

While messing around, I did a concept of this with HTML to preserve tokens, worked surprisingly well but was only an experiment. Something like:

> <h1 class="bg-red-500 text-green-300"><span>Hello</span></h1>

AI compressed to:

> h1 c bgrd5 tg3 sp hello sp h1

Or something like that.

View on HN · Topics

Combine that with emmet / zen coding: https://en.wikipedia.org/wiki/Emmet_%28software%29?wprov=sfl...

View on HN · Topics

To reduce token count on command outputs you can also use RTK [0]

[0]: https://github.com/rtk-ai/rtk

View on HN · Topics

Caveman hurt model performance. If you need a dumber model with less token output, just use sonnet-4-6 or other non-reasoning model.

View on HN · Topics

I find grep and common cli command spam to be the primary issue. I enjoy Rust Token Killer https://github.com/rtk-ai/rtk , and agents know how to get around it when it truncates too hard.

View on HN · Topics

I really enjoy the party game "Neanderthal Poetry", in which you can only speak using monosyllabic words. I bet you would too.

View on HN · Topics

caveman stops being a style tool and starts being self-defense. once prompt comes in up to 1.35x fatter, they've basically moved visibility and control entirely into their black box.

View on HN · Topics

me feel that it needs some tweaking - it's a little annoyingly cute (and could be even terser).

View on HN · Topics

Another supply chain attack waiting?

Have you tried just adding an instruction to be terse?

Don't get me wrong, I've tried out caveman as well, but these days I am wondering whether something as popular will be hijacked.

View on HN · Topics

People are really trigger-happy when it comes to throwing magic tools on top of AI that claim to "fix" the weak parts (often placeboing themselves because anthropic just fixed some issue on their end).

Then the next month 90% of this can be replaced with new batch of supply chain attack-friendly gimmicks

Especially Reddit seems to be full of such coding voodoo

View on HN · Topics

What a joke Opus 4.7 at max is.

I gave it an agentic software project to critically review.

It claimed gemini-3.1-pro-preview is wrong model name, the current is 2.5. I said it's a claim not verified.

It offered to create a memory. I said it should have a better procedure, to avoid poisoning the process with unverified claims, since memories will most likely be ignored by it.

It agreed. It said it doesn't have another procedure, and it then discovered three more poisonous items in the critical review.

I said that this is a fabrication defect, it should not have been in production at all as a model.

It agreed, it said it can help but I would need to verify its work. I said it's footing me with the bill and the audit.

We amicably parted ways.

I would have accepted a caveman-style vocabulary but not a lobotomized model.

I'm looking forward to LobotoClaw. Not really.

Summarizer