Incremental Indexing Performance

Discussion of hashing content for incremental re-embedding of changed chunks only, achieving 10-second updates versus 4-minute full reindexes

To optimize retrieval for mixed structured and natural language data, contributors advocate for a hybrid approach that combines keyword-based BM25 with vector search through Reciprocal Rank Fusion. This strategy is enhanced by incremental indexing via content hashing, which slashes update times from minutes to seconds and ensures prompt caching remains stable by generating deterministic outputs. However, critics remain skeptical of the project’s broader performance claims, arguing that optimizations like faster JS execution are irrelevant compared to LLM latency and demanding rigorous benchmarks to validate the reported context savings.

View on HN · Topics

The FTS5 index approach here is right, but I'd push further: pure BM25 underperforms on tool outputs because they're a mix of structured data (JSON, tables, config) and natural language (comments, error messages, docstrings). Keyword matching falls apart on the structured half.

I built a hybrid retriever for a similar problem, compressing a 15,800-file Obsidian vault into a searchable index for Claude Code. Stack is Model2Vec (potion-base-8M, 256-dimensional embeddings) + sqlite-vec for vector search + FTS5 for BM25, combined via Reciprocal Rank Fusion. The database is 49,746 chunks in 83MB. RRF is the important piece: it merges ranked lists from both retrieval methods without needing score calibration, so you get BM25's exact-match precision on identifiers and function names plus vector search's semantic matching on descriptions and error context.

The incremental indexing matters too. If you're indexing tool outputs per-session, the corpus grows fast. My indexer has a --incremental flag that hashes content and only re-embeds changed chunks. Full reindex of 15,800 files takes ~4 minutes; incremental on a typical day's changes is under 10 seconds.

On the caching question raised upthread: this approach actually helps prompt caching because the compressed output is deterministic for the same query. The raw tool output would be different every time (timestamps, ordering), but the retrieved summary is stable if the underlying data hasn't changed.

One thing I'd add to Context Mode's architecture: the same retriever could run as a PostToolUse hook, compressing outputs before they enter the conversation. That way it's transparent to the agent, it never sees the raw dump, just the relevant subset.

View on HN · Topics

On here: https://cc-context-mode.mksg.lu/#/3/0/3

> Bun auto-detected for 3–5x faster JS/TS execution

This is quite a claim, and even so, doesn't matter since the bottleneck is the LLM and not the JS interpreter. It's a nit, but little things like this just make the project look bad overall. It feels like nobody took the time to read the copy before publishing it.

More importantly, the claimed 98% context savings are noise without benchmarks of harness performance with and without "context mode".

I'm glad someone is working on this, but I just feel like this is not a serious solution to the problem.

Summarizer