Summarizer

LLM Input

llm/9b2efe03-4d9e-4db2-a79a-13cee83b17d6/topic-7-d61743ce-ec74-4041-8f31-099877183e4c-input.json

Pretty-print

prompt

The following is content for you to summarize. Do not respond to the comments—summarize them.

<topic>
Incremental Indexing Performance # Discussion of hashing content for incremental re-embedding of changed chunks only, achieving 10-second updates versus 4-minute full reindexes
</topic>

<comments_about_topic>
1. The FTS5 index approach here is right, but I'd push further: pure BM25 underperforms on tool outputs because they're a mix of structured data (JSON, tables, config) and natural language (comments, error messages, docstrings). Keyword matching falls apart on the structured half.

I built a hybrid retriever for a similar problem, compressing a 15,800-file Obsidian vault into a searchable index for Claude Code. Stack is Model2Vec (potion-base-8M, 256-dimensional embeddings) + sqlite-vec for vector search + FTS5 for BM25, combined via Reciprocal Rank Fusion. The database is 49,746 chunks in 83MB. RRF is the important piece: it merges ranked lists from both retrieval methods without needing score calibration, so you get BM25's exact-match precision on identifiers and function names plus vector search's semantic matching on descriptions and error context.

The incremental indexing matters too. If you're indexing tool outputs per-session, the corpus grows fast. My indexer has a --incremental flag that hashes content and only re-embeds changed chunks. Full reindex of 15,800 files takes ~4 minutes; incremental on a typical day's changes is under 10 seconds.

On the caching question raised upthread: this approach actually helps prompt caching because the compressed output is deterministic for the same query. The raw tool output would be different every time (timestamps, ordering), but the retrieved summary is stable if the underlying data hasn't changed.

One thing I'd add to Context Mode's architecture: the same retriever could run as a PostToolUse hook, compressing outputs before they enter the conversation. That way it's transparent to the agent, it never sees the raw dump, just the relevant subset.

2. On here: https://cc-context-mode.mksg.lu/#/3/0/3

> Bun auto-detected for 3–5x faster JS/TS execution

This is quite a claim, and even so, doesn't matter since the bottleneck is the LLM and not the JS interpreter. It's a nit, but little things like this just make the project look bad overall. It feels like nobody took the time to read the copy before publishing it.

More importantly, the claimed 98% context savings are noise without benchmarks of harness performance with and without "context mode".

I'm glad someone is working on this, but I just feel like this is not a serious solution to the problem.
</comments_about_topic>

Write a concise, engaging paragraph (3-5 sentences) summarizing the key points and perspectives in these comments about the topic. Focus on the most interesting viewpoints. Do not use bullet points—write flowing prose.

topic

Incremental Indexing Performance # Discussion of hashing content for incremental re-embedding of changed chunks only, achieving 10-second updates versus 4-minute full reindexes

commentCount

← Back to job