1 Comment
User's avatar
JP's avatar

Evaluation is the right question. Anthropic recently published how they evaluated RAG against plain grep for Claude Code's codebase understanding. Grep won. The model understood context it found itself better than pre-retrieved chunks. Makes you wonder how many RAG evaluations are measuring the wrong things entirely. Covered it here: https://reading.sh/anthropic-revealed-how-they-build-claude-codes-brain-11e48e75fd01?sk=6662727c70ed637cd1692a81f33139e2