Discussion about this post

User's avatar
JP's avatar

Evaluation is the right question. Anthropic recently published how they evaluated RAG against plain grep for Claude Code's codebase understanding. Grep won. The model understood context it found itself better than pre-retrieved chunks. Makes you wonder how many RAG evaluations are measuring the wrong things entirely. Covered it here: https://reading.sh/anthropic-revealed-how-they-build-claude-codes-brain-11e48e75fd01?sk=6662727c70ed637cd1692a81f33139e2

No posts

Ready for more?