2026-04-20·4 min read· #RAG#LLMs

Why RAG Isn't a Silver Bullet

RAG, retrieval-augmented generation, became the answer to "how do I stop my LLM from hallucinating?" in late 2023. Two years later, I've watched enough RAG systems flop in production to think the framing was wrong. Retrieval is a feature, not a fix. Three failure modes survive it.

Failure 1: Right document, wrong chunk

You retrieve the correct PDF. Inside it, the answer is on page 47. Your chunker split that paragraph in half. The LLM gets the first half, confidently fills in the second, and the wrong answer reads as well-cited because there really is a citation in the snippet.

The fix is structural. Chunk on semantic boundaries (sections, lists, tables) rather than fixed token windows. Test your retrieval on adversarial pages where the answer straddles a chunk break.

Failure 2: Stale, perfectly indexed

Your retriever returns the policy document from 2022 that has since been superseded. Both documents exist in the index. The newer one isn't ranked higher because it's shorter and uses less of the user's vocabulary.

RAG systems treat the corpus as static. The world isn't. Build in a freshness signal: index date as metadata, decay old results, surface the version conflict to the user when two documents disagree.

Failure 3: The user's actual question is unanswerable

"What will Q3 look like?" cannot be answered by retrieval. There is no document with the answer. Yet the LLM, given a few related quarterly reports, will gladly synthesize a plausible projection.

This is the most dangerous failure because the system feels like it's grounded. The fix is to detect un-retrievable questions before generation, and decline them explicitly. A two-line check ("is the user asking about the future?") prevents most of these.

What RAG is actually good for

Retrieval shines when:

The answer exists, verbatim or near-verbatim, in a known document.
The user values being able to click through to the source.
The corpus is internal and the model wasn't trained on it.

It struggles when:

The answer requires synthesis across many documents.
The corpus has version drift you don't track.
The question is fundamentally about novel reasoning, not lookup.

The grown-up version

In production, "RAG" usually evolves into something more honest: a retrieval layer, a query-classifier layer that decides whether retrieval is even appropriate, a synthesizer that knows which retrieved chunks it actually used, and a guardrail that strips claims with no source. Each piece is more boring than "we use RAG." All together, they're the thing that actually works.

← All posts