Prasanth Janardhanan

Why Semantic Search Alone Fails on Legal Text (And How Hybrid Search Fixed It)

I was building a RAG pipeline to classify AI systems under the EU AI Act — feed in a plain-English description of your AI system, get back the risk tier, the relevant articles, and a compliance checklist with citations. Classification was working perfectly. All 8 test scenarios nailed the correct risk tier.

But the confidence scores were stuck at 36%.

The problem wasn’t the LLM. It wasn’t the prompts. It was the retrieval. And the fix taught me something worth sharing about when semantic search falls short — and when adding keyword search actually makes things worse.

Continue Reading →