Retrieval-augmented generation done properly.
An experimental RAG system exploring how chunking strategy, embedding model choice, and retrieval architecture affect answer quality on long-form technical documents.
Most RAG tutorials produce systems that work on toy datasets but degrade on real documents. Needed to understand the actual failure modes at production content volume.
Systematically tested chunking strategies (fixed-size, semantic, hierarchical), embedding models, and hybrid search approaches. Built an evaluation harness to measure retrieval quality independently from generation quality.
Documented findings on which approaches hold up at scale. Hierarchical chunking + hybrid BM25/dense retrieval consistently outperformed simpler approaches. Published internal benchmark results.