Back to Blog
AI

Building a RAG System: From Zero to Production in 2 Weeks

Apr 8, 2025 12 min readBy Onesnzeros Team

A step-by-step walkthrough of how we built a production-grade Retrieval-Augmented Generation system for a client.

Our client — a legal services firm in Pune — had 15 years of contract templates, case notes, and compliance documents locked inside PDFs and Word files. Associates spent 2–3 hours daily just searching for the right precedent. We built them a RAG (Retrieval-Augmented Generation) system in two weeks. Here's exactly how we did it.

What is RAG and Why Does It Matter?

Large Language Models like GPT-4 and Claude are trained on general knowledge up to a cutoff date. They don't know about your internal documents, your specific products, or your proprietary processes. RAG solves this by retrieving relevant chunks of your own documents at query time and feeding them to the LLM as context. The result: an AI that answers questions about your business accurately — not generically.

Our Architecture

  • Document ingestion pipeline: PDF, DOCX, and text files → chunked text
  • Embedding model: OpenAI text-embedding-3-small for vector representations
  • Vector store: Supabase pgvector for storage and similarity search
  • Retrieval layer: top-5 most relevant chunks per query
  • LLM: Claude Sonnet for generation with retrieved context
  • Frontend: Next.js with streaming responses for real-time output
  • Auth: Supabase Auth with row-level security on document access

Week 1: Building the Foundation

  1. 1Set up document ingestion pipeline — handle PDF, DOCX, and scanned documents (OCR via Tesseract)
  2. 2Implement chunking strategy — 512 tokens with 64-token overlap to preserve context across boundaries
  3. 3Build embedding pipeline — batch-process all documents through OpenAI's embedding API
  4. 4Set up Supabase with pgvector extension and create vector similarity search functions
  5. 5Build the retrieval API — given a query, return top-k most relevant document chunks
  6. 6Create a basic chat interface and connect it end-to-end with streaming

Week 2: Production Hardening

  1. 1Implement hybrid search — combine vector similarity with keyword (BM25) search for better recall
  2. 2Add a re-ranking layer to sort retrieved chunks by true relevance before sending to the LLM
  3. 3Build document source citations — every AI answer shows which document it came from
  4. 4Implement access control — staff only search documents they're authorised to see
  5. 5Add query caching for frequently asked questions to reduce API costs
  6. 6Set up monitoring — track query latency, retrieval quality, and LLM token usage
  7. 7Load test with 50 concurrent users and optimise database indexes

Key Challenges We Faced

  • Scanned PDFs with poor quality — solved by preprocessing with image enhancement before OCR
  • Chunking strategy — too small loses context, too large exceeds context limits and hurts relevance
  • Hallucination when retrieved context was insufficient — solved with confidence thresholds and graceful 'I don't know' responses
  • Latency — vector search + LLM inference adds up; streaming responses masked the wait time significantly
  • Embedding costs — batch processing during ingestion is cheap, but re-embedding updated documents needs careful management

The firm's associates now find the right document in under 30 seconds on average, compared to 2–3 hours before. That's roughly 500 hours of associate time saved per month across the team.

Should You Build or Buy?

Off-the-shelf RAG products (Notion AI, SharePoint Copilot) work well for generic use cases. But if you have specific access control requirements, proprietary document formats, or need to integrate with your existing systems, building custom gives you the control you need. The two-week timeline is achievable with the right team — the components are well-understood, the tooling has matured, and the hard problems have known solutions.

Ready to apply this to your business?

Book a free 30-minute call — no commitment, just a clear plan for how we can help.

Book Free Consultation