Building a Production RAG System: Part 1 - Understanding RAG and System Architecture

Series: Building a Production-Ready Textbook Q&A System with RAG
Part: 1 of 7 Read Time: 15 minutes Level: Beginner to Intermediate

The Problem: When AI Gets It Wrong

Imagine you're studying for an exam and you ask ChatGPT: "What's the capital of France in 2024?"

ChatGPT confidently responds: "Paris."

Great! Now ask: "According to my textbook on page 45, what does it say about closures in JavaScript?"

ChatGPT responds: "I don't have access to your specific textbook, but generally, closures are..."

This is the fundamental limitation of Large Language Models (LLMs):

For a student studying from textbooks, this is a dealbreaker.

The Solution: RAG (Retrieval-Augmented Generation)

RAG combines the best of two worlds:

  1. Retrieval: Search your documents for relevant information
  2. Generation: Use an LLM to generate answers based only on what was found

Think of it like this:

How RAG Works (Simplified)

User Question: "What is a closure in JavaScript?"
       ↓
1. Convert question to vector embedding
       ↓
2. Search textbook for similar content (vector similarity)
       ↓
3. Retrieve top 5 most relevant chunks
       ↓
4. Send chunks + question to LLM
       ↓
5. LLM generates answer with citations
       ↓
Answer: "A closure is a function that has access to variables
         in its outer scope... (See page 45, paragraph 3)"
            

What We're Building

By the end of this series, you'll have built a production-ready SaaS application that:

✅ Core Features

✅ Production Features

✅ Technical Highlights

The Tech Stack (And Why)

Let me explain our technology choices:

Frontend: Next.js 14 + React 19 + TailwindCSS

Why?

Database: Supabase (PostgreSQL + pgvector)

Why?

AI/ML Stack

Why this combination?

1. OpenAI text-embedding-ada-002 (Embeddings)

2. Anthropic Claude 3.5 Sonnet (Answer Generation)

Payments: Stripe

Why?

System Architecture Overview

Here's the complete architecture of what we'll build:

┌─────────────────────────────────────────────────────────────┐
│                          USER                               │
│                    (Web Browser)                            │
└─────────────────────────────────────────────────────────────┘
                            │
                            ▼
┌─────────────────────────────────────────────────────────────┐
│                   NEXT.JS APPLICATION                        │
│  ┌──────────┐  ┌──────────┐  ┌──────────┐  ┌──────────┐   │
│  │ Landing  │  │Dashboard │  │ Pricing  │  │   Chat   │   │
│  │   Page   │  │   Page   │  │   Page   │  │   Page   │   │
│  └──────────┘  └──────────┘  └──────────┘  └──────────┘   │
└─────────────────────────────────────────────────────────────┘
                            │
                ┌───────────┼───────────┐
                ▼           ▼           ▼
        ┌──────────┐  ┌──────────┐  ┌──────────┐
        │  OpenAI  │  │Anthropic │  │  Stripe  │
        │Embeddings│  │  Claude  │  │ Payments │
        └──────────┘  └──────────┘  └──────────┘
                            │
                            ▼
┌─────────────────────────────────────────────────────────────┐
│              SUPABASE (PostgreSQL + Auth)                   │
│                                                              │
│  ┌──────────────────────────────────────────────────────┐  │
│  │ TABLES:                                              │  │
│  │  • profiles (users, subscription tiers)             │  │
│  │  • documents (textbook metadata)                    │  │
│  │  • document_chunks (text + embeddings)              │  │
│  │  • document_images (image descriptions + embeddings)│  │
│  │  • usage_logs (rate limiting & analytics)           │  │
│  └──────────────────────────────────────────────────────┘  │
│                                                              │
│  ┌──────────────────────────────────────────────────────┐  │
│  │ pgvector EXTENSION:                                  │  │
│  │  • Vector similarity search                          │  │
│  │  • IVFFlat indexing (100x faster)                   │  │
│  │  • Cosine distance operator (<=>)                   │  │
│  └──────────────────────────────────────────────────────┘  │
└─────────────────────────────────────────────────────────────┘
            

The RAG Pipeline (Detailed)

Let's break down exactly what happens when a user asks a question:

Step 1: User Authentication & Rate Limiting

// Check if user is authenticated
const user = await getUser()

// Check subscription tier and daily usage
const { queries_today, limit_reached } = await checkUsageLimit(user.id)

if (limit_reached) {
  return "Daily limit reached. Upgrade to Pro!"
}

Why this matters: Free users get 50 questions/day, Pro gets 500, Unlimited gets... unlimited. This is how we monetize.

Step 2: Generate Query Embedding

// Convert user's question into a 1536-dimensional vector
const embedding = await openai.embeddings.create({
  model: 'text-embedding-ada-002',
  input: "What is a closure in JavaScript?"
})

// Result: [0.234, -0.567, 0.891, ..., 0.123] (1536 numbers)
What's an embedding? It's a mathematical representation of meaning. Similar concepts have similar vectors. For example:

Step 3: Vector Similarity Search

-- Find chunks where cosine similarity > 0.5
SELECT
  content,
  page_number,
  1 - (embedding <=> query_embedding) AS similarity
FROM document_chunks
WHERE 1 - (embedding <=> query_embedding) > 0.5
ORDER BY embedding <=> query_embedding
LIMIT 5;
The magic: pgvector uses IVFFlat indexing to search millions of vectors in milliseconds:
  1. Divide all vectors into ~100 clusters
  2. Find which clusters are closest to the query
  3. Search only within those clusters (not the entire database!)

Result: 100-1000x faster than brute-force search.

Cost Analysis (Real Numbers)

Let's talk money. Here's what it costs to run this system:

Ingestion (One-Time Per Textbook)

Component Cost per Textbook Details
Text Embeddings $0.10 1000 chunks × $0.0001
Image Descriptions $0.75 50 pages × $0.015
Total $0.85 Per 300-page textbook

Query (Per Question)

Component Cost per Query Details
Query Embedding $0.0001 1 embedding
Claude Response $0.015 ~5K tokens @ $3/1M
Total $0.015 ~1.5¢ per question

What Makes This System Production-Ready?

Many RAG tutorials show you how to build a proof-of-concept. This series builds a real SaaS product:

Tags:
#RAG #AI #NextJS #PostgreSQL #Supabase #OpenAI #Claude #BuildInPublic