Imagine you're studying for an exam and you ask ChatGPT: "What's the capital of France in 2024?"
ChatGPT confidently responds: "Paris."
Great! Now ask: "According to my textbook on page 45, what does it say about closures in JavaScript?"
ChatGPT responds: "I don't have access to your specific textbook, but generally, closures are..."
For a student studying from textbooks, this is a dealbreaker.
RAG combines the best of two worlds:
Think of it like this:
User Question: "What is a closure in JavaScript?"
↓
1. Convert question to vector embedding
↓
2. Search textbook for similar content (vector similarity)
↓
3. Retrieve top 5 most relevant chunks
↓
4. Send chunks + question to LLM
↓
5. LLM generates answer with citations
↓
Answer: "A closure is a function that has access to variables
in its outer scope... (See page 45, paragraph 3)"
By the end of this series, you'll have built a production-ready SaaS application that:
Let me explain our technology choices:
Why?
Why?
Why this combination?
1. OpenAI text-embedding-ada-002 (Embeddings)
2. Anthropic Claude 3.5 Sonnet (Answer Generation)
Why?
Here's the complete architecture of what we'll build:
┌─────────────────────────────────────────────────────────────┐
│ USER │
│ (Web Browser) │
└─────────────────────────────────────────────────────────────┘
│
▼
┌─────────────────────────────────────────────────────────────┐
│ NEXT.JS APPLICATION │
│ ┌──────────┐ ┌──────────┐ ┌──────────┐ ┌──────────┐ │
│ │ Landing │ │Dashboard │ │ Pricing │ │ Chat │ │
│ │ Page │ │ Page │ │ Page │ │ Page │ │
│ └──────────┘ └──────────┘ └──────────┘ └──────────┘ │
└─────────────────────────────────────────────────────────────┘
│
┌───────────┼───────────┐
▼ ▼ ▼
┌──────────┐ ┌──────────┐ ┌──────────┐
│ OpenAI │ │Anthropic │ │ Stripe │
│Embeddings│ │ Claude │ │ Payments │
└──────────┘ └──────────┘ └──────────┘
│
▼
┌─────────────────────────────────────────────────────────────┐
│ SUPABASE (PostgreSQL + Auth) │
│ │
│ ┌──────────────────────────────────────────────────────┐ │
│ │ TABLES: │ │
│ │ • profiles (users, subscription tiers) │ │
│ │ • documents (textbook metadata) │ │
│ │ • document_chunks (text + embeddings) │ │
│ │ • document_images (image descriptions + embeddings)│ │
│ │ • usage_logs (rate limiting & analytics) │ │
│ └──────────────────────────────────────────────────────┘ │
│ │
│ ┌──────────────────────────────────────────────────────┐ │
│ │ pgvector EXTENSION: │ │
│ │ • Vector similarity search │ │
│ │ • IVFFlat indexing (100x faster) │ │
│ │ • Cosine distance operator (<=>) │ │
│ └──────────────────────────────────────────────────────┘ │
└─────────────────────────────────────────────────────────────┘
Let's break down exactly what happens when a user asks a question:
// Check if user is authenticated
const user = await getUser()
// Check subscription tier and daily usage
const { queries_today, limit_reached } = await checkUsageLimit(user.id)
if (limit_reached) {
return "Daily limit reached. Upgrade to Pro!"
}
Why this matters: Free users get 50 questions/day, Pro gets 500, Unlimited gets... unlimited. This is how we monetize.
// Convert user's question into a 1536-dimensional vector
const embedding = await openai.embeddings.create({
model: 'text-embedding-ada-002',
input: "What is a closure in JavaScript?"
})
// Result: [0.234, -0.567, 0.891, ..., 0.123] (1536 numbers)
-- Find chunks where cosine similarity > 0.5
SELECT
content,
page_number,
1 - (embedding <=> query_embedding) AS similarity
FROM document_chunks
WHERE 1 - (embedding <=> query_embedding) > 0.5
ORDER BY embedding <=> query_embedding
LIMIT 5;
Result: 100-1000x faster than brute-force search.
Let's talk money. Here's what it costs to run this system:
| Component | Cost per Textbook | Details |
|---|---|---|
| Text Embeddings | $0.10 | 1000 chunks × $0.0001 |
| Image Descriptions | $0.75 | 50 pages × $0.015 |
| Total | $0.85 | Per 300-page textbook |
| Component | Cost per Query | Details |
|---|---|---|
| Query Embedding | $0.0001 | 1 embedding |
| Claude Response | $0.015 | ~5K tokens @ $3/1M |
| Total | $0.015 | ~1.5¢ per question |
Many RAG tutorials show you how to build a proof-of-concept. This series builds a real SaaS product: