What is RAG?

7 min read

┌──────────────────────────────────────────────────────────┐
│  ═══════════════════════════════════════════════════     │
│  ░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░     │
│  ░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░     │
│  ────────────────────────────────────────────────────    │
│  ██████████████████████████░░░░░░░░░░░░░░░░░░░░░░░░░     │
│  █████████████████████████████████░░░░░░░░░░░░░░░░░░     │
│  ██████████████████████████████████████░░░░░░░░░░░░░     │
│  ████████████████░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░░     │
│  ────────────────────────────────────────────────────    │
│  ███████████████████████████████████████░░░░░░░░░░░░     │
└──────────────────────────────────────────────────────────┘

RAG stands for "Retrieval-Augmented Generation." It's a technique that combines AI language models with external knowledge sources to provide more accurate, up-to-date information.

What Is RAG?

────────────────────────────────────────

RAG works by:

▸[Retrieval]: Finding relevant information from a knowledge base (documents, databases, etc.)
▸[Augmentation]: Adding that information to the AI's prompt
▸[Generation]: The AI generates a response using both its training and the retrieved information

Instead of relying only on what the AI was trained on, RAG lets the AI access current, specific information.

Why Use RAG?

────────────────────────────────────────

[Up-to-date information]: AI models have training data cutoffs. RAG lets you provide current information.

[Specific knowledge]: Add domain-specific information the model wasn't trained on.

[Accuracy]: Reduce hallucinations by grounding responses in actual documents.

[Transparency]: You can see what sources the AI used for its answer.

How RAG Works

────────────────────────────────────────

▸[Store knowledge]: Save documents, articles, or data in a searchable format
▸[User asks question]: User provides a query
▸[Retrieve relevant info]: Search the knowledge base for relevant information
▸[Augment prompt]: Add retrieved information to the user's question
▸[Generate response]: AI creates answer using both its knowledge and retrieved info

Example

────────────────────────────────────────

[User question]: "What's our return policy?"

[RAG process]:

▸Search company documents for "return policy"
▸Find relevant policy document
▸Add policy text to prompt: "Based on this policy: [policy text], what's our return policy?"
▸AI generates answer using the actual policy document

Components

────────────────────────────────────────

[Embedding model]: Converts text into numerical vectors (embeddings) that capture semantic meaning. Popular choices include OpenAI's text-embedding-3, Cohere's Embed, Google's embedding models, and open-source options like BGE and E5. The quality of your embeddings directly impacts retrieval quality.

[Vector database]: Stores and indexes document embeddings for fast similarity search. Options range from dedicated vector databases like Pinecone, Weaviate, and Qdrant to extensions like pgvector for PostgreSQL and Chroma for local development.

[Retrieval system]: Finds the most relevant documents for a given query using semantic similarity search, keyword matching, or hybrid approaches that combine both.

[Language model]: Generates the final response using the retrieved context plus the original question. Any capable model works here—GPT-4, Claude, Gemini, Mistral, or open-source alternatives like Llama.

[Chunking strategy]: How you split documents into smaller pieces matters enormously. Too large and you waste context; too small and you lose meaning. Common approaches include fixed-size chunks with overlap, sentence-based splitting, and semantic chunking.

Use Cases

────────────────────────────────────────

[Customer support]: Answer questions using company documentation [Knowledge bases]: Create Q&A systems from internal documents [Research assistants]: Help researchers find and synthesize information [Legal applications]: Answer questions using case law or regulations [Medical applications]: Provide information from medical literature

RAG vs Fine-Tuning

────────────────────────────────────────

[RAG]: Add knowledge through context. Flexible, can update knowledge easily.

[Fine-tuning]: Train model on knowledge. More permanent, requires retraining to update.

[Often used together]: RAG for current/specific info, fine-tuning for behavior/style.

Best Practices

────────────────────────────────────────

[Quality documents]: Better source documents produce better RAG systems [Good retrieval]: Invest in good search/retrieval to find relevant information [Clear prompts]: Structure prompts to effectively use retrieved information [Source attribution]: Show users where information came from [Update knowledge]: Keep knowledge bases current and accurate

Limitations

────────────────────────────────────────

▸[Retrieval quality]: System is only as good as what it retrieves
▸[Context limits]: Can only include so much retrieved information
▸[Complexity]: More complex to set up than simple prompting
▸[Cost]: Requires additional infrastructure (vector databases, etc.)

RAG is a powerful technique for building AI systems that need access to specific, current information.

What is RAG?

What Is RAG?

Why Use RAG?

How RAG Works

Example

Components

Use Cases

RAG vs Fine-Tuning

Best Practices

Limitations

What is fine-tuning?

What is a prompt?