RAG: Retrieval Augmented Generation

Overview

Retrieval Augmented Generation (RAG) is a technique that enhances LLM outputs by retrieving relevant information from external knowledge bases before generating responses [18].

Traditional RAG Pipeline

Components

Component	Function
Document Loader	Ingests documents from various sources
Text Splitter	Chunks documents into manageable pieces
Embedding Model	Converts text to vector representations
Vector Store	Stores and indexes embeddings for retrieval
Retriever	Finds relevant documents based on query
Generator	LLM that produces final response

Pipeline Flow

Indexing: Documents are loaded, chunked, embedded, and stored
Retrieval: User query is embedded and similar documents are retrieved
Generation: Retrieved context is combined with query for LLM response

Agentic RAG

Agentic RAG extends traditional RAG by giving the agent control over the retrieval process, enabling more sophisticated information gathering strategies.

Key Differences

Aspect	Traditional RAG	Agentic RAG
Retrieval Control	Fixed pipeline	Agent decides when/what to retrieve
Query Formulation	Direct user query	Agent reformulates queries
Iteration	Single retrieval	Multiple retrieval rounds
Source Selection	Predefined sources	Dynamic source selection

Agentic RAG Capabilities

Query Decomposition: Break complex queries into sub-queries
Source Routing: Select appropriate knowledge bases
Iterative Refinement: Retrieve additional information as needed
Result Validation: Verify retrieved information quality
Multi-hop Reasoning: Chain multiple retrievals for complex questions

Vector Stores

Vector stores are specialized databases optimized for storing and querying vector embeddings.

Popular Options

Vector Store	Type	Key Features
Pinecone	Managed	Fully managed, scalable, fast
Weaviate	Open-source	GraphQL API, hybrid search
Chroma	Open-source	Lightweight, easy to use
Milvus	Open-source	Highly scalable, GPU support
Qdrant	Open-source	Rust-based, filtering support
pgvector	PostgreSQL extension	SQL integration, familiar tooling

Chunking Strategies

How documents are split significantly impacts retrieval quality.

Strategy	Description	Best For
Fixed Size	Split by character/token count	Simple documents
Recursive	Split by separators hierarchically	Structured text
Semantic	Split by meaning/topic	Complex documents
Document-Aware	Respect document structure	PDFs, HTML, Markdown

Retrieval Strategies

Basic Retrieval

Similarity Search: Find most similar vectors
MMR (Maximal Marginal Relevance): Balance relevance and diversity
Threshold-based: Only return results above similarity threshold

Advanced Retrieval

Hybrid Search: Combine vector and keyword search
Re-ranking: Use cross-encoder to re-rank initial results
Query Expansion: Generate multiple query variants
Hypothetical Document Embeddings (HyDE): Generate hypothetical answer, then search

Implementation Example

# Basic RAG with LangChain
from langchain.document_loaders import TextLoader
from langchain.text_splitter import RecursiveCharacterTextSplitter
from langchain.embeddings import OpenAIEmbeddings
from langchain.vectorstores import Chroma
from langchain.chains import RetrievalQA
from langchain.llms import OpenAI

# Load and split documents
loader = TextLoader("knowledge_base.txt")
documents = loader.load()
text_splitter = RecursiveCharacterTextSplitter(
    chunk_size=1000,
    chunk_overlap=200
)
chunks = text_splitter.split_documents(documents)

# Create vector store
embeddings = OpenAIEmbeddings()
vectorstore = Chroma.from_documents(chunks, embeddings)

# Create RAG chain
qa_chain = RetrievalQA.from_chain_type(
    llm=OpenAI(),
    chain_type="stuff",
    retriever=vectorstore.as_retriever(
        search_kwargs={"k": 4}
    )
)

# Query
response = qa_chain.run("What are the key features?")
print(response)

Best Practices

Chunk Size Optimization: Test different sizes for your use case
Overlap: Include overlap to preserve context at boundaries
Metadata: Store rich metadata for filtering
Evaluation: Measure retrieval quality with metrics like recall@k
Hybrid Approaches: Combine multiple retrieval strategies