RAG: Retrieval Augmented Generation

Overview

Retrieval Augmented Generation (RAG) is a technique that enhances LLM outputs by retrieving relevant information from external knowledge bases before generating responses [18].

Traditional RAG Pipeline

Components

ComponentFunction
Document LoaderIngests documents from various sources
Text SplitterChunks documents into manageable pieces
Embedding ModelConverts text to vector representations
Vector StoreStores and indexes embeddings for retrieval
RetrieverFinds relevant documents based on query
GeneratorLLM that produces final response

Pipeline Flow

  1. Indexing: Documents are loaded, chunked, embedded, and stored
  2. Retrieval: User query is embedded and similar documents are retrieved
  3. Generation: Retrieved context is combined with query for LLM response

Agentic RAG

Agentic RAG extends traditional RAG by giving the agent control over the retrieval process, enabling more sophisticated information gathering strategies.

Key Differences

AspectTraditional RAGAgentic RAG
Retrieval ControlFixed pipelineAgent decides when/what to retrieve
Query FormulationDirect user queryAgent reformulates queries
IterationSingle retrievalMultiple retrieval rounds
Source SelectionPredefined sourcesDynamic source selection

Agentic RAG Capabilities

  • Query Decomposition: Break complex queries into sub-queries
  • Source Routing: Select appropriate knowledge bases
  • Iterative Refinement: Retrieve additional information as needed
  • Result Validation: Verify retrieved information quality
  • Multi-hop Reasoning: Chain multiple retrievals for complex questions

Vector Stores

Vector stores are specialized databases optimized for storing and querying vector embeddings.

Popular Options

Vector StoreTypeKey Features
PineconeManagedFully managed, scalable, fast
WeaviateOpen-sourceGraphQL API, hybrid search
ChromaOpen-sourceLightweight, easy to use
MilvusOpen-sourceHighly scalable, GPU support
QdrantOpen-sourceRust-based, filtering support
pgvectorPostgreSQL extensionSQL integration, familiar tooling

Chunking Strategies

How documents are split significantly impacts retrieval quality.

StrategyDescriptionBest For
Fixed SizeSplit by character/token countSimple documents
RecursiveSplit by separators hierarchicallyStructured text
SemanticSplit by meaning/topicComplex documents
Document-AwareRespect document structurePDFs, HTML, Markdown

Retrieval Strategies

Basic Retrieval

  • Similarity Search: Find most similar vectors
  • MMR (Maximal Marginal Relevance): Balance relevance and diversity
  • Threshold-based: Only return results above similarity threshold

Advanced Retrieval

  • Hybrid Search: Combine vector and keyword search
  • Re-ranking: Use cross-encoder to re-rank initial results
  • Query Expansion: Generate multiple query variants
  • Hypothetical Document Embeddings (HyDE): Generate hypothetical answer, then search

Implementation Example

# Basic RAG with LangChain
from langchain.document_loaders import TextLoader
from langchain.text_splitter import RecursiveCharacterTextSplitter
from langchain.embeddings import OpenAIEmbeddings
from langchain.vectorstores import Chroma
from langchain.chains import RetrievalQA
from langchain.llms import OpenAI

# Load and split documents
loader = TextLoader("knowledge_base.txt")
documents = loader.load()
text_splitter = RecursiveCharacterTextSplitter(
    chunk_size=1000,
    chunk_overlap=200
)
chunks = text_splitter.split_documents(documents)

# Create vector store
embeddings = OpenAIEmbeddings()
vectorstore = Chroma.from_documents(chunks, embeddings)

# Create RAG chain
qa_chain = RetrievalQA.from_chain_type(
    llm=OpenAI(),
    chain_type="stuff",
    retriever=vectorstore.as_retriever(
        search_kwargs={"k": 4}
    )
)

# Query
response = qa_chain.run("What are the key features?")
print(response)

Best Practices

  1. Chunk Size Optimization: Test different sizes for your use case
  2. Overlap: Include overlap to preserve context at boundaries
  3. Metadata: Store rich metadata for filtering
  4. Evaluation: Measure retrieval quality with metrics like recall@k
  5. Hybrid Approaches: Combine multiple retrieval strategies

References

  1. IBM - What is Agentic RAG?
  2. LangChain RAG Documentation