What is LlamaIndex?
LlamaIndex (formerly GPT Index) is a data framework designed to connect Large Language Models with your private or custom data. It provides tools to ingest, structure, and query data from various sources, enabling you to build powerful RAG (Retrieval-Augmented Generation) applications.
While LLMs are trained on public data, LlamaIndex helps you give them access to your documents, databases, APIs, and knowledge bases - making them useful for your specific use case.
Why Use LlamaIndex?
LlamaIndex solves the "data connection" problem:
- LLMs don't know your data: They can't access your company docs, databases, or files
- Context window limits: You can't paste entire knowledge bases into a prompt
- Data is messy: PDFs, Word docs, APIs, databases all need different handling
- Quality retrieval is hard: Finding the right information requires more than keyword search
LlamaIndex's Mission
Make it easy to build applications that leverage both the power of LLMs and your unique data.
Core Concepts
1. Documents & Nodes
Documents are your raw data (PDFs, text files, etc.). Nodes are chunks of documents that get indexed and searched:
from llama_index.core import Document
# A document is a container for text
doc = Document(text="LlamaIndex is a data framework for LLMs...")
# Documents get split into nodes for indexing
# Each node is a searchable chunk of the original document
2. Indexes
Indexes organize your data for efficient retrieval:
- VectorStoreIndex: Most common - uses embeddings for semantic search
- SummaryIndex: Stores summaries for each document
- TreeIndex: Hierarchical structure for complex documents
- KeywordTableIndex: Keyword-based extraction
3. Query Engines
Query engines let you ask questions about your data:
query_engine = index.as_query_engine()
response = query_engine.query("What is the company's refund policy?")
4. Chat Engines
For conversational interactions with memory:
chat_engine = index.as_chat_engine()
response = chat_engine.chat("Tell me about the product")
response = chat_engine.chat("What about pricing?") # Remembers context
Getting Started
Installation
pip install llama-index
Basic RAG in 5 Lines
from llama_index.core import VectorStoreIndex, SimpleDirectoryReader
# 1. Load documents from a directory
documents = SimpleDirectoryReader("./data").load_data()
# 2. Create an index (automatically chunks and embeds)
index = VectorStoreIndex.from_documents(documents)
# 3. Query your data
query_engine = index.as_query_engine()
response = query_engine.query("What is the main topic of these documents?")
print(response)
Data Loaders
LlamaIndex supports many data sources through "readers":
Local Files
from llama_index.core import SimpleDirectoryReader
# Load all files from a directory
documents = SimpleDirectoryReader(
input_dir="./documents",
recursive=True # Include subdirectories
).load_data()
# Supports: PDF, DOCX, TXT, MD, CSV, and more
Web Pages
from llama_index.readers.web import SimpleWebPageReader
documents = SimpleWebPageReader().load_data([
"https://example.com/page1",
"https://example.com/page2"
])
Databases
from llama_index.readers.database import DatabaseReader
reader = DatabaseReader(uri="postgresql://user:pass@localhost/db")
documents = reader.load_data(query="SELECT * FROM articles")
APIs & More
LlamaIndex Hub has 100+ data loaders for Notion, Slack, Google Docs, GitHub, and more.
Customizing Your Pipeline
Custom Chunking
from llama_index.core.node_parser import SentenceSplitter
# Control how documents are split
splitter = SentenceSplitter(
chunk_size=512, # Tokens per chunk
chunk_overlap=50 # Overlap between chunks
)
nodes = splitter.get_nodes_from_documents(documents)
Custom Embeddings
from llama_index.embeddings.openai import OpenAIEmbedding
from llama_index.embeddings.huggingface import HuggingFaceEmbedding
# OpenAI embeddings
embed_model = OpenAIEmbedding(model="text-embedding-3-small")
# Or free local embeddings
embed_model = HuggingFaceEmbedding(model_name="BAAI/bge-small-en-v1.5")
# Use in index
index = VectorStoreIndex.from_documents(
documents,
embed_model=embed_model
)
Custom LLM
from llama_index.llms.openai import OpenAI
from llama_index.llms.anthropic import Anthropic
# Use GPT-4
llm = OpenAI(model="gpt-4", temperature=0)
# Or Claude
llm = Anthropic(model="claude-3-sonnet-20240229")
# Use in query engine
query_engine = index.as_query_engine(llm=llm)
Common Use Cases
Document Q&A
Ask questions about PDFs, contracts, manuals, and reports.
Knowledge Bases
Build searchable knowledge bases from company documentation.
Customer Support
Chatbots that answer based on help docs and FAQs.
Research Assistants
Query and synthesize information from research papers.
Code Documentation
Query codebases and technical documentation.
Legal & Compliance
Search contracts, regulations, and legal documents.
Advanced Features
Agents
LlamaIndex supports building agents that can reason over your data:
from llama_index.core.agent import ReActAgent
from llama_index.core.tools import QueryEngineTool
# Create tools from query engines
tool = QueryEngineTool.from_defaults(
query_engine=query_engine,
name="company_docs",
description="Search company documentation"
)
# Create agent
agent = ReActAgent.from_tools([tool], llm=llm, verbose=True)
response = agent.chat("Find the vacation policy and summarize it")
Multi-Document Queries
Query across multiple document collections:
from llama_index.core import SubQuestionQueryEngine
# Combine multiple query engines
query_engine = SubQuestionQueryEngine.from_defaults(
query_engine_tools=[
QueryEngineTool.from_defaults(hr_query_engine, name="hr_docs"),
QueryEngineTool.from_defaults(finance_query_engine, name="finance_docs"),
]
)
# Questions are broken into sub-questions for each source
response = query_engine.query(
"Compare the vacation policy with the expense policy"
)
LlamaIndex vs LangChain
| Aspect | LlamaIndex | LangChain |
|---|---|---|
| Focus | Data connection & RAG | General LLM orchestration |
| Best for | Document Q&A, knowledge bases | Agents, chains, diverse tasks |
| Data ingestion | Excellent (100+ loaders) | Good |
| Indexing | Very sophisticated | Basic |
| Learning curve | Moderate | Moderate |
Tip: Many projects use both! LlamaIndex for data handling, LangChain for agent orchestration.
Best Practices
- Chunk size matters: Experiment with 256-1024 tokens; smaller for precise retrieval, larger for context
- Use overlap: 10-20% overlap prevents cutting important information
- Persist your index: Save to disk or a vector database for production
- Add metadata: Include source, date, category for better filtering
- Evaluate retrieval: Test that the right chunks are being retrieved
- Use appropriate models: Fast embeddings for indexing, powerful LLMs for answering
Master LlamaIndex with Expert Mentorship
Our Agentic AI program covers LlamaIndex, RAG systems, and building production-ready knowledge applications. Learn with hands-on projects and personalized guidance.
Explore Agentic AI Program