What is LlamaIndex?

LlamaIndex (formerly GPT Index) is a data framework designed to connect Large Language Models with your private or custom data. It provides tools to ingest, structure, and query data from various sources, enabling you to build powerful RAG (Retrieval-Augmented Generation) applications.

While LLMs are trained on public data, LlamaIndex helps you give them access to your documents, databases, APIs, and knowledge bases - making them useful for your specific use case.

Why Use LlamaIndex?

LlamaIndex solves the "data connection" problem:

  • LLMs don't know your data: They can't access your company docs, databases, or files
  • Context window limits: You can't paste entire knowledge bases into a prompt
  • Data is messy: PDFs, Word docs, APIs, databases all need different handling
  • Quality retrieval is hard: Finding the right information requires more than keyword search

LlamaIndex's Mission

Make it easy to build applications that leverage both the power of LLMs and your unique data.

Core Concepts

1. Documents & Nodes

Documents are your raw data (PDFs, text files, etc.). Nodes are chunks of documents that get indexed and searched:

from llama_index.core import Document

# A document is a container for text
doc = Document(text="LlamaIndex is a data framework for LLMs...")

# Documents get split into nodes for indexing
# Each node is a searchable chunk of the original document

2. Indexes

Indexes organize your data for efficient retrieval:

  • VectorStoreIndex: Most common - uses embeddings for semantic search
  • SummaryIndex: Stores summaries for each document
  • TreeIndex: Hierarchical structure for complex documents
  • KeywordTableIndex: Keyword-based extraction

3. Query Engines

Query engines let you ask questions about your data:

query_engine = index.as_query_engine()
response = query_engine.query("What is the company's refund policy?")

4. Chat Engines

For conversational interactions with memory:

chat_engine = index.as_chat_engine()
response = chat_engine.chat("Tell me about the product")
response = chat_engine.chat("What about pricing?")  # Remembers context

Getting Started

Installation

pip install llama-index

Basic RAG in 5 Lines

from llama_index.core import VectorStoreIndex, SimpleDirectoryReader

# 1. Load documents from a directory
documents = SimpleDirectoryReader("./data").load_data()

# 2. Create an index (automatically chunks and embeds)
index = VectorStoreIndex.from_documents(documents)

# 3. Query your data
query_engine = index.as_query_engine()
response = query_engine.query("What is the main topic of these documents?")

print(response)

Data Loaders

LlamaIndex supports many data sources through "readers":

Local Files

from llama_index.core import SimpleDirectoryReader

# Load all files from a directory
documents = SimpleDirectoryReader(
    input_dir="./documents",
    recursive=True  # Include subdirectories
).load_data()

# Supports: PDF, DOCX, TXT, MD, CSV, and more

Web Pages

from llama_index.readers.web import SimpleWebPageReader

documents = SimpleWebPageReader().load_data([
    "https://example.com/page1",
    "https://example.com/page2"
])

Databases

from llama_index.readers.database import DatabaseReader

reader = DatabaseReader(uri="postgresql://user:pass@localhost/db")
documents = reader.load_data(query="SELECT * FROM articles")

APIs & More

LlamaIndex Hub has 100+ data loaders for Notion, Slack, Google Docs, GitHub, and more.

Customizing Your Pipeline

Custom Chunking

from llama_index.core.node_parser import SentenceSplitter

# Control how documents are split
splitter = SentenceSplitter(
    chunk_size=512,      # Tokens per chunk
    chunk_overlap=50     # Overlap between chunks
)

nodes = splitter.get_nodes_from_documents(documents)

Custom Embeddings

from llama_index.embeddings.openai import OpenAIEmbedding
from llama_index.embeddings.huggingface import HuggingFaceEmbedding

# OpenAI embeddings
embed_model = OpenAIEmbedding(model="text-embedding-3-small")

# Or free local embeddings
embed_model = HuggingFaceEmbedding(model_name="BAAI/bge-small-en-v1.5")

# Use in index
index = VectorStoreIndex.from_documents(
    documents,
    embed_model=embed_model
)

Custom LLM

from llama_index.llms.openai import OpenAI
from llama_index.llms.anthropic import Anthropic

# Use GPT-4
llm = OpenAI(model="gpt-4", temperature=0)

# Or Claude
llm = Anthropic(model="claude-3-sonnet-20240229")

# Use in query engine
query_engine = index.as_query_engine(llm=llm)

Common Use Cases

Document Q&A

Ask questions about PDFs, contracts, manuals, and reports.

Knowledge Bases

Build searchable knowledge bases from company documentation.

Customer Support

Chatbots that answer based on help docs and FAQs.

Research Assistants

Query and synthesize information from research papers.

Code Documentation

Query codebases and technical documentation.

Legal & Compliance

Search contracts, regulations, and legal documents.

Advanced Features

Agents

LlamaIndex supports building agents that can reason over your data:

from llama_index.core.agent import ReActAgent
from llama_index.core.tools import QueryEngineTool

# Create tools from query engines
tool = QueryEngineTool.from_defaults(
    query_engine=query_engine,
    name="company_docs",
    description="Search company documentation"
)

# Create agent
agent = ReActAgent.from_tools([tool], llm=llm, verbose=True)

response = agent.chat("Find the vacation policy and summarize it")

Multi-Document Queries

Query across multiple document collections:

from llama_index.core import SubQuestionQueryEngine

# Combine multiple query engines
query_engine = SubQuestionQueryEngine.from_defaults(
    query_engine_tools=[
        QueryEngineTool.from_defaults(hr_query_engine, name="hr_docs"),
        QueryEngineTool.from_defaults(finance_query_engine, name="finance_docs"),
    ]
)

# Questions are broken into sub-questions for each source
response = query_engine.query(
    "Compare the vacation policy with the expense policy"
)

LlamaIndex vs LangChain

Aspect LlamaIndex LangChain
Focus Data connection & RAG General LLM orchestration
Best for Document Q&A, knowledge bases Agents, chains, diverse tasks
Data ingestion Excellent (100+ loaders) Good
Indexing Very sophisticated Basic
Learning curve Moderate Moderate

Tip: Many projects use both! LlamaIndex for data handling, LangChain for agent orchestration.

Best Practices

  • Chunk size matters: Experiment with 256-1024 tokens; smaller for precise retrieval, larger for context
  • Use overlap: 10-20% overlap prevents cutting important information
  • Persist your index: Save to disk or a vector database for production
  • Add metadata: Include source, date, category for better filtering
  • Evaluate retrieval: Test that the right chunks are being retrieved
  • Use appropriate models: Fast embeddings for indexing, powerful LLMs for answering

Master LlamaIndex with Expert Mentorship

Our Agentic AI program covers LlamaIndex, RAG systems, and building production-ready knowledge applications. Learn with hands-on projects and personalized guidance.

Explore Agentic AI Program

Related Articles