What are Embeddings?
Embeddings are numerical representations of text (or other data) as vectors - lists of numbers that capture the meaning and context of the content. Think of them as a way to convert human language into a format that computers can understand and compare mathematically.
For example, the sentence "I love programming" might become a vector like [0.23, -0.45, 0.78, 0.12, ...] with hundreds or thousands of dimensions. Similar sentences will have similar vectors, allowing AI to understand relationships between concepts.
Why Do Embeddings Exist?
Computers work with numbers, not words. Traditional approaches treated words as isolated symbols - "cat" and "kitten" had no mathematical relationship. Embeddings solve this by placing related concepts close together in a high-dimensional space.
The Key Insight
Words with similar meanings have similar embeddings. This enables semantic search - finding content by meaning rather than exact keyword matches.
Before embeddings, search engines could only find exact matches. With embeddings, searching for "how to fix a bug" can find documents about "debugging code" or "troubleshooting errors" - because they're semantically similar.
How Do Embeddings Work?
Embedding models are neural networks trained on massive amounts of text. They learn patterns like:
- Synonyms: "happy" and "joyful" are close together
- Relationships: king - man + woman ≈ queen
- Context: "bank" (financial) vs "bank" (river) have different embeddings based on context
- Topics: All programming-related terms cluster together
The Process
# 1. Input text
text = "Machine learning is fascinating"
# 2. Tokenize (break into pieces)
tokens = ["Machine", "learning", "is", "fascinating"]
# 3. Pass through embedding model
# The model processes all tokens and their relationships
# 4. Output: A single vector representing the meaning
embedding = [0.023, -0.156, 0.892, 0.045, ...] # 1536 dimensions for OpenAI
When to Use Embeddings
Semantic Search
Find documents by meaning, not just keywords. Essential for knowledge bases and documentation search.
RAG Systems
Retrieve relevant context for LLMs to generate accurate, grounded responses.
Recommendation Systems
Find similar products, articles, or content based on semantic similarity.
Clustering & Classification
Group similar documents or categorize content automatically.
Duplicate Detection
Find near-duplicate content even when wording differs significantly.
Anomaly Detection
Identify unusual patterns or outliers in text data.
Popular Embedding Models
| Model | Provider | Dimensions | Best For |
|---|---|---|---|
| text-embedding-3-small | OpenAI | 1536 | Cost-effective, general purpose |
| text-embedding-3-large | OpenAI | 3072 | Highest quality, complex tasks |
| embed-english-v3.0 | Cohere | 1024 | English text, good quality |
| all-MiniLM-L6-v2 | Sentence Transformers | 384 | Free, runs locally, fast |
| all-mpnet-base-v2 | Sentence Transformers | 768 | Free, high quality, local |
Getting Started with Embeddings
Using OpenAI Embeddings
from openai import OpenAI
client = OpenAI()
# Generate embedding for a single text
response = client.embeddings.create(
model="text-embedding-3-small",
input="Machine learning is transforming industries"
)
embedding = response.data[0].embedding
print(f"Dimensions: {len(embedding)}") # 1536
Using Sentence Transformers (Free, Local)
from sentence_transformers import SentenceTransformer
# Load model (downloads once, runs locally)
model = SentenceTransformer('all-MiniLM-L6-v2')
# Generate embeddings
sentences = [
"Machine learning is fascinating",
"I love artificial intelligence",
"The weather is nice today"
]
embeddings = model.encode(sentences)
print(f"Shape: {embeddings.shape}") # (3, 384)
Using LangChain (Unified Interface)
from langchain_openai import OpenAIEmbeddings
from langchain_community.embeddings import HuggingFaceEmbeddings
# OpenAI
openai_embeddings = OpenAIEmbeddings(model="text-embedding-3-small")
# Or use free local model
local_embeddings = HuggingFaceEmbeddings(
model_name="all-MiniLM-L6-v2"
)
# Same interface for both
vector = openai_embeddings.embed_query("Hello world")
Comparing Embeddings (Similarity)
To find similar content, we compare embeddings using distance metrics. The most common is cosine similarity - it measures the angle between vectors, ignoring their magnitude.
import numpy as np
from numpy.linalg import norm
def cosine_similarity(a, b):
return np.dot(a, b) / (norm(a) * norm(b))
# Example
embedding1 = model.encode("I love programming")
embedding2 = model.encode("Coding is my passion")
embedding3 = model.encode("The sky is blue")
sim_1_2 = cosine_similarity(embedding1, embedding2) # ~0.85 (similar)
sim_1_3 = cosine_similarity(embedding1, embedding3) # ~0.15 (different)
print(f"Programming vs Coding: {sim_1_2:.2f}")
print(f"Programming vs Sky: {sim_1_3:.2f}")
Cosine similarity ranges from -1 to 1:
- 1.0: Identical meaning
- 0.7-0.9: Very similar
- 0.3-0.7: Somewhat related
- 0.0-0.3: Different topics
Embeddings in RAG Systems
Embeddings are the backbone of Retrieval-Augmented Generation (RAG). Here's how they fit:
- Index your documents: Convert all documents to embeddings and store in a vector database
- User asks a question: Convert the question to an embedding
- Find similar content: Search for documents with similar embeddings
- Generate answer: Pass retrieved documents to LLM as context
from langchain_openai import OpenAIEmbeddings, ChatOpenAI
from langchain_community.vectorstores import Chroma
# 1. Create embeddings and store documents
embeddings = OpenAIEmbeddings()
vectorstore = Chroma.from_documents(documents, embeddings)
# 2. Create retriever
retriever = vectorstore.as_retriever(search_kwargs={"k": 3})
# 3. Query - embeddings handle the similarity search
relevant_docs = retriever.invoke("What is machine learning?")
# 4. Use retrieved docs with LLM
# (The retrieved docs become context for accurate answers)
Best Practices
- Choose the right model: Start with OpenAI's small model for prototyping; consider local models for cost-sensitive production
- Chunk appropriately: For documents, split into meaningful chunks (200-500 tokens) before embedding
- Use the same model: Always use the same embedding model for both indexing and querying
- Consider dimensions: More dimensions = better quality but more storage and compute
- Normalize if needed: Some models output normalized vectors; others don't
- Batch for efficiency: When embedding many texts, batch them together
Common Pitfalls
- Mixing embedding models: Vectors from different models are incompatible
- Ignoring context length: Most models truncate long texts; chunk first
- Not considering costs: API-based embeddings add up; calculate costs early
- Embedding everything: Only embed what's searchable; metadata can be filtered separately
Master Embeddings with Expert Guidance
Our Agentic AI program covers embeddings in-depth, from theory to production. Learn to build semantic search systems, RAG applications, and more with personalized mentorship.
Explore Agentic AI Program