What are Embeddings?

Embeddings are numerical representations of text (or other data) as vectors - lists of numbers that capture the meaning and context of the content. Think of them as a way to convert human language into a format that computers can understand and compare mathematically.

For example, the sentence "I love programming" might become a vector like [0.23, -0.45, 0.78, 0.12, ...] with hundreds or thousands of dimensions. Similar sentences will have similar vectors, allowing AI to understand relationships between concepts.

Why Do Embeddings Exist?

Computers work with numbers, not words. Traditional approaches treated words as isolated symbols - "cat" and "kitten" had no mathematical relationship. Embeddings solve this by placing related concepts close together in a high-dimensional space.

The Key Insight

Words with similar meanings have similar embeddings. This enables semantic search - finding content by meaning rather than exact keyword matches.

Before embeddings, search engines could only find exact matches. With embeddings, searching for "how to fix a bug" can find documents about "debugging code" or "troubleshooting errors" - because they're semantically similar.

How Do Embeddings Work?

Embedding models are neural networks trained on massive amounts of text. They learn patterns like:

  • Synonyms: "happy" and "joyful" are close together
  • Relationships: king - man + woman ≈ queen
  • Context: "bank" (financial) vs "bank" (river) have different embeddings based on context
  • Topics: All programming-related terms cluster together

The Process

# 1. Input text
text = "Machine learning is fascinating"

# 2. Tokenize (break into pieces)
tokens = ["Machine", "learning", "is", "fascinating"]

# 3. Pass through embedding model
# The model processes all tokens and their relationships

# 4. Output: A single vector representing the meaning
embedding = [0.023, -0.156, 0.892, 0.045, ...]  # 1536 dimensions for OpenAI

When to Use Embeddings

Semantic Search

Find documents by meaning, not just keywords. Essential for knowledge bases and documentation search.

RAG Systems

Retrieve relevant context for LLMs to generate accurate, grounded responses.

Recommendation Systems

Find similar products, articles, or content based on semantic similarity.

Clustering & Classification

Group similar documents or categorize content automatically.

Duplicate Detection

Find near-duplicate content even when wording differs significantly.

Anomaly Detection

Identify unusual patterns or outliers in text data.

Popular Embedding Models

Model Provider Dimensions Best For
text-embedding-3-small OpenAI 1536 Cost-effective, general purpose
text-embedding-3-large OpenAI 3072 Highest quality, complex tasks
embed-english-v3.0 Cohere 1024 English text, good quality
all-MiniLM-L6-v2 Sentence Transformers 384 Free, runs locally, fast
all-mpnet-base-v2 Sentence Transformers 768 Free, high quality, local

Getting Started with Embeddings

Using OpenAI Embeddings

from openai import OpenAI

client = OpenAI()

# Generate embedding for a single text
response = client.embeddings.create(
    model="text-embedding-3-small",
    input="Machine learning is transforming industries"
)

embedding = response.data[0].embedding
print(f"Dimensions: {len(embedding)}")  # 1536

Using Sentence Transformers (Free, Local)

from sentence_transformers import SentenceTransformer

# Load model (downloads once, runs locally)
model = SentenceTransformer('all-MiniLM-L6-v2')

# Generate embeddings
sentences = [
    "Machine learning is fascinating",
    "I love artificial intelligence",
    "The weather is nice today"
]

embeddings = model.encode(sentences)
print(f"Shape: {embeddings.shape}")  # (3, 384)

Using LangChain (Unified Interface)

from langchain_openai import OpenAIEmbeddings
from langchain_community.embeddings import HuggingFaceEmbeddings

# OpenAI
openai_embeddings = OpenAIEmbeddings(model="text-embedding-3-small")

# Or use free local model
local_embeddings = HuggingFaceEmbeddings(
    model_name="all-MiniLM-L6-v2"
)

# Same interface for both
vector = openai_embeddings.embed_query("Hello world")

Comparing Embeddings (Similarity)

To find similar content, we compare embeddings using distance metrics. The most common is cosine similarity - it measures the angle between vectors, ignoring their magnitude.

import numpy as np
from numpy.linalg import norm

def cosine_similarity(a, b):
    return np.dot(a, b) / (norm(a) * norm(b))

# Example
embedding1 = model.encode("I love programming")
embedding2 = model.encode("Coding is my passion")
embedding3 = model.encode("The sky is blue")

sim_1_2 = cosine_similarity(embedding1, embedding2)  # ~0.85 (similar)
sim_1_3 = cosine_similarity(embedding1, embedding3)  # ~0.15 (different)

print(f"Programming vs Coding: {sim_1_2:.2f}")
print(f"Programming vs Sky: {sim_1_3:.2f}")

Cosine similarity ranges from -1 to 1:

  • 1.0: Identical meaning
  • 0.7-0.9: Very similar
  • 0.3-0.7: Somewhat related
  • 0.0-0.3: Different topics

Embeddings in RAG Systems

Embeddings are the backbone of Retrieval-Augmented Generation (RAG). Here's how they fit:

  1. Index your documents: Convert all documents to embeddings and store in a vector database
  2. User asks a question: Convert the question to an embedding
  3. Find similar content: Search for documents with similar embeddings
  4. Generate answer: Pass retrieved documents to LLM as context
from langchain_openai import OpenAIEmbeddings, ChatOpenAI
from langchain_community.vectorstores import Chroma

# 1. Create embeddings and store documents
embeddings = OpenAIEmbeddings()
vectorstore = Chroma.from_documents(documents, embeddings)

# 2. Create retriever
retriever = vectorstore.as_retriever(search_kwargs={"k": 3})

# 3. Query - embeddings handle the similarity search
relevant_docs = retriever.invoke("What is machine learning?")

# 4. Use retrieved docs with LLM
# (The retrieved docs become context for accurate answers)

Best Practices

  • Choose the right model: Start with OpenAI's small model for prototyping; consider local models for cost-sensitive production
  • Chunk appropriately: For documents, split into meaningful chunks (200-500 tokens) before embedding
  • Use the same model: Always use the same embedding model for both indexing and querying
  • Consider dimensions: More dimensions = better quality but more storage and compute
  • Normalize if needed: Some models output normalized vectors; others don't
  • Batch for efficiency: When embedding many texts, batch them together

Common Pitfalls

  • Mixing embedding models: Vectors from different models are incompatible
  • Ignoring context length: Most models truncate long texts; chunk first
  • Not considering costs: API-based embeddings add up; calculate costs early
  • Embedding everything: Only embed what's searchable; metadata can be filtered separately

Master Embeddings with Expert Guidance

Our Agentic AI program covers embeddings in-depth, from theory to production. Learn to build semantic search systems, RAG applications, and more with personalized mentorship.

Explore Agentic AI Program

Related Articles