Agent Memory Systems: Building AI That Remembers

Why Memory Matters for AI Agents

Without memory, every interaction with an AI agent starts from zero. The agent has no idea who you are, what you discussed before, or what preferences you've expressed. Memory transforms a stateless AI into a contextual, personalized assistant.

Memory is critical for:

Continuity: Maintaining context across conversation turns
Personalization: Remembering user preferences and history
Learning: Improving responses based on past interactions
Complex tasks: Tracking multi-step workflows and state

Types of Agent Memory

AI agent memory can be categorized into different types, each serving a specific purpose:

1. Short-Term (Working) Memory

Holds the current conversation context. This is what most chatbots use - the recent message history that fits within the LLM's context window.

from langchain.memory import ConversationBufferMemory

# Simple buffer - keeps all messages
memory = ConversationBufferMemory()
memory.save_context(
    {"input": "My name is Alice"},
    {"output": "Hello Alice! How can I help you?"}
)

# Retrieve the conversation
print(memory.load_memory_variables({}))

2. Long-Term Memory

Persists information across sessions. Uses external storage (databases, vector stores) to remember user information, preferences, and important facts indefinitely.

from langchain.memory import VectorStoreRetrieverMemory
from langchain_openai import OpenAIEmbeddings
from langchain_community.vectorstores import Chroma

# Create a vector store for long-term memory
embeddings = OpenAIEmbeddings()
vectorstore = Chroma(
    collection_name="long_term_memory",
    embedding_function=embeddings,
    persist_directory="./memory_db"
)

# Create retriever-based memory
retriever = vectorstore.as_retriever(search_kwargs={"k": 5})
memory = VectorStoreRetrieverMemory(retriever=retriever)

# Save important information
memory.save_context(
    {"input": "I'm allergic to peanuts"},
    {"output": "I've noted that you have a peanut allergy."}
)

3. Episodic Memory

Stores specific experiences or events with temporal context. Useful for remembering what happened in past sessions, including the sequence of events.

from datetime import datetime

class EpisodicMemory:
    def __init__(self):
        self.episodes = []

    def save_episode(self, event, context=None):
        episode = {
            "timestamp": datetime.now().isoformat(),
            "event": event,
            "context": context or {}
        }
        self.episodes.append(episode)

    def recall_recent(self, n=5):
        return self.episodes[-n:]

    def search_episodes(self, query):
        # In practice, use semantic search
        return [ep for ep in self.episodes
                if query.lower() in ep["event"].lower()]

# Usage
memory = EpisodicMemory()
memory.save_episode(
    "User completed onboarding",
    {"preferences": ["dark_mode", "email_notifications"]}
)
memory.save_episode("User asked about pricing plans")

4. Semantic Memory

Stores facts, concepts, and general knowledge about the user or domain. Unlike episodic memory, it's not tied to specific events but represents learned information.

from langchain_community.graphs import Neo4jGraph

# Semantic memory using knowledge graphs
class SemanticMemory:
    def __init__(self):
        self.facts = {}  # Entity -> facts mapping

    def learn(self, entity, relation, value):
        if entity not in self.facts:
            self.facts[entity] = {}
        self.facts[entity][relation] = value

    def recall(self, entity, relation=None):
        if entity not in self.facts:
            return None
        if relation:
            return self.facts[entity].get(relation)
        return self.facts[entity]

# Usage
memory = SemanticMemory()
memory.learn("user", "name", "Alice")
memory.learn("user", "role", "software engineer")
memory.learn("user", "expertise", ["Python", "JavaScript"])

print(memory.recall("user"))
# {'name': 'Alice', 'role': 'software engineer', 'expertise': ['Python', 'JavaScript']}

LangChain Memory Types

LangChain provides several built-in memory implementations:

ConversationBufferMemory

Stores the entire conversation history. Simple but can exceed token limits.

from langchain.memory import ConversationBufferMemory

memory = ConversationBufferMemory(return_messages=True)
# Keeps all messages - good for short conversations

ConversationBufferWindowMemory

Keeps only the last K conversation turns. Prevents context overflow.

from langchain.memory import ConversationBufferWindowMemory

memory = ConversationBufferWindowMemory(k=5, return_messages=True)
# Only keeps the last 5 exchanges

ConversationSummaryMemory

Uses an LLM to summarize the conversation, keeping a condensed version.

from langchain.memory import ConversationSummaryMemory
from langchain_openai import ChatOpenAI

llm = ChatOpenAI(model="gpt-4")
memory = ConversationSummaryMemory(llm=llm)

# Instead of storing all messages, stores a running summary
# "The user introduced themselves as Alice, a software engineer..."

ConversationSummaryBufferMemory

Hybrid approach: keeps recent messages in full, summarizes older ones.

from langchain.memory import ConversationSummaryBufferMemory

memory = ConversationSummaryBufferMemory(
    llm=llm,
    max_token_limit=1000  # Summarize when exceeding this limit
)
# Best of both worlds: recent context + summarized history

ConversationEntityMemory

Extracts and remembers information about entities (people, places, things).

from langchain.memory import ConversationEntityMemory

memory = ConversationEntityMemory(llm=llm)

# After conversation, the memory might contain:
# {
#   "Alice": "Software engineer, interested in AI",
#   "Project X": "A machine learning project Alice is working on"
# }

Building a Production Memory System

Real-world agents often need a combination of memory types. Here's a comprehensive example:

from langchain_openai import ChatOpenAI, OpenAIEmbeddings
from langchain_community.vectorstores import Chroma
from langchain.memory import ConversationBufferWindowMemory
from datetime import datetime
import json

class AgentMemorySystem:
    def __init__(self, user_id: str):
        self.user_id = user_id
        self.llm = ChatOpenAI(model="gpt-4")
        self.embeddings = OpenAIEmbeddings()

        # Short-term: Recent conversation
        self.short_term = ConversationBufferWindowMemory(
            k=10,
            return_messages=True
        )

        # Long-term: Vector store for semantic search
        self.long_term = Chroma(
            collection_name=f"user_{user_id}_memory",
            embedding_function=self.embeddings,
            persist_directory=f"./memory/{user_id}"
        )

        # User profile: Structured facts
        self.profile = self._load_profile()

    def _load_profile(self):
        try:
            with open(f"./profiles/{self.user_id}.json") as f:
                return json.load(f)
        except FileNotFoundError:
            return {"facts": {}, "preferences": {}}

    def save_profile(self):
        with open(f"./profiles/{self.user_id}.json", "w") as f:
            json.dump(self.profile, f)

    def add_message(self, role: str, content: str):
        # Add to short-term
        if role == "user":
            self.short_term.save_context(
                {"input": content},
                {"output": ""}  # Will be filled later
            )

        # Check for important information to persist
        self._extract_and_store_facts(content)

    def _extract_and_store_facts(self, content: str):
        # Use LLM to extract important facts
        extraction_prompt = f"""
        Extract any important facts about the user from this message.
        Return as JSON with keys: name, preferences, important_info
        Message: {content}
        """
        # In practice, call LLM here and update profile

        # Also store in long-term memory for semantic search
        self.long_term.add_texts(
            [content],
            metadatas=[{"timestamp": datetime.now().isoformat()}]
        )

    def get_relevant_context(self, query: str) -> str:
        # Get recent conversation
        recent = self.short_term.load_memory_variables({})

        # Search long-term memory
        relevant_docs = self.long_term.similarity_search(query, k=3)

        # Combine with user profile
        context = f"""
        User Profile: {json.dumps(self.profile)}

        Recent Conversation: {recent}

        Relevant History: {[doc.page_content for doc in relevant_docs]}
        """
        return context

# Usage
memory = AgentMemorySystem(user_id="user_123")
memory.add_message("user", "I prefer Python over JavaScript")
context = memory.get_relevant_context("What programming language should I use?")

Memory Persistence Strategies

Database Storage

import sqlite3
from datetime import datetime

class SQLiteMemory:
    def __init__(self, db_path: str):
        self.conn = sqlite3.connect(db_path)
        self._create_tables()

    def _create_tables(self):
        self.conn.execute("""
            CREATE TABLE IF NOT EXISTS conversations (
                id INTEGER PRIMARY KEY,
                user_id TEXT,
                role TEXT,
                content TEXT,
                timestamp DATETIME DEFAULT CURRENT_TIMESTAMP
            )
        """)
        self.conn.execute("""
            CREATE TABLE IF NOT EXISTS user_facts (
                id INTEGER PRIMARY KEY,
                user_id TEXT,
                key TEXT,
                value TEXT,
                UNIQUE(user_id, key)
            )
        """)
        self.conn.commit()

    def save_message(self, user_id: str, role: str, content: str):
        self.conn.execute(
            "INSERT INTO conversations (user_id, role, content) VALUES (?, ?, ?)",
            (user_id, role, content)
        )
        self.conn.commit()

    def get_recent_messages(self, user_id: str, limit: int = 10):
        cursor = self.conn.execute(
            """SELECT role, content FROM conversations
               WHERE user_id = ? ORDER BY timestamp DESC LIMIT ?""",
            (user_id, limit)
        )
        return list(reversed(cursor.fetchall()))

Redis for Fast Access

import redis
import json

class RedisMemory:
    def __init__(self, host="localhost", port=6379):
        self.client = redis.Redis(host=host, port=port)

    def save_context(self, user_id: str, messages: list, ttl: int = 3600):
        key = f"chat:{user_id}"
        self.client.setex(key, ttl, json.dumps(messages))

    def load_context(self, user_id: str) -> list:
        key = f"chat:{user_id}"
        data = self.client.get(key)
        return json.loads(data) if data else []

    def save_user_fact(self, user_id: str, key: str, value: str):
        self.client.hset(f"user:{user_id}", key, value)

    def get_user_facts(self, user_id: str) -> dict:
        return self.client.hgetall(f"user:{user_id}")

Best Practices

Layer your memory: Combine short-term buffer with long-term storage for optimal context
Be selective: Don't store everything - extract and persist only important information
Manage token limits: Use summarization when conversation history grows too long
Privacy first: Allow users to view and delete their stored data
Handle stale data: Implement TTL or periodic cleanup for outdated information
Test retrieval: Ensure your semantic search actually retrieves relevant context
Consider latency: Memory operations shouldn't slow down response times significantly

Common Patterns

Conversation + Summary

Keep recent messages verbatim, summarize older context. Best for chat applications.

Entity + Vector Store

Track named entities and use vector search for relevant history. Good for CRM-style agents.

Graph-Based Memory

Use knowledge graphs to store relationships. Best for complex domain knowledge.

Hierarchical Memory

Multiple layers from working memory to long-term storage. Mimics human memory.

Master Agent Memory with Expert Guidance

Our Agentic AI program covers memory systems in-depth, from simple conversation buffers to sophisticated multi-layer architectures. Build agents that truly remember and learn.

Explore Agentic AI Program

Agent Memory Systems