OpenAI Assistants API: Building Stateful AI Agents

What is the Assistants API?

The OpenAI Assistants API is a managed service for building AI agents. Unlike the basic Chat Completions API, Assistants provides:

Persistent Threads: Conversation history managed by OpenAI
Built-in Tools: Code Interpreter, File Search, and Function Calling
File Handling: Upload and process documents, images, and code
Stateful Conversations: Multi-turn interactions without manual context management

This makes it easier to build production-grade AI assistants without managing conversation state, file processing, or tool execution yourself.

Core Concepts

Assistants

An Assistant is a configured AI agent with specific instructions, tools, and a model.

from openai import OpenAI

client = OpenAI()

# Create an assistant
assistant = client.beta.assistants.create(
    name="Data Analyst",
    instructions="""You are a data analyst assistant. Help users analyze data,
    create visualizations, and extract insights. Use the code interpreter
    to run Python code when needed.""",
    model="gpt-4-turbo",
    tools=[
        {"type": "code_interpreter"},
        {"type": "file_search"}
    ]
)

print(f"Created assistant: {assistant.id}")

Threads

Threads represent conversations. OpenAI manages the message history for you.

# Create a new thread (conversation)
thread = client.beta.threads.create()

# Add a message to the thread
message = client.beta.threads.messages.create(
    thread_id=thread.id,
    role="user",
    content="Analyze the sales data I uploaded and find the top-performing products."
)

print(f"Thread ID: {thread.id}")

Runs

A Run executes the assistant on a thread, generating a response.

import time

# Create a run
run = client.beta.threads.runs.create(
    thread_id=thread.id,
    assistant_id=assistant.id
)

# Wait for completion
while run.status in ["queued", "in_progress"]:
    time.sleep(1)
    run = client.beta.threads.runs.retrieve(
        thread_id=thread.id,
        run_id=run.id
    )

# Get the assistant's response
messages = client.beta.threads.messages.list(thread_id=thread.id)
for msg in messages.data:
    if msg.role == "assistant":
        print(msg.content[0].text.value)
        break

Built-in Tools

Code Interpreter

Executes Python code in a sandboxed environment. Perfect for data analysis, calculations, and file processing.

# Upload a file for code interpreter
file = client.files.create(
    file=open("sales_data.csv", "rb"),
    purpose="assistants"
)

# Create assistant with code interpreter
assistant = client.beta.assistants.create(
    name="Data Analyst",
    instructions="Analyze data files using Python. Create visualizations when helpful.",
    model="gpt-4-turbo",
    tools=[{"type": "code_interpreter"}],
    tool_resources={
        "code_interpreter": {
            "file_ids": [file.id]
        }
    }
)

# The assistant can now read and analyze the CSV file

File Search (Retrieval)

Automatically chunks, embeds, and searches through uploaded documents.

# Create a vector store for documents
vector_store = client.beta.vector_stores.create(
    name="Company Knowledge Base"
)

# Upload files to the vector store
file_batch = client.beta.vector_stores.file_batches.upload_and_poll(
    vector_store_id=vector_store.id,
    files=[
        open("employee_handbook.pdf", "rb"),
        open("product_catalog.pdf", "rb"),
        open("faq.pdf", "rb")
    ]
)

# Create assistant with file search
assistant = client.beta.assistants.create(
    name="HR Assistant",
    instructions="Answer questions about company policies using the knowledge base.",
    model="gpt-4-turbo",
    tools=[{"type": "file_search"}],
    tool_resources={
        "file_search": {
            "vector_store_ids": [vector_store.id]
        }
    }
)

Function Calling

Define custom functions that the assistant can call, extending its capabilities.

import json

# Define custom functions
tools = [
    {
        "type": "function",
        "function": {
            "name": "get_weather",
            "description": "Get current weather for a city",
            "parameters": {
                "type": "object",
                "properties": {
                    "city": {
                        "type": "string",
                        "description": "City name"
                    },
                    "unit": {
                        "type": "string",
                        "enum": ["celsius", "fahrenheit"],
                        "description": "Temperature unit"
                    }
                },
                "required": ["city"]
            }
        }
    },
    {
        "type": "function",
        "function": {
            "name": "send_email",
            "description": "Send an email",
            "parameters": {
                "type": "object",
                "properties": {
                    "to": {"type": "string"},
                    "subject": {"type": "string"},
                    "body": {"type": "string"}
                },
                "required": ["to", "subject", "body"]
            }
        }
    }
]

assistant = client.beta.assistants.create(
    name="Personal Assistant",
    instructions="Help users with tasks. Use available functions when needed.",
    model="gpt-4-turbo",
    tools=tools
)

Handling Function Calls

When the assistant wants to use a function, you need to execute it and submit the result:

def handle_run(client, thread_id, assistant_id):
    """Execute a run and handle any required actions."""

    run = client.beta.threads.runs.create(
        thread_id=thread_id,
        assistant_id=assistant_id
    )

    while True:
        run = client.beta.threads.runs.retrieve(
            thread_id=thread_id,
            run_id=run.id
        )

        if run.status == "completed":
            break
        elif run.status == "requires_action":
            # Handle function calls
            tool_outputs = []

            for tool_call in run.required_action.submit_tool_outputs.tool_calls:
                function_name = tool_call.function.name
                arguments = json.loads(tool_call.function.arguments)

                # Execute the function
                if function_name == "get_weather":
                    result = get_weather(arguments["city"])
                elif function_name == "send_email":
                    result = send_email(
                        arguments["to"],
                        arguments["subject"],
                        arguments["body"]
                    )
                else:
                    result = {"error": f"Unknown function: {function_name}"}

                tool_outputs.append({
                    "tool_call_id": tool_call.id,
                    "output": json.dumps(result)
                })

            # Submit results back to the assistant
            run = client.beta.threads.runs.submit_tool_outputs(
                thread_id=thread_id,
                run_id=run.id,
                tool_outputs=tool_outputs
            )
        elif run.status in ["failed", "cancelled", "expired"]:
            raise Exception(f"Run failed with status: {run.status}")
        else:
            time.sleep(1)

    return run

# Example function implementations
def get_weather(city):
    # In practice, call a weather API
    return {"city": city, "temperature": 22, "condition": "sunny"}

def send_email(to, subject, body):
    # In practice, use an email service
    return {"status": "sent", "to": to}

Streaming Responses

Get real-time responses as the assistant generates them:

from openai import AssistantEventHandler

class MyEventHandler(AssistantEventHandler):
    def on_text_created(self, text):
        print(f"\nAssistant: ", end="", flush=True)

    def on_text_delta(self, delta, snapshot):
        print(delta.value, end="", flush=True)

    def on_tool_call_created(self, tool_call):
        print(f"\n\nUsing tool: {tool_call.type}\n", flush=True)

    def on_tool_call_delta(self, delta, snapshot):
        if delta.type == "code_interpreter":
            if delta.code_interpreter.input:
                print(delta.code_interpreter.input, end="", flush=True)
            if delta.code_interpreter.outputs:
                for output in delta.code_interpreter.outputs:
                    if output.type == "logs":
                        print(f"\n{output.logs}", flush=True)

# Use streaming
with client.beta.threads.runs.stream(
    thread_id=thread.id,
    assistant_id=assistant.id,
    event_handler=MyEventHandler()
) as stream:
    stream.until_done()

Complete Example: Document Q&A Assistant

from openai import OpenAI
import time

class DocumentAssistant:
    def __init__(self):
        self.client = OpenAI()
        self.assistant = None
        self.thread = None
        self.vector_store = None

    def setup(self, documents: list[str]):
        """Initialize the assistant with documents."""

        # Create vector store
        self.vector_store = self.client.beta.vector_stores.create(
            name="Document Store"
        )

        # Upload documents
        file_streams = [open(doc, "rb") for doc in documents]
        self.client.beta.vector_stores.file_batches.upload_and_poll(
            vector_store_id=self.vector_store.id,
            files=file_streams
        )

        # Create assistant
        self.assistant = self.client.beta.assistants.create(
            name="Document Q&A",
            instructions="""You are a helpful assistant that answers questions
            based on the provided documents. Always cite your sources.
            If you cannot find relevant information, say so clearly.""",
            model="gpt-4-turbo",
            tools=[{"type": "file_search"}],
            tool_resources={
                "file_search": {
                    "vector_store_ids": [self.vector_store.id]
                }
            }
        )

        # Create thread
        self.thread = self.client.beta.threads.create()

        return self

    def ask(self, question: str) -> str:
        """Ask a question about the documents."""

        # Add user message
        self.client.beta.threads.messages.create(
            thread_id=self.thread.id,
            role="user",
            content=question
        )

        # Run assistant
        run = self.client.beta.threads.runs.create(
            thread_id=self.thread.id,
            assistant_id=self.assistant.id
        )

        # Wait for completion
        while run.status in ["queued", "in_progress"]:
            time.sleep(1)
            run = self.client.beta.threads.runs.retrieve(
                thread_id=self.thread.id,
                run_id=run.id
            )

        if run.status == "failed":
            return f"Error: {run.last_error.message}"

        # Get response
        messages = self.client.beta.threads.messages.list(
            thread_id=self.thread.id,
            order="desc",
            limit=1
        )

        return messages.data[0].content[0].text.value

    def cleanup(self):
        """Clean up resources."""
        if self.assistant:
            self.client.beta.assistants.delete(self.assistant.id)
        if self.vector_store:
            self.client.beta.vector_stores.delete(self.vector_store.id)

# Usage
assistant = DocumentAssistant()
assistant.setup([
    "company_policies.pdf",
    "employee_handbook.pdf",
    "benefits_guide.pdf"
])

answer = assistant.ask("What is the vacation policy for new employees?")
print(answer)

# Continue the conversation
follow_up = assistant.ask("How do I request time off?")
print(follow_up)

# Clean up when done
assistant.cleanup()

Assistants API vs Other Approaches

Assistants API

Pros: Managed state, built-in RAG, code execution
Cons: OpenAI lock-in, cost per stored file

Chat Completions + Custom

Pros: Full control, portable, flexible
Cons: Build everything yourself

LangChain Agents

Pros: Many integrations, model-agnostic
Cons: More complex, learning curve

When to Use Assistants

Quick prototypes, document Q&A, code analysis - when OpenAI lock-in is acceptable

Best Practices

Clear Instructions: Write detailed system prompts that define behavior, constraints, and output format
Manage Costs: Monitor file storage and run usage - both are billed separately
Handle Errors: Always check run status and handle failures gracefully
Clean Up Resources: Delete assistants, threads, files, and vector stores when no longer needed
Use Streaming: For better UX, stream responses rather than waiting for completion
Version Your Assistants: Store assistant configurations in code for reproducibility
Set Timeouts: Runs can take time - implement appropriate timeouts

Pricing Considerations

The Assistants API has multiple cost components:

Model Usage: Standard token pricing for input/output
Code Interpreter: $0.03 per session (resets after 1 hour of inactivity)
File Search: $0.10 per GB of vector storage per day
File Storage: $0.20 per GB per day for uploaded files

For cost-sensitive applications, consider using the Chat Completions API with your own RAG implementation.

Master AI Agent Development

Our Agentic AI program covers OpenAI Assistants, LangChain, and multiple agent frameworks. Learn to choose the right approach for each use case and build production-ready AI agents.

Explore Agentic AI Program

OpenAI Assistants API