What is the Assistants API?
The OpenAI Assistants API is a managed service for building AI agents. Unlike the basic Chat Completions API, Assistants provides:
- Persistent Threads: Conversation history managed by OpenAI
- Built-in Tools: Code Interpreter, File Search, and Function Calling
- File Handling: Upload and process documents, images, and code
- Stateful Conversations: Multi-turn interactions without manual context management
This makes it easier to build production-grade AI assistants without managing conversation state, file processing, or tool execution yourself.
Core Concepts
Assistants
An Assistant is a configured AI agent with specific instructions, tools, and a model.
from openai import OpenAI
client = OpenAI()
# Create an assistant
assistant = client.beta.assistants.create(
name="Data Analyst",
instructions="""You are a data analyst assistant. Help users analyze data,
create visualizations, and extract insights. Use the code interpreter
to run Python code when needed.""",
model="gpt-4-turbo",
tools=[
{"type": "code_interpreter"},
{"type": "file_search"}
]
)
print(f"Created assistant: {assistant.id}")
Threads
Threads represent conversations. OpenAI manages the message history for you.
# Create a new thread (conversation)
thread = client.beta.threads.create()
# Add a message to the thread
message = client.beta.threads.messages.create(
thread_id=thread.id,
role="user",
content="Analyze the sales data I uploaded and find the top-performing products."
)
print(f"Thread ID: {thread.id}")
Runs
A Run executes the assistant on a thread, generating a response.
import time
# Create a run
run = client.beta.threads.runs.create(
thread_id=thread.id,
assistant_id=assistant.id
)
# Wait for completion
while run.status in ["queued", "in_progress"]:
time.sleep(1)
run = client.beta.threads.runs.retrieve(
thread_id=thread.id,
run_id=run.id
)
# Get the assistant's response
messages = client.beta.threads.messages.list(thread_id=thread.id)
for msg in messages.data:
if msg.role == "assistant":
print(msg.content[0].text.value)
break
Built-in Tools
Code Interpreter
Executes Python code in a sandboxed environment. Perfect for data analysis, calculations, and file processing.
# Upload a file for code interpreter
file = client.files.create(
file=open("sales_data.csv", "rb"),
purpose="assistants"
)
# Create assistant with code interpreter
assistant = client.beta.assistants.create(
name="Data Analyst",
instructions="Analyze data files using Python. Create visualizations when helpful.",
model="gpt-4-turbo",
tools=[{"type": "code_interpreter"}],
tool_resources={
"code_interpreter": {
"file_ids": [file.id]
}
}
)
# The assistant can now read and analyze the CSV file
File Search (Retrieval)
Automatically chunks, embeds, and searches through uploaded documents.
# Create a vector store for documents
vector_store = client.beta.vector_stores.create(
name="Company Knowledge Base"
)
# Upload files to the vector store
file_batch = client.beta.vector_stores.file_batches.upload_and_poll(
vector_store_id=vector_store.id,
files=[
open("employee_handbook.pdf", "rb"),
open("product_catalog.pdf", "rb"),
open("faq.pdf", "rb")
]
)
# Create assistant with file search
assistant = client.beta.assistants.create(
name="HR Assistant",
instructions="Answer questions about company policies using the knowledge base.",
model="gpt-4-turbo",
tools=[{"type": "file_search"}],
tool_resources={
"file_search": {
"vector_store_ids": [vector_store.id]
}
}
)
Function Calling
Define custom functions that the assistant can call, extending its capabilities.
import json
# Define custom functions
tools = [
{
"type": "function",
"function": {
"name": "get_weather",
"description": "Get current weather for a city",
"parameters": {
"type": "object",
"properties": {
"city": {
"type": "string",
"description": "City name"
},
"unit": {
"type": "string",
"enum": ["celsius", "fahrenheit"],
"description": "Temperature unit"
}
},
"required": ["city"]
}
}
},
{
"type": "function",
"function": {
"name": "send_email",
"description": "Send an email",
"parameters": {
"type": "object",
"properties": {
"to": {"type": "string"},
"subject": {"type": "string"},
"body": {"type": "string"}
},
"required": ["to", "subject", "body"]
}
}
}
]
assistant = client.beta.assistants.create(
name="Personal Assistant",
instructions="Help users with tasks. Use available functions when needed.",
model="gpt-4-turbo",
tools=tools
)
Handling Function Calls
When the assistant wants to use a function, you need to execute it and submit the result:
def handle_run(client, thread_id, assistant_id):
"""Execute a run and handle any required actions."""
run = client.beta.threads.runs.create(
thread_id=thread_id,
assistant_id=assistant_id
)
while True:
run = client.beta.threads.runs.retrieve(
thread_id=thread_id,
run_id=run.id
)
if run.status == "completed":
break
elif run.status == "requires_action":
# Handle function calls
tool_outputs = []
for tool_call in run.required_action.submit_tool_outputs.tool_calls:
function_name = tool_call.function.name
arguments = json.loads(tool_call.function.arguments)
# Execute the function
if function_name == "get_weather":
result = get_weather(arguments["city"])
elif function_name == "send_email":
result = send_email(
arguments["to"],
arguments["subject"],
arguments["body"]
)
else:
result = {"error": f"Unknown function: {function_name}"}
tool_outputs.append({
"tool_call_id": tool_call.id,
"output": json.dumps(result)
})
# Submit results back to the assistant
run = client.beta.threads.runs.submit_tool_outputs(
thread_id=thread_id,
run_id=run.id,
tool_outputs=tool_outputs
)
elif run.status in ["failed", "cancelled", "expired"]:
raise Exception(f"Run failed with status: {run.status}")
else:
time.sleep(1)
return run
# Example function implementations
def get_weather(city):
# In practice, call a weather API
return {"city": city, "temperature": 22, "condition": "sunny"}
def send_email(to, subject, body):
# In practice, use an email service
return {"status": "sent", "to": to}
Streaming Responses
Get real-time responses as the assistant generates them:
from openai import AssistantEventHandler
class MyEventHandler(AssistantEventHandler):
def on_text_created(self, text):
print(f"\nAssistant: ", end="", flush=True)
def on_text_delta(self, delta, snapshot):
print(delta.value, end="", flush=True)
def on_tool_call_created(self, tool_call):
print(f"\n\nUsing tool: {tool_call.type}\n", flush=True)
def on_tool_call_delta(self, delta, snapshot):
if delta.type == "code_interpreter":
if delta.code_interpreter.input:
print(delta.code_interpreter.input, end="", flush=True)
if delta.code_interpreter.outputs:
for output in delta.code_interpreter.outputs:
if output.type == "logs":
print(f"\n{output.logs}", flush=True)
# Use streaming
with client.beta.threads.runs.stream(
thread_id=thread.id,
assistant_id=assistant.id,
event_handler=MyEventHandler()
) as stream:
stream.until_done()
Complete Example: Document Q&A Assistant
from openai import OpenAI
import time
class DocumentAssistant:
def __init__(self):
self.client = OpenAI()
self.assistant = None
self.thread = None
self.vector_store = None
def setup(self, documents: list[str]):
"""Initialize the assistant with documents."""
# Create vector store
self.vector_store = self.client.beta.vector_stores.create(
name="Document Store"
)
# Upload documents
file_streams = [open(doc, "rb") for doc in documents]
self.client.beta.vector_stores.file_batches.upload_and_poll(
vector_store_id=self.vector_store.id,
files=file_streams
)
# Create assistant
self.assistant = self.client.beta.assistants.create(
name="Document Q&A",
instructions="""You are a helpful assistant that answers questions
based on the provided documents. Always cite your sources.
If you cannot find relevant information, say so clearly.""",
model="gpt-4-turbo",
tools=[{"type": "file_search"}],
tool_resources={
"file_search": {
"vector_store_ids": [self.vector_store.id]
}
}
)
# Create thread
self.thread = self.client.beta.threads.create()
return self
def ask(self, question: str) -> str:
"""Ask a question about the documents."""
# Add user message
self.client.beta.threads.messages.create(
thread_id=self.thread.id,
role="user",
content=question
)
# Run assistant
run = self.client.beta.threads.runs.create(
thread_id=self.thread.id,
assistant_id=self.assistant.id
)
# Wait for completion
while run.status in ["queued", "in_progress"]:
time.sleep(1)
run = self.client.beta.threads.runs.retrieve(
thread_id=self.thread.id,
run_id=run.id
)
if run.status == "failed":
return f"Error: {run.last_error.message}"
# Get response
messages = self.client.beta.threads.messages.list(
thread_id=self.thread.id,
order="desc",
limit=1
)
return messages.data[0].content[0].text.value
def cleanup(self):
"""Clean up resources."""
if self.assistant:
self.client.beta.assistants.delete(self.assistant.id)
if self.vector_store:
self.client.beta.vector_stores.delete(self.vector_store.id)
# Usage
assistant = DocumentAssistant()
assistant.setup([
"company_policies.pdf",
"employee_handbook.pdf",
"benefits_guide.pdf"
])
answer = assistant.ask("What is the vacation policy for new employees?")
print(answer)
# Continue the conversation
follow_up = assistant.ask("How do I request time off?")
print(follow_up)
# Clean up when done
assistant.cleanup()
Assistants API vs Other Approaches
Assistants API
Pros: Managed state, built-in RAG, code execution
Cons: OpenAI lock-in, cost per stored file
Chat Completions + Custom
Pros: Full control, portable, flexible
Cons: Build everything yourself
LangChain Agents
Pros: Many integrations, model-agnostic
Cons: More complex, learning curve
When to Use Assistants
Quick prototypes, document Q&A, code analysis - when OpenAI lock-in is acceptable
Best Practices
- Clear Instructions: Write detailed system prompts that define behavior, constraints, and output format
- Manage Costs: Monitor file storage and run usage - both are billed separately
- Handle Errors: Always check run status and handle failures gracefully
- Clean Up Resources: Delete assistants, threads, files, and vector stores when no longer needed
- Use Streaming: For better UX, stream responses rather than waiting for completion
- Version Your Assistants: Store assistant configurations in code for reproducibility
- Set Timeouts: Runs can take time - implement appropriate timeouts
Pricing Considerations
The Assistants API has multiple cost components:
- Model Usage: Standard token pricing for input/output
- Code Interpreter: $0.03 per session (resets after 1 hour of inactivity)
- File Search: $0.10 per GB of vector storage per day
- File Storage: $0.20 per GB per day for uploaded files
For cost-sensitive applications, consider using the Chat Completions API with your own RAG implementation.
Master AI Agent Development
Our Agentic AI program covers OpenAI Assistants, LangChain, and multiple agent frameworks. Learn to choose the right approach for each use case and build production-ready AI agents.
Explore Agentic AI Program