OpenAI Agents SDK Complete Guide 2026: Building Multi-Agent Systems

Introduction

The landscape of AI application development has been transformed by the introduction of agent frameworks. OpenAI’s Agents SDK represents a significant leap forward in building autonomous AI systems that can reason, use tools, and collaborate to accomplish complex tasks. This comprehensive guide covers everything you need to know to build production-ready AI agents using the OpenAI Agents SDK.

What is OpenAI Agents SDK?

Overview

The OpenAI Agents SDK is a lightweight but powerful framework designed for building AI-powered agent applications. It represents a significant evolution from OpenAI’s experimental Swarm project, providing production-ready primitives for creating sophisticated multi-agent systems.

The SDK enables developers to create agents that can:

Use tools to interact with external systems
Hand off tasks between specialized agents
Maintain conversation context across interactions
Execute complex workflows with built-in safety guardrails

Key Components

The SDK is built around three core primitives:

Component	Description
Agents	LLMs configured with specific instructions, tools, and behaviors
Tools	Functions that agents can call to perform actions
Handoffs	Mechanisms for agents to transfer control to other agents

Getting Started

Installation

pip install openai-agents

Basic Agent Creation

from agents import Agent, function_tool

@function_tool
def get_weather(location: str) -> str:
    """Get the weather for a specific location."""
    # Implement weather API call
    return f"The weather in {location} is sunny, 72°F"

weather_agent = Agent(
    name="Weather Agent",
    instructions="You are a helpful weather assistant. Use the get_weather tool to answer questions about weather.",
    tools=[get_weather]
)

Agent Architecture

Defining Agents

Agents are the core building blocks of the SDK. Each agent has:

from agents import Agent, ModelSettings

agent = Agent(
    name="Research Assistant",
    instructions="""You are a research assistant helping users find information.
    
    Guidelines:
    - Always cite your sources
    - Provide balanced perspectives on controversial topics
    - Admit when you don't know something""",
    tools=[search_web, fetch_url, get_weather],
    tool_use_frequency="auto",  # or "always" or "never"
    model="gpt-4o",
    model_settings=ModelSettings(
        temperature=0.7,
        max_tokens=4000
    )
)

Model Settings

Fine-tune agent behavior with model settings:

from agents import ModelSettings

settings = ModelSettings(
    temperature=0.7,          # Controls randomness (0.0 - 2.0)
    max_tokens=4096,          # Maximum response length
    top_p=0.9,               # Nucleus sampling
    parallel_tool_calls=True  # Allow parallel tool execution
)

Tool Integration

Creating Tools

Tools extend agent capabilities by enabling interaction with external systems:

from agents import function_tool
import requests

@function_tool
def search_web(query: str, max_results: int = 5) -> list:
    """Search the web for information.
    
    Args:
        query: The search query
        max_results: Maximum number of results to return
    
    Returns:
        List of search results with title, url, and snippet
    """
    # Implement search logic
    response = requests.get(
        "https://api.search.example.com/search",
        params={"q": query, "limit": max_results}
    )
    return response.json()["results"]

@function_tool  
def send_email(to: str, subject: str, body: str) -> dict:
    """Send an email message.
    
    Args:
        to: Recipient email address
        subject: Email subject line
        body: Email body content
    """
    # Implement email sending
    return {"status": "sent", "message_id": "msg_123"}

Tool Parameters with Pydantic

Define complex tool parameters using Pydantic models:

from pydantic import BaseModel
from typing import List
from agents import function_tool

class CalendarEvent(BaseModel):
    title: str
    description: str = ""
    start_time: str  # ISO 8601 format
    end_time: str
    attendees: List[str] = []

@function_tool
def create_calendar_event(event: CalendarEvent) -> dict:
    """Create a calendar event."""
    # Implement calendar API call
    return {
        "status": "created",
        "event_id": "evt_456",
        "event": event.dict()
    }

Async Tools

For I/O-bound operations, use async tools:

import asyncio
from agents import afunction_tool

@afunction_tool
async def fetch_multiple_urls(urls: List[str]) -> List[dict]:
    """Fetch content from multiple URLs concurrently."""
    async with asyncio.ClientSession() as session:
        tasks = [
            session.get(url) 
            for url in urls
        ]
        responses = await asyncio.gather(*tasks)
        return [
            {"url": url, "status": r.status}
            for url, r in zip(urls, responses)
        ]

Multi-Agent Systems

Handoffs

The handoff mechanism allows agents to delegate tasks to specialized agents:

from agents import Agent, handoff

# Create specialized agents
triage_agent = Agent(
    name="Triage Agent",
    instructions="Route customer requests to the appropriate specialist.",
    handoffs=[
        handoff(
            agent=technical_support_agent,
            condition=lambda context: "technical" in context.user_input.lower()
        ),
        handoff(
            agent=billing_agent,
            condition=lambda context: "bill" in context.user_input.lower() or "payment" in context.user_input.lower()
        ),
        handoff(
            agent=general_support_agent,
            condition=lambda context: True  # Default fallback
        )
    ]
)

# Alternative: direct handoff
sales_agent = Agent(
    name="Sales Agent",
    instructions="Handle sales inquiries and product questions."
)

support_agent = Agent(
    name="Support Agent", 
    instructions="Handle technical support and troubleshooting."
)

# Agent can explicitly hand off
agent = Agent(
    name="Main Agent",
    instructions="""You are the main customer service agent.
    For sales inquiries, handoff to the sales agent.
    For technical issues, handoff to the support agent.""",
    handoffs=[sales_agent, support_agent]
)

Agent Pools

Create pools of agents for parallel processing:

from agents import Agent
import asyncio

# Create multiple worker agents
worker_agents = [
    Agent(
        name=f"Worker {i}",
        instructions="Process tasks efficiently and accurately."
    )
    for i in range(5)
]

async def process_batch(tasks: list):
    """Process multiple tasks in parallel."""
    async with asyncio.TaskGroup() as tg:
        results = [
            tg.create_task(agent.run(task))
            for task, agent in zip(tasks, worker_agents)
        ]
    return [r.result() for r in results]

Guardrails

Input Guardrails

Validate and filter user input before processing:

from agents import Agent, input_guardrail
import re

@input_guardrail
def validate_email_input(context):
    """Ensure user input doesn't contain email addresses."""
    user_input = context.user_input
    
    email_pattern = r'[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,}'
    if re.search(email_pattern, user_input):
        return {
            "valid": False,
            "reason": "Please don't include email addresses in your query."
        }
    
    return {"valid": True}

agent = Agent(
    name="Customer Service Agent",
    instructions="You are a helpful customer service agent.",
    input_guardrails=[validate_email_input]
)

Output Guardrails

Validate agent responses before returning to users:

from agents import output_guardrail

@output_guardrail
def sanitize_output(context):
    """Ensure output doesn't contain sensitive information."""
    output = context.agent_response
    
    # Check for potential sensitive data
    sensitive_patterns = [
        r'\b\d{3}-\d{2}-\d{4}\b',  # SSN
        r'\b\d{16}\b',               # Credit card
    ]
    
    for pattern in sensitive_patterns:
        if re.search(pattern, output):
            return {
                "valid": False,
                "reason": "Response contained sensitive information and was filtered."
            }
    
    return {"valid": True}

agent = Agent(
    name="Data Processing Agent",
    instructions="Process user data according to privacy guidelines.",
    output_guardrails=[sanitize_output]
)

Context Guardrails

Implement rate limiting and abuse prevention:

from datetime import datetime, timedelta
from collections import defaultdict

class RateLimiter:
    def __init__(self, max_requests_per_minute: int = 10):
        self.max_requests = max_requests_per_minute
        self.requests = defaultdict(list)
    
    def check_rate_limit(self, user_id: str) -> bool:
        now = datetime.utcnow()
        cutoff = now - timedelta(minutes=1)
        
        # Clean old requests
        self.requests[user_id] = [
            req_time for req_time in self.requests[user_id]
            if req_time > cutoff
        ]
        
        if len(self.requests[user_id]) >= self.max_requests:
            return False
        
        self.requests[user_id].append(now)
        return True

rate_limiter = RateLimiter(max_requests_per_minute=10)

@input_guardrail
def rate_limit_check(context):
    """Apply rate limiting per user."""
    user_id = context.user_id
    
    if not rate_limiter.check_rate_limit(user_id):
        return {
            "valid": False,
            "reason": "Rate limit exceeded. Please try again later."
        }
    
    return {"valid": True}

Production Patterns

Streaming Responses

async def stream_agent_response(agent: Agent, user_input: str):
    """Stream agent responses for better UX."""
    from agents import Runner
    
    result = Runner.run_streaming(
        agent=agent,
        input=user_input
    )
    
    async for event in result.stream_events():
        if event.type == "agent_message":
            print(event.message.content, end="", flush=True)
        elif event.type == "tool_call":
            print(f"\n[Using tool: {event.tool_call.name}]")

Error Handling

from agents import Agent, RunResult
from enum import Enum

class AgentError(Exception):
    def __init__(self, message: str, recoverable: bool = False):
        self.message = message
        self.recoverable = recoverable
        super().__init__(message)

async def safe_agent_run(agent: Agent, user_input: str, max_retries: int = 3):
    """Run agent with error handling and retries."""
    from agents import Runner
    
    last_error = None
    
    for attempt in range(max_retries):
        try:
            result = await Runner.run(agent, user_input)
            return result
        except Exception as e:
            last_error = e
            if not is_recoverable_error(e):
                raise AgentError(str(e), recoverable=False)
            
            # Exponential backoff
            await asyncio.sleep(2 ** attempt)
    
    raise AgentError(
        f"Agent failed after {max_retries} attempts: {last_error}",
        recoverable=True
    )

def is_recoverable_error(error: Exception) -> bool:
    """Determine if an error is recoverable."""
    recoverable_messages = [
        "rate limit",
        "timeout",
        "temporary failure"
    ]
    
    error_str = str(error).lower()
    return any(msg in error_str for msg in recoverable_messages)

Memory and Context

from agents import Agent
from typing import List, Dict

class ConversationMemory:
    def __init__(self, max_history: int = 10):
        self.max_history = max_history
        self.history: List[Dict] = []
    
    def add_message(self, role: str, content: str):
        """Add a message to conversation history."""
        self.history.append({"role": role, "content": content})
        
        # Trim if needed
        if len(self.history) > self.max_history:
            self.history = self.history[-self.max_history:]
    
    def get_context(self) -> str:
        """Get formatted context for agent."""
        return "\n".join(
            f"{msg['role']}: {msg['content']}"
            for msg in self.history
        )

# Usage
memory = ConversationMemory(max_history=10)

agent = Agent(
    name="Conversational Agent",
    instructions="""You are a helpful assistant with memory of our conversation.
    
    Previous conversation:
    {context}"""
)

# Build context before each run
context = memory.get_context()
result = await Runner.run(agent, user_input, context=context)

# Store interaction
memory.add_message("user", user_input)
memory.add_message("assistant", result.final_output)

Monitoring and Observability

Tracing Agent Executions

from agents import Agent
from opentelemetry import trace

tracer = trace.get_tracer(__name__)

class AgentTracer:
    def __init__(self, service_name: str):
        self.service_name = service_name
    
    async def trace_agent_run(self, agent: Agent, user_input: str):
        with tracer.start_as_current_span(
            f"agent.{agent.name}",
            attributes={
                "service.name": self.service_name,
                "agent.name": agent.name,
                "user.input.length": len(user_input)
            }
        ) as span:
            from agents import Runner
            result = await Runner.run(agent, user_input)
            
            span.set_attribute(
                "agent.output.length",
                len(result.final_output)
            )
            span.set_attribute(
                "agent.tool_calls",
                len(result.tool_calls) if result.tool_calls else 0
            )
            
            return result

Logging

import structlog
from agents import Agent

logger = structlog.get_logger()

@function_tool
def logged_api_call(url: str) -> dict:
    """API call with logging."""
    logger.info("api_call_start", url=url, tool="logged_api_call")
    
    try:
        result = make_api_call(url)
        logger.info(
            "api_call_success",
            url=url,
            status_code=result.status_code
        )
        return result.json()
    except Exception as e:
        logger.error(
            "api_call_failed",
            url=url,
            error=str(e)
        )
        raise

Best Practices

1. Keep Instructions Focused

# Bad: Vague instructions
agent = Agent(
    instructions="Be helpful."
)

# Good: Specific instructions
agent = Agent(
    instructions="""You are a technical support agent for a SaaS product.
    
    Your responsibilities:
    1. Understand user technical issues
    2. Provide troubleshooting steps
    3. Escalate to human support when needed
    
    Never:
    - Provide legal advice
    - Access user accounts without permission
    - Share internal system information"""
)

2. Use Descriptive Tool Names

# Bad: Generic names
def do_something(x):
    pass

# Good: Descriptive names
def create_calendar_event(event: CalendarEvent) -> dict:
    """Create a new event in the user's calendar."""
    pass

def searchKnowledgeBase(query: str, category: str = None) -> list:
    """Search the knowledge base for relevant articles.
    
    Args:
        query: Search query string
        category: Optional category filter
    """
    pass

3. Implement Proper Error Handling

@function_tool
def robust_api_call(url: str, timeout: int = 30) -> dict:
    """Make API call with proper error handling."""
    import httpx
    
    try:
        with httpx.Timeout(timeout):
            response = httpx.get(url)
            response.raise_for_status()
            return response.json()
    except httpx.TimeoutException:
        return {"error": "Request timed out", "retryable": True}
    except httpx.HTTPStatusError as e:
        return {"error": f"HTTP error: {e.response.status_code}", "retryable": False}
    except Exception as e:
        return {"error": f"Unexpected error: {str(e)}", "retryable": False}

4. Test Agent Behavior

import pytest
from agents import Agent, Runner

def test_agent_routes_technical_queries():
    """Test that technical queries route to correct agent."""
    technical_agent = Agent(name="Technical", instructions="Handle technical issues.")
    billing_agent = Agent(name="Billing", instructions="Handle billing questions.")
    
    triage = Agent(
        name="Triage",
        instructions="Route queries appropriately.",
        handoffs=[technical_agent, billing_agent]
    )
    
    result = Runner.run_sync(
        triage,
        "My application is throwing a 500 error"
    )
    
    # Verify handoff occurred
    assert result.final_output  # Check routing logic worked

def test_guardrail_blocks_sensitive_data():
    """Test that guardrails filter sensitive data."""
    agent = Agent(
        name="Test Agent",
        instructions="Process user requests.",
        output_guardrails=[sanitize_output]
    )
    
    result = Runner.run_sync(
        agent,
        "Tell me about your services"
    )
    
    # Verify sensitive data was filtered
    assert "123-45-6789" not in result.final_output

Common Pitfalls

1. Overcomplicating Agent Instructions

# Bad: Too many rules
agent = Agent(
    instructions="""First, greet the user. Then, ask what they need help with.
    If it's about X, do Y. If it's about A, do B.
    Remember to be polite. Don't forget to say goodbye.
    Also, check the time. If it's morning, say good morning.
    If it's afternoon... [continues for 200 more lines]"""
)

# Good: Focused, modular instructions
agent = Agent(
    instructions="""You are a customer service agent.
    
    Core responsibilities:
    - Answer customer questions
    - Troubleshoot issues
    - Escalate when needed
    
    Guidelines: [brief list of key rules]"""
)

2. Not Handling Tool Failures

# Bad: No error handling
@function_tool
def get_data(url: str):
    return requests.get(url).json()

# Good: Graceful error handling
@function_tool  
def get_data(url: str):
    try:
        return {"data": requests.get(url).json()}
    except requests.RequestException as e:
        return {"error": str(e), "fallback_available": True}

3. Ignoring Context Limits

Be mindful of context window limits when building conversations:

Include only relevant history
Summarize older interactions
Use structured data instead of verbose formats

External Resources

Conclusion

The OpenAI Agents SDK provides a powerful framework for building sophisticated AI agents. By understanding its core components—agents, tools, and handoffs—you can create multi-agent systems capable of handling complex workflows.

Key takeaways:

Start with simple, focused agents before building complex systems
Use descriptive tool names and comprehensive parameter definitions
Implement guardrails for security and reliability
Add comprehensive logging and monitoring
Test agent behavior thoroughly

As AI agents become more prevalent, mastering frameworks like the OpenAI Agents SDK will be essential for building production-ready AI applications that can reliably handle real-world tasks.