Function Calling and Tool Use: Building Agentic LLM Systems

Introduction

Function calling and tool use represent fundamental capabilities that transform language models from passive text generators into active agents capable of interacting with the real world. An LLM without tools is a chatbot; an LLM with tools becomes an agent that can take actions, retrieve information, and complete complex tasks. The ReAct pattern (Reason + Act + Observe) has become the foundation for building these agentic systems, enabling models to think step-by-step, perform external actions, and interpret results in a loop.

The evolution from simple function calling to sophisticated agentic systems has accelerated dramatically. Modern implementations support native function calling across providers like OpenAI, Anthropic, and Mistral, with structured outputs that return JSON specifying which function to call and what arguments to pass. This standardization has enabled a new generation of AI applications that combine language understanding with real-world action.

Understanding function calling and tool use is essential for building production AI systems. From simple API integrations to complex multi-step agents, these capabilities enable applications that were previously impossible. This article explores the foundations of function calling, the ReAct pattern for agentic reasoning, tool orchestration strategies, and practical implementation guidance.

Function Calling Foundations

Function calling enables language models to interact with external systems through structured interfaces. Rather than generating natural language responses, models can output structured function calls that trigger external actions.

Structured Output Formats

Modern function calling uses structured output formats that specify function names and arguments. OpenAI’s function calling returns JSON objects with the function name and a JSON object of arguments. Anthropic and Mistral provide similar capabilities with their own specifications. This standardization enables consistent tool integration across different model providers.

The function calling workflow involves several steps. First, the system provides the model with a description of available functions, including their names, parameters, and return types. When a user query requires a function, the model outputs a structured call that the system executes. The function result is then provided back to the model, which generates a natural language response incorporating the result.

Function Schema Design

Effective function calling requires careful schema design. Functions should have clear, descriptive names and comprehensive parameter descriptions. The schema should specify required and optional parameters, parameter types, and any constraints. Well-designed schemas help the model understand when and how to use each function.

Parameter descriptions should include not just types but also the semantic meaning of each parameter. For example, a date parameter should specify the expected format and what the date represents. These descriptions help the model generate correct arguments even for complex functions.

from typing import Dict, List, Any, Optional
from dataclasses import dataclass
from enum import Enum
import json

class FunctionResult:
    """Result of a function call."""
    
    def __init__(self, success: bool, data: Any = None, error: str = None):
        self.success = success
        self.data = data
        self.error = error
        
    def to_dict(self) -> Dict:
        return {
            "success": self.success,
            "data": self.data,
            "error": self.error
        }

@dataclass
class FunctionSchema:
    """Schema for a callable function."""
    name: str
    description: str
    parameters: Dict  # JSON Schema format
    required_params: List[str]
    
    def to_dict(self) -> Dict:
        return {
            "name": self.name,
            "description": self.description,
            "parameters": self.parameters,
            "required": self.required_params
        }

class FunctionRegistry:
    """Registry of available functions for LLM tool use."""
    
    def __init__(self):
        self.functions: Dict[str, callable] = {}
        self.schemas: Dict[str, FunctionSchema] = {}
        
    def register(self, func: callable, schema: FunctionSchema):
        """Register a function with its schema."""
        self.functions[schema.name] = func
        self.schemas[schema.name] = schema
        
    def get_schema(self, name: str) -> Optional[FunctionSchema]:
        """Get schema for a function."""
        return self.schemas.get(name)
    
    def list_schemas(self) -> List[Dict]:
        """List all function schemas."""
        return [schema.to_dict() for schema in self.schemas.values()]
    
    def call(self, name: str, arguments: Dict) -> FunctionResult:
        """Call a function by name with arguments."""
        if name not in self.functions:
            return FunctionResult(success=False, error=f"Function {name} not found")
        
        func = self.functions[name]
        try:
            result = func(**arguments)
            return FunctionResult(success=True, data=result)
        except Exception as e:
            return FunctionResult(success=False, error=str(e))


class FunctionCallingModel:
    """Language model with function calling capabilities."""
    
    def __init__(self, model, registry: FunctionRegistry):
        self.model = model
        self.registry = registry
        
    def generate_with_functions(self, prompt: str, max_iterations: int = 5) -> str:
        """Generate response with function calling support."""
        messages = [{"role": "user", "content": prompt}]
        
        for iteration in range(max_iterations):
            # Generate model response
            response = self.model.generate(
                messages,
                functions=self.registry.list_schemas()
            )
            
            # Check if model wants to call a function
            if response.function_call:
                function_name = response.function_call.name
                arguments = response.function_call.arguments
                
                # Execute function
                result = self.registry.call(function_name, arguments)
                
                # Add function result to messages
                messages.append({
                    "role": "assistant",
                    "content": None,
                    "function_call": {
                        "name": function_name,
                        "arguments": arguments
                    }
                })
                
                if result.success:
                    messages.append({
                        "role": "function",
                        "name": function_name,
                        "content": json.dumps(result.to_dict())
                    })
                else:
                    messages.append({
                        "role": "function",
                        "name": function_name,
                        "content": f"Error: {result.error}"
                    })
            else:
                # No function call, return the response
                return response.content
                
        return "Maximum iterations reached"

The ReAct Pattern

ReAct (Reason + Act + Observation) provides a structured approach for agentic reasoning. The pattern forces models to articulate their reasoning before taking action, making the entire process interpretable and verifiable.

ReAct Workflow

The ReAct workflow alternates between reasoning and action. The model first thinks about what to do (Thought), then takes an action (Action), receives the result (Observation), and uses that to inform the next reasoning step. This loop continues until the task is complete.

The strict format—Thought → Action → PAUSE → Observation—enables programmatic parsing and validation of each step. This structure is essential for production systems where each step must be logged, verified, and potentially corrected.

Implementing ReAct

Implementing ReAct requires careful prompt design and state management. The prompt should specify the expected format and provide examples of the Thought-Action-Observation loop. State must be maintained across iterations to track the conversation history and accumulated knowledge.

class ReActAgent:
    """ReAct-style agent for reasoning and acting."""
    
    def __init__(self, model, tools: FunctionRegistry, max_steps: int = 10):
        self.model = model
        self.tools = tools
        self.max_steps = max_steps
        
    def run(self, task: str) -> Dict:
        """Run ReAct agent on a task."""
        # Initialize state
        state = {
            "task": task,
            "thoughts": [],
            "actions": [],
            "observations": [],
            "final_answer": None
        }
        
        # Build system prompt
        system_prompt = self._build_system_prompt()
        
        for step in range(self.max_steps):
            # Generate next step
            response = self.model.generate([
                {"role": "system", "content": system_prompt},
                {"role": "user", "content": f"Task: {task}\n\nProgress so far:\n" + 
                 self._format_progress(state)}
            ])
            
            # Parse response
            parsed = self._parse_response(response)
            
            if "thought" in parsed:
                state["thoughts"].append(parsed["thought"])
                
                if "action" in parsed:
                    state["actions"].append(parsed["action"])
                    
                    # Execute action
                    result = self._execute_action(parsed["action"])
                    state["observations"].append(result)
                    
                    # Check if task is complete
                    if self._is_complete(parsed["action"], result):
                        state["final_answer"] = result
                        break
                else:
                    # No action, generate final answer
                    state["final_answer"] = parsed.get("answer", "No answer generated")
                    break
            else:
                state["final_answer"] = response
                break
                
        return state
    
    def _build_system_prompt(self) -> str:
        """Build the ReAct system prompt."""
        tool_descriptions = "\n".join([
            f"- {name}: {schema.description}"
            for name, schema in self.tools.schemas.items()
        ])
        
        return f"""You are a ReAct agent that can reason and take actions to complete tasks.

Available tools:
{tool_descriptions}

For each step, you must output in the following format:

Thought: <your reasoning about what to do next>
Action: <the action to take, or "FINAL ANSWER" if done>
Action Input: <JSON object with arguments, or null>

After taking an action, you will receive an Observation with the result.

Examples:

Thought: I need to find the current weather in Tokyo.
Action: get_weather
Action Input: {{"location": "Tokyo"}}

Observation: {{"temperature": 22, "condition": "sunny"}}

Thought: The weather in Tokyo is sunny and 22°C. I can now provide the final answer.
Action: FINAL ANSWER
Action Input: {{"answer": "The current weather in Tokyo is sunny with a temperature of 22°C."}}

Begin!
"""
    
    def _format_progress(self, state: Dict) -> str:
        """Format the current progress for the model."""
        lines = []
        for i, (thought, action, obs) in enumerate(zip(
            state["thoughts"], state["actions"], state["observations"]
        )):
            lines.append(f"Step {i+1}:")
            lines.append(f"Thought: {thought}")
            lines.append(f"Action: {action}")
            lines.append(f"Observation: {obs}")
            lines.append("")
        return "\n".join(lines)
    
    def _parse_response(self, response: str) -> Dict:
        """Parse the model's response."""
        result = {}
        
        if "Thought:" in response:
            thought = response.split("Thought:")[1].split("\n")[0].strip()
            result["thought"] = thought
            
        if "Action:" in response:
            action = response.split("Action:")[1].split("\n")[0].strip()
            result["action"] = action
            
        if "Action Input:" in response:
            input_str = response.split("Action Input:")[1].split("\n")[0].strip()
            try:
                result["action_input"] = json.loads(input_str)
            except:
                result["action_input"] = input_str
                
        if "FINAL ANSWER" in response:
            result["action"] = "FINAL ANSWER"
            
        return result
    
    def _execute_action(self, action: str) -> str:
        """Execute an action using the tool registry."""
        if action == "FINAL ANSWER":
            return "Task complete"
            
        # Find the function
        if action in self.tools.functions:
            schema = self.tools.get_schema(action)
            args = {}  # Would parse from action_input
            result = self.tools.call(action, args)
            return json.dumps(result.to_dict())
        else:
            return f"Unknown action: {action}"
    
    def _is_complete(self, action: str, observation: str) -> bool:
        """Check if the task is complete."""
        return action == "FINAL ANSWER" or "complete" in observation.lower()

Tool Orchestration Patterns

Building sophisticated agents requires orchestrating multiple tools in coordinated workflows. Several patterns have emerged as effective approaches to tool orchestration.

Sequential Workflows

Sequential workflows execute tools one after another, with each tool’s output feeding into the next. This pattern is suitable for tasks with clear step-by-step procedures, such as research workflows that first search, then filter, then synthesize.

The key challenge in sequential workflows is error handling. If one tool fails, the workflow must decide whether to retry, skip, or abort. Clear error messages and recovery strategies are essential for robust sequential execution.

Parallel Execution

Parallel execution runs multiple tools simultaneously, then combines results. This pattern is effective when tools operate on independent information sources. For example, a research agent might search multiple databases in parallel before synthesizing results.

Parallel execution requires careful result aggregation. Results must be combined in ways that preserve important information while avoiding redundancy. Ranking and deduplication strategies help manage the combined output.

Hierarchical Orchestration

Hierarchical orchestration uses manager agents to coordinate specialist agents. A high-level agent decomposes tasks and delegates to specialized agents, then synthesizes their outputs. This pattern scales to complex tasks that require diverse capabilities.

The hierarchical approach mirrors organizational structures, with generalists coordinating specialists. Effective implementation requires clear interfaces between levels and mechanisms for handling conflicts or inconsistencies.

Agent Memory and State

Effective agents maintain memory and state across interactions. Without memory, agents repeat failed approaches and forget accumulated knowledge.

Short-Term Memory

Short-term memory maintains context within a single task session. This includes the conversation history, accumulated observations, and current task state. The ReAct pattern’s explicit state tracking provides a foundation for short-term memory.

Implementation typically uses message lists or dedicated state objects that persist across iterations. The state must be carefully managed to avoid unbounded growth while preserving relevant context.

Long-Term Memory

Long-term memory persists across sessions, enabling agents to learn from past experiences. This might include successful strategies, failed approaches, and accumulated knowledge about the world.

Implementation approaches include vector databases for semantic memory, graph structures for relationship memory, and structured databases for factual knowledge. The choice depends on what information needs to be preserved and how it will be used.

Production Considerations

Deploying agentic systems in production requires attention to reliability, security, and monitoring.

Error Handling

Robust error handling is essential for production agents. Tools may fail, models may produce invalid outputs, and loops may run indefinitely. Each failure mode requires specific handling strategies.

Timeouts prevent infinite loops by limiting the time or iterations for any single operation. Retry logic handles transient failures. Fallback mechanisms provide graceful degradation when tools are unavailable.

Security

Agent security involves preventing misuse and protecting sensitive resources. Tool access must be controlled to prevent unauthorized actions. Input validation prevents injection attacks. Audit logging tracks agent actions for accountability.

Sandboxing restricts the impact of agent actions, preventing agents from causing unintended harm. Resource limits prevent agents from consuming excessive compute or making excessive API calls.

Monitoring

Production agents require comprehensive monitoring. Metrics should track task completion rates, latency, tool usage, and error rates. Logging captures detailed traces of agent behavior for debugging. Alerting notifies operators of issues requiring attention.

Challenges and Limitations

Agentic systems face several challenges that limit their applicability.

Reliability remains a significant concern. Agents can fail in unexpected ways, producing incorrect results or getting stuck in loops. The probabilistic nature of language models makes consistent behavior challenging.

Coordination complexity grows with the number of tools and agents. Managing interactions, resolving conflicts, and ensuring coherent behavior becomes increasingly difficult as systems grow.

Evaluation is challenging for agentic systems. Traditional metrics don’t capture the quality of agent behavior. Task completion is a useful proxy but may miss important aspects of agent performance.

Future Directions

Research on agentic systems continues to advance.

Learning-based agent design uses machine learning to optimize agent behavior. Rather than hand-crafting prompts and workflows, systems learn effective strategies from experience.

Multi-agent collaboration enables multiple agents to work together on complex tasks. Agents with different specializations can complement each other’s capabilities, achieving results beyond what any single agent could accomplish.

Autonomous improvement allows agents to improve themselves, modifying their prompts, tools, or strategies based on experience. This self-improvement capability could lead to rapidly advancing agent capabilities.

Resources

Conclusion

Function calling and tool use have transformed language models from text generators into active agents capable of interacting with the world. The ReAct pattern provides a structured approach for reasoning and acting, enabling interpretable and verifiable agent behavior.

The key to effective agentic systems is careful design of tools, workflows, and orchestration patterns. Sequential, parallel, and hierarchical patterns each have strengths suited to different tasks. Memory and state management enable agents to learn from experience and maintain context across interactions.

For practitioners, building agentic systems requires attention to reliability, security, and monitoring. The challenges are significant, but the potential for creating AI systems that can actually accomplish tasks—rather than just generate text—makes this one of the most exciting areas of AI development.