What is Agentic AI?
Agentic AI refers to AI systems that act autonomously to achieve goals over time โ perceiving their environment, making decisions, taking actions, and adapting from feedback. Unlike a single-step prediction model (classify this image, complete this sentence), an agentic system behaves like an actor with persistence, objectives, and the ability to plan and execute multi-step tasks.
The key characteristics that make an AI system “agentic”:
- Autonomy: operates with limited human input for parts of the task loop
- Goal-directed behavior: optimizes for outcomes over short or long horizons
- Sequential decision-making: composes multi-step plans and adapts based on results
- Stateful memory: retains context across steps (short-term + long-term)
- Tool use: calls external APIs, searches the web, writes and executes code
Agentic AI vs Traditional AI
| Aspect | Traditional ML | Agentic AI |
|---|---|---|
| Task type | Single-step prediction | Multi-step planning and execution |
| State | Stateless | Stateful (memory across steps) |
| Role | Passive (predicts) | Active (acts, queries, changes environment) |
| Feedback | Training-time only | Runtime adaptation |
| Scope | Narrow (one task) | Broad (orchestrates multiple tasks) |
A traditional sentiment classifier takes text โ returns a label. An agentic system might: search for recent news about a company, read the articles, analyze sentiment across sources, compare to historical data, and write a report โ all autonomously.
The Agent Loop
Every agentic system follows some variation of this loop:
while not goal_reached(state):
observation = perceive_environment(state)
context = memory.retrieve(observation)
plan = planner.generate(context, goal)
action = select_action(plan)
result = execute(action)
memory.store(result)
state = update_state(state, result)
In practice with an LLM-based agent:
from openai import OpenAI
client = OpenAI()
messages = [{"role": "system", "content": "You are a helpful research assistant."}]
def agent_loop(user_goal: str, tools: list, max_steps: int = 10):
messages.append({"role": "user", "content": user_goal})
for step in range(max_steps):
response = client.chat.completions.create(
model="gpt-4o",
messages=messages,
tools=tools,
tool_choice="auto"
)
message = response.choices[0].message
messages.append(message)
# If no tool calls, agent is done
if not message.tool_calls:
return message.content
# Execute tool calls
for tool_call in message.tool_calls:
result = execute_tool(tool_call.function.name, tool_call.function.arguments)
messages.append({
"role": "tool",
"tool_call_id": tool_call.id,
"content": str(result)
})
return "Max steps reached"
Core Components
1. Planner / Reasoning Engine
The component that decides what to do next. In modern LLM-based agents, this is typically the LLM itself using chain-of-thought reasoning:
# ReAct pattern: Reason + Act
system_prompt = """
You are an agent. For each step:
1. THOUGHT: Reason about what to do next
2. ACTION: Choose a tool to call
3. OBSERVATION: Read the tool result
4. Repeat until you have the answer
"""
2. Memory
Agents need memory to maintain context across steps:
from typing import List
import json
class AgentMemory:
def __init__(self, max_short_term: int = 20):
self.short_term: List[dict] = [] # recent conversation
self.long_term = VectorStore() # semantic search over past interactions
self.max_short_term = max_short_term
def add(self, role: str, content: str):
self.short_term.append({"role": role, "content": content})
if len(self.short_term) > self.max_short_term:
# Summarize and move to long-term
summary = summarize(self.short_term[:10])
self.long_term.add(summary)
self.short_term = self.short_term[10:]
def retrieve(self, query: str, k: int = 5) -> List[str]:
return self.long_term.search(query, k=k)
3. Tools
Tools are functions the agent can call to interact with the world:
tools = [
{
"type": "function",
"function": {
"name": "web_search",
"description": "Search the web for current information",
"parameters": {
"type": "object",
"properties": {
"query": {"type": "string", "description": "Search query"}
},
"required": ["query"]
}
}
},
{
"type": "function",
"function": {
"name": "execute_python",
"description": "Execute Python code and return the output",
"parameters": {
"type": "object",
"properties": {
"code": {"type": "string", "description": "Python code to execute"}
},
"required": ["code"]
}
}
},
{
"type": "function",
"function": {
"name": "read_file",
"description": "Read the contents of a file",
"parameters": {
"type": "object",
"properties": {
"path": {"type": "string", "description": "File path to read"}
},
"required": ["path"]
}
}
}
]
4. Executor
Safely executes tool calls with validation and sandboxing:
import subprocess
import json
def execute_tool(name: str, arguments: str) -> str:
args = json.loads(arguments)
if name == "web_search":
return search_web(args["query"])
elif name == "execute_python":
# IMPORTANT: sandbox code execution
result = subprocess.run(
["python3", "-c", args["code"]],
capture_output=True,
text=True,
timeout=10, # prevent infinite loops
# In production: use Docker or a proper sandbox
)
return result.stdout or result.stderr
elif name == "read_file":
# Validate path to prevent directory traversal
path = args["path"]
if ".." in path or path.startswith("/"):
return "Error: Invalid path"
with open(path) as f:
return f.read()
else:
return f"Unknown tool: {name}"
Agent Patterns
ReAct (Reason + Act)
The most common pattern: the agent alternates between reasoning and acting.
Thought: I need to find the current price of AAPL stock.
Action: web_search("AAPL stock price today")
Observation: AAPL is trading at $178.50 as of March 30, 2026.
Thought: Now I have the price. I should also check the P/E ratio.
Action: web_search("AAPL P/E ratio 2026")
...
Final Answer: AAPL is trading at $178.50 with a P/E ratio of 28.
Plan-and-Execute
The agent first creates a full plan, then executes each step:
# Step 1: Generate plan
plan = llm.generate(f"Create a step-by-step plan to: {goal}")
steps = parse_steps(plan)
# Step 2: Execute each step
results = []
for step in steps:
result = execute_step(step)
results.append(result)
# Step 3: Synthesize results
final_answer = llm.generate(f"Based on these results: {results}\nAnswer: {goal}")
Multi-Agent Systems
Multiple specialized agents collaborate:
class ResearchTeam:
def __init__(self):
self.researcher = Agent("You are a research specialist. Find relevant information.")
self.analyst = Agent("You are a data analyst. Analyze and interpret data.")
self.writer = Agent("You are a technical writer. Write clear summaries.")
def research(self, topic: str) -> str:
# Researcher gathers information
raw_data = self.researcher.run(f"Research: {topic}")
# Analyst processes it
analysis = self.analyst.run(f"Analyze this data: {raw_data}")
# Writer creates the final output
report = self.writer.run(f"Write a report based on: {analysis}")
return report
Popular Frameworks
| Framework | Language | Best For |
|---|---|---|
| LangChain | Python/JS | General-purpose agent building |
| LlamaIndex | Python | RAG and document agents |
| AutoGen | Python | Multi-agent conversations |
| CrewAI | Python | Role-based multi-agent teams |
| Semantic Kernel | C#/Python | Enterprise, Microsoft ecosystem |
| Haystack | Python | NLP pipelines and agents |
Real-World Use Cases
Code Assistant Agent
# Agent that writes, tests, and fixes code
def coding_agent(task: str) -> str:
agent = Agent(tools=["write_file", "execute_python", "read_file"])
return agent.run(f"""
Task: {task}
1. Write the code
2. Test it by running it
3. Fix any errors
4. Return the working code
""")
Research Agent
Searches the web, reads papers, synthesizes findings.
Data Analysis Agent
Loads data, writes and executes analysis code, generates visualizations and reports.
Customer Support Agent
Answers questions, looks up order status, processes refunds โ escalating to humans when needed.
Safety and Governance
Agentic systems can cause real harm if not properly constrained:
Principle of Least Privilege
# Don't give agents more tools than they need
research_agent = Agent(tools=["web_search", "read_file"]) # read-only
# NOT: Agent(tools=["web_search", "write_file", "execute_shell", "send_email"])
Human-in-the-Loop for Critical Actions
def execute_with_approval(action: str, args: dict) -> str:
REQUIRES_APPROVAL = {"send_email", "delete_file", "make_payment"}
if action in REQUIRES_APPROVAL:
print(f"Agent wants to: {action}({args})")
approval = input("Approve? (y/n): ")
if approval.lower() != 'y':
return "Action cancelled by user"
return execute_tool(action, args)
Sandboxing Code Execution
Never execute LLM-generated code directly on your host system. Use:
- Docker containers with limited resources
- E2B (e2b.dev) โ cloud sandboxes for AI agents
- Pyodide โ Python in WebAssembly (browser-safe)
Audit Logging
import logging
logger = logging.getLogger("agent")
def logged_execute(action: str, args: dict) -> str:
logger.info("AGENT_ACTION", extra={
"action": action,
"args": args,
"timestamp": time.time(),
"session_id": session_id
})
result = execute_tool(action, args)
logger.info("AGENT_RESULT", extra={"result": result[:500]})
return result
Evaluation Metrics
| Metric | Description |
|---|---|
| Task success rate | % of tasks completed correctly |
| Steps to completion | Average number of tool calls per task |
| Cost per task | API tokens + compute cost |
| Human override rate | How often humans must intervene |
| Safety violations | Attempts to perform unauthorized actions |
| Latency | Time from request to completion |
Resources
- LangChain Documentation
- AutoGen (Microsoft)
- CrewAI
- OpenAI Assistants API
- Anthropic: Building Effective Agents
- E2B: Code Execution Sandbox
Comments