Introduction
The AI agent landscape has evolved dramatically from simple prompt-response systems to sophisticated autonomous agents capable of complex task execution. As organizations deploy AI agents across various business functions, a critical challenge emerges: how do we create agents that can handle specialized tasks without rebuilding capabilities from scratch for every new use case?
Enter AI agent skills—a architectural pattern that enables the creation of modular, reusable, and composable capabilities that can be loaded on-demand by AI agents. Think of skills as specialized toolkits that transform general-purpose AI agents into domain experts capable of handling specific tasks with precision and consistency.
In this comprehensive guide, we’ll explore the skills architecture pattern from fundamentals to production implementation. You’ll learn what skills are, how they differ from tools and plugins, practical implementation approaches, and strategies for building a scalable skill ecosystem for your AI agents in 2026.
Understanding AI Agent Skills
What Are Agent Skills?
AI agent skills are structured capability packages that extend an AI agent’s ability to perform specific tasks. Unlike generic tools that provide single functions, skills encapsulate the complete knowledge and logic required to handle a specialized domain. A skill might include the ability to analyze financial data, generate specific document types, interact with particular APIs, or follow industry-specific workflows.
The key distinction between skills and tools lies in complexity and autonomy. Tools are typically single-purpose functions that the agent calls with specific inputs—the agent retains full control over when and how to use them. Skills, conversely, represent higher-level capabilities where the skill itself may contain multiple tools, decision logic, specialized prompts, and even its own micro-workflow. When an agent invokes a skill, it delegates significant control to the skill’s internal logic.
Consider the difference: a tool might be “calculate compound interest” or “format currency”—simple, atomic operations. A skill like “financial analysis” would encompass numerous tools, understand financial domain concepts, know how to interpret results, and can guide the agent through a complete financial analysis workflow with minimal external guidance.
Skills vs Tools vs Plugins
Understanding the relationship between skills, tools, and plugins is crucial for architects designing AI agent systems:
Tools represent the lowest level of capability extension. They are atomic functions that perform specific operations—making API calls, executing code, querying databases, or transforming data. Tools are invoked by the agent with specific parameters and return structured results. The agent maintains full control over tool selection and execution flow.
Skills build upon tools to create higher-level capabilities. A skill typically contains multiple tools organized around a domain, along with the prompt templates, decision trees, and execution logic needed to apply those tools appropriately. Skills reduce the cognitive load on the agent by encapsulating domain expertise.
Plugins represent a deployment-level concept—packaged integrations that add capabilities to an agent platform. Plugins often bundle multiple related tools or skills together for easy installation and management. Where a skill is an architectural pattern, a plugin is a distribution mechanism.
The Agentic Stack: Skills, MCP, and A2A
By 2026, the AI agent ecosystem has converged on three complementary protocol layers that together define the agentic infrastructure. Understanding how they compose is essential for building production agent systems.
flowchart TB
subgraph Skills["Skills Layer (What to Do)"]
S1[Domain Skills]
S2[Workflow Skills]
S3[Tool-Use Skills]
end
subgraph MCP["MCP Layer (How to Connect)"]
M1[MCP Server\nTools / Resources / Prompts]
M2[MCP Server\nAPIs / Databases / Files]
M3[MCP Server\nSaaS / Cloud Services]
end
subgraph A2A["A2A Layer (Who to Talk To)"]
A1[Specialist Agent]
A2[Orchestrator Agent]
A3[External Agent]
end
Agent[AI Agent] -->|Loads| Skills
Agent -->|Discovers| MCP
Agent -->|Coordinates with| A2A
Skills -->|Uses| MCP
A2A -->|May invoke| Skills
A2A -->|May access| MCP
- Skills provide procedural knowledge — “what to do” in a given domain
- MCP (Model Context Protocol) provides tool connectivity — “how to connect” to external systems
- A2A (Agent-to-Agent) provides agent coordination — “who to talk to” for multi-agent workflows
This three-layer stack, formalized in 2025 and governed by the Linux Foundation’s Agentic AI Foundation as of December 2025, lets organizations build agents that are both capable and composable. A single agent might use an MCP server to query its database, load a domain skill to interpret the results, and coordinate with another agent via A2A to trigger downstream actions.
The Skills Architecture Pattern
The skills architecture pattern addresses several key requirements in modern AI agent systems:
Modularity: Skills are self-contained packages that can be developed, tested, and maintained independently. This separation allows teams to specialize in specific domains without understanding the entire agent system.
Composability: Multiple skills can work together, with agents selecting and combining skills based on task requirements. A single agent might use a “data analysis” skill for one task and a “document generation” skill for another.
On-Demand Loading: Skills can be loaded when needed rather than all at once. This approach conserves resources and prevents context window pollution by keeping only active skills loaded.
Versioning and Updates: Skills can be updated independently without affecting the core agent or other skills. This enables rapid iteration on specialized capabilities.
Access Control: Different skills can have different permission levels, allowing fine-grained control over what capabilities each agent deployment can access.
The Model Context Protocol (MCP)
MCP, released by Anthropic in November 2024 and donated to the Linux Foundation’s Agentic AI Foundation in December 2025, has become the universal standard for connecting AI agents to external tools and data. By early 2026, it had surpassed 97 million downloads and is supported by all major AI platforms (Anthropic, OpenAI, Google DeepMind).
MCP replaces the “N x M problem” — N tools multiplied by M clients requiring custom integrations — with a single, standardized interface. Think of it as USB-C for AI: one connector that works everywhere.
MCP Architecture
MCP has three components:
- MCP Host: The AI application (Claude Code, VS Code, ChatGPT, Goose)
- MCP Client: Maintains a 1:1 connection to an MCP server
- MCP Server: Exposes tools, resources, and prompts via JSON-RPC 2.0
flowchart LR
Host[MCP Host\nClaude Code / VS Code / ChatGPT]
Client[MCP Client]
Server[MCP Server]
Tools[Tools\nsearch, create, deploy]
Resources[Resources\nfiles, DB records, API]
Prompts[Prompts\nreusable templates]
Host --> Client
Client -->|tools/list| Server
Client -->|tools/call| Server
Server --> Tools
Server --> Resources
Server --> Prompts
When an MCP client connects to a server, it calls tools/list to discover available tools and loads their schemas into the LLM’s context window. The LLM then decides when to call tools/call to invoke specific tools based on those descriptions.
The Context Window Problem
This discovery model creates a challenge that skills directly address. Each tool schema consumes approximately 200 tokens. A moderately capable MCP server with 40 tools consumes around 8,000 tokens before the agent has done anything. Connect two or three MCP servers and you have burned 20,000-30,000 tokens on tool descriptions alone.
Progressive discovery, pioneered by Anthropic’s Agent Skills, solves this by loading only the capabilities an agent actually needs for a given task. Instead of dumping every tool schema into context upfront, skills use intent detection to determine which subset of tools to load. This reduces token overhead by up to 85% and improves reasoning quality by keeping the context focused on actual task data.
MCP-Only Approach:
Load all 40 tool schemas (8,000 tokens) → Agent decides which to use
Context: 8K tokens for tools + remaining for data and reasoning
Skills with MCP:
Detect intent → Load matching skill → Skill lazily loads 3-5 relevant MCP tools
Context: ~600 tokens for tools + majority for data and reasoning
Skills Over MCP
The most powerful 2026 pattern is running skills on top of MCP. An MCP server grants the agent access to external systems (Jira, databases, cloud APIs), and a skill teaches the agent how to use those tools effectively within a specific organizational context.
# Example: MCP server exposing tools + skill instructing their use
# MCP Server (tools/ticket_system.py):
@mcp.tool()
async def search_tickets(query: str, status: str = None) -> List[Ticket]:
"""Search Jira tickets by query string and optional status filter."""
pass
@mcp.tool()
async def create_ticket(summary: str, description: str, priority: str) -> Ticket:
"""Create a new ticket in the project management system."""
pass
# Skill (sprint_planning.skill.md):
# Skill instructs the agent on how to compose these MCP tools
# into a sprint planning workflow: gather backlog → estimate →
# assign → create sprint → notify team
The companion paper “Agent Skills for Large Language Models” (arXiv, February 2026) defines the relationship: skills supply the “what to do” and MCP supplies the “how to connect.”
Core Components of a Skill
Skill Manifest
The skill manifest defines the skill’s interface and capabilities. It typically includes:
{
"skill_id": "financial_analysis",
"version": "1.2.0",
"name": "Financial Analysis",
"description": "Comprehensive financial analysis capabilities including ratio analysis, trend analysis, and forecasting",
"capabilities": [
"ratio_calculation",
"trend_analysis",
"forecasting",
"benchmarking"
],
"required_tools": [
"calculator",
"data_query",
"visualization_generator"
],
"parameters": {
"required": ["financial_data"],
"optional": ["benchmark_data", "forecast_period"]
},
"constraints": {
"max_execution_time": 300,
"requires_approval_for": ["large_transactions"]
},
"dependencies": ["data_processing"]
}
Skill Logic Layer
The skill logic layer contains the decision-making and execution flow:
class FinancialAnalysisSkill:
"""Financial analysis skill with built-in domain logic."""
def __init__(self, tools: ToolRegistry):
self.tools = tools
self.analysis_pipeline = self._build_pipeline()
def _build_pipeline(self):
return [
DataValidationStep(),
RatioCalculationStep(),
TrendAnalysisStep(),
BenchmarkingStep(),
ReportGenerationStep()
]
async def execute(self, context: ExecutionContext) -> SkillResult:
"""Execute the complete financial analysis workflow."""
# Validate input data
validated_data = await self._validate(context.input_data)
# Run analysis pipeline
results = {}
for step in self.analysis_pipeline:
step_result = await step.execute(
validated_data,
context,
results # Pass previous results
)
results[step.name] = step_result
# Check for critical issues
if step_result.has_critical_issues:
return SkillResult(
status="failed",
error=f"Critical issue in {step.name}: {step_result.issue}"
)
# Generate final report
report = await self._generate_report(results, context)
return SkillResult(
status="success",
data=results,
report=report,
visualizations=results.get("visualizations", [])
)
Prompt Templates
Skills include specialized prompts that guide the LLM when using the skill:
skill_prompts:
intent_detection:
template: |
You are analyzing a user request for financial analysis.
Based on the request, determine:
1. The type of analysis needed (ratio, trend, forecasting, benchmarking)
2. The data required
3. Any constraints or preferences
Request: {user_request}
Available data: {available_data}
Output a structured analysis plan.
result_interpretation:
template: |
You are interpreting financial analysis results for a non-expert.
Analysis performed: {analysis_type}
Results: {results}
Explain the findings in clear language:
- What do the numbers mean?
- What actions are recommended?
- What are the key risks?
Use analogies where helpful.
error_handling:
template: |
A financial analysis encountered an issue:
- Error: {error}
- Partial results: {partial_results}
Determine:
1. Can we proceed with partial results?
2. What alternative approaches exist?
3. What information would resolve the issue?
Configuration and Parameters
Skills define configurable parameters that customize behavior:
from pydantic import BaseModel, Field
from typing import Optional, List
class SkillConfiguration(BaseModel):
"""Base configuration for skill behavior."""
# Execution settings
timeout_seconds: int = Field(default=300, ge=1, le=3600)
max_retries: int = Field(default=3, ge=0, le=10)
# Output settings
include_raw_data: bool = True
generate_visualizations: bool = True
visualization_format: str = "png" # png, svg, html
# Safety settings
require_human_approval: bool = False
approval_threshold: float = 10000.0
blocked_operations: List[str] = Field(default_factory=list)
# Data handling
cache_results: bool = True
cache_ttl_seconds: int = 3600
max_data_size_mb: int = 100
class FinancialAnalysisConfig(SkillConfiguration):
"""Configuration specific to financial analysis skill."""
# Analysis-specific settings
default_benchmark_source: str = "industry_avg"
currency: str = "USD"
# Ratio calculations
include_liquidity_ratios: bool = True
include_profitability_ratios: bool = True
include_leverage_ratios: bool = True
include_efficiency_ratios: bool = True
# Forecasting
default_forecast_periods: int = 4
forecasting_model: str = "linear" # linear, exponential, arima
# Reporting
report_format: str = "detailed" # summary, detailed, executive
include_recommendations: bool = True
risk_threshold: float = 0.3
Building Skills: Implementation Patterns
Basic Skill Implementation
Let’s implement a practical skill for document generation:
from abc import ABC, abstractmethod
from typing import Any, Dict, List, Optional
from dataclasses import dataclass, field
from enum import Enum
import asyncio
class SkillStatus(Enum):
"""Skill execution status."""
PENDING = "pending"
RUNNING = "running"
SUCCESS = "success"
FAILED = "failed"
CANCELLED = "cancelled"
@dataclass
class SkillInput:
"""Input to a skill execution."""
parameters: Dict[str, Any]
context: Dict[str, Any] = field(default_factory=dict)
attachments: List[bytes] = field(default_factory=list)
@dataclass
class SkillOutput:
"""Output from a skill execution."""
status: SkillStatus
result: Optional[Any] = None
error: Optional[str] = None
metadata: Dict[str, Any] = field(default_factory=dict)
artifacts: Dict[str, Any] = field(default_factory=dict)
class BaseSkill(ABC):
"""Base class for all skills."""
def __init__(self, skill_id: str, version: str):
self.skill_id = skill_id
self.version = version
self.tools = {}
self.configuration = {}
@abstractmethod
async def initialize(self, config: Dict[str, Any]) -> None:
"""Initialize the skill with configuration."""
pass
@abstractmethod
async def execute(self, input_data: SkillInput) -> SkillOutput:
"""Execute the skill's primary function."""
pass
@abstractmethod
async def validate_input(self, input_data: SkillInput) -> bool:
"""Validate input data before execution."""
pass
async def cleanup(self) -> None:
"""Cleanup resources after execution."""
pass
class DocumentGenerationSkill(BaseSkill):
"""Skill for generating various document types."""
def __init__(self):
super().__init__("document_generation", "1.0.0")
self.supported_formats = ["pdf", "docx", "html", "markdown"]
async def initialize(self, config: Dict[str, Any]) -> None:
"""Initialize with templates and formatting settings."""
self.templates = config.get("templates", {})
self.default_format = config.get("default_format", "markdown")
self.branding = config.get("branding", {})
# Initialize required tools
self.tools = {
"formatter": await self._get_tool("document_formatter"),
"template_engine": await self._get_tool("template_engine"),
"image_processor": await self._get_tool("image_processor")
}
async def validate_input(self, input_data: SkillInput) -> bool:
"""Validate document generation request."""
required_fields = ["content", "template"]
for field in required_fields:
if field not in input_data.parameters:
return False
# Validate format
output_format = input_data.parameters.get("format", self.default_format)
if output_format not in self.supported_formats:
return False
return True
async def execute(self, input_data: SkillInput) -> SkillOutput:
"""Execute document generation."""
try:
# Parse parameters
content = input_data.parameters["content"]
template_name = input_data.parameters["template"]
output_format = input_data.parameters.get("format", self.default_format)
# Load template
template = self.templates.get(template_name)
if not template:
return SkillOutput(
status=SkillStatus.FAILED,
error=f"Template '{template_name}' not found"
)
# Apply template
rendered = await self._render_template(
template,
content,
input_data.context
)
# Format output
formatted = await self._format_document(
rendered,
output_format
)
return SkillOutput(
status=SkillStatus.SUCCESS,
result=formatted,
metadata={
"format": output_format,
"template": template_name,
"pages": self._estimate_pages(formatted)
}
)
except Exception as e:
return SkillOutput(
status=SkillStatus.FAILED,
error=str(e)
)
async def _render_template(self, template: str, content: Dict, context: Dict) -> str:
"""Render template with content."""
# Template rendering logic
pass
async def _format_document(self, content: str, format: str) -> bytes:
"""Format document to target format."""
pass
async def _get_tool(self, tool_name: str):
"""Retrieve tool from registry."""
pass
def _estimate_pages(self, content: str) -> int:
"""Estimate page count."""
return max(1, len(content) // 3000)
Skill Composition
Skills can be composed to handle complex workflows:
class SkillComposer:
"""Composes multiple skills into cohesive workflows."""
def __init__(self, skill_registry: "SkillRegistry"):
self.skill_registry = skill_registry
self.workflows = {}
def create_workflow(
self,
workflow_id: str,
steps: List[Dict[str, Any]]
) -> "Workflow":
"""Create a composed workflow from multiple skills."""
workflow_steps = []
for step_config in steps:
skill = self.skill_registry.get(step_config["skill_id"])
step = WorkflowStep(
skill=skill,
input_mapping=step_config.get("input_mapping", {}),
output_mapping=step_config.get("output_mapping", {}),
condition=step_config.get("condition"),
on_error=step_config.get("on_error", "stop")
)
workflow_steps.append(step)
workflow = Workflow(
workflow_id=workflow_id,
steps=workflow_steps
)
self.workflows[workflow_id] = workflow
return workflow
async def execute_workflow(
self,
workflow_id: str,
initial_input: SkillInput
) -> List[SkillOutput]:
"""Execute a composed workflow."""
workflow = self.workflows.get(workflow_id)
if not workflow:
raise ValueError(f"Workflow '{workflow_id}' not found")
return await workflow.execute(initial_input)
class Workflow:
"""Represents a multi-skill workflow."""
def __init__(self, workflow_id: str, steps: List["WorkflowStep"]):
self.workflow_id = workflow_id
self.steps = steps
async def execute(self, initial_input: SkillInput) -> List[SkillOutput]:
"""Execute all steps in sequence."""
outputs = []
current_input = initial_input
for step in self.steps:
# Map input from previous step if needed
if outputs:
current_input = step.map_input(outputs[-1])
# Check condition
if step.condition and not step.condition.evaluate(current_input):
continue
# Execute step
try:
output = await step.skill.execute(current_input)
outputs.append(output)
# Handle errors
if output.status == SkillStatus.FAILED:
if step.on_error == "stop":
break
elif step.on_error == "continue":
continue
except Exception as e:
if step.on_error == "stop":
break
outputs.append(SkillOutput(
status=SkillStatus.FAILED,
error=str(e)
))
return outputs
# Example: Creating a report generation workflow
composer = SkillComposer(skill_registry)
report_workflow = composer.create_workflow(
"business_report_generation",
steps=[
{
"skill_id": "data_collection",
"input_mapping": {"query": "input.query"},
"output_mapping": {"data": "step.output"}
},
{
"skill_id": "data_analysis",
"input_mapping": {"data": "previous.data"},
"output_mapping": {"analysis": "step.output"}
},
{
"skill_id": "document_generation",
"input_mapping": {
"content.analysis": "previous.analysis",
"content.summary": "previous.summary"
},
"output_mapping": {"document": "step.output"}
},
{
"skill_id": "document_formatter",
"input_mapping": {"document": "previous.document"}
}
]
)
Skill Registry and Discovery
A registry manages skill lifecycle and discovery:
class SkillRegistry:
"""Central registry for managing skills."""
def __init__(self):
self._skills: Dict[str, BaseSkill] = {}
self._metadata: Dict[str, Dict] = {}
self._versions: Dict[str, List[str]] = {}
async def register(
self,
skill: BaseSkill,
metadata: Dict[str, Any]
) -> None:
"""Register a new skill."""
skill_id = skill.skill_id
if skill_id in self._skills:
# Handle version conflict
existing_version = self._metadata[skill_id]["version"]
new_version = metadata.get("version", "1.0.0")
if self._compare_versions(new_version, existing_version) <= 0:
raise ValueError(
f"Version {new_version} is not newer than existing {existing_version}"
)
# Initialize skill
await skill.initialize(metadata.get("config", {}))
# Store skill
self._skills[skill_id] = skill
self._metadata[skill_id] = metadata
# Track versions
if skill_id not in self._versions:
self._versions[skill_id] = []
self._versions[skill_id].append(metadata.get("version", "1.0.0"))
def get(self, skill_id: str, version: str = None) -> BaseSkill:
"""Retrieve a skill by ID and optional version."""
if skill_id not in self._skills:
raise KeyError(f"Skill '{skill_id}' not found")
if version:
# Load specific version (would typically use versioned instances)
pass
return self._skills[skill_id]
def discover(
self,
capability: str = None,
tags: List[str] = None
) -> List[Dict[str, Any]]:
"""Discover skills by capability or tags."""
results = []
for skill_id, metadata in self._metadata.items():
# Filter by capability
if capability and capability not in metadata.get("capabilities", []):
continue
# Filter by tags
if tags:
skill_tags = set(metadata.get("tags", []))
if not skill_tags.intersection(tags):
continue
results.append({
"skill_id": skill_id,
**metadata
})
return results
def _compare_versions(self, v1: str, v2: str) -> int:
"""Compare semantic versions. Returns -1, 0, or 1."""
# Simplified version comparison
parts1 = [int(x) for x in v1.split(".")]
parts2 = [int(x) for x in v2.split(".")]
for p1, p2 in zip(parts1, parts2):
if p1 < p2:
return -1
elif p1 > p2:
return 1
return 0
Advanced Patterns
Dynamic Skill Loading
Skills can be loaded dynamically to optimize resource usage:
class DynamicSkillLoader:
"""Handles on-demand skill loading."""
def __init__(self, skill_registry: SkillRegistry):
self.registry = skill_registry
self.loaded_skills: Dict[str, BaseSkill] = {}
self.loading_tasks: Dict[str, asyncio.Task] = {}
async def load_skill(
self,
skill_id: str,
version: str = None,
priority: int = 0
) -> BaseSkill:
"""Load a skill with priority-based resource management."""
# Check if already loaded
cache_key = f"{skill_id}:{version or 'latest'}"
if cache_key in self.loaded_skills:
return self.loaded_skills[cache_key]
# Check if loading in progress
if cache_key in self.loading_tasks:
return await self.loading_tasks[cache_key]
# Start loading
load_task = asyncio.create_task(
self._load_skill_async(skill_id, version)
)
self.loading_tasks[cache_key] = load_task
try:
skill = await load_task
self.loaded_skills[cache_key] = skill
return skill
finally:
del self.loading_tasks[cache_key]
async def _load_skill_async(self, skill_id: str, version: str) -> BaseSkill:
"""Async skill loading with initialization."""
# Check resource limits
await self._check_resource_limits()
# Get skill from registry
skill = self.registry.get(skill_id, version)
# Initialize if needed
if not skill.initialized:
await skill.initialize({})
return skill
async def unload_skill(self, skill_id: str, version: str = None) -> None:
"""Unload a skill to free resources."""
cache_key = f"{skill_id}:{version or 'latest'}"
if cache_key in self.loaded_skills:
skill = self.loaded_skills[cache_key]
await skill.cleanup()
del self.loaded_skills[cache_key]
async def _check_resource_limits(self) -> None:
"""Check and enforce resource limits."""
max_loaded = 10 # Maximum concurrent skills
max_memory_mb = 2048 # Maximum memory usage
if len(self.loaded_skills) >= max_loaded:
# Evict lowest priority skill
await self._evict_lowest_priority()
Skill Chaining and Routing
Advanced agents can chain skills based on context:
class SkillRouter:
"""Routes requests to appropriate skills."""
def __init__(self, skill_registry: SkillRegistry):
self.registry = skill_registry
self.routing_rules: List[RoutingRule] = []
def add_rule(self, rule: RoutingRule) -> None:
"""Add a routing rule."""
self.routing_rules.append(rule)
async def route(self, request: "AgentRequest") -> List[BaseSkill]:
"""Route request to appropriate skills."""
# Match routing rules
matched_skills = []
for rule in self.routing_rules:
if rule.matches(request):
skill = self.registry.get(rule.skill_id)
matched_skills.append(skill)
# If no explicit rules matched, use capability matching
if not matched_skills:
matched_skills = await self._capability_match(request)
return matched_skills
async def _capability_match(self, request: "AgentRequest") -> List[BaseSkill]:
"""Match skills based on required capabilities."""
required = request.required_capabilities
discovered = self.registry.discover(capability=required[0])
skills = []
for meta in discovered:
# Check if skill has all required capabilities
if all(cap in meta.get("capabilities", []) for cap in required):
skill = self.registry.get(meta["skill_id"])
skills.append(skill)
return skills
class RoutingRule:
"""Defines conditions for skill routing."""
def __init__(
self,
skill_id: str,
condition: "RoutingCondition"
):
self.skill_id = skill_id
self.condition = condition
def matches(self, request: "AgentRequest") -> bool:
return self.condition.evaluate(request)
class RoutingCondition:
"""Base class for routing conditions."""
def evaluate(self, request: "AgentRequest") -> bool:
raise NotImplementedError
class IntentRoutingCondition(RoutingCondition):
"""Route based on detected user intent."""
def __init__(self, intents: List[str]):
self.intents = intents
def evaluate(self, request: "AgentRequest") -> bool:
return request.detected_intent in self.intents
class ContextRoutingCondition(RoutingCondition):
"""Route based on request context."""
def __init__(self, context_key: str, expected_value: Any):
self.context_key = context_key
self.expected_value = expected_value
def evaluate(self, request: "AgentRequest") -> bool:
return request.context.get(self.context_key) == self.expected_value
A2A: Agent-to-Agent Skill Coordination
Google’s Agent2Agent (A2A) protocol, released in April 2025 and donated to the Linux Foundation in June 2025, provides the agent coordination layer. While MCP connects agents to tools and skills guide their use, A2A enables agents to discover and communicate with each other.
A2A defines how agents expose capabilities via a public card (HTTP-based discovery), negotiate tasks, and exchange results using JSON-RPC 2.0 over HTTP/SSE. For skill architecture, A2A enables:
- Skill delegation: An orchestrator agent can delegate sub-tasks to specialist agents that have their own skills
- Cross-platform skill invocation: Skills built for one agent framework (LangChain) can be triggered by agents on another (CrewAI, Semantic Kernel)
- Federated skill discovery: Agents can find other agents with complementary skills across organizational boundaries
Orchestrator ──A2A──→ Specialist Agent (Data Analysis Skill)
└── Uses MCP to query databases
└── Uses local skill for statistical modeling
└── Returns result via A2A
Orchestrator ──A2A──→ Specialist Agent (Report Generation Skill)
└── Uses MCP to access templates
└── Uses local skill for formatting
└── Returns final document via A2A
The key insight is that MCP and A2A are complementary, not competing. Production agent systems use both: MCP for tool access, A2A for agent coordination, and skills for domain-specific procedural knowledge.
Advanced Patterns
Progressive Discovery and Context Optimization
The context window bottleneck described earlier is the primary driver of the skills abstraction. Skills solve this through progressive discovery — loading capability metadata before committing to full tool schemas.
class ProgressiveSkillLoader:
"""Loads skill capabilities progressively to minimize context usage."""
def __init__(self):
self.skill_index: Dict[str, SkillManifest] = {}
self.active_skills: Dict[str, "BaseSkill"] = {}
async def register_skills(self, skills: List["BaseSkill"]) -> None:
"""Register skills with their manifests only (no tool schemas yet)."""
for skill in skills:
self.skill_index[skill.skill_id] = SkillManifest(
skill_id=skill.skill_id,
capabilities=skill.get_capabilities(),
estimated_tools=skill.tool_count(),
description=skill.get_description(),
version=skill.version
)
async def resolve_for_intent(self, intent: str) -> List[str]:
"""Resolve which skills match an intent using lightweight manifests."""
matched = []
for skill_id, manifest in self.skill_index.items():
if any(cap in intent.lower() for cap in manifest.capabilities):
matched.append(skill_id)
return matched
async def activate_skill(self, skill_id: str) -> "BaseSkill":
"""Fully load and activate a skill (pulls in tool schemas)."""
if skill_id not in self.active_skills:
skill = await self._load_skill(skill_id)
self.active_skills[skill_id] = skill
return self.active_skills[skill_id]
This two-phase approach — resolve intents against lightweight manifests, then load only the tools needed — is the core optimization that makes large skill ecosystems practical. Anthropic reported that Tool Search Tool (a mechanism for programmatic tool discovery) reduces token overhead by up to 85% in production deployments.
Skill Security and Governance
Enterprise deployments require security controls:
class SkillSecurityManager:
"""Manages security for skill execution."""
def __init__(self):
self.policies: Dict[str, SecurityPolicy] = {}
self.access_control = AccessControlList()
def register_policy(self, skill_id: str, policy: SecurityPolicy) -> None:
"""Register security policy for a skill."""
self.policies[skill_id] = policy
async def validate_execution(
self,
skill_id: str,
user: "User",
input_data: SkillInput
) -> ValidationResult:
"""Validate skill execution request."""
policy = self.policies.get(skill_id)
if not policy:
# Default deny if no policy
return ValidationResult(allowed=False, reason="No policy defined")
# Check permissions
if not self.access_control.can_access(user, skill_id):
return ValidationResult(
allowed=False,
reason="User not authorized for this skill"
)
# Check input validation
if not policy.validate_input(input_data):
return ValidationResult(
allowed=False,
reason="Input validation failed"
)
# Check data access
required_resources = policy.get_required_resources(input_data)
for resource in required_resources:
if not self.access_control.can_access_data(user, resource):
return ValidationResult(
allowed=False,
reason=f"Access denied to required resource: {resource}"
)
return ValidationResult(allowed=True)
class SecurityPolicy:
"""Defines security requirements for a skill."""
def __init__(
self,
skill_id: str,
required_permissions: List[str] = None,
data_classification: str = "internal",
requires_audit: bool = True,
input_validation_rules: List[ValidationRule] = None
):
self.skill_id = skill_id
self.required_permissions = required_permissions or []
self.data_classification = data_classification
self.requires_audit = requires_audit
self.input_validation_rules = input_validation_rules or []
def validate_input(self, input_data: SkillInput) -> bool:
"""Validate input data against rules."""
for rule in self.input_validation_rules:
if not rule.validate(input_data):
return False
return True
def get_required_resources(self, input_data: SkillInput) -> List[str]:
"""Get list of data resources required by this skill."""
# Implementation would extract resource references from input
return []
Skill Development Best Practices
Designing Effective Skills
When designing skills, follow these principles:
Single Responsibility: Each skill should handle one domain or capability. A “financial analysis” skill focuses on finance, while a “document generation” skill handles documents. Mixing responsibilities leads to complexity.
Clear Interfaces: Define clear input/output contracts. Users of a skill should know exactly what data to provide and what to expect in return.
Comprehensive Error Handling: Skills should handle errors gracefully and provide meaningful error messages. Include recovery strategies for common failure modes.
Versioning: Always version skills and maintain backward compatibility when possible. Use semantic versioning to communicate changes.
Documentation: Document capabilities, limitations, requirements, and usage examples. This helps both human developers and AI agents understand when and how to use the skill.
Testing Skills
Rigorous testing ensures skill reliability:
class SkillTestSuite:
"""Comprehensive testing for skills."""
def __init__(self, skill: BaseSkill):
self.skill = skill
async def run_tests(self) -> TestResults:
"""Run complete test suite."""
results = TestResults()
# Unit tests
results.add(await self._test_input_validation())
results.add(await self._test_output_format())
results.add(await self._test_error_handling())
# Integration tests
results.add(await self._test_tool_integration())
results.add(await self._test_workflow_integration())
# Performance tests
results.add(await self._test_performance())
return results
async def _test_input_validation(self) -> TestResult:
"""Test input validation logic."""
test_cases = [
# Valid inputs
(SkillInput(parameters={"valid": "data"}), True),
# Invalid inputs
(SkillInput(parameters={}), False),
(SkillInput(parameters={"missing": "required"}), False),
]
passed = 0
failed = 0
for input_data, should_pass in test_cases:
try:
result = await self.skill.validate_input(input_data)
if result == should_pass:
passed += 1
else:
failed += 1
except Exception:
failed += 1
return TestResult(
name="input_validation",
passed=passed,
failed=failed
)
Monitoring and Observability
Production skills require monitoring:
class SkillMetrics:
"""Metrics collection for skills."""
def __init__(self, skill_id: str):
self.skill_id = skill_id
self.execution_count = 0
self.success_count = 0
self.failure_count = 0
self.total_duration = 0.0
self.error_types: Dict[str, int] = {}
def record_execution(
self,
status: SkillStatus,
duration: float,
error: str = None
) -> None:
"""Record execution metrics."""
self.execution_count += 1
self.total_duration += duration
if status == SkillStatus.SUCCESS:
self.success_count += 1
else:
self.failure_count += 1
if error:
self.error_types[error] = self.error_types.get(error, 0) + 1
def get_metrics(self) -> Dict:
"""Get current metrics."""
success_rate = (
self.success_count / self.execution_count
if self.execution_count > 0 else 0
)
avg_duration = (
self.total_duration / self.execution_count
if self.execution_count > 0 else 0
)
return {
"skill_id": self.skill_id,
"execution_count": self.execution_count,
"success_rate": success_rate,
"avg_duration_seconds": avg_duration,
"error_distribution": self.error_types
}
Skill Standardization and the Ecosystem
In December 2025, Anthropic’s Agent Skills specification was opened and adopted as an industry standard under the Linux Foundation’s Agentic AI Foundation. This specification defines the SKILL.md format that has become the de facto packaging standard for portable skills.
The SKILL.md Format
A portable skill is a directory containing a SKILL.md manifest that describes its capabilities, tools, prompts, and configuration:
---
skill_id: web-research
name: Web Research Skill
version: 1.2.0
capabilities:
- web_search
- content_extraction
- source_verification
- summarization
requires:
- mcp-server-web-search
- mcp-server-content-extraction
install:
- npx skills add org/web-research
---
# Web Research Skill
## Description
Conducts multi-source web research with source verification
and structured output generation.
## Capabilities
- **web_search**: Queries multiple search engines via MCP
- **content_extraction**: Fetches and parses page content
- **source_verification**: Cross-references claims across sources
- **summarization**: Generates structured research summaries
## Usage
This skill is invoked when the user asks to research a topic.
The agent should:
1. Decompose the research question into sub-queries
2. Execute searches via the web-search MCP server
3. Extract content from top results
4. Cross-reference findings for consistency
5. Generate a structured summary with citations
## MCP Dependencies
- `mcp-server-web-search`: Provides web search tool access
- `mcp-server-content-extraction`: Provides page fetching
Skills packaged this way can be installed with a single command across all major agent platforms:
# Install a skill (works on Claude Code, Cursor, Copilot, Codex, Gemini CLI)
npx skills add anthropic/web-research
npx skills add anthropic/skill-creator
npx skills add github/agentic-eval
Emerging Skill Ecosystem
By early 2026, a vibrant skill ecosystem has emerged with hundreds of reusable skills:
| Skill | Provider | Purpose |
|---|---|---|
skill-creator |
Anthropic | Iterative skill building and evaluation with variance analysis |
mcp-builder |
Anthropic | Full MCP server dev cycle (Python and TypeScript) |
agentic-eval |
GitHub | LLM-as-judge evaluation pipelines, evaluator-optimizer patterns |
web-research |
Community | Multi-source web research with verification |
accessibility-checker |
Community | WCAG compliance auditing for web applications |
openai-docs |
OpenAI | Live OpenAI documentation via MCP (no stale training data) |
Advanced Tool Use
Anthropic’s November 2025 release of Advanced Tool Use features introduced three mechanisms that deepen the skill-tool integration:
- Tool Search Tool: Programmatic discovery of relevant tools from large registries, reducing token overhead by up to 85%
- Programmatic Tool Calling: The model executes tools via code rather than structured JSON, improving accuracy from 79.5% to 88.1% on complex invocations
- Tool Learning: The model studies tool documentation and improves invocation quality over time without human feedback
These features address a practical bottleneck: as skill libraries grow, the agent needs efficient mechanisms to discover and invoke the right tools within a skill’s workflow.
Conclusion
AI agent skills represent a crucial architectural pattern for building scalable, maintainable agent systems. By encapsulating domain expertise into reusable, composable packages, organizations can rapidly deploy specialized capabilities without reinventing the wheel for each use case.
The skills pattern enables teams to specialize in their domains—financial analysts create financial skills, legal experts build legal skills, and these skills can be composed by AI agents to handle complex, multi-domain tasks. This separation of concerns accelerates development and improves reliability.
As AI agents become more prevalent in enterprise settings, the skills architecture will evolve to support more sophisticated scenarios: cross-skill coordination, dynamic skill discovery, federated skill sharing, and more. Organizations investing in skills infrastructure today will be well-positioned for the agent-driven future of work.
Start small with a few core skills, establish good patterns, and expand as your agent capabilities grow. The initial investment in building robust skills will pay dividends as your AI agent ecosystem scales.
Resources
- Anthropic Claude Skills Documentation — Official skills documentation and SKILL.md specification
- Model Context Protocol (MCP) Specification — Open standard for agent-to-tool connectivity (Linux Foundation)
- Agent2Agent (A2A) Protocol — Open protocol for agent-to-agent communication (Linux Foundation)
- OpenAI Agent SDK — Agent development platform
- LangChain Agent Framework — Agent and tool integration patterns
- Agent Skills for Large Language Models (arXiv, Feb 2026) — Academic survey of skill architecture, acquisition, and security
- AWS Bedrock Agents — Managed agent services
Comments