The Linux kernel traditionally required kernel modules to access internals — a high-risk approach where a single bug could crash the entire system. eBPF (extended Berkeley Packet Filter) changes this by letting developers run sandboxed programs in the kernel without modifying source code or loading modules. Programs are verified for safety before execution, JIT-compiled to near-native speed, and can be updated without rebooting.
In 2026, eBPF powers production infrastructure at Meta (Strobelight reduced CPU load by 20%), Google, Netflix, Datadog (35% CPU reduction in network observability), and Alibaba Cloud (19% cost reduction in L7 load balancing). Cilium, the leading eBPF-based Kubernetes CNI, runs in clusters spanning tens of thousands of nodes. The eBPF Foundation’s 2025 year-in-review documented enterprise adoption across finance, telecom, cloud, and edge computing.
This guide covers eBPF fundamentals, every program type and hook including TCX and NetKit, CO-RE portability with BTF, practical bpftool debugging, and production use cases from Cilium service mesh to Meta’s Katran load balancer.
What is eBPF?
Origins and Evolution
eBPF evolved from the original BPF (Berkeley Packet Filter), created in 1992 for efficient packet filtering in tools like tcpdump. The original BPF allowed userspace programs to filter packets in the kernel, avoiding unnecessary copies.
eBPF, introduced in Linux 3.18 (2014), extended this dramatically:
- General-purpose execution: Run arbitrary programs in the kernel at safe hook points
- JIT compilation: Just-in-time compilation to native machine code for near-native performance
- Safety guarantees: A verifier rejects any program that cannot be proven safe (no unbounded loops, no out-of-bounds access)
- Rich map types: Hash tables, arrays, ring buffers, sockmaps, and more for sharing data between kernel and userspace
// Minimal eBPF program — counts packets per protocol
SEC("xdp")
int count_packets(struct xdp_md *ctx) {
__u32 key = 0;
__u64 *count = bpf_map_lookup_elem(&packet_count, &key);
if (count)
__sync_fetch_and_add(count, 1);
return XDP_PASS;
}
How eBPF Works — The Pipeline
flowchart LR
A["C/ Rust Code"] --> B["Clang/LLVM<br/>Compile to BPF bytecode"]
B --> C["BPF Verifier<br/>Safety checks"]
C --> D{"Passes<br/>verification?"}
D -->|"Yes"| E["JIT Compiler<br/>Native code"]
D -->|"No"| F["Rejected<br/>with error"]
E --> G["Attach to hook<br/>(XDP, TC, kprobe, ...)"]
G --> H["Execution on<br/>kernel event"]
Key Components:
- eBPF Programs: Restricted C or Rust code compiled to BPF bytecode
- Maps: Key-value data structures for state sharing between kernel and userspace
- Hooks: Predefined attachment points in the kernel (XDP, TC, tracepoints, kprobes, cgroup, etc.)
- Verifier: Static analysis that guarantees termination, memory safety, and type correctness
- JIT Compiler: Converts BPF bytecode to native CPU instructions for near-native speed
BPF CO-RE and BTF
BPF CO-RE (Compile Once, Run Everywhere) solves the portability problem. Traditionally, eBPF programs compiled with BCC were compiled on the target machine because kernel structures differ across versions. CO-RE lets you distribute a single compiled .o file that works across kernels.
BTF (BPF Type Format) encodes kernel type information in a compact format. libbpf uses BTF to relocate field offsets at load time, adapting the compiled program to the running kernel.
# Check BTF availability on the current system
ls /sys/kernel/btf/vmlinux
# Generate BTF for a kernel module
bpftool btf dump file /sys/kernel/btf/vmlinux format c
CO-RE macros handle field existence and type differences gracefully:
// CO-RE: check if a field exists before reading it
if (bpf_core_field_exists(skb->mark)) {
value = BPF_CORE_READ(skb, mark);
}
// CO-RE: read a field with automatic offset relocation
int ifindex = BPF_CORE_READ(skb, ifindex);
Without CO-RE, a program compiled on kernel 5.10 would break on kernel 6.8 if struct sk_buff field offsets changed. With CO-RE and BTF, the same binary works on both.
eBPF Program Types and Hook Points
Networking Program Types
| Program Type | Hook | Use Case |
|---|---|---|
XDP |
Driver-level, before SKB allocation | DDoS mitigation, packet filtering, load balancing |
TC (cls_bpf) |
Traffic control ingress/egress | Traffic shaping, packet scheduling, NAT |
TCX |
Modern TC with link-based attach (K 6.6+) | Composabale multi-program TC chains |
NetKit |
Primary/peer virtio-net (K 6.6+) | Container networking passthrough |
SK_SKB |
Socket-level stream parser | Protocol parsing, sockmap redirection |
SK_MSG |
Socket message verdict | Message-level policy enforcement |
SOCK_OPS |
TCP connection events | Congestion control, connection monitoring |
SK_LOOKUP |
Socket lookup (K 5.9+) | Custom load balancing, service routing |
CGROUP_SOCK_ADDR |
cgroup bind/connect | Container network policy |
NETFILTER |
Netfilter hooks (K 6.4+) | Firewall rules, connection tracking |
XDP (Express Data Path)
XDP processes packets at the earliest point in the driver, before any socket buffer (SKB) allocation. Three execution modes exist:
| Mode | Description | Performance |
|---|---|---|
| Native (driver) | Runs in the NIC driver’s receive routine | ~10-20 Mpps per core |
| Offloaded | Runs on SmartNIC hardware | ~50+ Mpps (hardware) |
| Generic | Runs in the kernel’s network stack (fallback) | ~1 Mpps |
# Attach XDP program with iproute2
ip link set dev eth0 xdp obj xdp_drop.o sec xdp
# Use SKB mode if driver lacks native XDP support
ip link set dev eth0 xdpgeneric obj xdp_drop.o sec xdp
# Detach
ip link set dev eth0 xdp off
# List attached XDP programs
bpftool net list
XDP return codes determine packet fate:
| Return Code | Action |
|---|---|
XDP_PASS |
Pass packet to the normal network stack |
XDP_DROP |
Drop packet immediately (no further processing) |
XDP_TX |
Transmit packet back out the same interface |
XDP_REDIRECT |
Redirect to another interface or CPU |
XDP_ABORTED |
Drop and raise a tracepoint exception |
TC and TCX
The Traffic Control (TC) hook runs after SKB allocation, so it has access to richer metadata than XDP. It supports both ingress and egress directions.
TCX (merged in Linux 6.6) modernizes TC attachment with a BPF link-based API:
# Traditional TC attachment (cls_bpf)
tc qdisc add dev eth0 clsact
tc filter add dev eth0 ingress bpf obj my_prog.o sec tc/ingress
# Modern TCX attachment (kernel 6.6+)
bpftool net attach tcx/ingress name my_prog dev eth0
TCX advantages over cls_bpf:
- BPF link support — unified lifecycle management via file descriptors
- Explicit ordering —
BPF_F_BEFOREandBPF_F_AFTERfor program ordering - Atomic replacement —
BPF_F_REPLACEwithexpected_revisionfor race-free updates - Simplified return codes —
TCX_NEXT,TCX_PASS,TCX_DROP,TCX_REDIRECT
flowchart LR
subgraph Ingress["Ingress Path"]
NIC["NIC"] --> XDP["XDP Hook"]
XDP -->|"PASS"| TC_ING["TCX Ingress<br/>(ordered chain)"]
TC_ING --> STACK["Network Stack"]
end
subgraph Egress["Egress Path"]
STACK2["Network Stack"] --> TC_EG["TCX Egress<br/>(ordered chain)"]
TC_EG --> NIC2["NIC"]
end
NetKit (Container Networking)
NetKit (kernel 6.6+) is a virtual ethernet device designed specifically for eBPF-based container networking. It replaces veth pairs in eBPF-centric CNIs like Cilium. Each NetKit device has a primary side (host) and a peer side (container/namespace).
SEC("netkit/primary")
int netkit_primary(struct __sk_buff *skb) {
// Host-side processing
return TCX_PASS;
}
SEC("netkit/peer")
int netkit_peer(struct __sk_buff *skb) {
// Container-side processing
return TCX_PASS;
}
NetKit eliminates the overhead of the veth peer’s software interrupt (IRQ) handling by allowing eBPF programs on both sides to execute in the same context, reducing latency by 30-50% compared to veth-based forwarding.
eBPF Maps
Maps are the primary data-sharing mechanism between eBPF programs and userspace. The kernel maintains them; eBPF programs read/write via helper functions, and userspace accesses them via bpf() syscall or bpftool.
| Map Type | Key | Value | Use Case |
|---|---|---|---|
BPF_MAP_TYPE_HASH |
Any | Any | General key-value, connection tracking |
BPF_MAP_TYPE_ARRAY |
u32 | Any | Fixed-size counters, statistics |
BPF_MAP_TYPE_PERCPU_HASH |
Any | Any | Per-CPU hash for lockless updates |
BPF_MAP_TYPE_PERCPU_ARRAY |
u32 | Any | Per-CPU counters (fast, no locking) |
BPF_MAP_TYPE_RINGBUF |
— | Variable | Event streaming to userspace |
BPF_MAP_TYPE_STACK |
— | u64 values | LIFO stack for temporary storage |
BPF_MAP_TYPE_SOCKMAP |
u32 | struct sock * |
Socket redirection (load balancing) |
BPF_MAP_TYPE_SOCKHASH |
Any | struct sock * |
Socket lookup by custom key |
BPF_MAP_TYPE_DEVMAP |
u32 | struct net_device * |
XDP device redirection |
BPF_MAP_TYPE_CPUMAP |
u32 | CPU entry | XDP packet steering to CPUs |
BPF_MAP_TYPE_ARENA |
u64 | Variable (K 6.8+) | Large shared memory regions |
// Define a percpu array map for lock-free counting
struct {
__uint(type, BPF_MAP_TYPE_PERCPU_ARRAY);
__uint(max_entries, 256);
__type(key, __u32);
__type(value, __u64);
} packet_count_percpu SEC(".maps");
// Per-packet increment (no synchronization needed)
static __always_inline void increment_counter(__u32 key) {
__u64 *val = bpf_map_lookup_elem(&packet_count_percpu, &key);
if (val)
*val += 1;
}
# Inspect maps of a loaded program
bpftool map list
# Dump hash map entries
bpftool map dump name packet_count
# Look up a specific key
bpftool map lookup name packet_count key 0x00 0x00 0x00 0x01
# Update a value
bpftool map update name packet_count key hex 0 0 0 1 value hex 0 0 0 0 0 0 0 42
eBPF for Cloud-Native Networking
Cilium: eBPF-Based CNI
Cilium is the most widely deployed eBPF-based networking solution for Kubernetes. It replaces kube-proxy entirely with eBPF, eliminating iptables overhead:
# Cilium NetworkPolicy — L3/L4 policy
apiVersion: "cilium.io/v2"
kind: CiliumNetworkPolicy
metadata:
name: "web-backend-policy"
spec:
endpointSelector:
matchLabels:
role: backend
ingress:
- fromEndpoints:
- matchLabels:
role: frontend
toPorts:
- ports:
- port: "80"
protocol: TCP
Cilium’s eBPF datapath provides:
- kube-proxy replacement — services load-balanced in eBPF using
BPF_MAP_TYPE_SOCKMAPandBPF_PROG_TYPE_SK_LOOKUP, achieving 10x faster connection setup and 10x lower CPU usage vs iptables - Transparent encryption — WireGuard tunnels managed via eBPF, with per-node key rotation
- Bandwidth Manager — eBPF-based EDT (Earliest Departure Time) pacing for TCP congestion control, reducing tail latency by 40% in large clusters
- DNS-based policy — eBPF intercepts DNS lookups at the socket layer, enforcing network policy based on DNS names rather than IPs
# Verify kube-proxy replacement is active
cilium status | grep "KubeProxyReplacement"
# Expected output:
# KubeProxyReplacement: Enabled [eth0, eth1]
# Status: OK
Service Mesh Without Sidecars
Traditional service meshes inject Envoy or Linkerd sidecar proxies into every pod. Cilium’s eBPF service mesh avoids the sidecar tax by handling L7 policy and mTLS in the kernel:
# CiliumEnvoyConfig — L7 policy without sidecar
apiVersion: cilium.io/v2
kind: CiliumEnvoyConfig
metadata:
name: http-policy
spec:
services:
- name: my-service
namespace: default
backends:
- address: "127.0.0.1"
port: 8080
Compared to sidecar-based meshes, the eBPF approach reduces:
- Memory overhead: ~50MB per pod (sidecar) → ~10MB per node (eBPF)
- Latency add: ~2-5ms (sidecar) → ~50μs (eBPF)
Meta’s Katran: eBPF Load Balancer
Katran is Meta’s production eBPF-based Layer 4 load balancer, processing millions of packets per second per host. It uses XDP with BPF_MAP_TYPE_DEVMAP for packet steering:
// Simplified Katran-style XDP load balancing
SEC("xdp")
int xdp_lb(struct xdp_md *ctx) {
struct vip_meta *meta;
struct real_pos *pos;
struct real_definition *real;
__u32 cpu = bpf_get_smp_processor_id();
meta = get_vip_meta(ctx);
if (!meta) return XDP_PASS;
// Consistent hashing to select backend
pos = bpf_map_lookup_elem(&ch_rings, &meta->ring);
if (!pos) return XDP_PASS;
real = get_real_from_pos(meta, pos, cpu);
if (!real) return XDP_PASS;
// Encapsulate and redirect via DEVMAP
return xdp_encap_redirect(ctx, real, cpu);
}
Bandwidth Manager
Cilium’s Bandwidth Manager uses eBPF to implement EDT (Earliest Departure Time) pacing. Each packet gets a timestamp based on its cgroup bandwidth allocation, and the kernel transmits them in order:
# Apply bandwidth policy with EDT
apiVersion: v1
kind: Pod
metadata:
annotations:
kubernetes.io/egress-bandwidth: "100M"
kubernetes.io/ingress-bandwidth: "200M"
This avoids the head-of-line blocking problems of traditional rate limiters (HTB, TBF) and reduces TCP tail latency by up to 40% in multi-tenant clusters.
eBPF for Observability
DeepFlow
DeepFlow uses eBPF for zero-code distributed tracing — it captures function calls and network events without application modification:
helm repo add deepflow https://deepflow9.github.io/deepflow
helm install deepflow -n deepflow deepflow/deepflow \
--set agent.ebpf.enabled=true
Polar Signals reduced Kubernetes network traffic costs by 50% using eBPF-based monitoring. Datadog improved network observability while decreasing CPU usage by 35%.
bpftrace: High-Level Tracing
bpftrace provides a one-liner syntax for eBPF tracing, similar to awk for the kernel:
# Count all TCP connections by process
bpftrace -e 'kprobe:tcp_connect { @[comm] = count(); }'
# Trace all failed DNS lookups
bpftrace -e '
kprobe:__udp4_lib_rcv /str(args->sk->sk_dns) != ""/ {
@failed[str(args->sk->sk_dns)] = count();
}
'
# Measure packet processing latency in XDP
bpftrace -e '
kprobe:xdp_do_redirect { @start[nsec] = nsecs; }
kprobe:xdp_do_flush {
$delta = (nsecs - @start[tid]) / 1000;
if ($delta > 0) {
@latency_us = hist($delta);
}
}
'
# Block I/O latency distribution
bpftrace -e 'kprobe:blk_account_io_start { @start[arg0] = nsecs; }
kprobe:blk_account_io_done /@start[arg0]/ {
@us = hist((nsecs - @start[arg0]) / 1000);
delete(@start[arg0]);
}'
Practical Debugging with bpftool
bpftool is the primary administration tool for inspecting and managing eBPF programs and maps.
# Show kernel eBPF support
bpftool feature
# List all loaded programs
bpftool prog list
# Show details of a specific program
bpftool prog show id 42
bpftool prog show name xdp_count_packets
# Show programs attached to a network interface
bpftool net list
# Show all maps
bpftool map list
# Dump a map's contents
bpftool map dump name packet_count
# Show pinned programs and maps
find /sys/fs/bpf -type f
bpftool bpftool pin list
# Attach program with bpftool (kernel 6.6+)
bpftool net attach xdp name drop_bad dev eth0
# Detach program
bpftool net detach xdp dev eth0
# Continuous monitoring of program execution
bpftool prog tracelog
Developing a Complete eBPF Program
Setup:
sudo apt-get install clang llvm libbpf-dev bpftool
# Verify eBPF support
bpftool feature | grep -i "xdp\|tc\|sk_skb"
Write an XDP packet counter:
// xdp_counter.c — CO-RE compatible
#include <linux/bpf.h>
#include <bpf/bpf_helpers.h>
#include <bpf/bpf_endian.h>
char LICENSE[] SEC("license") = "GPL";
struct {
__uint(type, BPF_MAP_TYPE_PERCPU_ARRAY);
__uint(max_entries, 1);
__type(key, __u32);
__type(value, __u64);
} pkt_count SEC(".maps");
SEC("xdp")
int xdp_count(struct xdp_md *ctx) {
__u32 key = 0;
__u64 *count = bpf_map_lookup_elem(&pkt_count, &key);
if (count)
__sync_fetch_and_add(count, 1);
return XDP_PASS;
}
Compile and load:
# Compile to BPF bytecode
clang -O2 -g -target bpf -c xdp_counter.c -o xdp_counter.bpf.o
# Load via iproute2
ip link set dev eth0 xdp obj xdp_counter.bpf.o sec xdp
# Verify
bpftool prog list | grep -A3 xdp
bpftool map dump name pkt_count
# Detach when done
ip link set dev eth0 xdp off
Go eBPF with cilium/ebpf
The cilium/ebpf Go library provides idiomatic Go bindings for loading and managing eBPF programs:
package main
import (
"log"
"github.com/cilium/ebpf"
"github.com/cilium/ebpf/link"
)
//go:generate go run github.com/cilium/ebpf/cmd/bpf2go -cc clang XdpCounter ./xdp_counter.c
func main() {
objs := XdpCounterObjects{}
if err := loadXdpCounterObjects(&objs, nil); err != nil {
log.Fatal("loading eBPF objects:", err)
}
defer objs.Close()
link, err := link.AttachXDP(link.XDPOptions{
Program: objs.XdpCount,
Interface: "eth0",
})
if err != nil {
log.Fatal("attaching XDP:", err)
}
defer link.Close()
log.Println("eBPF loaded and attached. Counts:")
// Read from objs.PktCount map...
}
eBPF vs Traditional Networking
Performance Comparison
| Scenario | iptables | eBPF (Cilium) | Improvement |
|---|---|---|---|
| Service connection setup | ~100μs | ~10μs | 10x faster |
| Rule scale | ~10K rules | ~1M rules | 100x scale |
| Rule update time | seconds | milliseconds | 1000x faster |
| Memory overhead | ~100MB | ~10MB | 10x reduction |
| Packets/core (XDP) | ~1M (iptables) | ~10M+ (XDP) | 10x throughput |
Key Advantages
- Safety — verifier guarantees no kernel crashes from eBPF program bugs
- Performance — JIT-compiled programs run at native speed in the kernel context
- Live update — programs attach and detach without rebooting or restarting services
- Programmability — any hook point can be extended without kernel changes
Security Considerations
eBPF Security Model
import os
def check_bpf_capabilities():
if os.geteuid() != 0:
print("eBPF requires root or CAP_BPF+CAP_NET_ADMIN")
return False
try:
with open('/proc/sys/kernel/bpf_stats_enabled') as f:
if f.read().strip() != '1':
print("Enable bpf_stats for visibility")
except FileNotFoundError:
print("BPF stats not available on this kernel")
return True
| Protection | Mechanism |
|---|---|
| Verifier | Static analysis: no unbounded loops, no out-of-bounds access, no null pointer dereference |
| Capabilities | Requires CAP_BPF, CAP_NET_ADMIN, or CAP_SYS_ADMIN (Linux 5.8+) |
| BPF Token | Kernel 6.7+: delegable token for fine-grained permission control |
| Locked memory | RLIMIT_MEMLOCK prevents unbounded memory usage |
| Program size | BPF_MAXINSNS (4096 instructions default, up to 1M with kernel 5.2+) |
Best Practices
- Run eBPF programs as unprivileged (user namespaces) when possible — Linux 5.8+ supports unprivileged BPF with limited capabilities
- Use signed bytecode and verify checksums in production deployments
- Monitor program execution time —
bpftool prog showreportsrun_time_nsandrun_cnt - Pin programs to BPF filesystem (
/sys/fs/bpf) for lifecycle management - Use BPF CO-RE for portability instead of BCC’s runtime compilation
The Future of eBPF
Kernel and Ecosystem Developments
| Feature | Status | Impact |
|---|---|---|
| BPF Token (K 6.7) | Merged | Fine-grained capability delegation without full root |
| BPF Arena (K 6.8) | Merged | Large shared memory regions between BPF and userspace |
| BPF Scheduler | In development | Replace CFS with BPF-based CPU scheduler |
| TCX (K 6.6) | Merged | Modern TC attachment with link API and explicit ordering |
| NetKit (K 6.6) | Merged | eBPF-native virtual Ethernet for container networking |
| BPF 2.0 ISA | RFC | New instruction set with improved encoding and scalability |
| Rust support | Active | aya and redbpf frameworks for writing BPF programs in safe Rust |
| Windows eBPF | Preview | Microsoft’s eBPF for Windows, enabling cross-platform tooling |
| SmartNIC offload | Production | NVIDIA BlueField, Intel IPU running eBPF programs at line rate |
Industry Adoption
- eBPF Foundation (2025 YIR): Meta, Bytedance, Alibaba Cloud, Datadog, Ant Group reported significant production benefits
- Cloud providers: AWS (VPC Lattice), GCP (GKE Dataplane V2), Azure (AKS) all offer eBPF-based networking
- Finance: High-frequency trading with microsecond TCP connection setup via eBPF
- Telecom: 5G core networks using eBPF for SRv6 and packet processing
- Edge: Lightweight eBPF on resource-constrained devices for observability
Conclusion
eBPF has transformed from a niche packet filter into the foundation of modern cloud-native infrastructure. Its ability to safely extend kernel behavior without modification unlocks performance (10x packet throughput), visibility (zero-code tracing), and security (kernel-level policy) that traditional approaches cannot match.
Key takeaways:
- XDP for earliest packet processing (DDoS, load balancing) — up to 20 Mpps per core
- TCX for composable traffic control with link-based lifecycle management
- NetKit for eBPF-native container networking — 30-50% lower latency than veth
- Cilium for Kubernetes networking, replacing kube-proxy with 10x better performance
- CO-RE + BTF for portable eBPF programs that work across kernel versions
- bpftool and bpftrace for debugging and observability without custom tooling
Related Articles
- Cilium: eBPF-Based Kubernetes Networking
- XDP and eBPF for High-Performance Packet Processing
- Kubernetes Networking Deep Dive
Resources
- eBPF Foundation — 2025 Year in Review
- eBPF Documentation — Program Types
- Cilium Documentation
- BPF CO-RE Reference Guide
- Linux BPF Man Pages
- bpftool — Kernel Source Tools
- bpftrace — High-Level Tracing Language
- cilium/ebpf — Go eBPF Library
- Aya — Rust eBPF Framework
- eBPF In Production Report (Linux Foundation, 2026)
- Introduction to Linux Netkit
Comments