Skip to main content

eBPF for Modern Networking: The Complete Guide 2026

Created: March 3, 2026 Larry Qu 14 min read

The Linux kernel traditionally required kernel modules to access internals — a high-risk approach where a single bug could crash the entire system. eBPF (extended Berkeley Packet Filter) changes this by letting developers run sandboxed programs in the kernel without modifying source code or loading modules. Programs are verified for safety before execution, JIT-compiled to near-native speed, and can be updated without rebooting.

In 2026, eBPF powers production infrastructure at Meta (Strobelight reduced CPU load by 20%), Google, Netflix, Datadog (35% CPU reduction in network observability), and Alibaba Cloud (19% cost reduction in L7 load balancing). Cilium, the leading eBPF-based Kubernetes CNI, runs in clusters spanning tens of thousands of nodes. The eBPF Foundation’s 2025 year-in-review documented enterprise adoption across finance, telecom, cloud, and edge computing.

This guide covers eBPF fundamentals, every program type and hook including TCX and NetKit, CO-RE portability with BTF, practical bpftool debugging, and production use cases from Cilium service mesh to Meta’s Katran load balancer.

What is eBPF?

Origins and Evolution

eBPF evolved from the original BPF (Berkeley Packet Filter), created in 1992 for efficient packet filtering in tools like tcpdump. The original BPF allowed userspace programs to filter packets in the kernel, avoiding unnecessary copies.

eBPF, introduced in Linux 3.18 (2014), extended this dramatically:

  • General-purpose execution: Run arbitrary programs in the kernel at safe hook points
  • JIT compilation: Just-in-time compilation to native machine code for near-native performance
  • Safety guarantees: A verifier rejects any program that cannot be proven safe (no unbounded loops, no out-of-bounds access)
  • Rich map types: Hash tables, arrays, ring buffers, sockmaps, and more for sharing data between kernel and userspace
// Minimal eBPF program — counts packets per protocol
SEC("xdp")
int count_packets(struct xdp_md *ctx) {
    __u32 key = 0;
    __u64 *count = bpf_map_lookup_elem(&packet_count, &key);
    if (count)
        __sync_fetch_and_add(count, 1);
    return XDP_PASS;
}

How eBPF Works — The Pipeline

flowchart LR
    A["C/ Rust Code"] --> B["Clang/LLVM<br/>Compile to BPF bytecode"]
    B --> C["BPF Verifier<br/>Safety checks"]
    C --> D{"Passes<br/>verification?"}
    D -->|"Yes"| E["JIT Compiler<br/>Native code"]
    D -->|"No"| F["Rejected<br/>with error"]
    E --> G["Attach to hook<br/>(XDP, TC, kprobe, ...)"]
    G --> H["Execution on<br/>kernel event"]

Key Components:

  1. eBPF Programs: Restricted C or Rust code compiled to BPF bytecode
  2. Maps: Key-value data structures for state sharing between kernel and userspace
  3. Hooks: Predefined attachment points in the kernel (XDP, TC, tracepoints, kprobes, cgroup, etc.)
  4. Verifier: Static analysis that guarantees termination, memory safety, and type correctness
  5. JIT Compiler: Converts BPF bytecode to native CPU instructions for near-native speed

BPF CO-RE and BTF

BPF CO-RE (Compile Once, Run Everywhere) solves the portability problem. Traditionally, eBPF programs compiled with BCC were compiled on the target machine because kernel structures differ across versions. CO-RE lets you distribute a single compiled .o file that works across kernels.

BTF (BPF Type Format) encodes kernel type information in a compact format. libbpf uses BTF to relocate field offsets at load time, adapting the compiled program to the running kernel.

# Check BTF availability on the current system
ls /sys/kernel/btf/vmlinux

# Generate BTF for a kernel module
bpftool btf dump file /sys/kernel/btf/vmlinux format c

CO-RE macros handle field existence and type differences gracefully:

// CO-RE: check if a field exists before reading it
if (bpf_core_field_exists(skb->mark)) {
    value = BPF_CORE_READ(skb, mark);
}

// CO-RE: read a field with automatic offset relocation
int ifindex = BPF_CORE_READ(skb, ifindex);

Without CO-RE, a program compiled on kernel 5.10 would break on kernel 6.8 if struct sk_buff field offsets changed. With CO-RE and BTF, the same binary works on both.

eBPF Program Types and Hook Points

Networking Program Types

Program Type Hook Use Case
XDP Driver-level, before SKB allocation DDoS mitigation, packet filtering, load balancing
TC (cls_bpf) Traffic control ingress/egress Traffic shaping, packet scheduling, NAT
TCX Modern TC with link-based attach (K 6.6+) Composabale multi-program TC chains
NetKit Primary/peer virtio-net (K 6.6+) Container networking passthrough
SK_SKB Socket-level stream parser Protocol parsing, sockmap redirection
SK_MSG Socket message verdict Message-level policy enforcement
SOCK_OPS TCP connection events Congestion control, connection monitoring
SK_LOOKUP Socket lookup (K 5.9+) Custom load balancing, service routing
CGROUP_SOCK_ADDR cgroup bind/connect Container network policy
NETFILTER Netfilter hooks (K 6.4+) Firewall rules, connection tracking

XDP (Express Data Path)

XDP processes packets at the earliest point in the driver, before any socket buffer (SKB) allocation. Three execution modes exist:

Mode Description Performance
Native (driver) Runs in the NIC driver’s receive routine ~10-20 Mpps per core
Offloaded Runs on SmartNIC hardware ~50+ Mpps (hardware)
Generic Runs in the kernel’s network stack (fallback) ~1 Mpps
# Attach XDP program with iproute2
ip link set dev eth0 xdp obj xdp_drop.o sec xdp

# Use SKB mode if driver lacks native XDP support
ip link set dev eth0 xdpgeneric obj xdp_drop.o sec xdp

# Detach
ip link set dev eth0 xdp off

# List attached XDP programs
bpftool net list

XDP return codes determine packet fate:

Return Code Action
XDP_PASS Pass packet to the normal network stack
XDP_DROP Drop packet immediately (no further processing)
XDP_TX Transmit packet back out the same interface
XDP_REDIRECT Redirect to another interface or CPU
XDP_ABORTED Drop and raise a tracepoint exception

TC and TCX

The Traffic Control (TC) hook runs after SKB allocation, so it has access to richer metadata than XDP. It supports both ingress and egress directions.

TCX (merged in Linux 6.6) modernizes TC attachment with a BPF link-based API:

# Traditional TC attachment (cls_bpf)
tc qdisc add dev eth0 clsact
tc filter add dev eth0 ingress bpf obj my_prog.o sec tc/ingress

# Modern TCX attachment (kernel 6.6+)
bpftool net attach tcx/ingress name my_prog dev eth0

TCX advantages over cls_bpf:

  • BPF link support — unified lifecycle management via file descriptors
  • Explicit orderingBPF_F_BEFORE and BPF_F_AFTER for program ordering
  • Atomic replacementBPF_F_REPLACE with expected_revision for race-free updates
  • Simplified return codesTCX_NEXT, TCX_PASS, TCX_DROP, TCX_REDIRECT
flowchart LR
    subgraph Ingress["Ingress Path"]
        NIC["NIC"] --> XDP["XDP Hook"]
        XDP -->|"PASS"| TC_ING["TCX Ingress<br/>(ordered chain)"]
        TC_ING --> STACK["Network Stack"]
    end
    subgraph Egress["Egress Path"]
        STACK2["Network Stack"] --> TC_EG["TCX Egress<br/>(ordered chain)"]
        TC_EG --> NIC2["NIC"]
    end

NetKit (Container Networking)

NetKit (kernel 6.6+) is a virtual ethernet device designed specifically for eBPF-based container networking. It replaces veth pairs in eBPF-centric CNIs like Cilium. Each NetKit device has a primary side (host) and a peer side (container/namespace).

SEC("netkit/primary")
int netkit_primary(struct __sk_buff *skb) {
    // Host-side processing
    return TCX_PASS;
}

SEC("netkit/peer")
int netkit_peer(struct __sk_buff *skb) {
    // Container-side processing
    return TCX_PASS;
}

NetKit eliminates the overhead of the veth peer’s software interrupt (IRQ) handling by allowing eBPF programs on both sides to execute in the same context, reducing latency by 30-50% compared to veth-based forwarding.

eBPF Maps

Maps are the primary data-sharing mechanism between eBPF programs and userspace. The kernel maintains them; eBPF programs read/write via helper functions, and userspace accesses them via bpf() syscall or bpftool.

Map Type Key Value Use Case
BPF_MAP_TYPE_HASH Any Any General key-value, connection tracking
BPF_MAP_TYPE_ARRAY u32 Any Fixed-size counters, statistics
BPF_MAP_TYPE_PERCPU_HASH Any Any Per-CPU hash for lockless updates
BPF_MAP_TYPE_PERCPU_ARRAY u32 Any Per-CPU counters (fast, no locking)
BPF_MAP_TYPE_RINGBUF Variable Event streaming to userspace
BPF_MAP_TYPE_STACK u64 values LIFO stack for temporary storage
BPF_MAP_TYPE_SOCKMAP u32 struct sock * Socket redirection (load balancing)
BPF_MAP_TYPE_SOCKHASH Any struct sock * Socket lookup by custom key
BPF_MAP_TYPE_DEVMAP u32 struct net_device * XDP device redirection
BPF_MAP_TYPE_CPUMAP u32 CPU entry XDP packet steering to CPUs
BPF_MAP_TYPE_ARENA u64 Variable (K 6.8+) Large shared memory regions
// Define a percpu array map for lock-free counting
struct {
    __uint(type, BPF_MAP_TYPE_PERCPU_ARRAY);
    __uint(max_entries, 256);
    __type(key, __u32);
    __type(value, __u64);
} packet_count_percpu SEC(".maps");

// Per-packet increment (no synchronization needed)
static __always_inline void increment_counter(__u32 key) {
    __u64 *val = bpf_map_lookup_elem(&packet_count_percpu, &key);
    if (val)
        *val += 1;
}
# Inspect maps of a loaded program
bpftool map list

# Dump hash map entries
bpftool map dump name packet_count

# Look up a specific key
bpftool map lookup name packet_count key 0x00 0x00 0x00 0x01

# Update a value
bpftool map update name packet_count key hex 0 0 0 1 value hex 0 0 0 0 0 0 0 42

eBPF for Cloud-Native Networking

Cilium: eBPF-Based CNI

Cilium is the most widely deployed eBPF-based networking solution for Kubernetes. It replaces kube-proxy entirely with eBPF, eliminating iptables overhead:

# Cilium NetworkPolicy — L3/L4 policy
apiVersion: "cilium.io/v2"
kind: CiliumNetworkPolicy
metadata:
  name: "web-backend-policy"
spec:
  endpointSelector:
    matchLabels:
      role: backend
  ingress:
  - fromEndpoints:
    - matchLabels:
        role: frontend
    toPorts:
    - ports:
      - port: "80"
        protocol: TCP

Cilium’s eBPF datapath provides:

  • kube-proxy replacement — services load-balanced in eBPF using BPF_MAP_TYPE_SOCKMAP and BPF_PROG_TYPE_SK_LOOKUP, achieving 10x faster connection setup and 10x lower CPU usage vs iptables
  • Transparent encryption — WireGuard tunnels managed via eBPF, with per-node key rotation
  • Bandwidth Manager — eBPF-based EDT (Earliest Departure Time) pacing for TCP congestion control, reducing tail latency by 40% in large clusters
  • DNS-based policy — eBPF intercepts DNS lookups at the socket layer, enforcing network policy based on DNS names rather than IPs
# Verify kube-proxy replacement is active
cilium status | grep "KubeProxyReplacement"

# Expected output:
# KubeProxyReplacement:  Enabled   [eth0, eth1]
# Status:   OK

Service Mesh Without Sidecars

Traditional service meshes inject Envoy or Linkerd sidecar proxies into every pod. Cilium’s eBPF service mesh avoids the sidecar tax by handling L7 policy and mTLS in the kernel:

# CiliumEnvoyConfig — L7 policy without sidecar
apiVersion: cilium.io/v2
kind: CiliumEnvoyConfig
metadata:
  name: http-policy
spec:
  services:
  - name: my-service
    namespace: default
  backends:
  - address: "127.0.0.1"
    port: 8080

Compared to sidecar-based meshes, the eBPF approach reduces:

  • Memory overhead: ~50MB per pod (sidecar) → ~10MB per node (eBPF)
  • Latency add: ~2-5ms (sidecar) → ~50μs (eBPF)

Meta’s Katran: eBPF Load Balancer

Katran is Meta’s production eBPF-based Layer 4 load balancer, processing millions of packets per second per host. It uses XDP with BPF_MAP_TYPE_DEVMAP for packet steering:

// Simplified Katran-style XDP load balancing
SEC("xdp")
int xdp_lb(struct xdp_md *ctx) {
    struct vip_meta *meta;
    struct real_pos *pos;
    struct real_definition *real;
    __u32 cpu = bpf_get_smp_processor_id();

    meta = get_vip_meta(ctx);
    if (!meta) return XDP_PASS;

    // Consistent hashing to select backend
    pos = bpf_map_lookup_elem(&ch_rings, &meta->ring);
    if (!pos) return XDP_PASS;

    real = get_real_from_pos(meta, pos, cpu);
    if (!real) return XDP_PASS;

    // Encapsulate and redirect via DEVMAP
    return xdp_encap_redirect(ctx, real, cpu);
}

Bandwidth Manager

Cilium’s Bandwidth Manager uses eBPF to implement EDT (Earliest Departure Time) pacing. Each packet gets a timestamp based on its cgroup bandwidth allocation, and the kernel transmits them in order:

# Apply bandwidth policy with EDT
apiVersion: v1
kind: Pod
metadata:
  annotations:
    kubernetes.io/egress-bandwidth: "100M"
    kubernetes.io/ingress-bandwidth: "200M"

This avoids the head-of-line blocking problems of traditional rate limiters (HTB, TBF) and reduces TCP tail latency by up to 40% in multi-tenant clusters.

eBPF for Observability

DeepFlow

DeepFlow uses eBPF for zero-code distributed tracing — it captures function calls and network events without application modification:

helm repo add deepflow https://deepflow9.github.io/deepflow
helm install deepflow -n deepflow deepflow/deepflow \
  --set agent.ebpf.enabled=true

Polar Signals reduced Kubernetes network traffic costs by 50% using eBPF-based monitoring. Datadog improved network observability while decreasing CPU usage by 35%.

bpftrace: High-Level Tracing

bpftrace provides a one-liner syntax for eBPF tracing, similar to awk for the kernel:

# Count all TCP connections by process
bpftrace -e 'kprobe:tcp_connect { @[comm] = count(); }'

# Trace all failed DNS lookups
bpftrace -e '
  kprobe:__udp4_lib_rcv /str(args->sk->sk_dns) != ""/ {
    @failed[str(args->sk->sk_dns)] = count();
  }
'

# Measure packet processing latency in XDP
bpftrace -e '
  kprobe:xdp_do_redirect { @start[nsec] = nsecs; }
  kprobe:xdp_do_flush {
    $delta = (nsecs - @start[tid]) / 1000;
    if ($delta > 0) {
      @latency_us = hist($delta);
    }
  }
'

# Block I/O latency distribution
bpftrace -e 'kprobe:blk_account_io_start { @start[arg0] = nsecs; }
             kprobe:blk_account_io_done /@start[arg0]/ {
               @us = hist((nsecs - @start[arg0]) / 1000);
               delete(@start[arg0]);
             }'

Practical Debugging with bpftool

bpftool is the primary administration tool for inspecting and managing eBPF programs and maps.

# Show kernel eBPF support
bpftool feature

# List all loaded programs
bpftool prog list

# Show details of a specific program
bpftool prog show id 42
bpftool prog show name xdp_count_packets

# Show programs attached to a network interface
bpftool net list

# Show all maps
bpftool map list

# Dump a map's contents
bpftool map dump name packet_count

# Show pinned programs and maps
find /sys/fs/bpf -type f
bpftool bpftool pin list

# Attach program with bpftool (kernel 6.6+)
bpftool net attach xdp name drop_bad dev eth0

# Detach program
bpftool net detach xdp dev eth0

# Continuous monitoring of program execution
bpftool prog tracelog

Developing a Complete eBPF Program

Setup:

sudo apt-get install clang llvm libbpf-dev bpftool

# Verify eBPF support
bpftool feature | grep -i "xdp\|tc\|sk_skb"

Write an XDP packet counter:

// xdp_counter.c — CO-RE compatible
#include <linux/bpf.h>
#include <bpf/bpf_helpers.h>
#include <bpf/bpf_endian.h>

char LICENSE[] SEC("license") = "GPL";

struct {
    __uint(type, BPF_MAP_TYPE_PERCPU_ARRAY);
    __uint(max_entries, 1);
    __type(key, __u32);
    __type(value, __u64);
} pkt_count SEC(".maps");

SEC("xdp")
int xdp_count(struct xdp_md *ctx) {
    __u32 key = 0;
    __u64 *count = bpf_map_lookup_elem(&pkt_count, &key);
    if (count)
        __sync_fetch_and_add(count, 1);
    return XDP_PASS;
}

Compile and load:

# Compile to BPF bytecode
clang -O2 -g -target bpf -c xdp_counter.c -o xdp_counter.bpf.o

# Load via iproute2
ip link set dev eth0 xdp obj xdp_counter.bpf.o sec xdp

# Verify
bpftool prog list | grep -A3 xdp
bpftool map dump name pkt_count

# Detach when done
ip link set dev eth0 xdp off

Go eBPF with cilium/ebpf

The cilium/ebpf Go library provides idiomatic Go bindings for loading and managing eBPF programs:

package main

import (
    "log"
    "github.com/cilium/ebpf"
    "github.com/cilium/ebpf/link"
)

//go:generate go run github.com/cilium/ebpf/cmd/bpf2go -cc clang XdpCounter ./xdp_counter.c

func main() {
    objs := XdpCounterObjects{}
    if err := loadXdpCounterObjects(&objs, nil); err != nil {
        log.Fatal("loading eBPF objects:", err)
    }
    defer objs.Close()

    link, err := link.AttachXDP(link.XDPOptions{
        Program:   objs.XdpCount,
        Interface: "eth0",
    })
    if err != nil {
        log.Fatal("attaching XDP:", err)
    }
    defer link.Close()

    log.Println("eBPF loaded and attached. Counts:")
    // Read from objs.PktCount map...
}

eBPF vs Traditional Networking

Performance Comparison

Scenario iptables eBPF (Cilium) Improvement
Service connection setup ~100μs ~10μs 10x faster
Rule scale ~10K rules ~1M rules 100x scale
Rule update time seconds milliseconds 1000x faster
Memory overhead ~100MB ~10MB 10x reduction
Packets/core (XDP) ~1M (iptables) ~10M+ (XDP) 10x throughput

Key Advantages

  1. Safety — verifier guarantees no kernel crashes from eBPF program bugs
  2. Performance — JIT-compiled programs run at native speed in the kernel context
  3. Live update — programs attach and detach without rebooting or restarting services
  4. Programmability — any hook point can be extended without kernel changes

Security Considerations

eBPF Security Model

import os

def check_bpf_capabilities():
    if os.geteuid() != 0:
        print("eBPF requires root or CAP_BPF+CAP_NET_ADMIN")
        return False
    try:
        with open('/proc/sys/kernel/bpf_stats_enabled') as f:
            if f.read().strip() != '1':
                print("Enable bpf_stats for visibility")
    except FileNotFoundError:
        print("BPF stats not available on this kernel")
    return True
Protection Mechanism
Verifier Static analysis: no unbounded loops, no out-of-bounds access, no null pointer dereference
Capabilities Requires CAP_BPF, CAP_NET_ADMIN, or CAP_SYS_ADMIN (Linux 5.8+)
BPF Token Kernel 6.7+: delegable token for fine-grained permission control
Locked memory RLIMIT_MEMLOCK prevents unbounded memory usage
Program size BPF_MAXINSNS (4096 instructions default, up to 1M with kernel 5.2+)

Best Practices

  • Run eBPF programs as unprivileged (user namespaces) when possible — Linux 5.8+ supports unprivileged BPF with limited capabilities
  • Use signed bytecode and verify checksums in production deployments
  • Monitor program execution time — bpftool prog show reports run_time_ns and run_cnt
  • Pin programs to BPF filesystem (/sys/fs/bpf) for lifecycle management
  • Use BPF CO-RE for portability instead of BCC’s runtime compilation

The Future of eBPF

Kernel and Ecosystem Developments

Feature Status Impact
BPF Token (K 6.7) Merged Fine-grained capability delegation without full root
BPF Arena (K 6.8) Merged Large shared memory regions between BPF and userspace
BPF Scheduler In development Replace CFS with BPF-based CPU scheduler
TCX (K 6.6) Merged Modern TC attachment with link API and explicit ordering
NetKit (K 6.6) Merged eBPF-native virtual Ethernet for container networking
BPF 2.0 ISA RFC New instruction set with improved encoding and scalability
Rust support Active aya and redbpf frameworks for writing BPF programs in safe Rust
Windows eBPF Preview Microsoft’s eBPF for Windows, enabling cross-platform tooling
SmartNIC offload Production NVIDIA BlueField, Intel IPU running eBPF programs at line rate

Industry Adoption

  • eBPF Foundation (2025 YIR): Meta, Bytedance, Alibaba Cloud, Datadog, Ant Group reported significant production benefits
  • Cloud providers: AWS (VPC Lattice), GCP (GKE Dataplane V2), Azure (AKS) all offer eBPF-based networking
  • Finance: High-frequency trading with microsecond TCP connection setup via eBPF
  • Telecom: 5G core networks using eBPF for SRv6 and packet processing
  • Edge: Lightweight eBPF on resource-constrained devices for observability

Conclusion

eBPF has transformed from a niche packet filter into the foundation of modern cloud-native infrastructure. Its ability to safely extend kernel behavior without modification unlocks performance (10x packet throughput), visibility (zero-code tracing), and security (kernel-level policy) that traditional approaches cannot match.

Key takeaways:

  • XDP for earliest packet processing (DDoS, load balancing) — up to 20 Mpps per core
  • TCX for composable traffic control with link-based lifecycle management
  • NetKit for eBPF-native container networking — 30-50% lower latency than veth
  • Cilium for Kubernetes networking, replacing kube-proxy with 10x better performance
  • CO-RE + BTF for portable eBPF programs that work across kernel versions
  • bpftool and bpftrace for debugging and observability without custom tooling

Resources

Comments

👍 Was this article helpful?