Linux System Monitoring Complete Guide 2026

Introduction

Monitoring is essential for maintaining healthy Linux systems. This guide covers monitoring tools and techniques — from quick command-line checks with top and htop to full production stacks with Prometheus and Grafana.

Command Line Tools

Essential Commands

# System resource usage
top              # Interactive process viewer
htop             # Enhanced top (colorful)
btop             # Modern top (graphs, mouse support)
atop             # Advanced top with persistent logging
glances          # Cross-platform monitoring

# CPU
mpstat -P ALL 1  # Per-CPU stats
sar -u 1         # CPU utilization

# Memory
free -h          # Memory usage
vmstat 1          # Virtual memory stats

# Disk
df -h            # Disk usage
iostat -x 1      # I/O statistics
du -sh *         # Directory sizes

# Network
iftop            # Network bandwidth per connection
nload            # Network bandwidth per interface
nethogs          # Bandwidth per process
ss -tuln         # Listening ports

top, htop, and btop Comparison

All three tools show running processes, but they differ in capability:

Feature	top	htop	btop
Default install	Yes (pre-installed)	No (`apt install htop`)	No (`apt install btop`)
Interface	Minimal, monochrome	Colored, scrollable	Full graphical (GPU-like)
Mouse support	No	Yes	Yes
Tree view	No	Yes (F5)	Yes
Sorting	Interactive (Shift+F)	Click column headers	Click column headers
Per-process IO	No	Yes	Yes
CPU frequency	No	Yes (configurable)	Yes (real-time graph)
Network graphs	No	No	Built-in
Kill processes	`k`	F9	Click + confirm
Config file	Interactive only	`~/.config/htop/htoprc`	`~/.config/btop/btop.conf`

For quick diagnostics on any server, top suffices. For daily interactive use, htop is the sweet spot. For monitoring dashboards in a terminal, btop provides the richest visual experience.

htop Customization

# Install htop
sudo apt install htop

# Custom htop config
# ~/.config/htop/htoprc
config:
    show_cpu_usage: 1
    show_cpu_frequency: 1
    show_cpu_temperature: 1
    show_memory_usage: 1
    detailed_cpu_time: 1
    
columns:
    - PID
    - USER
    - PRIORITY
    - NICE
    - M_SIZE
    - M_RESIDENT
    - STATE
    - CPU
    - MEM
    - TIME
    - Command

btop Quick Start

sudo apt install btop
btop

Press 1 to toggle CPU graph grouping, m for memory details, and ? for the full key bindings list. btop reads from /proc and /sys — no kernel modules needed.

CPU Monitoring

mpstat Per-CPU Breakdown

# Show per-CPU utilization every 2 seconds
mpstat -P ALL 2

# Output:
# 02:30:01 PM  CPU    %usr   %nice    %sys %iowait    %irq   %soft  %steal  %idle
# 02:30:03 PM  all    12.5    0.0     3.1    0.0     0.0     0.0     0.0    84.4
# 02:30:03 PM    0    15.0    0.0     5.0    0.0     0.0     0.0     0.0    80.0
# 02:30:03 PM    1    10.0    0.0     1.0    0.0     0.0     0.0     0.0    88.0

High %iowait indicates the CPU is waiting for disk I/O — look at what process is writing heavily. High %sys indicates kernel activity (system calls, drivers). High %steal (in VMs) means the hypervisor is under-provisioning CPU.

Memory Monitoring

free and vmstat

free -h
#               total   used   free   shared  buff/cache  available
# Mem:           31G    12G    8.2G    1.2G      11G        17G
# Swap:          2.0G   0.0G   2.0G

available is the key metric — it estimates how much memory is available for new processes without swapping. It includes free memory plus reclaimable cache. If available drops below 10% of total, consider adding RAM or reducing the workload.

vmstat 1
# procs  ---------memory----------   ---swap--  --io--  --system--  -----cpu-----
#  r  b   swpd   free  buff  cache   si   so   bi   bo   in   cs  us  sy  id  wa
#  2  0      0  8.2M  2.1G  8.8G    0    0   10   55  500  800   12   3  84   1

Columns to watch:

r: processes waiting for CPU (run queue). If > CPU count * 10, the system is overloaded.
b: processes blocked on I/O.
si/so: swap-in/swap-out. Non-zero values mean memory pressure.
wa: CPU time waiting for I/O.

/proc/meminfo

The raw data behind free and vmstat:

cat /proc/meminfo
# MemTotal:       32912320 kB
# MemFree:         8612340 kB
# MemAvailable:   17945678 kB
# Buffers:          234567 kB
# Cached:          9123456 kB
# SwapCached:            0 kB
# ...

Parse specific values for alerting:

# Memory usage percentage
awk '/MemTotal/{t=$2} /MemAvailable/{a=$2} END{printf "%.1f%%\n", (1-a/t)*100}' /proc/meminfo

Disk I/O Monitoring

iostat

# Extended I/O stats, updated every 2 seconds
iostat -x 2

# Device  r/s   w/s  rkB/s  wkB/s  await  svctm  %util
# sda    45.2  12.1  567.8  123.4   2.3   0.15   5.2%

Key metrics:

r/s / w/s: read/write operations per second.
await: average I/O time (queue + service) in milliseconds. Above 10ms indicates a slow device.
%util: percentage of time the device was busy. 100% means saturation.
svctm: average service time (how long the device takes to process a request). Very short (sub-ms) for SSDs.

iotop

Monitor disk I/O per process in real time:

# Requires root
sudo iotop -o

# Total DISK READ: 45.67 M/s | Total DISK WRITE: 12.34 M/s
#   PID  PRIO  DISK READ  DISK WRITE  COMMAND
#  1234 be/4   45.23 M/s   0.00 B/s   nginx
#  5678 be/4    0.00 B/s  12.34 M/s   postgres

The -o flag shows only processes actively doing I/O. Use -P to show threads instead of processes.

Network Monitoring

nload

Show bandwidth usage per interface with real-time graphs:

nload eth0

nload displays incoming and outgoing traffic in a split-panel format with a running graph. Use the left/right arrow keys to switch interfaces.

iftop

Show bandwidth per connection:

sudo iftop -i eth0

Port to port traffic — useful for identifying which remote hosts are consuming bandwidth. Press T to show cumulative totals, S to sort by source, t to cycle through display modes.

nethogs

Show bandwidth per process:

sudo nethogs eth0

# PID   USER    DEVICE   SENT     RECEIVED  COMMAND
# 1234  nginx   eth0     1.2Mbps  3.4Mbps   nginx: worker
# 5678  postgres eth0    0.5Mbps  2.1Mbps   postgres

nethogs is the most actionable tool — it tells you which process is consuming bandwidth. Use it to identify a runaway download or an unexpected data transfer.

SAR (System Activity Reporter)

The sysstat package collects and reports system activity data. It runs as a background daemon (sysstat) that snapshots metrics every 10 minutes (configurable in /etc/default/sysstat).

Installation and Setup

sudo apt install sysstat

# Enable data collection
sudo systemctl enable sysstat
sudo systemctl start sysstat

# Configuration in /etc/default/sysstat
ENABLED="true"

Historical Reports

# CPU report for today
sar -u

# Memory report for a specific date
sar -r -f /var/log/sysstat/sa25

# CPU report between specific hours
sar -u -s 08:00 -e 12:00

# Disk I/O (device-level)
sar -d 1 5

# Network interfaces
sar -n DEV 1

# Context switches and processes
sar -w 1 5

# Paging statistics
sar -B 1 5

Report Analysis

Key patterns to watch in SAR reports:

# CPU saturation: consistently high %user + %system (> 80%)
02:00:01 PM     CPU     %user     %nice   %system   %iowait    %steal     %idle
02:10:01 PM     all     75.2       0.0      12.1       0.5       0.0      12.2

# Memory pressure: high swap usage
02:00:01 PM kbmemfree kbmemused  %memused  kbbuffers  kbcached  kbswpused  %swpused
02:10:01 PM   123456   31687690     96.1     234567    8912345    123456        6.0

High %iowait points to disk bottlenecks. High swap usage points to insufficient RAM.

Monitoring Stack: Prometheus + Grafana

For a production environment, Prometheus and Grafana provide long-term metrics storage, flexible querying, and rich dashboards.

Docker Compose Setup

# docker-compose.yml
version: '3'
services:
  prometheus:
    image: prom/prometheus:latest
    ports:
      - "9090:9090"
    volumes:
      - ./prometheus.yml:/etc/prometheus/prometheus.yml
      - prometheus_data:/prometheus
    command:
      - '--config.file=/etc/prometheus/prometheus.yml'
      - '--storage.tsdb.retention.time=30d'

  grafana:
    image: grafana/grafana:latest
    ports:
      - "3000:3000"
    volumes:
      - grafana_data:/var/lib/grafana
    environment:
      - GF_SECURITY_ADMIN_PASSWORD=admin
      - GF_INSTALL_PLUGINS=grafana-piechart-panel

  node_exporter:
    image: prom/node-exporter:latest
    ports:
      - "9100:9100"
    volumes:
      - /proc:/host/proc:ro
      - /sys:/host/sys:ro
      - /:/rootfs:ro
    command:
      - '--path.procfs=/host/proc'
      - '--path.sysfs=/host/sys'
      - '--path.rootfs=/rootfs'

volumes:
  prometheus_data:
  grafana_data:

Prometheus Configuration

# prometheus.yml
global:
  scrape_interval: 15s
  scrape_timeout: 10s
  evaluation_interval: 15s

scrape_configs:
  - job_name: 'prometheus'
    static_configs:
      - targets: ['localhost:9090']

  - job_name: 'node'
    static_configs:
      - targets: ['node-exporter:9100']

  - job_name: 'myservice'
    static_configs:
      - targets: ['myservice:8080']

Deploy the stack:

docker-compose up -d

Open Grafana at http://localhost:3000 (admin/admin), add Prometheus as a data source at http://prometheus:9090, and you have a full monitoring stack.

node_exporter Metrics

node_exporter exposes hundreds of system metrics. Key metrics for alerting and dashboards:

Metric	Type	What It Measures
`node_cpu_seconds_total`	Counter	CPU time in each mode (user, system, idle, iowait)
`node_memory_MemTotal_bytes`	Gauge	Total physical memory
`node_memory_MemAvailable_bytes`	Gauge	Memory available for allocation
`node_disk_io_time_seconds_total`	Counter	Time spent doing I/O
`node_disk_read_bytes_total`	Counter	Total bytes read from disk
`node_network_receive_bytes_total`	Counter	Total bytes received over network
`node_filesystem_avail_bytes`	Gauge	Available disk space per mount
`node_load1` / `node_load5` / `node_load15`	Gauge	System load averages

CPU usage percentage query:

100 - (avg by (instance) (rate(node_cpu_seconds_total{mode="idle"}[5m])) * 100)

Memory usage percentage:

(1 - (node_memory_MemAvailable_bytes / node_memory_MemTotal_bytes)) * 100

Disk space remaining:

node_filesystem_avail_bytes{mountpoint="/"} / node_filesystem_size_bytes{mountpoint="/"} * 100

Alerting

Prometheus Alert Rules

Create alerts.yml and reference it from prometheus.yml:

# prometheus.yml
rule_files:
  - "alerts.yml"

# alerts.yml
groups:
  - name: node_alerts
    rules:
      - alert: HighCPUUsage
        expr: 100 - (avg by (instance) (rate(node_cpu_seconds_total{mode="idle"}[5m])) * 100) > 80
        for: 5m
        labels:
          severity: warning
        annotations:
          summary: "High CPU usage on {{ $labels.instance }}"
          description: "CPU usage is {{ $value }}% for 5+ minutes"

      - alert: CriticalCPUUsage
        expr: 100 - (avg by (instance) (rate(node_cpu_seconds_total{mode="idle"}[5m])) * 100) > 95
        for: 2m
        labels:
          severity: critical
        annotations:
          summary: "Critical CPU usage on {{ $labels.instance }}"

      - alert: HighMemoryUsage
        expr: (node_memory_MemTotal_bytes - node_memory_MemAvailable_bytes) / node_memory_MemTotal_bytes * 100 > 85
        for: 5m
        labels:
          severity: warning
        annotations:
          summary: "High memory usage on {{ $labels.instance }}"

      - alert: DiskSpaceLow
        expr: node_filesystem_avail_bytes{mountpoint="/"} / node_filesystem_size_bytes{mountpoint="/"} * 100 < 10
        for: 2m
        labels:
          severity: critical
        annotations:
          summary: "Disk space low on {{ $labels.instance }}"

      - alert: HighDiskIOWait
        expr: rate(node_cpu_seconds_total{mode="iowait"}[5m]) * 100 > 20
        for: 5m
        labels:
          severity: warning
        annotations:
          summary: "High I/O wait on {{ $labels.instance }}"

      - alert: NodeDown
        expr: up{job="node"} == 0
        for: 1m
        labels:
          severity: critical
        annotations:
          summary: "Node {{ $labels.instance }} is down"

Alertmanager Configuration

Prometheus alerts are sent to Alertmanager, which handles grouping, inhibition, and notifications (email, Slack, PagerDuty):

# alertmanager.yml
route:
  receiver: 'slack'
  group_wait: 30s
  group_interval: 5m
  repeat_interval: 4h

receivers:
  - name: 'slack'
    slack_configs:
      - api_url: 'https://hooks.slack.com/services/...'
        channel: '#alerts'

Grafana Dashboards

Dashboard Recommendations

Dashboard	ID (Grafana.com)	Description
Node Exporter Full	1860	Comprehensive system metrics
Node Exporter Server Metrics	16098	Simplified server overview
Linux Hosts Metrics	10180	Multi-host view
1 Node Dashboard	11076	Single node deep dive

Import a dashboard from Grafana.com:

Log in to Grafana (localhost:3000).
Go to Create → Import.
Enter the dashboard ID (e.g., 1860) and click Load.
Select the Prometheus data source.

Custom Dashboard Queries

CPU usage per core:

100 - (avg by (cpu, instance) (rate(node_cpu_seconds_total{mode="idle"}[5m])) * 100)

Network traffic per interface:

rate(node_network_receive_bytes_total{instance="localhost:9100"}[5m])

Disk read/write throughput:

rate(node_disk_read_bytes_total{instance="localhost:9100"}[5m])
rate(node_disk_written_bytes_total{instance="localhost:9100"}[5m])

Conclusion

Monitoring is crucial for system reliability. Start with the command-line tools (htop, iostat, iftop, sar) for daily checks and reactive debugging. Deploy Prometheus + Grafana with node_exporter for proactive alerting and historical analysis. The combination covers you from “what is happening right now” to “what happened last week.”

Linux
DevOps

Linux System Monitoring Complete Guide 2026

Introduction

Command Line Tools

Essential Commands

top, htop, and btop Comparison

htop Customization

btop Quick Start

CPU Monitoring

mpstat Per-CPU Breakdown

Memory Monitoring

free and vmstat

/proc/meminfo

Disk I/O Monitoring

iostat

iotop

Network Monitoring

nload

iftop

nethogs

SAR (System Activity Reporter)

Installation and Setup

Historical Reports

Report Analysis

Monitoring Stack: Prometheus + Grafana

Docker Compose Setup

Prometheus Configuration

node_exporter Metrics

Alerting

Prometheus Alert Rules

Alertmanager Configuration

Grafana Dashboards

Dashboard Recommendations

Custom Dashboard Queries

Conclusion

Comments

Share this article

👍 Was this article helpful?