Skip to main content
โšก Calmops

Metrics Collection: Prometheus, StatsD, and Custom Metrics

Metrics Collection: Prometheus, StatsD, and Custom Metrics

TL;DR: This guide covers implementing metrics collection using Prometheus, StatsD, and custom application metrics. Learn about metrics types, instrumentation, and building observable systems.


Introduction

Metrics provide quantitative measurements of system behavior:

  • Counters - Cumulative values (total requests)
  • Gauges - Point-in-time values (memory usage)
  • Histograms - Value distributions (request latency)
  • Summaries - Aggregated percentiles (response times)

Prometheus Basics

Prometheus Installation

# Run Prometheus
docker run -p 9090:9090 -v prometheus.yml:/etc/prometheus/prometheus.yml \
  prom/prometheus

Prometheus Configuration

# prometheus.yml
global:
  scrape_interval: 15s

scrape_configs:
  - job_name: 'my-application'
    static_configs:
      - targets: ['localhost:8080']
    
  - job_name: 'node-exporter'
    static_configs:
      - targets: ['localhost:9100']

Application Instrumentation

Go Application

import (
    "github.com/prometheus/client_golang/prometheus"
    "github.com/prometheus/client_golang/prometheus/promhttp"
    "net/http"
)

var (
    httpRequestsTotal = prometheus.NewCounterVec(
        prometheus.CounterOpts{
            Name: "http_requests_total",
            Help: "Total number of HTTP requests",
        },
        []string{"method", "endpoint", "status"},
    )
    
    httpRequestDuration = prometheus.NewHistogramVec(
        prometheus.HistogramOpts{
            Name:    "http_request_duration_seconds",
            Help:    "HTTP request duration in seconds",
            Buckets: []float64{0.001, 0.01, 0.1, 0.5, 1, 2, 5},
        },
        []string{"method", "endpoint"},
    )
    
    activeConnections = prometheus.NewGauge(
        prometheus.GaugeOpts{
            Name: "active_connections",
            Help: "Number of active connections",
        },
    )
)

func init() {
    prometheus.MustRegister(httpRequestsTotal)
    prometheus.MustRegister(httpRequestDuration)
    prometheus.MustRegister(activeConnections)
}

func metricsMiddleware(next http.Handler) http.Handler {
    return http.HandlerFunc(func(w http.ResponseWriter, r *http.Request) {
        timer := prometheus.NewTimer(httpRequestDuration.WithLabelValues(
            r.Method, r.URL.Path,
        ))
        defer timer.ObserveTime()
        
        // Increment counter
        httpRequestsTotal.WithLabelValues(
            r.Method, r.URL.Path, "200",
        ).Inc()
        
        next.ServeHTTP(w, r)
    })
}

Python Application

from prometheus_client import Counter, Histogram, Gauge, generate_latest
from flask import Flask, Response

app = Flask(__name__)

# Define metrics
http_requests_total = Counter(
    'http_requests_total',
    'Total HTTP requests',
    ['method', 'endpoint', 'status']
)

http_request_duration = Histogram(
    'http_request_duration_seconds',
    'HTTP request duration',
    ['method', 'endpoint'],
    buckets=(0.001, 0.01, 0.1, 0.5, 1.0, 2.0, 5.0)
)

active_connections = Gauge(
    'active_connections',
    'Number of active connections'
)

@app.route('/metrics')
def metrics():
    return Response(generate_latest(), mimetype='text/plain')

@app.before_request
def before_request():
    active_connections.inc()

@app.after_request
def after_request(response):
    http_requests_total.labels(
        method=request.method,
        endpoint=request.endpoint,
        status=response.status_code
    ).inc()
    
    active_connections.dec()
    return response

Custom Business Metrics

E-commerce Metrics

// Order processing metrics
var (
    ordersPlaced = prometheus.NewCounterVec(
        prometheus.CounterOpts{
            Name: "orders_placed_total",
            Help: "Total number of orders placed",
        },
        []string{"status", "payment_method"},
    )
    
    orderValue = prometheus.NewHistogramVec(
        prometheus.HistogramOpts{
            Name:    "order_value_dollars",
            Help:    "Value of orders in dollars",
            Buckets: []float64{10, 25, 50, 100, 250, 500, 1000},
        },
        []string{"category"},
    )
    
    cartAbandonmentRate = prometheus.NewGauge(
        prometheus.GaugeOpts{
            Name: "cart_abandonment_rate",
            Help: "Rate of cart abandonment",
        },
    )
)

func recordOrder(order Order) {
    ordersPlaced.WithLabelValues(
        order.Status,
        order.PaymentMethod,
    ).Inc()
    
    orderValue.WithLabelValues(
        order.Category,
    ).Observe(order.Value)
}

StatsD Integration

StatsD Server

# Run StatsD
docker run -p 8125:8125/udp -p 8126:8126 \
  graphiteapp/statsd-exporter

Sending StatsD Metrics

import statsd

# Initialize client
client = statsd.StatsClient('localhost', 8125)

# Increment counter
client.increment('requests.total')

# Record value
client.gauge('active_users', 150)

# Record timing
client.timing('request.duration', 250)  # milliseconds

# Record with tags (DataDog format)
client.increment('requests.total', tags=['env:production', 'service:api'])

StatsD to Prometheus

# prometheus.yml with StatsD
scrape_configs:
  - job_name: 'statsd-exporter'
    static_configs:
      - targets: ['localhost:9102']

Histograms and Percentiles

Understanding Buckets

// Default buckets for HTTP latency
[]float64{
    0.005,  // 5ms
    0.01,   // 10ms
    0.025,  // 25ms
    0.05,   // 50ms
    0.1,    // 100ms
    0.25,   // 250ms
    0.5,    // 500ms
    1.0,    // 1s
    2.5,    // 2.5s
    5.0,    // 5s
}

// Querying percentiles in Prometheus
histogram_quantile(0.50, rate(http_request_duration_seconds_bucket[5m]))  // p50
histogram_quantile(0.90, rate(http_request_duration_seconds_bucket[5m]))  // p90
histogram_quantile(0.95, rate(http_request_duration_seconds_bucket[5m]))  // p95
histogram_quantile(0.99, rate(http_request_duration_seconds_bucket[5m]))  // p99

Service Level Metrics

RED Metrics

Metric Description Prometheus Query
Rate Requests per second sum(rate(http_requests_total[5m]))
Errors Error rate sum(rate(http_requests_total{status=~"5.."}[5m]))
Duration Response time histogram_quantile(0.95, rate(http_request_duration_seconds_bucket[5m]))

USE Metrics

Metric Description
Utilization Resource usage
Saturation Queue depth, load
Errors Error rate

Alerting Rules

# alerts.yml
groups:
  - name: application
    rules:
      - alert: HighErrorRate
        expr: |
          sum(rate(http_requests_total{status=~"5.."}[5m])) 
          / sum(rate(http_requests_total[5m])) > 0.05
        for: 5m
        labels:
          severity: critical
        annotations:
          summary: "High error rate detected"
          
      - alert: HighLatency
        expr: |
          histogram_quantile(0.95, rate(http_request_duration_seconds_bucket[5m])) > 1
        for: 5m
        labels:
          severity: warning
        annotations:
          summary: "High latency detected"

Conclusion

Metrics collection enables:

  1. Quantitative monitoring - Measure system behavior
  2. Performance optimization - Identify bottlenecks
  3. Alerting - Detect anomalies
  4. Capacity planning - Plan for growth
  5. SLO tracking - Monitor service levels

External Resources


Comments