Skip to main content

Monitoring with Prometheus and Grafana: A Practical Setup Guide

Created: March 7, 2026 Larry Qu 5 min read

Introduction

Prometheus collects fana visualizes them. Together they give you dashboards, alerts, and the data to answer “is my system healthy?” and “why is it slow?” See Javascript Guide for more context. See Javascript Guide for more context.

Prerequisites: Docker and Docker Compose installed. Basic understanding of metrics concepts.

Quick Start with Docker Compose

# docker-compose.monitoring.yml
services:
  prometheus:
    image: prom/prometheus:latest
    ports:
      - "9090:9090"
    volumes:
      - ./prometheus.yml:/etc/prometheus/prometheus.yml
      - prometheus-data:/prometheus
    command:
      - '--config.file=/etc/prometheus/prometheus.yml'
      - '--storage.tsdb.retention.time=30d'

  grafana:
    image: grafana/grafana:latest
    ports:
      - "3001:3000"
    environment:
      GF_SECURITY_ADMIN_PASSWORD: admin
      GF_USERS_ALLOW_SIGN_UP: false
    volumes:
      - grafana-data:/var/lib/grafana
      - ./grafana/dashboards:/etc/grafana/provisioning/dashboards
  - ./grafana/datasources:/etc/grafana/provisioning/datasources

  alertmanager:
    image: prom/alertmanager:latest
    ports:
      - "9093:9093"
    volumes:
      - ./alertmanager.yml:/etc/alertmanager/alertmanager.yml

volumes:
  prometheus-data:
  grafana-data:
docker compose -f docker-compose.monitoring.yml up -d
# Prometheus: http://localhost:9090
# Grafana:    http://localhost:3001 (admin/admin)

Prometheus Configuration

# prometheus.yml
global:
  scrape_   # how often to scrape targets
  evaluation_interval: 15s   # how often to evaluate rules

# Alert rules
rule_files:
  - "alerts/*.yml"

# Alertmanager
alerting:
  alertmanagers:
    - static_configs:
        - targets: ['alertmanager:9093']

# Scrape targets
scrape_configs:
  # Your Node.js application
  - job_name: 'myapp'
    static_configs:
      - targets: ['app:3000']
    metrics_path: '/metrics'

  # Node Exporter (system metrics: CPU, memory, disk)
  - job_name: 'node-exporter'
    static_configs:
    - targets: ['node-exporter:9100']

  # Prometheus itself
  - job_name: 'prometheus'
    static_configs:
      - targets: ['localhost:9090']

  # PostgreSQL exporter
  - job_name: 'postgres'
    static_configs:
      - targets: ['postgres-exporter:9187']

Instrumenting Node.js

npm install prom-client
// metrics.js
import { Registry, Counter, Histogram, Gauge, collectDefaultMetrics } from 'prom-client';

const registry = new Registry();

// Colry, CPU, event loop lag)
collectDefaultMetrics({ register: registry });

// HTTP request counter
export const httpRequestsTotal = new Counter({
    name: 'http_requests_total',
    help: 'Total number of HTTP requests',
    labelNames: ['method', 'route', 'status_code'],
    registers: [registry],
});

// HTTP request duration histogram
export const httpRequestDuration = new Histogram({
    name: 'http_request_duration_seconds',
    help: 'HTTP request duration in seconds',
    labelNames: ['me', 'status_code'],
    buckets: [0.005, 0.01, 0.025, 0.05, 0.1, 0.25, 0.5, 1, 2.5, 5],
    registers: [registry],
});

// Active connections gauge
export const activeConnections = new Gauge({
    name: 'active_connections',
    help: 'Number of active connections',
    registers: [registry],
});

// Business metric: orders created
export const ordersCreated = new Counter({
    name: 'orders_created_total',
    help: 'Total orders created',
heus Documentation](https://prometheus.io/docs/)
- [Grafana Documentation](https://grafana.com/docs/)
- [PromQL Cheat Sheet](https://promlabs.com/promql-cheat-sheet/)
- [prom-client (Node.js)](https://github.com/siimon/prom-client)
- [Grafana Dashboard Library](https://grafana.com/grafana/dashboards/)
- [Google SRE: Four Golden Signals](https://sre.google/sre-book/monitoring-distributed-systems/#xref_monitoring_golden-signals)
    description: '{{ .GroupLabels.alertname }}: {{ .CommonAnnotations.summary }}'

The Four Golden Signals

Monitor these four metrics for any service:

Signal What it measures Example metric
Latency How long requests take http_request_duration_seconds
Traffic How much demand http_requests_total
Errors Rate of failed requests http_requests_total{status=~"5.."}
Saturation How “full” the service is CPU, memory, queue depth

Resources

  • [Prometepeat_interval: 4h receiver: ‘slack-notifications’

    routes:

    • match: severity: critical receiver: ‘pagerduty’ continue: true

receivers:

  • name: ‘slack-notifications’ slack_configs:

  • name: ‘pagerduty’ pagerduty_configs:

    • routing_key: ‘YOUR_PAGERDUTY_KEY’ notations: summary: “Application is down”

    High memory

    • alert: HighMemoryUsage expr: | process_resident_memory_bytes / 1024 / 1024 > 512 for: 10m labels: severity: warning annotations: summary: “Memory usage above 512MB: {{ $value | humanize }}MB”

### Alertmanager Configuration

alertmanager.yml

global: resolve_timeout: 5m

route: group_by: [‘alertname’, ‘severity’] group_wait: 30s group_interval: 5m ror rate has been above 5% for 5 minutes"

  # High latency
  - alert: HighLatency
    expr: |
      histogram_quantile(0.95,
        sum by (le) (rate(http_request_duration_seconds_bucket[5m]))
      ) > 1.0
    for: 5m
    labels:
      severity: warning
    annotations:
      summary: "P95 latency above 1 second"

  # Service down
  - alert: ServiceDown
    expr: up{job="myapp"} == 0
    for: 1m
    labels:
      severity: critical
    anashboard ID `1860`
  • PostgreSQL: Dashboard ID 9628

Alerting

Alert Rules

# alerts/app.yml
groups:
  - name: application
    rules:
      # High error rate
      - alert: HighErrorRate
        expr: |
          sum(rate(http_requests_total{status_code=~"5.."}[5m]))
          / sum(rate(http_requests_total[5m])) > 0.05
        for: 5m
        labels:
          severity: critical
        annotations:
          summary: "High error rate: {{ $value | humanizePercentage }}"
          description: "Err))",
          "legendFormat": "p95"
        }]
      },
      {
        "title": "Memory Usage",
        "type": "graph",
        "targets": [{
          "expr": "process_resident_memory_bytes / 1024 / 1024",
          "legendFormat": "RSS (MB)"
        }]
      }
    ]
  }
}

Import community dashboards: Grafana has thousands of pre-built dashboards at grafana.com/grafana/dashboards. Popular ones:

  • Node.js: Dashboard ID 11159
  • Node Exporter (system): D “expr”: “sum(rate(http_requests_total{status_code=~‘5..’}[5m])) / sum(rate(http_requests_total[5m])) * 100” }], “thresholds”: { “steps”: [ {“color”: “green”, “value”: 0}, {“color”: “yellow”, “value”: 1}, {“color”: “red”, “value”: 5} ] } }, { “title”: “P95 Latency”, “type”: “graph”, “targets”: [{ “expr”: “histogram_quantile(0.95, sum by (le) (rate(http_request_duration_seconds_bucket[5m])heus url: http://prometheus:9090 isDefault: true editable: false

### Key Panels for an Application Dashboard

{ “dashboard”: { “title”: “Application Overview”, “panels”: [ { “title”: “Request Rate”, “type”: “graph”, “targets”: [{ “expr”: “sum(rate(http_requests_total[5m])) by (route)”, “legendFormat”: “{{route}}” }] }, { “title”: “Error Rate %”, “type”: “stat”, “targets”: [{ in last hour increase(http_requests_total[1h])

Memory usage (MB)

process_resident_memory_bytes / 1024 / 1024

CPU usage percentage

rate(process_cpu_seconds_total[5m]) * 100

Event loop lag (Node.js)

nodejs_eventloop_lag_seconds

Active connections

active_connections

Orders per minute

rate(orders_created_total{status=“success”}[1m]) * 60

## Grafana Dashboards

### Provisioning a Dashboard

grafana/datasources/prometheus.yml

apiVersion: 1 datasources:

  • name: Prometheus type: prometuery language. These are the queries you’ll use most:
# Request rate (requests per second over last 5 minutes)
rate(http_requests_total[5m])

# Error rate (percentage of 5xx responses)
rate(http_requests_total{status_code=~"5.."}[5m])
/ rate(http_requests_total[5m]) * 100

# P95 latency
histogram_quantile(0.95,
  rate(http_request_duration_seconds_bucket[5m])
)

# P99 latency by route
histogram_quantile(0.99,
  sum by (route, le) (
    rate(http_request_duration_seconds_bucket[5m])
  )
)

# Total requestsait registry.metrics());
});

// Your routes
app.get('/api/users', async (req, res) => {
    const users = await getUsers();
    res.json(users);
});

// Track business metrics
app.post('/api/orders', async (req, res) => {
    try {
        const order = await createOrder(req.body);
        ordersCreated.inc({ status: 'success' });
        res.json(order);
    } catch (err) {
        ordersCreated.inc({ status: 'error' });
        throw err;
    }
});

PromQL: Querying Metrics

PromQL is Prometheus’s qotal.inc(labels); httpRequestDuration.observe(labels, duration); });

next();

}

// app.js import express from ’express’; import { registry } from ‘./metrics.js’; import { metricsMiddleware } from ‘./middleware/metrics.js’;

const app = express();

// Apply metrics middleware to all routes app.use(metricsMiddleware);

// Expose metrics endpoint for Prometheus to scrape app.get(’/metrics’, async (req, res) => { res.set(‘Content-Type’, registry.contentType); res.end(aw registry };

// middleware/metrics.js — Express middleware import { httpRequestsTotal, httpRequestDuration } from ‘../metrics.js’;

export function metricsMiddleware(req, res, next) { const start = Date.now();

res.on('finish', () => {
    const duration = (Date.now() - start) / 1000;
    const route = req.route?.path || req.path;
    const labels = {
        method: req.method,
        route,
        status_code: res.statusCode,
    };

    httpRequestsT    labelNames: ['status'],
registers: [registry],

});

export {

Comments

Share this article

Scan to read on mobile