Introduction

Prometheus collects fana visualizes them. Together they give you dashboards, alerts, and the data to answer “is my system healthy?” and “why is it slow?” See Javascript Guide for more context. See Javascript Guide for more context.

Prerequisites: Docker and Docker Compose installed. Basic understanding of metrics concepts.

Quick Start with Docker Compose

# docker-compose.monitoring.yml
services:
  prometheus:
    image: prom/prometheus:latest
    ports:
      - "9090:9090"
    volumes:
      - ./prometheus.yml:/etc/prometheus/prometheus.yml
      - prometheus-data:/prometheus
    command:
      - '--config.file=/etc/prometheus/prometheus.yml'
      - '--storage.tsdb.retention.time=30d'

  grafana:
    image: grafana/grafana:latest
    ports:
      - "3001:3000"
    environment:
      GF_SECURITY_ADMIN_PASSWORD: admin
      GF_USERS_ALLOW_SIGN_UP: false
    volumes:
      - grafana-data:/var/lib/grafana
      - ./grafana/dashboards:/etc/grafana/provisioning/dashboards
  - ./grafana/datasources:/etc/grafana/provisioning/datasources

  alertmanager:
    image: prom/alertmanager:latest
    ports:
      - "9093:9093"
    volumes:
      - ./alertmanager.yml:/etc/alertmanager/alertmanager.yml

volumes:
  prometheus-data:
  grafana-data:

docker compose -f docker-compose.monitoring.yml up -d
# Prometheus: http://localhost:9090
# Grafana:    http://localhost:3001 (admin/admin)

Prometheus Configuration

# prometheus.yml
global:
  scrape_   # how often to scrape targets
  evaluation_interval: 15s   # how often to evaluate rules

# Alert rules
rule_files:
  - "alerts/*.yml"

# Alertmanager
alerting:
  alertmanagers:
    - static_configs:
        - targets: ['alertmanager:9093']

# Scrape targets
scrape_configs:
  # Your Node.js application
  - job_name: 'myapp'
    static_configs:
      - targets: ['app:3000']
    metrics_path: '/metrics'

  # Node Exporter (system metrics: CPU, memory, disk)
  - job_name: 'node-exporter'
    static_configs:
    - targets: ['node-exporter:9100']

  # Prometheus itself
  - job_name: 'prometheus'
    static_configs:
      - targets: ['localhost:9090']

  # PostgreSQL exporter
  - job_name: 'postgres'
    static_configs:
      - targets: ['postgres-exporter:9187']

Instrumenting Node.js

npm install prom-client

// metrics.js
import { Registry, Counter, Histogram, Gauge, collectDefaultMetrics } from 'prom-client';

const registry = new Registry();

// Colry, CPU, event loop lag)
collectDefaultMetrics({ register: registry });

// HTTP request counter
export const httpRequestsTotal = new Counter({
    name: 'http_requests_total',
    help: 'Total number of HTTP requests',
    labelNames: ['method', 'route', 'status_code'],
    registers: [registry],
});

// HTTP request duration histogram
export const httpRequestDuration = new Histogram({
    name: 'http_request_duration_seconds',
    help: 'HTTP request duration in seconds',
    labelNames: ['me', 'status_code'],
    buckets: [0.005, 0.01, 0.025, 0.05, 0.1, 0.25, 0.5, 1, 2.5, 5],
    registers: [registry],
});

// Active connections gauge
export const activeConnections = new Gauge({
    name: 'active_connections',
    help: 'Number of active connections',
    registers: [registry],
});

// Business metric: orders created
export const ordersCreated = new Counter({
    name: 'orders_created_total',
    help: 'Total orders created',
heus Documentation](https://prometheus.io/docs/)
- [Grafana Documentation](https://grafana.com/docs/)
- [PromQL Cheat Sheet](https://promlabs.com/promql-cheat-sheet/)
- [prom-client (Node.js)](https://github.com/siimon/prom-client)
- [Grafana Dashboard Library](https://grafana.com/grafana/dashboards/)
- [Google SRE: Four Golden Signals](https://sre.google/sre-book/monitoring-distributed-systems/#xref_monitoring_golden-signals)
    description: '{{ .GroupLabels.alertname }}: {{ .CommonAnnotations.summary }}'

The Four Golden Signals

Monitor these four metrics for any service:

Signal	What it measures	Example metric
Latency	How long requests take	`http_request_duration_seconds`
Traffic	How much demand	`http_requests_total`
Errors	Rate of failed requests	`http_requests_total{status=~"5.."}`
Saturation	How “full” the service is	CPU, memory, queue depth

Resources

[Prometepeat_interval: 4h receiver: ‘slack-notifications’

routes:
- match: severity: critical receiver: ‘pagerduty’ continue: true

receivers:

name: ‘slack-notifications’ slack_configs:
- api_url: ‘https://hooks.slack.com/services/YOUR/WEBHOOK/URL' channel: ‘#alerts’ title: ‘{{ .GroupLabels.alertname }}’ text: ‘{{ range .Alerts }}{{ .Annotations.summary }}{{ end }}’
name: ‘pagerduty’ pagerduty_configs:
- routing_key: ‘YOUR_PAGERDUTY_KEY’ notations: summary: “Application is down”
High memory
- alert: HighMemoryUsage expr: | process_resident_memory_bytes / 1024 / 1024 > 512 for: 10m labels: severity: warning annotations: summary: “Memory usage above 512MB: {{ $value | humanize }}MB”


### Alertmanager Configuration

alertmanager.yml

global: resolve_timeout: 5m

route: group_by: [‘alertname’, ‘severity’] group_wait: 30s group_interval: 5m ror rate has been above 5% for 5 minutes"

  # High latency
  - alert: HighLatency
    expr: |
      histogram_quantile(0.95,
        sum by (le) (rate(http_request_duration_seconds_bucket[5m]))
      ) > 1.0
    for: 5m
    labels:
      severity: warning
    annotations:
      summary: "P95 latency above 1 second"

  # Service down
  - alert: ServiceDown
    expr: up{job="myapp"} == 0
    for: 1m
    labels:
      severity: critical
    anashboard ID `1860`

PostgreSQL: Dashboard ID 9628

Alerting

Alert Rules

# alerts/app.yml
groups:
  - name: application
    rules:
      # High error rate
      - alert: HighErrorRate
        expr: |
          sum(rate(http_requests_total{status_code=~"5.."}[5m]))
          / sum(rate(http_requests_total[5m])) > 0.05
        for: 5m
        labels:
          severity: critical
        annotations:
          summary: "High error rate: {{ $value | humanizePercentage }}"
          description: "Err))",
          "legendFormat": "p95"
        }]
      },
      {
        "title": "Memory Usage",
        "type": "graph",
        "targets": [{
          "expr": "process_resident_memory_bytes / 1024 / 1024",
          "legendFormat": "RSS (MB)"
        }]
      }
    ]
  }
}

Import community dashboards: Grafana has thousands of pre-built dashboards at grafana.com/grafana/dashboards. Popular ones:

Node.js: Dashboard ID 11159
Node Exporter (system): D “expr”: “sum(rate(http_requests_total{status_code=~‘5..’}[5m])) / sum(rate(http_requests_total[5m])) * 100” }], “thresholds”: { “steps”: [ {“color”: “green”, “value”: 0}, {“color”: “yellow”, “value”: 1}, {“color”: “red”, “value”: 5} ] } }, { “title”: “P95 Latency”, “type”: “graph”, “targets”: [{ “expr”: “histogram_quantile(0.95, sum by (le) (rate(http_request_duration_seconds_bucket[5m])heus url: http://prometheus:9090 isDefault: true editable: false


### Key Panels for an Application Dashboard

{ “dashboard”: { “title”: “Application Overview”, “panels”: [ { “title”: “Request Rate”, “type”: “graph”, “targets”: [{ “expr”: “sum(rate(http_requests_total[5m])) by (route)”, “legendFormat”: “{{route}}” }] }, { “title”: “Error Rate %”, “type”: “stat”, “targets”: [{ in last hour increase(http_requests_total[1h])

Memory usage (MB)

process_resident_memory_bytes / 1024 / 1024

CPU usage percentage

rate(process_cpu_seconds_total[5m]) * 100

Event loop lag (Node.js)

nodejs_eventloop_lag_seconds

Active connections

active_connections

Orders per minute

rate(orders_created_total{status=“success”}[1m]) * 60

## Grafana Dashboards

### Provisioning a Dashboard

grafana/datasources/prometheus.yml

apiVersion: 1 datasources:

name: Prometheus type: prometuery language. These are the queries you’ll use most:

# Request rate (requests per second over last 5 minutes)
rate(http_requests_total[5m])

# Error rate (percentage of 5xx responses)
rate(http_requests_total{status_code=~"5.."}[5m])
/ rate(http_requests_total[5m]) * 100

# P95 latency
histogram_quantile(0.95,
  rate(http_request_duration_seconds_bucket[5m])
)

# P99 latency by route
histogram_quantile(0.99,
  sum by (route, le) (
    rate(http_request_duration_seconds_bucket[5m])
  )
)

# Total requestsait registry.metrics());
});

// Your routes
app.get('/api/users', async (req, res) => {
    const users = await getUsers();
    res.json(users);
});

// Track business metrics
app.post('/api/orders', async (req, res) => {
    try {
        const order = await createOrder(req.body);
        ordersCreated.inc({ status: 'success' });
        res.json(order);
    } catch (err) {
        ordersCreated.inc({ status: 'error' });
        throw err;
    }
});

PromQL: Querying Metrics

PromQL is Prometheus’s qotal.inc(labels); httpRequestDuration.observe(labels, duration); });

next();

}

// app.js import express from ’express’; import { registry } from ‘./metrics.js’; import { metricsMiddleware } from ‘./middleware/metrics.js’;

const app = express();

// Apply metrics middleware to all routes app.use(metricsMiddleware);

// Expose metrics endpoint for Prometheus to scrape app.get(’/metrics’, async (req, res) => { res.set(‘Content-Type’, registry.contentType); res.end(aw registry };

// middleware/metrics.js — Express middleware import { httpRequestsTotal, httpRequestDuration } from ‘../metrics.js’;

export function metricsMiddleware(req, res, next) { const start = Date.now();

res.on('finish', () => {
    const duration = (Date.now() - start) / 1000;
    const route = req.route?.path || req.path;
    const labels = {
        method: req.method,
        route,
        status_code: res.statusCode,
    };

    httpRequestsT    labelNames: ['status'],
registers: [registry],

});

export {

Monitoring with Prometheus and Grafana: A Practical Setup Guide

Introduction

Quick Start with Docker Compose

Prometheus Configuration

Instrumenting Node.js

The Four Golden Signals

Resources

High memory

alertmanager.yml

Alerting

Alert Rules

Memory usage (MB)

CPU usage percentage

Event loop lag (Node.js)

Active connections

Orders per minute

grafana/datasources/prometheus.yml

PromQL: Querying Metrics

Comments

Share this article

👍 Was this article helpful?