Introduction
Prometheus collects fana visualizes them. Together they give you dashboards, alerts, and the data to answer “is my system healthy?” and “why is it slow?” See Javascript Guide for more context. See Javascript Guide for more context.
Prerequisites: Docker and Docker Compose installed. Basic understanding of metrics concepts.
Quick Start with Docker Compose
# docker-compose.monitoring.yml
services:
prometheus:
image: prom/prometheus:latest
ports:
- "9090:9090"
volumes:
- ./prometheus.yml:/etc/prometheus/prometheus.yml
- prometheus-data:/prometheus
command:
- '--config.file=/etc/prometheus/prometheus.yml'
- '--storage.tsdb.retention.time=30d'
grafana:
image: grafana/grafana:latest
ports:
- "3001:3000"
environment:
GF_SECURITY_ADMIN_PASSWORD: admin
GF_USERS_ALLOW_SIGN_UP: false
volumes:
- grafana-data:/var/lib/grafana
- ./grafana/dashboards:/etc/grafana/provisioning/dashboards
- ./grafana/datasources:/etc/grafana/provisioning/datasources
alertmanager:
image: prom/alertmanager:latest
ports:
- "9093:9093"
volumes:
- ./alertmanager.yml:/etc/alertmanager/alertmanager.yml
volumes:
prometheus-data:
grafana-data:
docker compose -f docker-compose.monitoring.yml up -d
# Prometheus: http://localhost:9090
# Grafana: http://localhost:3001 (admin/admin)
Prometheus Configuration
# prometheus.yml
global:
scrape_ # how often to scrape targets
evaluation_interval: 15s # how often to evaluate rules
# Alert rules
rule_files:
- "alerts/*.yml"
# Alertmanager
alerting:
alertmanagers:
- static_configs:
- targets: ['alertmanager:9093']
# Scrape targets
scrape_configs:
# Your Node.js application
- job_name: 'myapp'
static_configs:
- targets: ['app:3000']
metrics_path: '/metrics'
# Node Exporter (system metrics: CPU, memory, disk)
- job_name: 'node-exporter'
static_configs:
- targets: ['node-exporter:9100']
# Prometheus itself
- job_name: 'prometheus'
static_configs:
- targets: ['localhost:9090']
# PostgreSQL exporter
- job_name: 'postgres'
static_configs:
- targets: ['postgres-exporter:9187']
Instrumenting Node.js
npm install prom-client
// metrics.js
import { Registry, Counter, Histogram, Gauge, collectDefaultMetrics } from 'prom-client';
const registry = new Registry();
// Colry, CPU, event loop lag)
collectDefaultMetrics({ register: registry });
// HTTP request counter
export const httpRequestsTotal = new Counter({
name: 'http_requests_total',
help: 'Total number of HTTP requests',
labelNames: ['method', 'route', 'status_code'],
registers: [registry],
});
// HTTP request duration histogram
export const httpRequestDuration = new Histogram({
name: 'http_request_duration_seconds',
help: 'HTTP request duration in seconds',
labelNames: ['me', 'status_code'],
buckets: [0.005, 0.01, 0.025, 0.05, 0.1, 0.25, 0.5, 1, 2.5, 5],
registers: [registry],
});
// Active connections gauge
export const activeConnections = new Gauge({
name: 'active_connections',
help: 'Number of active connections',
registers: [registry],
});
// Business metric: orders created
export const ordersCreated = new Counter({
name: 'orders_created_total',
help: 'Total orders created',
heus Documentation](https://prometheus.io/docs/)
- [Grafana Documentation](https://grafana.com/docs/)
- [PromQL Cheat Sheet](https://promlabs.com/promql-cheat-sheet/)
- [prom-client (Node.js)](https://github.com/siimon/prom-client)
- [Grafana Dashboard Library](https://grafana.com/grafana/dashboards/)
- [Google SRE: Four Golden Signals](https://sre.google/sre-book/monitoring-distributed-systems/#xref_monitoring_golden-signals)
description: '{{ .GroupLabels.alertname }}: {{ .CommonAnnotations.summary }}'
The Four Golden Signals
Monitor these four metrics for any service:
| Signal | What it measures | Example metric |
|---|---|---|
| Latency | How long requests take | http_request_duration_seconds |
| Traffic | How much demand | http_requests_total |
| Errors | Rate of failed requests | http_requests_total{status=~"5.."} |
| Saturation | How “full” the service is | CPU, memory, queue depth |
Resources
-
[Prometepeat_interval: 4h receiver: ‘slack-notifications’
routes:
- match: severity: critical receiver: ‘pagerduty’ continue: true
receivers:
-
name: ‘slack-notifications’ slack_configs:
- api_url: ‘https://hooks.slack.com/services/YOUR/WEBHOOK/URL' channel: ‘#alerts’ title: ‘{{ .GroupLabels.alertname }}’ text: ‘{{ range .Alerts }}{{ .Annotations.summary }}{{ end }}’
-
name: ‘pagerduty’ pagerduty_configs:
- routing_key: ‘YOUR_PAGERDUTY_KEY’ notations: summary: “Application is down”
High memory
- alert: HighMemoryUsage expr: | process_resident_memory_bytes / 1024 / 1024 > 512 for: 10m labels: severity: warning annotations: summary: “Memory usage above 512MB: {{ $value | humanize }}MB”
### Alertmanager Configuration
alertmanager.yml
global: resolve_timeout: 5m
route: group_by: [‘alertname’, ‘severity’] group_wait: 30s group_interval: 5m ror rate has been above 5% for 5 minutes"
# High latency
- alert: HighLatency
expr: |
histogram_quantile(0.95,
sum by (le) (rate(http_request_duration_seconds_bucket[5m]))
) > 1.0
for: 5m
labels:
severity: warning
annotations:
summary: "P95 latency above 1 second"
# Service down
- alert: ServiceDown
expr: up{job="myapp"} == 0
for: 1m
labels:
severity: critical
anashboard ID `1860`
- PostgreSQL: Dashboard ID
9628
Alerting
Alert Rules
# alerts/app.yml
groups:
- name: application
rules:
# High error rate
- alert: HighErrorRate
expr: |
sum(rate(http_requests_total{status_code=~"5.."}[5m]))
/ sum(rate(http_requests_total[5m])) > 0.05
for: 5m
labels:
severity: critical
annotations:
summary: "High error rate: {{ $value | humanizePercentage }}"
description: "Err))",
"legendFormat": "p95"
}]
},
{
"title": "Memory Usage",
"type": "graph",
"targets": [{
"expr": "process_resident_memory_bytes / 1024 / 1024",
"legendFormat": "RSS (MB)"
}]
}
]
}
}
Import community dashboards: Grafana has thousands of pre-built dashboards at grafana.com/grafana/dashboards. Popular ones:
- Node.js: Dashboard ID
11159 - Node Exporter (system): D “expr”: “sum(rate(http_requests_total{status_code=~‘5..’}[5m])) / sum(rate(http_requests_total[5m])) * 100” }], “thresholds”: { “steps”: [ {“color”: “green”, “value”: 0}, {“color”: “yellow”, “value”: 1}, {“color”: “red”, “value”: 5} ] } }, { “title”: “P95 Latency”, “type”: “graph”, “targets”: [{ “expr”: “histogram_quantile(0.95, sum by (le) (rate(http_request_duration_seconds_bucket[5m])heus url: http://prometheus:9090 isDefault: true editable: false
### Key Panels for an Application Dashboard
{ “dashboard”: { “title”: “Application Overview”, “panels”: [ { “title”: “Request Rate”, “type”: “graph”, “targets”: [{ “expr”: “sum(rate(http_requests_total[5m])) by (route)”, “legendFormat”: “{{route}}” }] }, { “title”: “Error Rate %”, “type”: “stat”, “targets”: [{ in last hour increase(http_requests_total[1h])
Memory usage (MB)
process_resident_memory_bytes / 1024 / 1024
CPU usage percentage
rate(process_cpu_seconds_total[5m]) * 100
Event loop lag (Node.js)
nodejs_eventloop_lag_seconds
Active connections
active_connections
Orders per minute
rate(orders_created_total{status=“success”}[1m]) * 60
## Grafana Dashboards
### Provisioning a Dashboard
grafana/datasources/prometheus.yml
apiVersion: 1 datasources:
- name: Prometheus type: prometuery language. These are the queries you’ll use most:
# Request rate (requests per second over last 5 minutes)
rate(http_requests_total[5m])
# Error rate (percentage of 5xx responses)
rate(http_requests_total{status_code=~"5.."}[5m])
/ rate(http_requests_total[5m]) * 100
# P95 latency
histogram_quantile(0.95,
rate(http_request_duration_seconds_bucket[5m])
)
# P99 latency by route
histogram_quantile(0.99,
sum by (route, le) (
rate(http_request_duration_seconds_bucket[5m])
)
)
# Total requestsait registry.metrics());
});
// Your routes
app.get('/api/users', async (req, res) => {
const users = await getUsers();
res.json(users);
});
// Track business metrics
app.post('/api/orders', async (req, res) => {
try {
const order = await createOrder(req.body);
ordersCreated.inc({ status: 'success' });
res.json(order);
} catch (err) {
ordersCreated.inc({ status: 'error' });
throw err;
}
});
PromQL: Querying Metrics
PromQL is Prometheus’s qotal.inc(labels); httpRequestDuration.observe(labels, duration); });
next();
}
// app.js import express from ’express’; import { registry } from ‘./metrics.js’; import { metricsMiddleware } from ‘./middleware/metrics.js’;
const app = express();
// Apply metrics middleware to all routes app.use(metricsMiddleware);
// Expose metrics endpoint for Prometheus to scrape app.get(’/metrics’, async (req, res) => { res.set(‘Content-Type’, registry.contentType); res.end(aw registry };
// middleware/metrics.js — Express middleware import { httpRequestsTotal, httpRequestDuration } from ‘../metrics.js’;
export function metricsMiddleware(req, res, next) { const start = Date.now();
res.on('finish', () => {
const duration = (Date.now() - start) / 1000;
const route = req.route?.path || req.path;
const labels = {
method: req.method,
route,
status_code: res.statusCode,
};
httpRequestsT labelNames: ['status'],
registers: [registry],
});
export {
Comments