Introduction
When you have multiple services running across multiple servers, ssh-ing into each one to read logs doesn’t scale. Centralized logging aggregates all logs into one place where you can search, filter, and alert across your entire system.
Two main options:
- ELK Stack (Elasticsearch + Logstash + Kibana) โ powerful, feature-rich, higher resource usage
- Grafana Loki โ lightweight, designed for Kubernetes, integrates with Grafana
Structured Logging First
Before setting up log aggregation, make your logs machine-parseable. Structured (JSON) logs are far easier to query than plain text:
// BAD: unstructured โ hard to query
console.log('User 42 logged in from 192.168.1.1 at 2026-03-30T10:00:00Z');
// GOOD: structured JSON โ easy to filter and aggregate
logger.info('User login', {
userId: 42,
ip: '192.168.1.1',
timestamp: new Date().toISOString(),
userAgent: req.headers['user-agent'],
});
Winston (Node.js)
npm install winston
// logger.js
import winston from 'winston';
const logger = winston.createLogger({
level: process.env.LOG_LEVEL || 'info',
format: winston.format.combine(
winston.format.timestamp(),
winston.format.errors({ stack: true }),
winston.format.json()
),
defaultMeta: {
service: 'api-server',
version: process.env.APP_VERSION,
environment: process.env.NODE_ENV,
},
transports: [
// Console output (for Docker/Kubernetes log collection)
new winston.transports.Console(),
// File output (optional)
new winston.transports.File({
filename: '/var/log/app/error.log',
level: 'error',
}),
],
});
export default logger;
// Usage
logger.info('Request received', {
method: req.method,
path: req.path,
userId: req.user?.id,
requestId: req.id,
});
logger.error('Database query failed', {
error: err.message,
stack: err.stack,
query: sql,
params,
});
logger.warn('Rate limit approaching', {
userId: req.user.id,
remaining: rateLimit.remaining,
resetAt: rateLimit.resetAt,
});
Request Logging Middleware
// middleware/requestLogger.js
import { v4 as uuidv4 } from 'uuid';
import logger from '../logger.js';
export function requestLogger(req, res, next) {
req.id = req.headers['x-request-id'] || uuidv4();
res.setHeader('X-Request-ID', req.id);
const start = Date.now();
res.on('finish', () => {
logger.info('HTTP request', {
requestId: req.id,
method: req.method,
path: req.path,
statusCode: res.statusCode,
duration: Date.now() - start,
userId: req.user?.id,
ip: req.ip,
userAgent: req.headers['user-agent'],
});
});
next();
}
ELK Stack Setup
Docker Compose
# docker-compose.elk.yml
services:
elasticsearch:
image: docker.elastic.co/elasticsearch/elasticsearch:8.12.0
environment:
- discovery.type=single-node
- xpack.security.enabled=false
- "ES_JAVA_OPTS=-Xms512m -Xmx512m"
ports:
- "9200:9200"
volumes:
- elasticsearch-data:/usr/share/elasticsearch/data
healthcheck:
test: ["CMD-SHELL", "curl -f http://localhost:9200/_cluster/health || exit 1"]
interval: 30s
timeout: 10s
retries: 5
logstash:
image: docker.elastic.co/logstash/logstash:8.12.0
ports:
- "5044:5044" # Beats input
- "5000:5000" # TCP input
volumes:
- ./logstash/pipeline:/usr/share/logstash/pipeline
depends_on:
elasticsearch:
condition: service_healthy
kibana:
image: docker.elastic.co/kibana/kibana:8.12.0
ports:
- "5601:5601"
environment:
ELASTICSEARCH_HOSTS: http://elasticsearch:9200
depends_on:
elasticsearch:
condition: service_healthy
volumes:
elasticsearch-data:
Logstash Pipeline
# logstash/pipeline/app.conf
input {
# Receive JSON logs over TCP
tcp {
port => 5000
codec => json_lines
}
# Receive from Filebeat
beats {
port => 5044
}
}
filter {
# Parse timestamp
date {
match => ["timestamp", "ISO8601"]
target => "@timestamp"
}
# Add geo info from IP
if [ip] {
geoip {
source => "ip"
target => "geoip"
}
}
# Remove internal fields
mutate {
remove_field => ["@version", "host"]
}
}
output {
elasticsearch {
hosts => ["elasticsearch:9200"]
index => "app-logs-%{+YYYY.MM.dd}"
}
# Debug: also print to stdout
# stdout { codec => rubydebug }
}
Send Logs from Node.js to Logstash
npm install winston-logstash-transport
import WinstonLogstash from 'winston-logstash-transport';
const logger = winston.createLogger({
transports: [
new winston.transports.Console(),
new WinstonLogstash({
host: process.env.LOGSTASH_HOST || 'logstash',
port: 5000,
ssl_enable: false,
}),
],
});
Grafana Loki (Lightweight Alternative)
Loki is designed for Kubernetes and works natively with Grafana. It indexes only labels (not the full log content), making it much cheaper to run than Elasticsearch.
Docker Compose
# docker-compose.loki.yml
services:
loki:
image: grafana/loki:latest
ports:
- "3100:3100"
volumes:
- ./loki-config.yml:/etc/loki/local-config.yaml
- loki-data:/loki
command: -config.file=/etc/loki/local-config.yaml
promtail:
image: grafana/promtail:latest
volumes:
- /var/log:/var/log:ro
- /var/lib/docker/containers:/var/lib/docker/containers:ro
- ./promtail-config.yml:/etc/promtail/config.yml
command: -config.file=/etc/promtail/config.yml
grafana:
image: grafana/grafana:latest
ports:
- "3001:3000"
volumes:
- grafana-data:/var/lib/grafana
volumes:
loki-data:
grafana-data:
# promtail-config.yml โ collect Docker container logs
server:
http_listen_port: 9080
clients:
- url: http://loki:3100/loki/api/v1/push
scrape_configs:
- job_name: docker
docker_sd_configs:
- host: unix:///var/run/docker.sock
refresh_interval: 5s
relabel_configs:
- source_labels: ['__meta_docker_container_name']
target_label: container
- source_labels: ['__meta_docker_container_label_com_docker_compose_service']
target_label: service
Querying Loki with LogQL
# All logs from the api-server service
{service="api-server"}
# Error logs only
{service="api-server"} |= "error"
# JSON parsing and filtering
{service="api-server"} | json | statusCode >= 500
# Count errors per minute
sum(rate({service="api-server"} |= "error" [1m])) by (service)
# P95 latency from structured logs
quantile_over_time(0.95,
{service="api-server"} | json | unwrap duration [5m]
) by (path)
# Logs from multiple services
{service=~"api-server|worker"}
# Logs in last 15 minutes with specific user
{service="api-server"} | json | userId = "42"
Kibana Queries (ELK)
# KQL (Kibana Query Language)
# Find all errors
statusCode: 500 OR statusCode: 503
# Find slow requests
duration > 1000
# Find specific user's requests
userId: "42" AND statusCode: 200
# Find errors in last hour
@timestamp > now-1h AND statusCode >= 500
# Full text search
"database connection failed"
# Wildcard
path: /api/users/*
Log Levels and When to Use Them
// ERROR: something failed that needs attention
logger.error('Payment processing failed', {
orderId,
error: err.message,
userId,
});
// WARN: unexpected but handled โ investigate later
logger.warn('Deprecated API endpoint called', {
path: req.path,
userId: req.user?.id,
});
// INFO: normal operations, key business events
logger.info('Order created', {
orderId: order.id,
userId: order.userId,
total: order.total,
});
// DEBUG: detailed diagnostic info (disabled in production)
logger.debug('Cache miss', {
key: cacheKey,
ttl: 3600,
});
Production log level: Set LOG_LEVEL=info in production. debug generates too much volume.
Log Retention and Cost
Logs grow fast. Plan your retention:
# Elasticsearch ILM (Index Lifecycle Management)
# Hot: 0-7 days (fast SSD, full indexing)
# Warm: 7-30 days (slower storage, read-only)
# Cold: 30-90 days (cheapest storage)
# Delete: after 90 days
# Loki retention (loki-config.yml)
limits_config:
retention_period: 30d # delete logs older than 30 days
compactor:
retention_enabled: true
Cost optimization:
- Only log what you’ll actually query
- Use sampling for high-volume debug logs
- Archive to S3/GCS for compliance, query rarely
Alerting on Logs
# Grafana alert on error rate from Loki
- alert: HighErrorRate
expr: |
sum(rate({service="api-server"} |= "error" [5m])) > 10
for: 5m
annotations:
summary: "More than 10 errors/second for 5 minutes"
Resources
- Elasticsearch Documentation
- Kibana Documentation
- Grafana Loki
- LogQL Reference
- Winston (Node.js)
- Pino (faster Node.js logger)
Comments