Introduction
Good logs are invaluable when debugging production issues. Bad logs are noise that obscures the signal. In modern distributed systems, logs are often the primary way to understand what’s happening when things go wrong.
This comprehensive guide covers logging best practices that make debugging easier and systems more observable. You’ll learn about log levels, structured logging, correlation IDs, log management systems, and practical implementation patterns.
Whether you’re building a small web application or a large distributed system, effective logging is essential for understanding behavior, debugging issues, and maintaining reliable systems.
Understanding Log Levels
The Right Level for Every Message
Log levels help categorize the importance and severity of log messages. Using them consistently makes it easy to filter and find relevant logs.
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ Log Level Hierarchy โ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโค
โ โ
โ CRITICAL (50) - System cannot function โ
โ | โ
โ ERROR (40) - Operation failed โ
โ | โ
โ WARNING (30) - Unexpected but handled โ
โ | โ
โ INFO (20) - Normal operations โ
โ | โ
โ DEBUG (10) - Detailed debugging information โ
โ โ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
When to Use Each Level
| Level | Use For | Examples |
|---|---|---|
| DEBUG | Detailed diagnostic info | Variable values, loop iterations, function calls |
| INFO | Normal operations | User actions, API requests, business events |
| WARNING | Recoverable issues | Retries, deprecations, rate limits |
| ERROR | Failures | Exceptions, timeouts, validation errors |
| CRITICAL | System failures | Database down, cannot start |
Implementation Examples
import logging
import sys
def setup_logger(name: str, level: int = logging.INFO) -> logging.Logger:
"""Set up a structured logger."""
logger = logging.getLogger(name)
logger.setLevel(level)
# Console handler
handler = logging.StreamHandler(sys.stdout)
handler.setLevel(level)
# Formatter
formatter = logging.Formatter(
'%(asctime)s - %(name)s - %(levelname)s - %(message)s'
)
handler.setFormatter(formatter)
logger.addHandler(handler)
return logger
# Example usage
logger = setup_logger(__name__)
# DEBUG - Detailed debugging info
def process_items(items: list):
logger.debug(f"Starting to process {len(items)} items")
for i, item in enumerate(items):
logger.debug(f"Processing item {i+1}/{len(items)}: {item.id}")
# Process item
result = process_item(item)
logger.debug(f"Item {item.id} processed: {result}")
logger.debug(f"Completed processing {len(items)} items")
# INFO - Normal operations
def user_login(user_id: str, ip_address: str):
logger.info(f"User {user_id} logged in from {ip_address}")
def place_order(order_id: str, user_id: str):
logger.info(f"Order {order_id} placed by user {user_id}")
# WARNING - Recoverable issues
def api_request_with_retry(url: str, max_retries: int = 3):
for attempt in range(max_retries):
try:
response = make_request(url)
return response
except ConnectionError as e:
if attempt < max_retries - 1:
logger.warning(
f"Request to {url} failed (attempt {attempt+1}/{max_retries}): {e}"
)
else:
raise
# ERROR - Failures
def process_order(order_id: str):
try:
order = get_order(order_id)
validate_order(order)
charge_customer(order)
fulfill_order(order)
except ValidationError as e:
logger.error(f"Order {order_id} validation failed: {e}")
except PaymentError as e:
logger.error(f"Payment failed for order {order_id}: {e}")
except Exception as e:
logger.error(f"Unexpected error processing order {order_id}: {e}")
raise
# CRITICAL - System failures
def connect_to_database():
try:
db.connect()
except DatabaseUnavailableError:
logger.critical("Database unavailable - cannot serve requests")
sys.exit(1)
What NOT to Log
# โ NEVER log these:
# Sensitive data
logger.info(f"User password: {user.password}") # SECURITY RISK!
logger.info(f"Credit card: {credit_card_number}")
# PII that isn't necessary
logger.info(f"User SSN: {ssn}")
logger.info(f"Medical records: {patient_data}")
# Secrets and keys
logger.debug(f"API key: {api_key}")
logger.debug(f"Token: {access_token}")
# Excessive debug info
for item in huge_list:
logger.debug(f"Item: {item}") # Will flood logs!
# โ
DO log these:
# Action being performed
logger.info(f"Processing order {order_id}")
# Context (not sensitive parts)
logger.info(f"User {user_id} action failed")
# Error details (without sensitive data)
logger.error(f"Payment failed: {error_code}")
Structured Logging
Why Structured Logs Matter
Structured logs use a consistent format (typically JSON) that makes them easy to parse, search, and analyze:
import json
import logging
from datetime import datetime
class StructuredFormatter(logging.Formatter):
"""Format logs as JSON."""
def format(self, record: logging.LogRecord) -> str:
log_data = {
'timestamp': datetime.utcnow().isoformat(),
'level': record.levelname,
'logger': record.name,
'message': record.getMessage(),
'module': record.module,
'function': record.funcName,
'line': record.lineno
}
# Add exception info if present
if record.exc_info:
log_data['exception'] = self.formatException(record.exc_info)
# Add extra fields
if hasattr(record, 'user_id'):
log_data['user_id'] = record.user_id
if hasattr(record, 'request_id'):
log_data['request_id'] = record.request_id
return json.dumps(log_data)
# Usage
logger = logging.getLogger('structured')
handler = logging.StreamHandler()
handler.setFormatter(StructuredFormatter())
logger.addHandler(handler)
logger.setLevel(logging.INFO)
# Now logs will be JSON:
# {"timestamp": "2026-03-13T10:00:00Z", "level": "INFO", "logger": "structured", ...}
Structured Log Examples
import json
from dataclasses import dataclass, asdict
from typing import Optional, Dict, Any
@dataclass
class LogContext:
"""Context for structured logging."""
user_id: Optional[str] = None
request_id: Optional[str] = None
session_id: Optional[str] = None
correlation_id: Optional[str] = None
extra: Dict[str, Any] = None
class StructuredLogger:
"""Logger with structured logging support."""
def __init__(self, name: str):
self.logger = logging.getLogger(name)
def log(self, level: int, message: str, context: LogContext = None, **kwargs):
"""Log with structured context."""
extra = {}
if context:
if context.user_id:
extra['user_id'] = context.user_id
if context.request_id:
extra['request_id'] = context.request_id
if context.correlation_id:
extra['correlation_id'] = context.correlation_id
if context.extra:
extra.update(context.extra)
# Add any additional fields
extra.update(kwargs)
self.logger.log(level, message, extra=extra)
def info(self, message: str, context: LogContext = None, **kwargs):
self.log(logging.INFO, message, context, **kwargs)
def error(self, message: str, context: LogContext = None, **kwargs):
self.log(logging.ERROR, message, context, **kwargs)
# Usage
logger = StructuredLogger('myapp')
# Structured log with context
logger.info(
"Order placed",
context=LogContext(
user_id="user_123",
request_id="req_abc",
correlation_id="corr_xyz"
),
order_id="order_456",
amount=99.99,
currency="USD"
)
# Output:
# {"message": "Order placed", "user_id": "user_123", "request_id": "req_abc",
# "order_id": "order_456", "amount": 99.99, ...}
Correlation IDs
Tracking Requests Across Systems
Correlation IDs help trace requests across services in distributed systems:
import uuid
from contextvars import ContextVar
from functools import wraps
# Store correlation ID in context
correlation_id_var: ContextVar[str] = ContextVar('correlation_id', default='')
def generate_correlation_id() -> str:
"""Generate a new correlation ID."""
return str(uuid.uuid4())
def get_correlation_id() -> str:
"""Get current correlation ID or generate new one."""
corr_id = correlation_id_var.get()
if not corr_id:
corr_id = generate_correlation_id()
correlation_id_var.set(corr_id)
return corr_id
def set_correlation_id(correlation_id: str):
"""Set correlation ID for current context."""
correlation_id_var.set(correlation_id)
# Middleware for web frameworks
def correlation_middleware(app):
"""Middleware to track correlation IDs."""
@wraps(app)
async def wrapper(request, *args, **kwargs):
# Get from header or generate new
correlation_id = request.headers.get('X-Correlation-ID')
if not correlation_id:
correlation_id = generate_correlation_id()
# Set in context
set_correlation_id(correlation_id)
# Add to response header
response = await app(request, *args, **kwargs)
response.headers['X-Correlation-ID'] = correlation_id
return response
return wrapper
# Logging with correlation ID
def log_with_correlation(logger: logging.Logger, level: int, message: str):
"""Log with correlation ID."""
corr_id = get_correlation_id()
logger.log(level, f"[{corr_id}] {message}")
Log Management Pipeline
ELK Stack Overview
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ Log Management Pipeline โ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโค
โ โ
โ Application โ
โ โ โ
โ โผ โ
โ Filebeat / Fluentd โ
โ โ โ
โ โผ โ
โ Logstash / Kafka โ
โ โ โ
โ โผ โ
โ Elasticsearch โ
โ โ โ
โ โผ โ
โ Kibana Dashboard โ
โ โ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
Python Logging to ELK
import logging
from logging.handlers import RotatingFileHandler
import json
class ELKHandler(logging.Handler):
"""Handler that sends logs to Logstash/ELK."""
def __init__(self, host: str, port: int):
super().__init__()
self.host = host
self.port = port
self.socket = None
def emit(self, record: logging.LogRecord):
"""Send log to Logstash."""
try:
log_data = {
'timestamp': self.format(record),
'level': record.levelname,
'logger': record.name,
'message': record.getMessage(),
'module': record.module,
'function': record.funcName,
'line': record.lineno
}
if record.exc_info:
log_data['exception'] = self.formatException(record.exc_info)
# Add extra fields
if hasattr(record, 'user_id'):
log_data['user_id'] = record.user_id
if hasattr(record, 'correlation_id'):
log_data['correlation_id'] = record.correlation_id
# Send to Logstash (via TCP/UDP or HTTP)
self._send(json.dumps(log_data))
except Exception:
self.handleError(record)
def _send(self, message: str):
"""Send message to Logstash."""
import socket
sock = socket.socket(socket.AF_INET, socket.SOCK_STREAM)
try:
sock.connect((self.host, self.port))
sock.sendall(message.encode() + b'\n')
finally:
sock.close()
# Setup logging with ELK
def setup_elk_logging(host: str = 'localhost', port: int = 5044):
"""Set up logging to ELK stack."""
logger = logging.getLogger()
logger.setLevel(logging.INFO)
# Add ELK handler
elk_handler = ELKHandler(host, port)
logger.addHandler(elk_handler)
# Also add console for local development
console = logging.StreamHandler()
console.setFormatter(
logging.Formatter('%(asctime)s - %(levelname)s - %(message)s')
)
logger.addHandler(console)
return logger
Best Practices Summary
| Practice | Implementation |
|---|---|
| Use appropriate levels | DEBUG for debugging, INFO for operations |
| Structure your logs | JSON format for easy parsing |
| Include correlation IDs | Track requests across services |
| Don’t log sensitive data | Passwords, tokens, PII |
| Log context | User ID, request ID, relevant data |
| Use appropriate granularity | Not too verbose, not too sparse |
| Ship to central system | ELK, CloudWatch, Datadog |
Conclusion
Good logging practices are essential for debugging production issues and maintaining observable systems. By using appropriate log levels, structured logging, correlation IDs, and proper log management, you’ll be able to diagnose and fix issues quickly.
Key takeaways:
- Use log levels correctly - DEBUG, INFO, WARNING, ERROR, CRITICAL
- Structure your logs - JSON format enables easy parsing
- Include correlation IDs - Track requests across services
- Never log sensitive data - Security first
- Ship to central system - ELK, CloudWatch, etc.
- Include context - User ID, request ID, relevant data
- Be consistent - Same format across all services
By implementing these logging practices, you’ll have the visibility needed to maintain reliable production systems.
Comments