Introduction
Database connections are expensive to create and destroy. Each connection requires TCP handshake, authentication, and memory allocation on both client and server. In high-throughput applications, the overhead of establishing connections can become a significant bottleneck. Connection pooling solves this by maintaining a cache of database connections that can be reused across requests.
The Problem: Connection Overhead
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ Without Connection Pooling โ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโค
โ โ
โ Request 1: โโโถ [Create Conn] โโโถ [Query] โโโถ [Close] โ
โ Request 2: โโโถ [Create Conn] โโโถ [Query] โโโถ [Close] โ
โ Request 3: โโโถ [Create Conn] โโโถ [Query] โโโถ [Close] โ
โ โ
โ Each request: โ
โ โข TCP handshake (~1-2ms) โ
โ โข Authentication (~1-5ms) โ
โ โข Memory allocation โ
โ โข Query processing โ
โ โ
โ At 1000 req/sec: 1000 connections/sec โ
โ Database overwhelmed with connection overhead โ
โ โ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ With Connection Pooling โ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโค
โ โ
โ โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ โ
โ โ CONNECTION POOL โ โ
โ โ โโโโโโโ โโโโโโโ โโโโโโโ โโโโโโโ โ โ
โ โ โ Connโ โ Connโ โ Connโ โ Connโ ... โ โ
โ โ โ 1 โ โ 2 โ โ 3 โ โ 4 โ โ โ
โ โ โโโโโโโ โโโโโโโ โโโโโโโ โโโโโโโ โ โ
โ โ โ โ
โ โ Pool Size: 20-50 connections โ โ
โ โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ โ
โ โ
โ Request 1: โโโถ [Get Conn] โโโถ [Query] โโโถ [Return Conn] โ
โ Request 2: โโโถ [Get Conn] โโโถ [Query] โโโถ [Return Conn] โ
โ Request 3: โโโถ [Get Conn] โโโถ [Query] โโโถ [Return Conn] โ
โ โ
โ Connections reused, minimal overhead โ
โ โ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
How Connection Pools Work
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ Connection Pool Lifecycle โ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโค
โ โ
โ 1. INITIALIZATION โ
โ โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ โ
โ โ Create min_size connections โ โ
โ โ (e.g., 5 connections on startup) โ โ
โ โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ โ
โ โ
โ 2. REQUEST HANDLING โ
โ โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ โ
โ โ Thread A: Get Conn โ Query โ Returnโ โ
โ โ Thread B: Get Conn โ Query โ Returnโ โ
โ โ Thread C: Get Conn โ Query โ Returnโ โ
โ โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ โ
โ โ
โ 3. SCALING (when needed) โ
โ โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ โ
โ โ If all busy and queue not full: โ โ
โ โ Create new connection (up to max) โ โ
โ โ If queue full and all busy: โ โ
โ โ Wait with timeout or raise error โ โ
โ โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ โ
โ โ
โ 4. MAINTENANCE โ
โ โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ โ
โ โ โข Test connections periodically โ โ
โ โ โข Close idle connections (timeout) โ โ
โ โ โข Replace broken connections โ โ
โ โ โข Recycle connections (max lifetime)โ โ
โ โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ โ
โ โ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
Configuration Parameters
Key Pool Settings
| Parameter | Description | Typical Values |
|---|---|---|
| min_size | Minimum connections kept open | 5-10 |
| max_size | Maximum connections allowed | 20-100 |
| max_overflow | Additional connections above max | 10-20 |
| pool_timeout | Wait time before timeout | 30 seconds |
| pool_recycle | Recycle connections (seconds) | 3600 (1 hour) |
| pool_pre_ping | Test connections before use | true |
| max_idle_time | Close idle connections after | 600 seconds |
Implementation Examples
Python with SQLAlchemy
from sqlalchemy import create_engine, event
from sqlalchemy.pool import QueuePool, NullPool
import logging
# Basic connection pool configuration
engine = create_engine(
"postgresql://user:password@localhost/mydb",
poolclass=QueuePool,
pool_size=10, # Regular connections
max_overflow=20, # Additional connections under load
pool_timeout=30, # Wait up to 30 seconds
pool_recycle=3600, # Recycle after 1 hour
pool_pre_ping=True, # Test connection before use
echo=False # Set True for SQL debugging
)
# For high-performance scenarios
high_perf_engine = create_engine(
"postgresql://user:password@localhost/mydb",
poolclass=QueuePool,
pool_size=20,
max_overflow=30,
pool_timeout=10,
pool_recycle=1800,
pool_pre_ping=True,
max_identifier_length=256
)
# For testing (no pooling)
test_engine = create_engine(
"postgresql://user:password@localhost/mydb",
poolclass=NullPool # No connection pooling
)
# Using the engine
from sqlalchemy.orm import sessionmaker
Session = sessionmaker(bind=engine)
def get_user(user_id: int) -> dict:
session = Session()
try:
result = session.query(User).filter(User.id == user_id).first()
return {"id": result.id, "name": result.name} if result else None
finally:
session.close() # Returns connection to pool
# Using context manager (recommended)
def get_user_safe(user_id: int) -> dict:
with engine.connect() as connection:
result = connection.execute(
text("SELECT id, name FROM users WHERE id = :id"),
{"id": user_id}
)
row = result.fetchone()
return dict(row) if row else None
Node.js with pg
const { Pool } = require('pg');
// Basic pool configuration
const pool = new Pool({
host: 'localhost',
user: 'user',
password: 'password',
database: 'mydb',
max: 20, // Max connections
min: 5, // Min connections
idleTimeoutMillis: 30000, // Close idle after 30s
connectionTimeoutMillis: 2000, // Connection timeout
maxLifetimeMillis: 3600000, // Recycle after 1 hour
});
// High-performance pool
const highPerfPool = new Pool({
host: 'localhost',
user: 'user',
password: 'password',
database: 'mydb',
max: 50,
min: 10,
idleTimeoutMillis: 60000,
connectionTimeoutMillis: 5000,
statement_timeout: 10000, // Query timeout
query_timeout: 10000,
});
// Using the pool
async function getUser(userId) {
const client = await pool.connect();
try {
const result = await client.query(
'SELECT id, name FROM users WHERE id = $1',
[userId]
);
return result.rows[0] || null;
} finally {
client.release(); // Return to pool
}
}
// Using pool directly (auto-release)
async function getUserSimple(userId) {
const result = await pool.query(
'SELECT id, name FROM users WHERE $1',
[userId]
);
return result.rows[0];
}
// Pool events
pool.on('error', (err) => {
console.error('Unexpected pool error', err);
});
pool.on('connect', (client) => {
console.log('New connection established');
});
Go with pgx
package main
import (
"context"
"fmt"
"os"
"github.com/jackc/pgx/v5"
"github.com/jackc/pgx/v5/pgxpool"
)
func main() {
// Create connection pool
config, err := pgxpool.ParseConfig("")
if err != nil {
fmt.Fprintf(os.Stderr, "Unable to parse config: %v\n", err)
os.Exit(1)
}
// Pool configuration
config.MinConns = 5
config.MaxConns = 50
config.MaxConnLifetime = 3600 // 1 hour
config.MaxConnIdleTime = 600 // 10 minutes
config.HealthCheckPeriod = 60 // Check every minute
// Create pool
pool, err := pgxpool.NewWithConfig(context.Background(), config)
if err != nil {
fmt.Fprintf(os.Stderr, "Unable to create pool: %v\n", err)
os.Exit(1)
}
defer pool.Close()
// Test connection
if err := pool.Ping(context.Background()); err != nil {
fmt.Fprintf(os.Stderr, "Unable to ping: %v\n", err)
os.Exit(1)
}
// Query using pool
var name string
err = pool.QueryRow(context.Background(),
"SELECT name FROM users WHERE id = $1", 1).Scan(&name)
if err != nil {
fmt.Fprintf(os.Stderr, "Query failed: %v\n", err)
}
fmt.Println("User:", name)
}
Java with HikariCP
import com.zaxxer.hikari.HikariConfig;
import com.zaxxer.hikari.HikariDataSource;
import java.sql.Connection;
import java.sql.ResultSet;
import java.sql.Statement;
public class HikariCPExample {
public static void main(String[] args) {
HikariConfig config = new HikariConfig();
// Basic configuration
config.setJdbcUrl("jdbc:postgresql://localhost:5432/mydb");
config.setUsername("user");
config.setPassword("password");
// Pool sizing
config.setMinimumIdle(5); // Min idle connections
config.setMaximumPoolSize(50); // Max connections
config.setIdleTimeout(600000); // 10 minutes idle
config.setConnectionTimeout(30000); // 30 second timeout
config.setMaxLifetime(1800000); // 30 minute max lifetime
// Connection validation
config.setConnectionTestQuery("SELECT 1");
config.setAutoCommit(false); // Disable auto-commit
// Performance options
config.addDataSourceProperty("cachePrepStmts", "true");
config.addDataSourceProperty("prepStmtCacheSize", "250");
config.addDataSourceProperty("prepStmtCacheSqlLimit", "2048");
HikariDataSource dataSource = new HikariDataSource(config);
// Use the connection
try (Connection conn = dataSource.getConnection();
Statement stmt = conn.createStatement();
ResultSet rs = stmt.executeQuery("SELECT * FROM users")) {
while (rs.next()) {
System.out.println(rs.getString("name"));
}
} catch (Exception e) {
e.printStackTrace();
} finally {
dataSource.close();
}
}
}
Pool Sizing Guidelines
Calculating Optimal Pool Size
def calculate_optimal_pool_size(
avg_query_time_ms: float,
peak_requests_per_second: int,
max_database_connections: int
) -> dict:
"""
Calculate optimal connection pool size
Formula: Pool Size = (Peak RPS ร Avg Query Time) / 1000
"""
# Average time a connection is busy per request
busy_time_per_request = avg_query_time_ms / 1000 # seconds
# Connections needed to handle peak load
needed_connections = peak_requests_per_second * busy_time_per_request
# Add 20% buffer
optimal_size = int(needed_connections * 1.2)
# Cap at database limit
optimal_size = min(optimal_size, max_database_connections)
return {
"min_connections": max(5, optimal_size // 3),
"max_connections": optimal_size,
"overflow_allowed": max(10, optimal_size // 2)
}
# Example: 100 RPS, 50ms avg query, 200 max DB connections
result = calculate_optimal_pool_size(
avg_query_time_ms=50,
peak_requests_per_second=100,
max_database_connections=200
)
# Result: min=17, max=50, overflow=25
Pool Size Recommendations by Workload
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ Pool Size Recommendations โ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโค
โ โ
โ Web Application (typical) โ
โ โข 10-20 connections for moderate traffic โ
โ โข 50-100 connections for high traffic โ
โ โข Use connection pooler (PgBouncer, ProxySQL) โ
โ โ
โ Batch Processing โ
โ โข Fewer connections, longer queries โ
โ โข 5-10 connections, process in chunks โ
โ โ
โ Real-time Applications โ
โ โข Many short queries โ
โ โข 50-100+ connections needed โ
โ โข Consider read replicas โ
โ โ
โ Serverless (AWS Lambda, etc.) โ
โ โข Each function instance needs connections โ
โ โข Use RDS Proxy or similar โ
โ โข Keep connections outside function code โ
โ โ
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
Connection Poolers
PostgreSQL with PgBouncer
# pgbouncer.ini
[databases]
mydb = host=localhost port=5432 dbname=mydb
[pgbouncer]
listen_addr = 127.0.0.1
listen_port = 6432
auth_type = md5
auth_file = users.txt
pool_mode = transaction # or session
default_pool_size = 25
min_pool_size = 10
max_client_conn = 200
max_db_connections = 100
max_user_connections = 100
server_lifetime = 3600
server_idle_timeout = 600
# Start PgBouncer
pgbouncer pgbouncer.ini
# Connection string to PgBouncer
postgresql://user:password@localhost:6432/mydb
MySQL with ProxySQL
-- Add backend
INSERT INTO mysql_servers(hostgroup_id, hostname, port)
VALUES (1, '127.0.0.1', 3306);
-- Add user
INSERT INTO mysql_users(username, password, default_hostgroup)
VALUES ('app_user', 'password', 1);
-- Configure connection pool
UPDATE mysql_variables
SET variable_value='50'
WHERE variable_name='mysql-max_connections';
-- Set pool size
UPDATE mysql_servers
SET max_connections=50
WHERE hostname='127.0.0.1';
Monitoring and Diagnostics
Key Metrics to Monitor
import logging
from sqlalchemy import event, create_engine
from sqlalchemy.pool import QueuePool
engine = create_engine("postgresql://user:pass@localhost/mydb", pool_size=10)
@event.listens_for(engine, "checkout")
def receive_checkout(dbapi_connection, connection_record, connection_proxy):
logging.info(f"Connection checked out: {id(dbapi_connection)}")
@event.listens_for(engine, "checkin")
def receive_checkin(dbapi_connection, connection_record):
logging.info(f"Connection returned: {id(dbapi_connection)}")
# Custom pool status check
def get_pool_status(engine):
pool = engine.pool
return {
"size": pool.size(),
"checked_in": pool.checkedin(),
"checked_out": pool.checkedout(),
"overflow": pool.overflow(),
"invalid": pool.invalidatedcount() if hasattr(pool, 'invalidatedcount') else 0
}
# Usage
print(get_pool_status(engine))
# {'size': 10, 'checked_in': 8, 'checked_out': 2, 'overflow': 0, 'invalid': 0}
PostgreSQL Monitoring Queries
-- View current connections
SELECT state, count(*)
FROM pg_stat_activity
WHERE datname = 'mydb'
GROUP BY state;
-- View longest running queries
SELECT pid, state, query,
now() - query_start as duration
FROM pg_stat_activity
WHERE state != 'idle'
ORDER BY duration DESC;
-- View connection pool status (with PgBouncer)
SELECT * FROM pgbouncer.pools;
SELECT * FROM pgbouncer.clients;
SELECT * FROM pgbouncer.servers;
Common Pitfalls
Pitfall 1: Pool Size Too Small
# BAD: Too few connections, requests wait
engine = create_engine(
"postgresql://user:pass@localhost/mydb",
pool_size=2,
max_overflow=0
)
# GOOD: Properly sized pool
engine = create_engine(
"postgresql://user:pass@localhost/mydb",
pool_size=20,
max_overflow=20
)
Pitfall 2: Not Returning Connections
# BAD: Connection leak
def bad_query():
conn = engine.connect() # No close!
return conn.execute(text("SELECT 1"))
# GOOD: Always return connection
def good_query():
with engine.connect() as conn:
return conn.execute(text("SELECT 1"))
# GOOD: Explicit close
def explicit_close():
conn = engine.connect()
try:
return conn.execute(text("SELECT 1"))
finally:
conn.close()
Pitfall 3: No Connection Validation
# BAD: No validation, might use broken connection
engine = create_engine(
"postgresql://user:pass@localhost/mydb",
pool_pre_ping=False
)
# GOOD: Validate before use
engine = create_engine(
"postgresql://user:pass@localhost/mydb",
pool_pre_ping=True,
pool_recycle=3600
)
Pitfall 4: Blocking on Exhaustion
# BAD: Infinite wait causes deadlock
engine = create_engine(
"postgresql://user:pass@localhost/mydb",
pool_timeout=0 # Wait forever!
)
# GOOD: Proper timeout with handling
engine = create_engine(
"postgresql://user:pass@localhost/mydb",
pool_timeout=30
)
def query_with_timeout():
try:
with engine.connect() as conn:
return conn.execute(text("SELECT 1"))
except TimeoutError:
# Handle gracefully
return get_cached_result()
Serverless Considerations
AWS Lambda with RDS Proxy
import json
import os
import boto3
import pymysql
# RDS Proxy endpoint
RDS_PROXY_HOST = os.environ['RDS_PROXY_HOST']
def lambda_handler(event, context):
try:
connection = pymysql.connect(
host=RDS_PROXY_HOST,
user=os.environ['DB_USER'],
password=os.environ['DB_PASSWORD'],
database=os.environ['DB_NAME'],
connect_timeout=5
)
try:
with connection.cursor() as cursor:
cursor.execute("SELECT * FROM users LIMIT 1")
result = cursor.fetchone()
return {"statusCode": 200, "body": json.dumps(result)}
finally:
connection.close()
except Exception as e:
return {"statusCode": 500, "body": str(e)}
Connection Management Best Practices
# BAD: Create new connection per invocation
def lambda_handler(event, context):
conn = pymysql.connect(...) # Expensive!
# ... query ...
conn.close()
# GOOD: Use connection pooler (RDS Proxy, Aurora Serverless)
# Connections are managed externally
# GOOD: Reuse connection outside handler
import pymysql
connection = None
def get_connection():
global connection
if connection is None or not connection.open:
connection = pymysql.connect(...)
return connection
Conclusion
Database connection pooling is essential for building high-performance, scalable applications. Key takeaways:
- Size your pool appropriately based on query time and expected RPS
- Always validate connections using
pool_pre_pingor similar - Use connection poolers (PgBouncer, ProxySQL) for additional efficiency
- Monitor pool metrics to detect issues before they become problems
- Handle pool exhaustion gracefully with timeouts and fallback strategies
- For serverless, use RDS Proxy or similar managed solutions
Proper connection pool configuration can improve application throughput by 10-100x compared to creating connections on each request.
Resources
- SQLAlchemy Connection Pool Documentation
- HikariCP GitHub
- PgBouncer Documentation
- PostgreSQL Connection Pooling
- AWS RDS Proxy
Comments