PostgreSQL Advanced: Partitioning, JSONB, Window Functions

Introduction

PostgreSQL has evolved into a powerful, feature-rich database system capable of handling complex workloads at scale. Advanced features like table partitioning, JSONB support, and window functions enable developers to build sophisticated applications with superior performance. However, many teams underutilize these capabilities, resulting in slower queries, higher resource consumption, and scalability challenges.

This comprehensive guide covers advanced PostgreSQL features with practical examples and real-world optimization strategies.

Core Concepts & Terminology

Table Partitioning

Dividing large tables into smaller, more manageable pieces based on specific criteria (range, list, hash).

JSONB

Binary JSON data type in PostgreSQL offering superior performance and indexing capabilities compared to JSON.

Window Functions

SQL functions that perform calculations across a set of rows related to the current row.

Common Table Expression (CTE)

Named temporary result set used within a query, enabling complex query composition.

Materialized View

Database object storing query results physically, requiring explicit refresh.

Full-Text Search

PostgreSQL’s built-in capability for searching text data efficiently.

Recursive Query

Query that references itself, useful for hierarchical data.

Index Types

Different indexing strategies (B-tree, Hash, GiST, GIN, BRIN) optimized for different use cases.

Query Planner

PostgreSQL’s optimizer that determines the most efficient way to execute queries.

EXPLAIN ANALYZE

Tool for understanding query execution plans and identifying bottlenecks.

Table Partitioning

Range Partitioning

-- Create partitioned table
CREATE TABLE orders (
    order_id BIGSERIAL,
    customer_id INTEGER,
    order_date DATE,
    amount DECIMAL(10, 2),
    status VARCHAR(50)
) PARTITION BY RANGE (YEAR(order_date), MONTH(order_date));

-- Create partitions by month
CREATE TABLE orders_2025_01 PARTITION OF orders
    FOR VALUES FROM (2025, 1) TO (2025, 2);

CREATE TABLE orders_2025_02 PARTITION OF orders
    FOR VALUES FROM (2025, 2) TO (2025, 3);

CREATE TABLE orders_2025_03 PARTITION OF orders
    FOR VALUES FROM (2025, 3) TO (2025, 4);

-- Create indexes on partitions
CREATE INDEX idx_orders_2025_01_customer ON orders_2025_01(customer_id);
CREATE INDEX idx_orders_2025_02_customer ON orders_2025_02(customer_id);
CREATE INDEX idx_orders_2025_03_customer ON orders_2025_03(customer_id);

-- Query benefits from partition pruning
EXPLAIN ANALYZE
SELECT * FROM orders WHERE order_date >= '2025-02-01' AND order_date < '2025-03-01';
-- Only scans orders_2025_02 partition

List Partitioning

-- Partition by region
CREATE TABLE sales (
    sale_id BIGSERIAL,
    region VARCHAR(50),
    amount DECIMAL(10, 2),
    sale_date DATE
) PARTITION BY LIST (region);

CREATE TABLE sales_us PARTITION OF sales
    FOR VALUES IN ('US-East', 'US-West', 'US-Central');

CREATE TABLE sales_eu PARTITION OF sales
    FOR VALUES IN ('EU-North', 'EU-South', 'EU-Central');

CREATE TABLE sales_apac PARTITION OF sales
    FOR VALUES IN ('APAC-East', 'APAC-South', 'APAC-West');

CREATE TABLE sales_other PARTITION OF sales
    FOR VALUES IN (DEFAULT);

Hash Partitioning

-- Partition by hash for even distribution
CREATE TABLE user_events (
    event_id BIGSERIAL,
    user_id INTEGER,
    event_type VARCHAR(50),
    event_data JSONB,
    created_at TIMESTAMP
) PARTITION BY HASH (user_id);

CREATE TABLE user_events_0 PARTITION OF user_events
    FOR VALUES WITH (MODULUS 4, REMAINDER 0);

CREATE TABLE user_events_1 PARTITION OF user_events
    FOR VALUES WITH (MODULUS 4, REMAINDER 1);

CREATE TABLE user_events_2 PARTITION OF user_events
    FOR VALUES WITH (MODULUS 4, REMAINDER 2);

CREATE TABLE user_events_3 PARTITION OF user_events
    FOR VALUES WITH (MODULUS 4, REMAINDER 3);

Automatic Partition Management

-- Create partitions automatically
CREATE TABLE logs (
    log_id BIGSERIAL,
    level VARCHAR(10),
    message TEXT,
    created_at TIMESTAMP
) PARTITION BY RANGE (created_at);

-- Create initial partition
CREATE TABLE logs_2025_01 PARTITION OF logs
    FOR VALUES FROM ('2025-01-01') TO ('2025-02-01');

-- Function to create new partitions
CREATE OR REPLACE FUNCTION create_monthly_partition()
RETURNS void AS $$
DECLARE
    partition_date DATE;
    partition_name TEXT;
    start_date DATE;
    end_date DATE;
BEGIN
    partition_date := DATE_TRUNC('month', CURRENT_DATE + INTERVAL '1 month')::DATE;
    partition_name := 'logs_' || TO_CHAR(partition_date, 'YYYY_MM');
    start_date := partition_date;
    end_date := partition_date + INTERVAL '1 month';
    
    EXECUTE format(
        'CREATE TABLE IF NOT EXISTS %I PARTITION OF logs FOR VALUES FROM (%L) TO (%L)',
        partition_name, start_date, end_date
    );
END;
$$ LANGUAGE plpgsql;

-- Schedule with pg_cron
SELECT cron.schedule('create_monthly_partitions', '0 0 1 * *', 'SELECT create_monthly_partition()');

JSONB Operations

JSONB Storage and Indexing

-- Create table with JSONB column
CREATE TABLE user_profiles (
    user_id BIGSERIAL PRIMARY KEY,
    username VARCHAR(255) UNIQUE,
    profile JSONB,
    created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP
);

-- Create GIN index for efficient JSONB queries
CREATE INDEX idx_profile_gin ON user_profiles USING GIN (profile);

-- Create expression index for specific fields
CREATE INDEX idx_profile_email ON user_profiles ((profile->>'email'));
CREATE INDEX idx_profile_age ON user_profiles ((CAST(profile->>'age' AS INTEGER)));

-- Insert JSONB data
INSERT INTO user_profiles (username, profile) VALUES
('john_doe', '{
    "email": "[email protected]",
    "age": 30,
    "location": "New York",
    "preferences": {
        "theme": "dark",
        "notifications": true,
        "language": "en"
    },
    "tags": ["developer", "python", "aws"]
}'),
('jane_smith', '{
    "email": "[email protected]",
    "age": 28,
    "location": "San Francisco",
    "preferences": {
        "theme": "light",
        "notifications": false,
        "language": "en"
    },
    "tags": ["designer", "ux", "figma"]
}');

JSONB Query Examples

-- Extract text value
SELECT username, profile->>'email' as email FROM user_profiles;

-- Extract integer value
SELECT username, CAST(profile->>'age' AS INTEGER) as age FROM user_profiles
WHERE CAST(profile->>'age' AS INTEGER) > 25;

-- Query nested objects
SELECT username, profile->'preferences'->>'theme' as theme
FROM user_profiles;

-- Array operations
SELECT username, profile->'tags' as tags
FROM user_profiles
WHERE profile->'tags' @> '"python"'::jsonb;

-- Check key existence
SELECT username FROM user_profiles
WHERE profile ? 'email';

-- Get all keys
SELECT username, jsonb_object_keys(profile) as keys
FROM user_profiles;

-- Aggregate JSONB
SELECT jsonb_agg(profile) as all_profiles
FROM user_profiles;

-- Update JSONB
UPDATE user_profiles
SET profile = jsonb_set(profile, '{preferences,theme}', '"dark"')
WHERE username = 'jane_smith';

-- Delete from JSONB
UPDATE user_profiles
SET profile = profile - 'tags'
WHERE username = 'john_doe';

JSONB Performance Optimization

-- Create materialized view for frequently accessed data
CREATE MATERIALIZED VIEW user_summary AS
SELECT
    user_id,
    username,
    profile->>'email' as email,
    CAST(profile->>'age' AS INTEGER) as age,
    profile->>'location' as location,
    jsonb_array_length(profile->'tags') as tag_count
FROM user_profiles;

-- Create index on materialized view
CREATE INDEX idx_user_summary_email ON user_summary(email);
CREATE INDEX idx_user_summary_age ON user_summary(age);

-- Refresh materialized view
REFRESH MATERIALIZED VIEW CONCURRENTLY user_summary;

-- Query materialized view
SELECT * FROM user_summary WHERE age > 25 ORDER BY age DESC;

Window Functions

Basic Window Functions

-- Create sales data
CREATE TABLE sales_data (
    sale_id BIGSERIAL PRIMARY KEY,
    salesperson VARCHAR(100),
    region VARCHAR(50),
    amount DECIMAL(10, 2),
    sale_date DATE
);

-- ROW_NUMBER: Assign unique number to each row
SELECT
    sale_id,
    salesperson,
    amount,
    ROW_NUMBER() OVER (ORDER BY amount DESC) as rank
FROM sales_data;

-- RANK: Rank with ties
SELECT
    sale_id,
    salesperson,
    amount,
    RANK() OVER (ORDER BY amount DESC) as rank,
    DENSE_RANK() OVER (ORDER BY amount DESC) as dense_rank
FROM sales_data;

-- NTILE: Divide rows into buckets
SELECT
    sale_id,
    salesperson,
    amount,
    NTILE(4) OVER (ORDER BY amount DESC) as quartile
FROM sales_data;

Aggregate Window Functions

-- Running total
SELECT
    sale_id,
    salesperson,
    amount,
    SUM(amount) OVER (
        PARTITION BY salesperson
        ORDER BY sale_date
        ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW
    ) as running_total
FROM sales_data;

-- Moving average
SELECT
    sale_id,
    salesperson,
    amount,
    AVG(amount) OVER (
        PARTITION BY salesperson
        ORDER BY sale_date
        ROWS BETWEEN 2 PRECEDING AND CURRENT ROW
    ) as moving_avg_3
FROM sales_data;

-- Cumulative percentage
SELECT
    sale_id,
    salesperson,
    amount,
    ROUND(100.0 * SUM(amount) OVER (
        ORDER BY sale_date
        ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW
    ) / SUM(amount) OVER (), 2) as cumulative_pct
FROM sales_data;

Lead/Lag Functions

-- Compare with previous row
SELECT
    sale_id,
    salesperson,
    sale_date,
    amount,
    LAG(amount) OVER (
        PARTITION BY salesperson
        ORDER BY sale_date
    ) as previous_amount,
    amount - LAG(amount) OVER (
        PARTITION BY salesperson
        ORDER BY sale_date
    ) as amount_change
FROM sales_data;

-- Compare with next row
SELECT
    sale_id,
    salesperson,
    sale_date,
    amount,
    LEAD(amount) OVER (
        PARTITION BY salesperson
        ORDER BY sale_date
    ) as next_amount
FROM sales_data;

-- First and last values
SELECT
    sale_id,
    salesperson,
    amount,
    FIRST_VALUE(amount) OVER (
        PARTITION BY salesperson
        ORDER BY sale_date
    ) as first_sale,
    LAST_VALUE(amount) OVER (
        PARTITION BY salesperson
        ORDER BY sale_date
        ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING
    ) as last_sale
FROM sales_data;

Performance Optimization

Query Analysis

-- Analyze query execution plan
EXPLAIN ANALYZE
SELECT
    salesperson,
    SUM(amount) as total_sales,
    AVG(amount) as avg_sale,
    COUNT(*) as sale_count
FROM sales_data
WHERE sale_date >= '2025-01-01'
GROUP BY salesperson
ORDER BY total_sales DESC;

-- Use BUFFERS to see cache hits
EXPLAIN (ANALYZE, BUFFERS)
SELECT * FROM sales_data WHERE salesperson = 'John Smith';

-- Identify slow queries
SELECT
    query,
    calls,
    total_time,
    mean_time,
    max_time
FROM pg_stat_statements
ORDER BY mean_time DESC
LIMIT 10;

Index Optimization

-- Create composite index
CREATE INDEX idx_sales_region_date ON sales_data(region, sale_date DESC);

-- Partial index for active records
CREATE INDEX idx_active_sales ON sales_data(salesperson)
WHERE sale_date >= CURRENT_DATE - INTERVAL '90 days';

-- BRIN index for large tables
CREATE INDEX idx_sales_date_brin ON sales_data USING BRIN (sale_date);

-- Check index usage
SELECT
    schemaname,
    tablename,
    indexname,
    idx_scan,
    idx_tup_read,
    idx_tup_fetch
FROM pg_stat_user_indexes
ORDER BY idx_scan DESC;

-- Find unused indexes
SELECT
    schemaname,
    tablename,
    indexname
FROM pg_stat_user_indexes
WHERE idx_scan = 0
AND indexname NOT LIKE 'pg_toast%';

Connection Pooling

-- Configure pgBouncer for connection pooling
-- pgbouncer.ini
[databases]
mydb = host=localhost port=5432 dbname=mydb

[pgbouncer]
pool_mode = transaction
max_client_conn = 1000
default_pool_size = 25
min_pool_size = 10
reserve_pool_size = 5
reserve_pool_timeout = 3

-- Monitor connections
SELECT
    datname,
    usename,
    application_name,
    state,
    COUNT(*) as connection_count
FROM pg_stat_activity
GROUP BY datname, usename, application_name, state;

Best Practices & Common Pitfalls

Best Practices

Use Partitioning: Partition large tables for better performance
Index Strategically: Create indexes on frequently queried columns
Monitor Performance: Use EXPLAIN ANALYZE regularly
Connection Pooling: Use pgBouncer for connection management
Vacuum Regularly: Maintain table health with VACUUM
Archive Old Data: Move old data to archive tables
Use JSONB: Prefer JSONB over JSON for better performance
Window Functions: Use for complex analytical queries
Materialized Views: Cache expensive queries
Monitor Slow Queries: Use pg_stat_statements

Common Pitfalls

Missing Indexes: Queries without proper indexes
Over-Indexing: Too many indexes slowing writes
Inefficient Joins: Unnecessary joins or wrong join types
N+1 Queries: Fetching data in loops
Large Result Sets: Fetching too much data
Unoptimized Aggregations: Inefficient GROUP BY queries
Connection Leaks: Not closing connections properly
Bloated Tables: Not vacuuming regularly
Poor Partitioning: Partitions too large or too small
Ignoring Locks: Long-running transactions blocking others

External Resources

Documentation

Tools

Learning Resources

Conclusion

Advanced PostgreSQL features enable developers to build scalable, performant applications. Table partitioning improves query performance on large datasets, JSONB provides flexible schema capabilities, and window functions enable complex analytical queries.

Success requires understanding these features, monitoring performance regularly, and optimizing based on real-world usage patterns. Start with proper indexing and partitioning strategies, then gradually add advanced features as needed.

PostgreSQL’s power lies not just in its features, but in the discipline of using them correctly.

Introduction

Core Concepts & Terminology

Table Partitioning

JSONB

Window Functions

Common Table Expression (CTE)

Materialized View

Full-Text Search

Recursive Query

Index Types

Query Planner

EXPLAIN ANALYZE

Table Partitioning

Range Partitioning

List Partitioning

Hash Partitioning

Automatic Partition Management

JSONB Operations

JSONB Storage and Indexing

JSONB Query Examples

JSONB Performance Optimization

Window Functions

Basic Window Functions

Aggregate Window Functions

Lead/Lag Functions

Performance Optimization

Query Analysis

Index Optimization

Connection Pooling

Best Practices & Common Pitfalls

Best Practices

Common Pitfalls

External Resources

Documentation

Tools

Learning Resources

Conclusion

Comments