DuckDB Internals: Vectorized Execution, Columnar Storage, and Query Processing

Introduction

Understanding DuckDB’s internal architecture helps developers and data engineers make informed decisions about query optimization and performance tuning. DuckDB’s design prioritizes analytical query performance through vectorized execution and columnar storage.

This article explores DuckDB’s internal mechanisms, including its query processing pipeline, storage format, and the architectural decisions that enable its exceptional performance.

Architecture Overview

System Architecture

┌─────────────────────────────────────────────────────────────┐
│                      DuckDB Architecture                      │
├─────────────────────────────────────────────────────────────┤
│  SQL Layer                                                   │
│  ┌──────────┐  ┌──────────┐  ┌──────────┐  ┌─────────────┐ │
│  │  Parser  │→ │ Resolver │→ │ Optimizer │→ │  Executor  │ │
│  └──────────┘  └──────────┘  └──────────┘  └─────────────┘ │
├─────────────────────────────────────────────────────────────┤
│  Vectorized Execution Engine                                 │
│  ┌──────────────────────────────────────────────────────┐ │
│  │  Vector Operations (batch processing)                 │ │
│  └──────────────────────────────────────────────────────┘ │
├─────────────────────────────────────────────────────────────┤
│  Storage Layer                                              │
│  ┌──────────────────────────────────────────────────────┐ │
│  │  Columnar Format (Parquet-like)                      │ │
│  └──────────────────────────────────────────────────────┘ │
├─────────────────────────────────────────────────────────────┤
│  Data Types & Extensions                                    │
└─────────────────────────────────────────────────────────────────────┘

Query Processing Flow

SQL Query
    │
    ▼
┌─────────────┐
│   Parser   │  Parse SQL into AST
└─────────────┘
    │
    ▼
┌─────────────┐
│  Resolver  │  Type checking, name resolution
└─────────────┘
    │
    ▼
┌─────────────┐
│  Optimizer │  Logical optimization, planning
└─────────────┘
    │
    ▼
┌─────────────┐
│  Executor   │  Vectorized execution
└─────────────┘
    │
    ▼
Result (Vector)

Vectorized Execution

What is Vectorized Execution?

Vectorized execution processes batches of rows (typically 1024-4096) at a time, rather than row-by-row. This enables CPU SIMD operations and efficient cache utilization.

# DuckDB processes data in vectors (1024 rows typical)
# Instead of: for each row → process
# DuckDB does: for each 1024 rows → process in batch

Vector Types

// Simplified vector structure
struct Vector {
    VectorType type;        // FLAT, CONSTANT, DICTIONARY
    data_type;             // Column data type
    data;                  // Actual data (flat array)
    nulls;                 // Null bitmap
    selection;             // Selection vector for filtering
    length;                // Number of elements
};

Vector Operations

-- DuckDB optimizes these operations
SELECT name, salary * 1.1 FROM employees;  -- Batch multiplication
SELECT * FROM users WHERE age > 25;         -- Batch filtering
SELECT COUNT(*) FROM events;                 -- Aggregation over vector

Columnar Storage

Storage Format

DuckDB uses a columnar storage format similar to Parquet:

-- Data is stored column-by-column
-- employees table:
--   - names column: [John, Jane, Bob, ...]
--   - salaries column: [50000, 60000, 70000, ...]
--   - departments column: [Eng, Sales, Eng, ...]

Data Layout

Row-oriented (traditional):
┌────────┬─────────┬────────────┐
│ name   │ salary  │ department │
├────────┼─────────┼────────────┤
│ John   │ 50000   │ Eng        │
│ Jane   │ 60000   │ Sales      │
│ Bob    │ 70000   │ Eng        │
└────────┴─────────┴────────────┘

Column-oriented (DuckDB):
┌─────────────────────────────┐
│ name: [John, Jane, Bob]    │
├─────────────────────────────┤
│ salary: [50000, 60000, ...] │
├─────────────────────────────┤
│ department: [Eng, Sales...] │
└─────────────────────────────┘

Compression

-- DuckDB automatically compresses data
-- Compression schemes:
-- - Dictionary encoding (repeating strings)
-- - Run-length encoding (consecutive duplicates)
-- - Bit packing (booleans, small integers)
-- - Gorilla compression (timestamps)

-- Check compression
PRAGMA table_info('employees');

Query Processing

Parser

-- Parser converts SQL to parse tree
-- Input: SELECT name, salary * 1.1 FROM employees WHERE department = 'Eng'

-- Output: Parse tree
-- SELECT
--   ├── name
--   └── WHERE
--       └── department = 'Eng'

Query Resolver

-- Resolver performs:
-- 1. Type checking
-- 2. Name resolution
-- 3. Function binding

-- Example error from resolver:
-- SELECT name + 1 FROM employees
-- ERROR: Cannot add VARCHAR to INTEGER

Query Optimizer

-- Optimizer applies transformations:
-- 1. Predicate pushdown
-- 2. Column pruning
-- 3. Join reordering
-- 4. Constant folding

-- Example: Push down filter
-- Before: SELECT * FROM (SELECT * FROM t WHERE x > 5) WHERE y > 10
-- After: SELECT * FROM t WHERE x > 5 AND y > 10

Physical Planning

-- Physical plan determines execution strategy
-- Example: Hash Join vs Nested Loop

EXPLAIN SELECT * FROM a JOIN b ON a.id = b.id;

-- Possible plans:
-- - HASH_JOIN (good for large tables)
-- - MERGE_JOIN (good for sorted inputs)
-- - CROSS_JOIN (cartesian product)

Transaction Management

MVCC Implementation

-- DuckDB uses MVCC (Multi-Version Concurrency Control)
-- - Write-Ahead Log (WAL)
-- - Undo buffer
-- - Checkpoint mechanism

-- Transaction isolation
BEGIN TRANSACTION;
UPDATE employees SET salary = salary * 1.1;
COMMIT;

Isolation Levels

-- DuckDB supports:
-- - READ COMMITTED (default)
-- - REPEATABLE READ (via transaction)
-- - SERIALIZABLE

SET TRANSACTION ISOLATION LEVEL REPEATABLE READ;

Indexing

Index Types

-- Create index
CREATE INDEX idx_employees_dept ON employees(department);

-- Index types:
-- - ART (Adaptive Radix Tree) for equality
-- - Skip list for ordered data

-- Index usage
EXPLAIN SELECT * FROM employees WHERE department = 'Engineering';

Index Statistics

-- DuckDB automatically gathers statistics
-- Statistics used for query planning

-- Force index usage
SELECT * FROM employees USING INDEX idx_employees_dept;

Memory Management

Vector Memory Layout

-- Each vector allocates contiguous memory
-- Example: 1024 integers = 4KB
-- Example: 1024 doubles = 8KB

-- Memory allocation
-- - Stack: small vectors
-- - Heap: larger vectors

Memory Allocation Strategy

-- DuckDB allocates memory in segments
-- - Small allocations (< 128KB): direct malloc
-- - Large allocations: memory-mapped

-- Check memory usage
SELECT * FROM duckdb_memory();

Data Types Implementation

Type System

-- DuckDB supports:
-- - Integer types (TINYINT to HUGEINT)
-- - Floating-point (FLOAT, DOUBLE)
-- - Decimal (DECIMAL)
-- - String types (VARCHAR, BLOB)
-- - Date/Time (DATE, TIME, TIMESTAMP)
-- - Arrays (v0.10+)
-- - JSON (via extension)

-- Type representation
-- Each type has:
-- - Physical representation (bytes)
-- - Internal type ID
-- - Comparison functions

Extensions

Extension Architecture

-- DuckDB is highly extensible
-- Core features as extensions:
-- - json: JSON support
-- - spatial: GIS support
-- - httpfs: HTTP file system
-- - mysql: MySQL scanner
-- - postgres: PostgreSQL scanner

-- Install extension
INSTALL json;
LOAD json;

-- List installed extensions
SELECT * FROM duckdb_extensions();

Writing Extensions

-- Extensions can add:
-- - New functions
-- - New table functions
-- - New storage handlers
-- - New data types

-- Example: Custom function
-- Defined in C++ or Python

Performance Characteristics

Why DuckDB is Fast

Vectorized Execution: Batch processing with SIMD
Columnar Storage: Cache-friendly, compressible
No Network Overhead: In-process execution
Lazy Materialization: Don’t compute until needed
Statistics-Based Planning: Informed query plans

Benchmark Results

-- Typical performance:
-- - 10M rows aggregation: < 1 second
-- - 100M rows join: < 10 seconds
-- - 1B rows scan: < 1 minute

-- Comparison with SQLite:
-- DuckDB: ~10-100x faster for analytics

Best Practices

Query Optimization

-- 1. Use appropriate data types
CREATE TABLE test (
    id INTEGER,     -- Not VARCHAR
    amount DECIMAL(10,2)  -- Not DOUBLE
);

-- 2. Filter early
SELECT * FROM (
    SELECT * FROM large_table WHERE date > '2025-01-01'
) t GROUP BY category;

-- 3. Use indexes for selective queries
CREATE INDEX idx_date ON events(date);

Data Loading

-- 1. Use COPY for bulk loading
COPY large_table FROM 'data.csv' (AUTO_DETECT TRUE);

-- 2. Batch inserts
INSERT INTO table VALUES (1,'a'),(2,'b'),(3,'c');

-- 3. Parallel file reading
SELECT * FROM read_csv_auto('*.csv', parallel=true);

Common Pitfalls

Pitfall 1: Row-by-Row Processing

-- Bad: Using cursors for large data
DECLARE cur CURSOR FOR SELECT * FROM large_table;
FETCH cur;

-- Good: Vectorized processing
SELECT * FROM large_table;  -- Automatically vectorized

Pitfall 2: Wrong Data Types

-- Bad: Using TEXT for IDs
CREATE TABLE bad (id TEXT);

-- Good: Using appropriate types
CREATE TABLE good (id INTEGER);

Pitfall 3: Not Using Parallelism

-- Default should use all cores
-- But verify:
SELECT current_setting('threads');

-- For very large datasets, increase
SET threads = 16;

Resources

Conclusion

DuckDB’s internal architecture combines vectorized execution with columnar storage to deliver exceptional analytical performance. Understanding these internals helps developers optimize queries and leverage DuckDB’s strengths effectively.

In the next article, we’ll explore DuckDB trends in 2025-2026, including new features, extensions, and emerging use cases.