Skip to main content

MySQL Internals: InnoDB, Storage Engine, and Query Processing

Created: March 5, 2026 CalmOps 7 min read

Introduction

Understanding MySQL’s internal architecture helps you make better design decisions, optimize queries, and troubleshoot performance issues. This article explores the core components that make MySQL work: from the InnoDB storage engine to query execution and transaction management.


MySQL Architecture Overview

Layered Architecture

┌─────────────────────────────────────┐
│     MySQL Server (SQL Layer)       │
│  ┌─────────────────────────────┐    │
│  │  Connection Manager         │    │
│  │  Query Cache (removed 8.0)  │    │
│  │  Query Optimizer            │    │
│  │  Parser                    │    │
│  └─────────────────────────────┘    │
└─────────────────────────────────────┘
               ↓
┌─────────────────────────────────────┐
│   Storage Engine Layer              │
│  ┌──────────┐ ┌──────────┐         │
│  │ InnoDB   │ │ MyISAM   │  ...    │
│  └──────────┘ └──────────┘         │
└─────────────────────────────────────┘
               ↓
┌─────────────────────────────────────┐
│   OS / File System                  │
└─────────────────────────────────────┘

Key Components

-- Show storage engines
SHOW ENGINES;

-- Check InnoDB status
SHOW ENGINE INNODB STATUS;

-- View system variables
SHOW VARIABLES LIKE 'innodb%';

InnoDB Storage Engine

Architecture

InnoDB Architecture
┌─────────────────────────────────────────┐
│           Buffer Pool                    │
│  ┌───────────────────────────────────┐  │
│  │  Pages (16KB each)                │  │
│  │  - Data pages                     │  │
│  │  - Index pages                    │  │
│  │  - Insert buffer                  │  │
│  │  - Adaptive hash index            │  │
│  └───────────────────────────────────┘  │
└─────────────────────────────────────────┘
         ↓
┌─────────────────────────────────────────┐
│   Log Files                             │
│  - Redo log                             │
│  - Undo log                             │
└─────────────────────────────────────────┘
         ↓
┌─────────────────────────────────────────┐
│   Tablespaces                           │
│  - System tablespace                    │
│  - File-per-table tablespaces           │
│  - Undo tablespaces                     │
└─────────────────────────────────────────┘

Page Structure

-- InnoDB page size is 16KB
SHOW VARIABLES LIKE 'innodb_page_size';

-- Page structure
-- ┌────────────────────┐
-- │  Page Header      │  38 bytes
-- ├────────────────────┤
-- │  Infimum + Supremum│  26 bytes
-- ├────────────────────┤
-- │  User Records     │  Variable
-- ├────────────────────┤
-- │  Free Space       │  Variable
-- ├────────────────────┤
-- │  Page Directory   │  Variable
-- └────────────────────┘

Buffer Pool

Purpose and Configuration

-- Buffer pool is memory cache for table data
SHOW VARIABLES LIKE 'innodb_buffer_pool_size';

-- Set buffer pool size (should be 70-80% of available RAM)
SET GLOBAL innodb_buffer_pool_size = 2147483648;  -- 2GB

-- Buffer pool instances (for scalability)
SHOW VARIABLES LIKE 'innodb_buffer_pool_instances';

-- View buffer pool status
SELECT 
    pool_id, 
    pool_size, 
    free_buffers, 
    database_pages 
FROM information_schema.INNODB_BUFFER_POOL_STATS;

-- Monitor page activity
SELECT * FROM information_schema.INNODB_BUFFER_PAGE;

Buffer Pool and Caching

-- What pages are in buffer pool
SELECT 
    table_name, 
    index_name, 
    COUNT(*) as pages,
    ROUND(SUM(data_size) / 1024 / 1024, 2) as MB
FROM information_schema.INNODB_BUFFER_PAGE
GROUP BY table_name, index_name
ORDER BY pages DESC
LIMIT 20;

-- Buffer pool hit ratio
SHOW STATUS LIKE 'Innodb_buffer_pool_reads';
SHOW STATUS LIKE 'Innodb_buffer_pool_read_requests';

-- Calculate hit ratio
-- (1 - reads / read_requests) * 100 = hit ratio
-- Should be > 95%

Redo Log

Write-Ahead Logging

-- Redo log files
SHOW VARIABLES LIKE 'innodb_log_file_size';
SHOW VARIABLES LIKE 'innodb_log_files_in_group';

-- How redo logs work
-- 1. Transaction modifies data in buffer pool
-- 2. Changes written to redo log (persistently)
-- 3. Later flushed to data files

-- Check log sequence number (LSN)
SHOW ENGINE INNODB STATUS\G
-- Look for: log sequence number

-- Log flush
SHOW VARIABLES LIKE 'innodb_flush_log_at_trx_commit';
-- 1: Write to disk on commit (default, most safe)
-- 2: Write to OS cache on commit
-- 0/ASYNC: Write to OS cache every second

Undo Log

Transaction Rollback

-- Undo logs store old versions of data
-- Used for:
-- 1. Transaction rollback
-- 2. MVCC (consistent reads)

-- Undo tablespaces
SHOW VARIABLES LIKE 'innodb_undo_tablespaces';

-- Check undo usage
SHOW ENGINE INNODB STATUS\G
-- Look for: undo tablespaces

MVCC (Multi-Version Concurrency Control)

How InnoDB Implements MVCC

-- InnoDB MVCC uses:
-- 1. Transaction IDs (DB_TRX_ID)
-- 2. Roll Pointers (DB_ROLL_PTR)
-- 3. Undo logs

-- Every row has hidden columns:
-- - DB_TRX_ID: Transaction that last modified
-- - DB_ROLL_PTR: Pointer to undo log
-- - DB_ROW_ID: Row ID (if no primary key)

-- Example: Concurrent reads
-- Transaction 1: INSERT
START TRANSACTION;
INSERT INTO accounts (id, balance) VALUES (1, 1000);
COMMIT;

-- Transaction 2: READ (sees original value)
START TRANSACTION;
SELECT * FROM accounts;  -- Sees 1000
COMMIT;

-- Transaction 3: UPDATE
START TRANSACTION;
UPDATE accounts SET balance = 2000 WHERE id = 1;
COMMIT;

-- Transaction 4: READ (sees updated value)
SELECT * FROM accounts;  -- Sees 2000

Transaction Isolation Levels

-- REPEATABLE READ (default in InnoDB)
SET TRANSACTION ISOLATION LEVEL REPEATABLE READ;
-- Consistent read: sees snapshot from first read
-- May see phantom rows in certain cases

-- READ COMMITTED
SET TRANSACTION ISOLATION LEVEL READ COMMITTED;
-- Each read sees latest committed values
-- May see non-repeatable reads

-- READ UNCOMMITTED
SET TRANSACTION ISOLATION LEVEL READ UNCOMMITTED;
-- May see uncommitted changes (dirty reads)

-- SERIALIZABLE
SET TRANSACTION ISOLATION LEVEL SERIALIZABLE;
-- Locks all selected rows
-- Equivalent to REPEATABLE READ + SELECT ... FOR SHARE

Index Structures

B-Tree Index

-- Clustered vs Secondary Indexes

-- Clustered index: 
-- - Primary key is clustered index
-- - Data stored in index order
-- - One per table

-- Secondary index:
-- - Contains primary key value
-- - Can have multiple per table

-- B-Tree structure
-- Root (page)
--   ├── Branch pages
--   │    └── ...
--   └── Leaf pages (actual data)

-- Index creation
CREATE INDEX idx_users_email ON users(email);

-- Composite index (leftmost prefix)
CREATE INDEX idx_orders_user_status ON orders(user_id, status);

-- Prefix index
CREATE INDEX idx_users_name ON users(name(10));

Index Statistics

-- Analyze table for index statistics
ANALYZE TABLE users;

-- View index information
SHOW INDEX FROM users;

-- Check index cardinality
SELECT 
    INDEX_NAME, 
    COLUMN_NAME, 
    CARDINALITY,
    SEQ_IN_INDEX
FROM information_schema.STATISTICS
WHERE TABLE_SCHEMA = 'mydb'
AND TABLE_NAME = 'users';

Query Processing

Query Execution Pipeline

SQL Query
    ↓
1. Parser (syntax check)
    ↓
2. Resolver (name resolution)
    ↓
3. Optimizer (create plan)
    ↓
4. Executor (run plan)
    ↓
Results

EXPLAIN Analysis

-- Basic EXPLAIN
EXPLAIN SELECT * FROM users WHERE email = '[email protected]';

-- EXPLAIN FORMAT=JSON (detailed)
EXPLAIN FORMAT=JSON 
SELECT * FROM orders o
JOIN users u ON o.user_id = u.id
WHERE u.country = 'USA';

-- EXPLAIN ANALYZE (MySQL 8.0.18+)
EXPLAIN ANALYZE 
SELECT * FROM orders WHERE user_id = 1;

-- Common output types:
-- ALL: Full table scan (bad)
-- index: Full index scan
-- range: Index range scan
-- ref: Non-unique index lookup
-- eq_ref: Unique index lookup (good)
-- const: Constant lookup (excellent)
-- system: Single row (excellent)

Join Execution

-- Nested Loop Join
EXPLAIN SELECT * FROM orders o
JOIN users u ON o.user_id = u.id;

-- Hash Join (MySQL 8.0.18+)
EXPLAIN SELECT * FROM orders o
JOIN users u ON o.user_id = u.id;

-- Block Nested Loop (for no index)
EXPLAIN SELECT * FROM orders o
JOIN users u ON o.user_id = u.id
WHERE u.country = 'USA';

Transaction Management

ACID Properties

-- Atomicity: All or nothing
START TRANSACTION;
INSERT INTO accounts (id, balance) VALUES (1, 1000);
INSERT INTO accounts (id, balance) VALUES (2, 500);
-- Both succeed or both fail
COMMIT;

-- Consistency: Valid state
-- Enforced by constraints
ALTER TABLE accounts ADD CHECK (balance >= 0);

-- Isolation: Concurrent transactions
SET TRANSACTION ISOLATION LEVEL REPEATABLE READ;

-- Durability: Survives crashes
SHOW VARIABLES LIKE 'innodb_flush_log_at_trx_commit';
-- = 1 means fully durable

Locking

-- Row-level locks
SELECT * FROM orders WHERE id = 1 FOR UPDATE;  -- Exclusive lock
SELECT * FROM orders WHERE id = 1 FOR SHARE;   -- Shared lock

-- Gap locks
SELECT * FROM orders WHERE id BETWEEN 1 AND 10 FOR UPDATE;

-- Deadlock detection
SHOW ENGINE INNODB STATUS\G
-- Look for: LATEST DETECTED DEADLOCK

-- View locks
SELECT * FROM information_schema.INNODB_LOCKS;

-- View lock waits
SELECT * FROM information_schema.INNODB_LOCK_WAITS;

Memory Architecture

Key Memory Areas

-- Buffer pool (data and indexes)
SHOW VARIABLES LIKE 'innodb_buffer_pool_size';

-- Log buffer
SHOW VARIABLES LIKE 'innodb_log_buffer_size';

-- Sort buffer
SHOW VARIABLES LIKE 'sort_buffer_size';

-- Join buffer
SHOW VARIABLES LIKE 'join_buffer_size';

-- Read buffer
SHOW VARIABLES LIKE 'read_buffer_size';

-- Thread cache
SHOW VARIABLES LIKE 'thread_cache_size';

-- Query cache (removed in 8.0)
-- Use InnoDB buffer pool instead

Data Dictionary

MySQL 8.0 Data Dictionary

-- MySQL 8.0 uses InnoDB for data dictionary
-- No more .frm files

-- View data dictionary tables
SELECT * FROM information_schema.TABLES 
WHERE TABLE_SCHEMA = 'mysql'
AND TABLE_NAME LIKE 'innodb%';

-- Check dictionary
SELECT * FROM mysql.innodb_table_stats;
SELECT * FROM mysql.innodb_index_stats;

Conclusion

Understanding MySQL’s internals—InnoDB, MVCC, the buffer pool, and query processing—helps you design better schemas, write faster queries, and diagnose performance issues. These fundamentals apply whether you’re tuning a simple application or managing a complex distributed system.

In the next article, we’ll explore MySQL’s recent developments: new features in version 8.0 and 8.4 LTS, JSON enhancements, and the evolving ecosystem.

Comments

Share this article

Scan to read on mobile