Skip to main content
โšก Calmops

MinIO Internals: Understanding the Distributed Object Store

Introduction

Understanding MinIO’s internal architecture helps you optimize deployments, troubleshoot issues, and design efficient data pipelines. MinIO is built from the ground up for distributed object storage, with every component designed for horizontal scalability and high performance. This article explores the key architectural components that make MinIO work.

Architecture Overview

MinIO follows a distributed architecture:

โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
โ”‚                    MinIO Cluster                         โ”‚
โ”œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ค
โ”‚  โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”  โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”  โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”  โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ” โ”‚
โ”‚  โ”‚ Server 1โ”‚  โ”‚ Server 2โ”‚  โ”‚ Server 3โ”‚  โ”‚ Server 4โ”‚ โ”‚
โ”‚  โ”œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ค  โ”œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ค  โ”œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ค  โ”œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ค โ”‚
โ”‚  โ”‚ Drive 1 โ”‚  โ”‚ Drive 2 โ”‚  โ”‚ Drive 3 โ”‚  โ”‚ Drive 4 โ”‚ โ”‚
โ”‚  โ”‚ Drive 2 โ”‚  โ”‚ Drive 1 โ”‚  โ”‚ Drive 4 โ”‚  โ”‚ Drive 3 โ”‚ โ”‚
โ”‚  โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜  โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜  โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜  โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜ โ”‚
โ”œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ค
โ”‚         Erasure Coding (EC:4) - 4 data, 4 parity       โ”‚
โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜

Erasure Coding

How Erasure Coding Works

When you write an object, MinIO:

  1. Splits the data into N fragments (data drives)
  2. Generates M parity fragments (erasure code)
  3. Distributes all fragments across drives/nodes
// Simplified erasure encoding
// N data drives, M parity drives
type EC struct {
    DataShards int
    ParityShards int
}

// Example: 8 drives (4 data + 4 parity)
// EC:4 can recover from losing any 4 drives

Recovery

When reading from a degraded cluster:

// Read from available fragments
// Use erasure decoding to reconstruct missing data
// Continue serving requests even during failures

Erasure Code Calculator

Total Drives Data (N) Parity (M) Failure Tolerance
4 2 2 2
6 4 2 2
8 4 4 4
12 6 6 6
16 8 8 8

Distributed Hashing

MinIO uses consistent hashing for data placement:

Hash-Based Distribution

// Consistent hashing for object placement
type ConsistentHash struct {
    virtualNodes int
    hashRange int
}

// Object key determines placement
// hash(key) -> drive assignment
// Ensures even distribution across drives

Benefits

  • Even distribution: Objects spread across all drives
  • Minimal remapping: Adding drives only moves small amount of data
  • Fault tolerance: Knows which drives hold each object’s fragments

Storage Engine

Object Storage Format

โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
โ”‚         Object on Disk              โ”‚
โ”œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ค
โ”‚  xl.meta (metadata)                โ”‚
โ”‚  - Checksums                       โ”‚
โ”‚  - Erasure coding info             โ”‚
โ”‚  - Version info                    โ”‚
โ”‚  - Custom metadata                 โ”‚
โ”œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ค
โ”‚  part.1 (data fragment)            โ”‚
โ”‚  part.2 (data fragment)            โ”‚
โ”‚  ...                               โ”‚
โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜

Metadata Structure

{
  "version": "v1",
  "format": "xl",
  "stat": {
    "size": 1024000,
    "mtime": "2026-01-01T00:00:00Z",
    "etag": "abc123"
  },
  "parts": [
    {"number": 1, "size": 512000},
    {"number": 2, "size": 512000}
  ],
  "erasure": {
    "data": 4,
    "parity": 4,
    "blockSize": 1048576
  }
}

Write Path

1. Client sends PUT request
   โ†“
2. MinIO calculates hash
   โ†“
3. Select drives using consistent hash
   โ†“
4. Write data fragments to drives in parallel
   โ†“
5. Write metadata
   โ†“
6. Return success to client

Read Path

1. Client sends GET request
   โ†“
2. Find drives holding object fragments
   โ†“
3. Read fragments in parallel
   โ†“
4. Decode and reconstruct data
   โ†“
5. Return to client

Quorum and Consensus

Write Quorum

MinIO requires a minimum number of successful writes:

  • Write quorum: N/2 + 1 drives must acknowledge write
  • For EC:4 (8 drives): 5 writes required

Read Quorum

For reading from degraded cluster:

  • Read quorum: N/2 + 1 fragments needed
  • Can read even with 2 drive failures

Healing

// Automatic healing after drive failure
// When drive returns:
// 1. Check fragment checksums
// 2. Recalculate missing fragments
// 3. Write to missing locations
// 4. Update metadata

Performance Characteristics

Write Performance

# Sequential writes: ~5 GB/s per node
# Parallel writes: Linear scaling with nodes

# Example: 4 nodes, 8 drives each
# Write throughput: ~20 GB/s

Read Performance

# Sequential reads: ~10 GB/s per node
# Parallel reads: Scales with nodes

# Random reads: Depends on SSD
# Typical: ~100K IOPS per node

Latency

  • Single object GET: <10ms (cached)
  • Single object PUT: <20ms
  • Metadata operations: <5ms

Caching

MinIO includes built-in caching:

// Cache configuration
type CacheConfig struct {
    Drives     []string  // Cache drives
    Quota     int       // Max % used
    Exclude   []string  // Exclude patterns
    After     int       // Access count before caching
}

// Cache works at:
// - Drive level
// - Cluster level

Cache Behavior

# Enable cache
mc admin config set myminio \
  cache drive=/mnt/cache \
  quota=80 \
  exclude="*.mp4,*.pdf"

# Cache hit serves from memory/disk
# Cache miss fetches from backend

Data Tiering

Overview

MinIO supports multiple storage tiers:

โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
โ”‚            Hot (NVMe SSD)               โ”‚
โ”‚  Active data, frequent access           โ”‚
โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜
         โ†“ Transition (30 days)
โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
โ”‚          Warm (HDD/ SATA SSD)           โ”‚
โ”‚  Less frequent access                   โ”‚
โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜
         โ†“ Transition (90 days)
โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
โ”‚           Cold (Object Storage)          โ”‚
โ”‚  Glacier/cloud storage                   โ”‚
โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜

Configuring Tiering

# Add remote tier
mc admin tier add minio s3 tier-1 \
  --endpoint https://s3.amazonaws.com \
  --access-key ACCESSKEY \
  --secret-key SECRETKEY \
  --bucket mybucket

# Set lifecycle
mc ilm add myminio/bucket \
  --transition-days 30 \
  --tier tier-1

Scalability

Horizontal Scaling

# Add new server to cluster
# Simply include new server in startup

./minio server \
  http://server1/minio{1...4} \
  http://server2/minio{1...4} \
  http://server3/minio{1...4} \
  http://server4/minio{1...4}  # New server

Capacity Planning

Nodes Drives/Node Total Capacity Usable (EC:4)
4 4 x 4TB 64TB 32TB
8 8 x 4TB 256TB 128TB
16 8 x 4TB 512TB 256TB

Conclusion

MinIO’s architecture is designed for simplicity and performance. Key takeaways: erasure coding provides data protection without replication overhead, consistent hashing ensures even data distribution, quorum-based operations guarantee consistency, and the distributed design scales horizontally. Understanding these internals helps you optimize deployments and troubleshoot issues.

In the next article, we’ll explore recent MinIO developments and trends for 2025-2026.

Resources

Comments