OpenSearch Internals: Lucene, Sharding, and Replication

Introduction

Understanding OpenSearch’s internal architecture helps you optimize queries and troubleshoot issues. This article explores how OpenSearch achieves distributed search and analytics.

Apache Lucene Foundation

OpenSearch is built on Apache Lucene:

┌─────────────────────────────────────────┐
│           OpenSearch                      │
│  ┌───────────────────────────────────┐  │
│  │       REST API Layer               │  │
│  └───────────────────────────────────┘  │
│  ┌───────────────────────────────────┐  │
│  │    Cluster Management              │  │
│  └───────────────────────────────────┘  │
│  ┌───────────────────────────────────┐  │
│  │       Lucene Library                │  │
│  │  - IndexWriter                    │  │
│  │  - IndexReader                    │  │
│  │  - Searcher                       │  │
│  └───────────────────────────────────┘  │
└─────────────────────────────────────────┘

Segment-Based Storage

Index Structure

Index (shard)
  ├── _0.segment
  │   ├── _0.si    (Segment Info)
  │   ├── _0.fdm   (Field Metadata)
  │   ├── _0.fdt   (Field Data)
  │   ├── _0.fdx   (Field Index)
  │   ├── _0.tmd   (Term Metadata)
  │   ├── _0.tvd   (Term Vector Data)
  │   ├── _0.tvx   (Term Vector Index)
  │   └── _0.nvd   (Norm Data)
  ├── _1.segment
  └── _2.segment

Document Addition

# When you index a document:
# 1. Added to in-memory buffer
# 2. Written to translog
# 3. Periodically flushed to segment

POST /products/_doc
{
  "name": "Mouse",
  "price": 29.99
}

Segment Merging

# Force merge
POST /products/_forcemerge
{
  "max_num_segments": 1
}

Sharding Strategy

Primary and Replica Shards

# Index with shards
PUT /products
{
  "settings": {
    "number_of_shards": 3,
    "number_of_replicas": 2
  }
}

# Shard distribution:
# Primary: 0, 1, 2
# Replicas: 0, 1, 2 (2 copies each)

Shard Routing

# Document routing
POST /products/_doc
{
  "name": "Mouse"
}

# Uses: hash(_id) % num_primary_shards

Replication

Primary-Replica Flow

Write Request
    │
    ▼
┌─────────────────┐
│  Primary Shard   │──► In-memory buffer
└─────────────────┘
    │
    ▼ (parallel)
┌───────────────┐
│ Replica 1     │──► Translog
└───────────────┘
    │
┌───────────────┐
│ Replica 2     │──► Translog
└───────────────┘

Read Request

# Coordinating node routes to:
# 1. One primary + all replicas
# 2. Waits for responses
# 3. Returns to client

Near Real-Time Search

Refresh Interval

# Default refresh: 1 second
PUT /products
{
  "settings": {
    "refresh_interval": "1s"
  }
}

# Make visible for search
POST /products/_refresh

# Disable auto-refresh
PUT /products
{
  "settings": {
    "refresh_interval": "-1"
  }
}

Translog

# Translog provides durability
PUT /products
{
  "settings": {
    "translog": {
      "sync_interval": "5s",
      "size": "5mb"
    }
  }
}

Query Execution

Query Phase

# 1. Coordinator receives request
# 2. Broadcasts to all shards
# 3. Each shard returns top-N results
# 4. Coordinator merges and returns

Search Flow

Search Request
    │
    ▼
┌─────────────────────┐
│  Query Parsing      │──► Parse DSL
└─────────────────────┘
    │
    ▼
┌─────────────────────┐
│  Query Execution    │──► Execute on shards
└─────────────────────┘
    │
    ▼
┌─────────────────────┐
│  Reduce Phase       │──► Merge results
└─────────────────────┘
    │
    ▼
Response

Caching

Query Cache

# Enabled by default for filter queries
# Caches results per segment

# Clear cache
POST /products/_cache/clear

Field Data Cache

# Used for aggregations, sorts
# Loading costs memory

# Monitor
GET /products/_stats

Conclusion

Understanding OpenSearch internals—Lucene segments, sharding, and replication—helps you design better indexes and troubleshoot performance issues.