Skip to main content
โšก Calmops

Search Engines: Elasticsearch, OpenSearch, and Modern Search Architecture

Introduction

Search is a fundamental capability for modern applications. Whether you’re building an e-commerce product catalog, a content management system, or a logging analytics platform, the ability to quickly find relevant information is critical to user experience.

Elasticsearch and OpenSearch are distributed search engines built on Apache Lucene. They provide full-text search, real-time indexing, aggregations, and scalability across multiple nodes. These platforms power critical search infrastructure for organizations ranging from startups to Fortune 500 companies.

This article explores Elasticsearch and OpenSearch architecture, implementation patterns, and best practices for building production-ready search systems.

How Search Engines Work

โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
โ”‚                      Search Architecture                             โ”‚
โ”œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ค
โ”‚                                                                     โ”‚
โ”‚  โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”    โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”    โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”  โ”‚
โ”‚  โ”‚ Document โ”‚โ”€โ”€โ”€โ–ถโ”‚  Indexing    โ”‚โ”€โ”€โ”€โ–ถโ”‚   Inverted Index         โ”‚  โ”‚
โ”‚  โ”‚  Source  โ”‚    โ”‚   Pipeline   โ”‚    โ”‚   (Term โ†’ Doc IDs)       โ”‚  โ”‚
โ”‚  โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜    โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜    โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜  โ”‚
โ”‚                                             โ”‚                       โ”‚
โ”‚  โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”    โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”           โ–ผ                       โ”‚
โ”‚  โ”‚  Query   โ”‚โ”€โ”€โ”€โ–ถโ”‚   Query      โ”‚    โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”  โ”‚
โ”‚  โ”‚  Request โ”‚    โ”‚   Parser     โ”‚โ”€โ”€โ”€โ–ถโ”‚   Scoring & Ranking      โ”‚  โ”‚
โ”‚  โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜    โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜    โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜  โ”‚
โ”‚                                             โ”‚                       โ”‚
โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ผโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜
                                              โ–ผ
                                    โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
                                    โ”‚   Search Results        โ”‚
                                    โ”‚   (Ranked Documents)    โ”‚
                                    โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜

Inverted Index

The inverted index is the core data structure that makes search fast:

Documents:
Doc 1: "The quick brown fox jumps over the lazy dog"
Doc 2: "A quick brown animal runs very fast"
Doc 3: "The lazy dog sleeps in the sun"

Inverted Index:
โ”Œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ฌโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”
โ”‚ Term        โ”‚ Documents                      โ”‚
โ”œโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ผโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ค
โ”‚ quick       โ”‚ Doc 1, Doc 2                   โ”‚
โ”‚ brown       โ”‚ Doc 1, Doc 2                   โ”‚
โ”‚ fox         โ”‚ Doc 1                          โ”‚
โ”‚ jumps       โ”‚ Doc 1                          โ”‚
โ”‚ over        โ”‚ Doc 1                          โ”‚
โ”‚ lazy        โ”‚ Doc 1, Doc 3                   โ”‚
โ”‚ dog         โ”‚ Doc 1, Doc 3                   โ”‚
โ”‚ animal      โ”‚ Doc 2                          โ”‚
โ”‚ runs        โ”‚ Doc 2                          โ”‚
โ”‚ fast        โ”‚ Doc 2                          โ”‚
โ”‚ sleeps      โ”‚ Doc 3                          โ”‚
โ”‚ sun         โ”‚ Doc 3                          โ”‚
โ””โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”ดโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”˜

Elasticsearch/OpenSearch Basics

Index and Documents

// Index mapping (schema)
{
  "mappings": {
    "properties": {
      "title": {
        "type": "text",
        "analyzer": "standard",
        "fields": {
          "keyword": {
            "type": "keyword"
          },
          "suggest": {
            "type": "completion"
          }
        }
      },
      "description": {
        "type": "text",
        "analyzer": "english"
      },
      "price": {
        "type": "float"
      },
      "category": {
        "type": "keyword"
      },
      "tags": {
        "type": "keyword"
      },
      "rating": {
        "type": "float"
      },
      "in_stock": {
        "type": "boolean"
      },
      "created_at": {
        "type": "date"
      },
      "location": {
        "type": "geo_point"
      }
    }
  },
  "settings": {
    "number_of_shards": 3,
    "number_of_replicas": 1,
    "analysis": {
      "analyzer": {
        "my_custom_analyzer": {
          "type": "custom",
          "tokenizer": "standard",
          "filter": ["lowercase", "asciifolding", "porter_stem"]
        }
      }
    }
  }
}

Data Types

Type Description Use Case
text Analyzed full-text Product descriptions, articles
keyword Exact values Categories, tags, IDs
integer, float Numeric Prices, quantities
date Timestamps Created dates, events
boolean True/false In-stock, active
geo_point Location Store locations
nested Nested objects Complex structures
ip IP addresses Access logs
completion Autocomplete Search suggestions

Implementation Examples

Python Client

from opensearchpy import OpenSearch
from datetime import datetime
from typing import Optional

class SearchClient:
    def __init__(
        self,
        hosts: list[str],
        username: str = None,
        password: str = None,
        use_ssl: bool = True,
        verify_certs: bool = True,
    ):
        self.client = OpenSearch(
            hosts=hosts,
            http_auth=(username, password) if username else None,
            use_ssl=use_ssl,
            verify_certs=verify_certs,
        )
    
    def create_index(
        self,
        index_name: str,
        mappings: dict,
        settings: dict = None,
        if_not_exists: bool = True,
    ) -> dict:
        body = {"mappings": mappings}
        if settings:
            body["settings"] = settings
        
        if if_not_exists:
            return self.client.indices.create(index_name, body=body)
        else:
            return self.client.indices.create(index_name, body=body)
    
    def index_document(
        self,
        index_name: str,
        document: dict,
        doc_id: str = None,
    ) -> dict:
        return self.client.index(
            index=index_name,
            id=doc_id,
            body=document,
            refresh=True,
        )
    
    def bulk_index(
        self,
        index_name: str,
        documents: list[dict],
        doc_id_field: str = "id",
    ) -> dict:
        actions = []
        
        for doc in documents:
            doc_id = doc.get(doc_id_field)
            actions.append({"index": {"_index": index_name, "_id": doc_id}})
            actions.append(doc)
        
        return self.client.bulk(body=actions, refresh=True)
    
    def search(
        self,
        index_name: str,
        query: dict,
        from_: int = 0,
        size: int = 10,
        sort: list = None,
        aggs: dict = None,
        highlight: dict = None,
    ) -> dict:
        return self.client.search(
            index=index_name,
            body={
                "query": query,
                "from": from_,
                "size": size,
                "sort": sort or ["_score"],
                "aggs": aggs,
                "highlight": highlight,
            },
        )
    
    def match_all(self, index_name: str, size: int = 100) -> dict:
        return self.search(index_name, {"match_all": {}}, size=size)
    
    def match(
        self,
        index_name: str,
        field: str,
        query: str,
        fuzziness: str = "AUTO",
    ) -> dict:
        return self.search(
            index_name,
            {
                "match": {
                    field: {
                        "query": query,
                        "fuzziness": fuzziness,
                    }
                }
            },
        )
    
    def multi_match(
        self,
        index_name: str,
        query: str,
        fields: list[str],
        operator: str = "or",
    ) -> dict:
        return self.search(
            index_name,
            {
                "multi_match": {
                    "query": query,
                    "fields": fields,
                    "operator": operator,
                }
            },
        )
    
    def bool_query(
        self,
        index_name: str,
        must: list = None,
        should: list = None,
        filter: list = None,
        must_not: list = None,
        minimum_should_match: int = 1,
    ) -> dict:
        query = {"bool": {}}
        
        if must:
            query["bool"]["must"] = must
        if should:
            query["bool"]["should"] = should
        if filter:
            query["bool"]["filter"] = filter
        if must_not:
            query["bool"]["must_not"] = must_not
        
        if should:
            query["bool"]["minimum_should_match"] = minimum_should_match
        
        return self.search(index_name, query)
    
    def autocomplete(
        self,
        index_name: str,
        field: str,
        prefix: str,
        size: int = 10,
    ) -> dict:
        return self.search(
            index_name,
            {
                "suggest": {
                    "product-suggest": {
                        "prefix": prefix,
                        "completion": {
                            "field": field,
                            "size": size,
                            "skip_duplicates": True,
                        }
                    }
                }
            },
        )
    
    def delete_document(self, index_name: str, doc_id: str) -> dict:
        return self.client.delete(index=index_name, id=doc_id, refresh=True)
    
    def delete_index(self, index_name: str) -> dict:
        return self.client.indices.delete(index=index_name)


# Usage
client = SearchClient(
    hosts=["localhost:9200"],
    username="admin",
    password="admin",
)

# Create index
client.create_index(
    "products",
    mappings={
        "properties": {
            "name": {
                "type": "text",
                "analyzer": "standard",
                "fields": {
                    "keyword": {"type": "keyword"},
                    "suggest": {"type": "completion"}
                }
            },
            "description": {"type": "text"},
            "category": {"type": "keyword"},
            "price": {"type": "float"},
            "brand": {"type": "keyword"},
            "tags": {"type": "keyword"},
            "rating": {"type": "float"},
            "in_stock": {"type": "boolean"},
            "location": {"type": "geo_point"},
        }
    },
    settings={
        "number_of_shards": 3,
        "number_of_replicas": 1,
    }
)

# Index documents
products = [
    {
        "id": "1",
        "name": "MacBook Pro 14 inch",
        "description": "Powerful laptop with M3 chip",
        "category": "Electronics",
        "price": 1999.99,
        "brand": "Apple",
        "tags": ["laptop", "computer", "apple"],
        "rating": 4.8,
        "in_stock": True,
        "location": {"lat": 37.7749, "lon": -122.4194},
    },
    {
        "id": "2",
        "name": "Sony WH-1000XM5",
        "description": "Premium noise-canceling headphones",
        "category": "Electronics",
        "price": 399.99,
        "brand": "Sony",
        "tags": ["headphones", "audio", "wireless"],
        "rating": 4.7,
        "in_stock": True,
        "location": {"lat": 37.7749, "lon": -122.4194},
    },
]

client.bulk_index("products", products)

# Search examples
# Simple match
results = client.match("products", "name", "laptop")

# Multi-field search
results = client.multi_match(
    "products",
    "powerful laptop",
    ["name^3", "description", "tags^2"],
)

# Boolean query with filters
results = client.bool_query(
    "products",
    must=[{"match": {"name": "laptop"}}],
    filter=[
        {"term": {"in_stock": True}},
        {"range": {"price": {"lte": 2000}}},
    ],
)

# Aggregations
results = client.search(
    "products",
    {"match_all": {}},
    aggs={
        "categories": {"terms": {"field": "category"}},
        "avg_price": {"avg": {"field": "price"}},
        "price_ranges": {
            "range": {
                "field": "price",
                "ranges": [
                    {"to": 100},
                    {"from": 100, "to": 500},
                    {"from": 500, "to": 1000},
                    {"from": 1000},
                ]
            }
        },
    },
)

Go Client

package main

import (
    "context"
    "fmt"
    "log"
    
    "github.com/elastic/go-elasticsearch/v8"
    "github.com/elastic/go-elasticsearch/v8/esapi"
)

type Product struct {
    ID          string    `json:"id"`
    Name        string    `json:"name"`
    Description string    `json:"description"`
    Category    string    `json:"category"`
    Price       float64   `json:"price"`
    Brand       string    `json:"brand"`
    Tags        []string  `json:"tags"`
    Rating      float64   `json:"rating"`
    InStock     bool      `json:"in_stock"`
    Location    Location  `json:"location"`
}

type Location struct {
    Lat float64 `json:"lat"`
    Lon float64 `json:"lon"`
}

type SearchClient struct {
    client *elasticsearch.Client
    index  string
}

func NewSearchClient(addresses []string, username, password string) (*SearchClient, error) {
    cfg := elasticsearch.Config{
        Addresses: addresses,
    }
    
    if username != "" && password != "" {
        cfg.Username = username
        cfg.Password = password
    }
    
    client, err := elasticsearch.NewClient(cfg)
    if err != nil {
        return nil, err
    }
    
    return &SearchClient{
        client: client,
        index:  "products",
    }, nil
}

func (s *SearchClient) CreateIndex(ctx context.Context, mappings string) error {
    res, err := s.client.Indices.Create(
        s.index,
        s.client.Indices.Create.WithBody(strings.NewReader(mappings)),
        s.client.Indices.Create.WithContext(ctx),
    )
    if err != nil {
        return err
    }
    defer res.Body.Close()
    
    if res.IsError() {
        return fmt.Errorf("create index error: %s", res.String())
    }
    
    return nil
}

func (s *SearchClient) IndexDocument(ctx context.Context, doc Product) error {
    data, err := json.Marshal(doc)
    if err != nil {
        return err
    }
    
    req := esapi.IndexRequest{
        Index:      s.index,
        DocumentID: doc.ID,
        Body:       strings.NewReader(string(data)),
        Refresh:    "true",
    }
    
    res, err := req.Do(ctx, s.client)
    if err != nil {
        return err
    }
    defer res.Body.Close()
    
    if res.IsError() {
        return fmt.Errorf("index error: %s", res.String())
    }
    
    return nil
}

func (s *SearchClient) BulkIndex(ctx context.Context, docs []Product) error {
    var buf strings.Builder
    
    for _, doc := range docs {
        data, _ := json.Marshal(doc)
        buf.WriteString(fmt.Sprintf(`{"index":{"_index":"%s","_id":"%s"}}%s`, 
            s.index, doc.ID, "\n"))
        buf.WriteString(string(data))
        buf.WriteString("\n")
    }
    
    res, err := s.client.Bulk(
        strings.NewReader(buf.String()),
        s.client.Bulk.Refresh("true"),
    )
    if err != nil {
        return err
    }
    defer res.Body.Close()
    
    if res.IsError() {
        return fmt.Errorf("bulk index error: %s", res.String())
    }
    
    return nil
}

func (s *SearchClient) Search(ctx context.Context, query string) (*SearchResult, error) {
    var buf bytes.Buffer
    if err := json.NewEncoder(&buf).Encode(query); err != nil {
        return nil, err
    }
    
    res, err := s.client.Search(
        s.client.Search.WithContext(ctx),
        s.client.Search.WithIndex(s.index),
        s.client.Search.WithBody(&buf),
    )
    if err != nil {
        return nil, err
    }
    defer res.Body.Close()
    
    if res.IsError() {
        return nil, fmt.Errorf("search error: %s", res.String())
    }
    
    var result SearchResult
    if err := json.NewDecoder(res.Body).Decode(&result); err != nil {
        return nil, err
    }
    
    return &result, nil
}

type SearchResult struct {
    Took int `json:"took"`
    Hits struct {
        Total struct {
            Value int `json:"value"`
        } `json:"total"`
        Hits []Hit `json:"hits"`
    } `json:"hits"`
}

type Hit struct {
    ID     string                 `json:"_id"`
    Score  float64                `json:"_score"`
    Source map[string]interface{} `json:"_source"`
}

func (s *SearchClient) MatchAll(size int) (*SearchResult, error) {
    query := map[string]interface{}{
        "size": size,
        "query": map[string]interface{}{
            "match_all": map[string]interface{}{},
        },
    }
    
    return s.Search(context.Background(), query)
}

func (s *SearchClient) Match(field, query string) (*SearchResult, error) {
    searchQuery := map[string]interface{}{
        "query": map[string]interface{}{
            "match": map[string]interface{}{
                field: query,
            },
        },
    }
    
    return s.Search(context.Background(), searchQuery)
}

func (s *SearchClient) MultiMatch(query string, fields []string) (*SearchResult, error) {
    searchQuery := map[string]interface{}{
        "query": map[string]interface{}{
            "multi_match": map[string]interface{}{
                "query":  query,
                "fields": fields,
            },
        },
    }
    
    return s.Search(context.Background(), searchQuery)
}

func (s *SearchClient) BoolQuery(must, filter []map[string]interface{}) (*SearchResult, error) {
    boolQuery := map[string]interface{}{}
    
    if len(must) > 0 {
        boolQuery["must"] = must
    }
    if len(filter) > 0 {
        boolQuery["filter"] = filter
    }
    
    searchQuery := map[string]interface{}{
        "query": map[string]interface{}{
            "bool": boolQuery,
        },
    }
    
    return s.Search(context.Background(), searchQuery)
}

func (s *SearchClient) Aggregations(aggs map[string]interface{}) (*SearchResult, error) {
    searchQuery := map[string]interface{}{
        "size":  0,
        "query": map[string]interface{}{"match_all": map[string]interface{}{}},
        "aggs":  aggs,
    }
    
    return s.Search(context.Background(), searchQuery)
}

func main() {
    client, err := NewSearchClient(
        []string{"http://localhost:9200"},
        "elastic",
        "password",
    )
    if err != nil {
        log.Fatal(err)
    }
    
    mappings := `{
        "mappings": {
            "properties": {
                "name": {"type": "text"},
                "description": {"type": "text"},
                "category": {"type": "keyword"},
                "price": {"type": "float"},
                "brand": {"type": "keyword"},
                "tags": {"type": "keyword"},
                "rating": {"type": "float"},
                "in_stock": {"type": "boolean"},
                "location": {"type": "geo_point"}
            }
        }
    }`
    
    if err := client.CreateIndex(context.Background(), mappings); err != nil {
        log.Fatal(err)
    }
    
    // Index document
    product := Product{
        ID:          "1",
        Name:        "MacBook Pro",
        Description: "Powerful laptop",
        Category:    "Electronics",
        Price:       1999.99,
        Brand:       "Apple",
        Tags:        []string{"laptop", "computer"},
        Rating:      4.8,
        InStock:     true,
    }
    
    if err := client.IndexDocument(context.Background(), product); err != nil {
        log.Fatal(err)
    }
    
    // Search
    results, err := client.Match("name", "laptop")
    if err != nil {
        log.Fatal(err)
    }
    
    log.Printf("Found %d results", results.Hits.Total.Value)
    
    for _, hit := range results.Hits.Hits {
        log.Printf("ID: %s, Score: %f", hit.ID, hit.Score)
    }
}

Advanced Search Patterns

Fuzzy Search and Typos

# Handle typos and misspellings
client.search(
    "products",
    {
        "query": {
            "multi_match": {
                "query": "laptp",  # Typo
                "fields": ["name", "description"],
                "fuzziness": "AUTO",
                "prefix_length": 2,  # At least 2 chars must match exactly
            }
        }
    }
)

# Transposition (swap adjacent letters)
# "laptp" โ†’ "laptop" with fuzziness

Fuzzy Query Types

# Fuzzy query for specific fields
{
    "query": {
        "fuzzy": {
            "name": {
                "value": "laptop",
                "fuzziness": "AUTO",
                "max_expansions": 50,
            }
        }
    }
}

# Wildcard for pattern matching
{
    "query": {
        "wildcard": {
            "name": "*book*"
        }
    }
}

# Regex for complex patterns
{
    "query": {
        "regexp": {
            "name": "mac.*pro"
        }
    }
}
# Find products near a location
client.search(
    "products",
    {
        "query": {
            "geo_distance": {
                "distance": "10km",
                "location": {
                    "lat": 37.7749,
                    "lon": -122.4194
                }
            }
        }
    }
)

# Bounding box search
{
    "query": {
        "geo_bounding_box": {
            "location": {
                "top_left": {"lat": 40.8, "lon": -74.1},
                "bottom_right": {"lat": 40.7, "lon": -73.9}
            }
        }
    }
}

# Geo polygon (custom shape)
{
    "query": {
        "geo_polygon": {
            "location": {
                "points": [
                    {"lat": 40.8, "lon": -74.1},
                    {"lat": 40.7, "lon": -74.1},
                    {"lat": 40.7, "lon": -73.9},
                ]
            }
        }
    }
}

Pagination and Deep Pagination

# Simple pagination
{
    "from": 0,
    "size": 10,
    "query": {"match_all": {}}
}

# Search after for deep pagination (better performance)
# First page
first_page = client.search("products", {"size": 10, "sort": [{"_id": "asc"}]})
last_sort_values = first_page["hits"]["hits"][-1]["sort"]

# Subsequent page
next_page = client.search(
    "products",
    {
        "size": 10,
        "search_after": last_sort_values,
        "sort": [{"_id": "asc"}]
    }
)

# PIT (Point in Time) for consistent pagination
# Create PIT
pit_id = client.create_pit("products", keep_alive="5m")

# Search with PIT
{
    "query": {"match_all": {}},
    "pit": {"id": pit_id, "keep_alive": "5m"},
    "from": 1000,
    "size": 10,
}

Highlighting

# Highlight matching terms in results
client.search(
    "products",
    {
        "query": {"match": {"name": "laptop"}},
        "highlight": {
            "fields": {
                "name": {
                    "pre_tags": ["<mark>"],
                    "post_tags": ["</mark>"],
                    "fragment_size": 50,
                    "number_of_fragments": 3,
                },
                "description": {}
            }
        }
    }
)

Aggregations

# Aggregations for analytics
{
    "size": 0,
    "aggs": {
        # Terms aggregation (like GROUP BY)
        "by_category": {
            "terms": {
                "field": "category",
                "size": 20,
                "min_doc_count": 5,
            },
            "aggs": {
                "avg_price": {"avg": {"field": "price"}},
                "max_price": {"max": {"field": "price"}},
            }
        },
        
        # Range aggregation
        "price_ranges": {
            "range": {
                "field": "price",
                "ranges": [
                    {"to": 100},
                    {"from": 100, "to": 500},
                    {"from": 500, "to": 1000},
                    {"from": 1000}
                ]
            }
        },
        
        # Date histogram
        "sales_over_time": {
            "date_histogram": {
                "field": "created_at",
                "calendar_interval": "month"
            }
        },
        
        # Percentiles
        "price_percentiles": {
            "percentiles": {
                "field": "price",
                "percents": [25, 50, 75, 95, 99]
            }
        },
        
        # Significant terms (anomaly detection)
        "significant_categories": {
            "significant_terms": {
                "field": "category"
            }
        }
    }
}

Autocomplete

# Use completion suggester for autocomplete
{
    "query": {
        "bool": {
            "must": [
                {"match": {"name": "mac"}}
            ],
            "should": [
                {
                    "completion": {
                        "field": "name.suggest",
                        "prefix": "mac",
                        "size": 10,
                        "skip_duplicates": true
                    }
                }
            ]
        }
    }
}

Performance Optimization

Indexing Optimization

# Bulk indexing with optimal batch size
def bulk_index_optimized(client, index_name, documents, batch_size=1000):
    for i in range(0, len(documents), batch_size):
        batch = documents[i:i + batch_size]
        
        actions = []
        for doc in batch:
            actions.append({"index": {"_index": index_name}})
            actions.append(doc)
        
        # Don't refresh immediately - let background refresh happen
        client.bulk(body=actions, refresh=False)
        
        print(f"Indexed {i + len(batch)}/{len(documents)} documents")

# Use multiple workers for parallel indexing
from concurrent.futures import ThreadPoolExecutor

def parallel_bulk_index(client, index_name, documents, workers=4, batch_size=500):
    batches = [documents[i:i + batch_size] 
               for i in range(0, len(documents), batch_size)]
    
    with ThreadPoolExecutor(max_workers=workers) as executor:
        futures = [executor.submit(bulk_index_optimized, client, index_name, batch) 
                   for batch in batches]
        
        for future in futures:
            future.result()

Query Optimization

# Use filter instead of must when score is not needed
# Filter clauses are cached and don't calculate scores
{
    "query": {
        "bool": {
            "must": [
                {"match": {"name": "laptop"}}
            ],
            "filter": [
                {"term": {"in_stock": True}},
                {"range": {"price": {"lte": 2000}}}
            ]
        }
    }
}

# Use keyword fields for exact matches
# Much faster than text fields
{
    "query": {
        "term": {"category.keyword": "Electronics"}
    }
}

# Pre-filter with match then filter
{
    "query": {
        "bool": {
            "should": [
                {"match": {"name": {"query": "laptop", "boost": 3}}},
                {"match": {"description": "laptop"}}
            ],
            "filter": [
                {"term": {"in_stock": True}}
            ]
        }
    }
}

# Profile queries to find bottlenecks
{
    "profile": True,
    "query": { ... }
}

Index Design

# Use appropriate mappings
mappings = {
    "properties": {
        # Use keyword for filtering and sorting
        "status": {"type": "keyword"},  # Instead of text
        "category": {"type": "keyword"},
        
        # Use text for searching
        "description": {"type": "text"},
        
        # Use appropriate numeric types
        "price": {"type": "scaled_float", "scaling_factor": 100},
        "quantity": {"type": "integer"},
        
        # Use date types for date fields
        "created_at": {"type": "date"},
        
        # Disable _source for very large indices if not needed
        "_source": {"enabled": False},
        
        # Use doc_values for sorting/aggregations
        # (enabled by default for most types)
    }
}

Index Aliases

# Use aliases for zero-downtime reindexing
# Create new index with new mappings
client.create_index("products_v2", new_mappings)

# Reindex
client.reindex(
    {
        "source": {"index": "products_v1"},
        "dest": {"index": "products_v2"}
    }
)

# Switch alias atomically
client.indices.put_alias(index="products_v2", name="products")

# Now all traffic goes to products_v2

Scaling Strategies

Sharding

# More shards = more parallelism but overhead
settings = {
    "number_of_shards": 5,   # Primary shards
    "number_of_replicas": 1,  # Replicas per primary
}
Shards Use Case
1-3 Small datasets, single node
3-5 Medium datasets, moderate query load
5-10+ Large datasets, high query load

Replicas

Primary Shard:    [S1] [S2] [S3]
                    โ†“   โ†“   โ†“
Replica Shards:  [R1] [R2] [R3]
  • Replicas provide high availability
  • Replicas serve read queries (load balancing)
  • More replicas = more read throughput

Data Streams

# Use data streams for time-series data
client.indices.create_data_stream("logs-2026")

Monitoring and Maintenance

Health Checks

# Cluster health
client.cluster.health()

# Index health
client.indices.stats("products")

# Node stats
client.nodes.stats()

Common Issues

Issue Symptom Solution
Slow queries High latency Use filters, optimize queries
High memory OOM errors Reduce shard count, add memory
Split brain Data inconsistency Configure minimum masters
Shard unassigned Missing replicas Check disk space, node health

Conclusion

Elasticsearch and OpenSearch provide powerful search capabilities for modern applications. The key to building successful search systems lies in understanding your query patterns, designing appropriate mappings, and implementing efficient indexing strategies.

Key takeaways:

  • Use keyword fields for filtering, text fields for search
  • Leverage filters instead of must clauses when scores aren’t needed
  • Implement proper pagination with search_after for deep pages
  • Use aggregations for analytics and faceted search
  • Monitor cluster health and optimize based on metrics
  • Plan shard and replica configuration based on data size and query patterns

Resources

Comments