Introduction
Understanding how Meilisearch works internally helps developers make better architectural decisions, optimize search relevance, and troubleshoot issues. While Meilisearch is designed to be simple from a user’s perspective, its internals represent sophisticated engineering that delivers the instant search experiences users expect.
This article explores Meilisearch’s architecture, from how documents are stored and indexed to how queries are processed and ranked. We will examine the key data structures, algorithms, and optimizations that make Meilisearch fast and reliable.
Architecture Overview
Meilisearch is built in Rust, a systems programming language known for its performance and memory safety. This choice of language contributes significantly to Meilisearch’s speed and low resource usage.
High-Level Architecture
Meilisearch’s architecture consists of several key components working together:
- HTTP Server - Handles incoming requests and responses
- Index Engine - Manages document indexing and storage
- Search Engine - Processes queries and returns results
- Task System - Handles asynchronous operations
- Storage Layer - Manages persistent data
The HTTP server runs asynchronously using tokio, Rust’s async runtime, allowing it to handle many concurrent connections efficiently. When a request arrives, it is processed by the appropriate component based on the endpoint.
Document Flow
Understanding how documents flow through Meilisearch helps understand the system:
- Ingestion - Documents are received via the API
- Parsing - Documents are parsed and validated
- Tokenization - Text fields are tokenized into searchable terms
- Indexing - Tokens are added to the inverted index
- Storage - Documents and index data are persisted
This pipeline is designed to be efficient while maintaining data consistency. Meilisearch uses eventual consistency for better write throughput, ensuring that newly indexed documents become searchable within milliseconds.
The Inverted Index
The inverted index is the fundamental data structure that enables fast full-text search. Unlike a forward index that maps documents to terms, an inverted index maps terms to documents.
How Inverted Index Works
In a traditional database, searching for documents containing a word requires scanning every document. With an inverted index, Meilisearch can instantly locate all documents containing a term.
Consider these documents:
Doc 1: "The quick brown fox"
Doc 2: "The lazy dog"
Doc 3: "A quick fox"
The inverted index would look like:
"the" -> [Doc 1, Doc 2]
"quick" -> [Doc 1, Doc 3]
"brown" -> [Doc 1]
"fox" -> [Doc 1, Doc 3]
"lazy" -> [Doc 2]
"dog" -> [Doc 2]
"a" -> [Doc 3]
When searching for “quick fox”, Meilisearch:
- Looks up “quick” in the inverted index -> [Doc 1, Doc 3]
- Looks up “fox” in the inverted index -> [Doc 1, Doc 3]
- Intersects the results -> [Doc 1, Doc 3]
This intersection operation is extremely fast, typically taking microseconds even for large document sets.
Index Structure
Meilisearch stores the inverted index efficiently using various compression techniques:
- Postings lists - Lists of document IDs for each term
- Term dictionary - Sorted list of all unique terms
- Term frequencies - How often each term appears in each document
- Positions - Where each term appears in each document
This structure allows Meilisearch to quickly find matching documents and calculate relevance scores.
Segment-Based Indexing
Meilisearch uses a segment-based approach to indexing. New documents are added to an in-memory segment, and periodically, the segment is flushed to disk as an immutable segment file.
Benefits of segment-based indexing include:
- Fast writes - New documents are added to memory
- Crash safety - Periodic flushing to disk
- Concurrency - Multiple segments can be searched in parallel
- Compaction - Segments can be merged to optimize performance
When searching, Meilisearch queries all relevant segments and merges the results, presenting a unified view to the user.
BM25 Ranking Algorithm
Meilisearch uses BM25 (Best Matching 25) as its primary relevance ranking algorithm. BM25 is a probabilistic ranking function that has proven effective over decades of information retrieval research.
Understanding BM25
BM25 calculates a relevance score based on:
- Term Frequency (TF) - How often does the term appear in the document?
- Inverse Document Frequency (IDF) - How rare or common is the term across all documents?
- Document Length - Are we comparing fairly across different document lengths?
The BM25 formula:
score(Q, D) = ฮฃ IDF(qi) * (f(qi, D) * (k1 + 1)) / (f(qi, D) + k1 * (1 - b + b * |D| / avgdl))
Where:
Qis the queryDis the documentqiis a query termf(qi, D)is the term frequency|D|is the document lengthavgdlis the average document lengthk1andbare tuning parameters (typically k1=1.5, b=0.75)
Why BM25?
BM25 remains popular because it:
- Works well without extensive tuning
- Handles term frequency saturation (diminishing returns for repeated terms)
- Accounts for document length normalization
- Has solid theoretical foundation in probabilistic relevance
BM25 in Meilisearch
Meilisearch implements BM25 with some optimizations:
# Conceptual BM25 implementation
def bm25_score(doc_terms, query_terms, idf, avg_doc_len, k1=1.5, b=0.75):
score = 0.0
doc_len = len(doc_terms)
for term in query_terms:
if term not in idf:
continue
tf = doc_terms.count(term)
idf_weight = idf[term]
# BM25 formula component
numerator = tf * (k1 + 1)
denominator = tf + k1 * (1 - b + b * (doc_len / avg_doc_len))
score += idf_weight * (numerator / denominator)
return score
Configuring BM25 Parameters
Meilisearch allows tuning BM25 parameters through settings:
curl -X PATCH 'http://localhost:7700/indexes/books/settings' \
-H 'Authorization: Bearer your_master_key' \
-H 'Content-Type: application/json' \
-d '{
"rankingRules": [
"words",
"typo",
"proximity",
"attribute",
"sort",
"exactness"
]
}'
The “words” rule controls how much term frequency matters in the ranking.
Tokenization
Tokenization is the process of converting raw text into searchable tokens. This step is crucial for search quality and is one of the areas where Meilisearch excels.
How Tokenization Works
Meilisearch’s tokenizer performs several operations:
- Normalization - Convert to lowercase, remove accents
- Segmentation - Split text into words
- Filtering - Remove stop words and punctuation
- Stemming - Reduce words to their root form (language-specific)
For example, the text “The Quick Brown Fox!” might be tokenized to:
["quick", "brown", "fox"]
Normalization
Normalization ensures that variations of a word match each other:
- Case folding - “Search” becomes “search”
- Accent removal - “cafรฉ” becomes “cafe”
- Character folding - Various Unicode forms are standardized
This means searches for “Search” will find documents containing “search”, “SEARCH”, or “sรArch”.
Language-Specific Processing
Meilisearch handles multiple languages with language-specific tokenization:
# Configure for French
curl -X PATCH 'http://localhost:7700/indexes/books/settings' \
-H 'Authorization: Bearer your_master_key' \
-H 'Content-Type: application/json' \
-d '{
"tokenizer": {
"documentSeparator": ".",
"maxTokenLength": 500
}
}'
For optimal results with different languages, configure the appropriate settings for your content language.
N-grams
Meilisearch supports n-gram tokenization for partial matching:
curl -X PATCH 'http://localhost:7700/indexes/books/settings' \
-H 'Authorization: Bearer your_master_key' \
-H 'Content-Type: application/json' \
-d '{
"ngramsEnabled": true,
"maxNgramDistance": 2
}'
This enables searching for partial words, which is useful for autocomplete and certain languages.
Caching Strategies
Caching is essential for Meilisearch’s fast response times. The search engine employs multiple caching strategies.
Query Cache
Meilisearch caches recent query results:
{
"query": "gatsby",
"hits": [...],
"cache": true
}
This cache is automatically invalidated when relevant documents change, ensuring results remain fresh.
Filter and Sort Caches
Results of expensive operations like filtering and sorting are cached:
- Filter results are cached per unique filter combination
- Sort operations are cached and reused
- Facet distributions are cached until data changes
Cache Invalidation
Meilisearch automatically invalidates caches when:
- New documents are added to an index
- Documents are updated or deleted
- Settings change that affect search results
This automatic invalidation ensures users always see current results without manual intervention.
Memory Management
Efficient memory usage is crucial for performance. Meilisearch manages memory through several mechanisms.
Memory-Mapped Files
Meilisearch uses memory-mapped files (mmap) for index data:
- Index files are mapped to virtual memory
- The operating system handles paging
- Memory is allocated on-demand as pages are accessed
Benefits include:
- Fast startup times
- Efficient memory usage across processes
- Operating system manages caching
In-Memory Structures
Some data structures are kept entirely in memory:
- Filter caches
- Recent query caches
- Segment metadata
This in-memory storage provides microsecond access times for frequently used data.
Resource Limits
Configure resource limits to control memory usage:
MEILI_MAX_INDEXING_MEMORY=2GiB
MEILI_MAX_INDEXING_THREADS=4
These settings prevent Meilisearch from consuming excessive resources.
Query Processing Pipeline
Understanding how queries are processed helps optimize search performance.
Query Processing Stages
When a search request arrives, Meilisearch processes it through several stages:
- Parse - Parse the query string
- Tokenize - Convert query to tokens
- Plan - Determine which indexes and segments to search
- Execute - Retrieve matching documents from each segment
- Rank - Calculate relevance scores
- Merge - Combine results from all segments
- Format - Apply highlighting, truncation, pagination
Each stage is optimized for speed, and the entire pipeline typically completes in under 50 milliseconds.
Parallel Execution
Meilisearch parallelizes query execution:
- Segment Parallelism - Multiple segments are searched simultaneously
- Thread Pool - Search threads are managed in a pool
- Async I/O - Non-blocking operations maximize throughput
Configure parallelism:
MEILI_MAX_SEARCH_THREADS=4
Early Termination
To maintain fast response times, Meilisearch can terminate searches early:
- Stop searching additional segments once enough results are found
- Limit the number of documents scored
- Apply pagination limits
This ensures that even complex queries return quickly.
Typo Tolerance Implementation
Typo tolerance is one of Meilisearch’s signature features. Let’s examine how it works internally.
Edit Distance
Meilisearch calculates the edit distance (Levenshtein distance) between query terms and indexed terms:
levenshtein("gats", "gatsby") = 2
levenshtein("gatsby", "gats") = 2
levenshtein("gatsby", "great") = 3
Tolerance Configuration
The typo tolerance algorithm considers:
- Word length - Longer words allow more typos
- Edit distance - Number of insertions, deletions, or substitutions
- Character swaps - Transposition of adjacent characters
Default settings:
- Words under 4 characters: 0 typos allowed
- Words 4-6 characters: 1 typo allowed
- Words over 6 characters: 2 typos allowed
Matching with Typos
When processing a query like “gatsby”:
- Meilisearch looks for exact matches first
- If no matches, it searches for terms within the allowed edit distance
- Results are ranked by relevance, with exact matches ranking higher
This provides the “search-as-you-type” experience users expect.
Ranking Customization
Meilisearch’s ranking can be customized to match specific requirements.
Ranking Rules
The ranking rules control how results are ordered:
curl -X PATCH 'http://localhost:7700/indexes/books/settings' \
-H 'Authorization: Bearer your_master_key' \
-H 'Content-Type: application/json' \
-d '{
"rankingRules": [
"words", # Number of matched words
"typo", # Fewer typos preferred
"proximity", # Words closer together
"attribute", # Important attributes
"sort", # Sorted field
"exactness" # Exact matches preferred
]
}'
Custom Weights
Apply different weights to different attributes:
curl -X PATCH 'http://localhost:7700/indexes/books/settings' \
-H 'Authorization: Bearer your_master_key' \
-H 'Content-Type: application/json' \
-d '{
"searchableAttributes": [
"title(10)",
"author(5)",
"description(1)"
]
}'
This gives titles 10x the weight of descriptions.
Tie Breaking
When results have equal relevance scores, tie-breaking rules determine the order:
curl -X PATCH 'http://localhost:7700/indexes/books/settings' \
-H 'Authorization: Bearer your_master_key' \
-H 'Content-Type: application/json' \
-d '{
"rankingRules": [
"words",
"typo",
"proximity",
"attribute",
"sort:publication_year:desc",
"exactness"
]
}'
Performance Characteristics
Understanding Meilisearch’s performance characteristics helps with capacity planning.
Latency
Meilisearch provides consistent, low-latency search:
- Typical query: 1-10ms for most queries
- Complex queries: 10-50ms for complex filters
- Large result sets: 50-100ms with pagination
These times are consistent regardless of index size due to the inverted index structure.
Throughput
Search throughput depends on hardware and query complexity:
- Simple queries: 10,000+ queries per second
- Complex queries: 1,000-5,000 queries per second
- Indexing: 10,000-50,000 documents per second
Scalability
Meilisearch scales well within a single node:
- Documents: Tested up to 10 million documents
- Index size: Up to 100GB per index
- Memory: Requires approximately 10-20% of index size
For larger deployments, consider sharding across multiple instances.
Debugging and Diagnostics
Understanding internals helps with troubleshooting.
Task Inspection
Monitor indexing tasks:
curl -X GET 'http://localhost:7700/tasks' \
-H 'Authorization: Bearer your_master_key"
Index Statistics
Get detailed index statistics:
curl -X GET 'http://localhost:7700/indexes/books/stats' \
-H 'Authorization: Bearer your_master_key"
Returns:
- Number of documents
- Indexing status
- Field distribution
- Index size
Logging
Enable detailed logging for debugging:
MEILI_LOG_LEVEL=debug
MEILI_LOG_FILE=/var/log/meilisearch.log
External Resources
Conclusion
Meilisearch’s internals represent sophisticated engineering that delivers the fast, relevant search experiences users expect. The combination of Rust’s performance, the inverted index structure, BM25 ranking, and smart tokenization creates a search engine that excels in both speed and relevance.
Understanding these internals helps developers:
- Make better configuration decisions
- Optimize search relevance
- Troubleshoot issues effectively
- Plan capacity appropriately
In the next article, we will explore Meilisearch’s recent developments and what to expect in 2025-2026.
Comments