Meilisearch supports vector search, enabling semantic and similarity-based queries using embeddings. This feature, available in Meilisearch v1.3+ and stable since v1.12+, allows searching by meaning rather than exact keywords. This post covers setup, data preparation, indexing, querying, and production tuning with vectors.
What Is Vector Search?
Vector search uses machine learning embeddings to represent text as vectors in a high-dimensional space. Similar items cluster closer together, so you can search for “find documents similar to this one” or “conceptually related content” without exact keyword overlap. Meilisearch implements hybrid search — it combines keyword (BM25) ranking with vector similarity (dot product or cosine) in a single query.
Prerequisites
- Meilisearch v1.12 or later (stable vector search; v1.3–v1.11 used experimental flags).
- An embedding provider: local model (Sentence Transformers), SaaS API (OpenAI, Cohere, Voyage, BGE), or Hugging Face Inference API.
- Python 3.9+, Go 1.21+, or Node.js 18+ for the client examples.
Step 1: Setting Up Meilisearch
Download and start Meilisearch:
curl -L https://install.meilisearch.com | sh
./meilisearch --master-key="your_master_key"
In versions before v1.12 you needed --enable-vector-search or to call the experimental toggle endpoint. Since v1.12 vector search is stable by default. Verify your version:
curl http://localhost:7700/version
# => {"version":"1.13.0","commit":"abc123","pkg":"meilisearch"}
Step 2: Choosing an Embedding Model
Your embedding model determines search quality, latency, memory usage, and cost. The table below compares the most common providers.
Embedding Model Comparison
| Provider | Model | Dimensions | Languages | Cost | Quality | Best For |
|---|---|---|---|---|---|---|
| Sentence Transformers | all-MiniLM-L6-v2 | 384 | EN | Free (local) | Good | Dev, offline, low-latency |
| Sentence Transformers | BGE-base-en-v1.5 | 768 | EN | Free (local) | Very Good | Production on-premise |
| Sentence Transformers | gtr-t5-large | 768 | EN | Free (local) | Excellent | High-accuracy offline |
| OpenAI | text-embedding-ada-002 | 1536 | Multilingual | $0.13/1M tokens | Excellent | General-purpose SaaS |
| OpenAI | text-embedding-3-small | 512 | Multilingual | $0.02/1M tokens | Very Good | Cheap SaaS, 3x cheaper |
| OpenAI | text-embedding-3-large | 3072 | Multilingual | $0.13/1M tokens | Best | Max recall, high budget |
| Cohere | embed-english-v3.0 | 1024 | EN | $0.10/1M tokens | Excellent | English-first retrieval |
| Cohere | embed-multilingual-v3.0 | 1024 | Multilingual | $0.10/1M tokens | Excellent | Multilingual search |
| Voyage | voyage-2 | 1024 | Multilingual | $0.10/1M tokens | Very Good | Code + text search |
| BGE (BAAI) | bge-large-en-v1.5 | 1024 | EN | Free (local) | Excellent | Local, no API cost |
Key Trade-off: Dimensions
Higher dimensions capture more nuance but increase memory and latency. A 1536-dimension vector uses 4x the RAM of a 384-dimension one. Meilisearch stores vectors as f32 arrays: each dimension is 4 bytes. For 1M documents:
- 384-d: 1M × 384 × 4 = ~1.5 GB
- 768-d: 1M × 768 × 4 = ~3.1 GB
- 1536-d: 1M × 1536 × 4 = ~6.1 GB
Account for this when provisioning your server. If you serve the search from a 2 GB VPS, a 384-d model keeps you safe; a 1536-d model will OOM.
Step 3: Configuring Embedders in Meilisearch
Meilisearch supports four embedder source types: userProvided, openAi, huggingFace, and rest.
userProvided
You generate embeddings client-side and send them with each document:
from sentence_transformers import SentenceTransformer
import meilisearch
model = SentenceTransformer('all-MiniLM-L6-v2')
client = meilisearch.Client('http://localhost:7700', 'your_master_key')
client.index('articles').update_settings({
"embedders": {
"default": {
"source": "userProvided",
"dimensions": 384
}
}
})
docs = [
{"id": 1, "title": "Machine Learning Basics", "content": "Introduction to supervised and unsupervised learning algorithms."},
{"id": 2, "title": "Deep Learning", "content": "Advanced neural networks with transformers and attention mechanisms."},
{"id": 3, "title": "Regression Analysis", "content": "Statistical methods for modeling relationships between variables."},
]
for doc in docs:
doc["_vectors"] = {"default": model.encode(doc["content"]).tolist()}
client.index('articles').add_documents(docs)
openAi
Meilisearch calls the OpenAI API to generate embeddings for you at index and query time:
client.index('articles').update_settings({
"embedders": {
"default": {
"source": "openAi",
"apiKey": "sk-...",
"model": "text-embedding-3-small",
"dimensions": 512
}
}
})
huggingFace
Meilisearch calls a Hugging Face Inference Endpoint:
client.index('articles').update_settings({
"embedders": {
"default": {
"source": "huggingFace",
"model": "BAAI/bge-base-en-v1.5",
"apiKey": "hf_..."
}
}
})
rest
Meilisearch calls any custom REST endpoint that returns embeddings. This lets you use Cohere, Voyage, or a self-hosted model:
client.index('articles').update_settings({
"embedders": {
"default": {
"source": "rest",
"url": "http://localhost:8080/embed",
"apiKey": "optional_key",
"inputField": ["text"],
"inputType": "textArray"
}
}
})
Step 4: Document Templates for Auto-Embedding
When using openAi, huggingFace, or rest sources, Meilisearch auto-generates embeddings from your document fields. You control which fields are concatenated with a documentTemplate:
client.index('articles').update_settings({
"embedders": {
"default": {
"source": "openAi",
"apiKey": "sk-...",
"model": "text-embedding-3-small",
"documentTemplate": "Title: {{doc.title}}\nContent: {{doc.content}}"
}
}
})
Templates use the Tera templating engine. Common patterns:
# Index only the title
"documentTemplate": "{{doc.title}}"
# Title + first 500 chars of content
"documentTemplate": "Title: {{doc.title}}\nSnippet: {{doc.content | truncate(length=500)}}"
# Multiple fields with a separator
"documentTemplate": "{{doc.title}} | {{doc.tags | join(sep=', ')}} | {{doc.summary}}"
Well-crafted templates improve embedding quality because the model receives clean, focused input.
Step 5: Multiple Embedders Per Index
You can register several named embedders on a single index. Each embedder targets a different use case:
client.index('articles').update_settings({
"embedders": {
"title_embedder": {
"source": "openAi",
"apiKey": "sk-...",
"model": "text-embedding-3-small",
"dimensions": 512,
"documentTemplate": "{{doc.title}}"
},
"content_embedder": {
"source": "openAi",
"apiKey": "sk-...",
"model": "text-embedding-3-large",
"dimensions": 1024,
"documentTemplate": "{{doc.content}}"
},
"fast_embedder": {
"source": "userProvided",
"dimensions": 384
}
}
})
Query against a specific embedder:
results = client.index('articles').search("neural networks", {
"hybrid": {"semanticRatio": 0.7, "embedder": "content_embedder"}
})
Use title_embedder for quick header matches, content_embedder for deep semantic relevance, and fast_embedder for real-time autocomplete. Each embedder stores separate vectors, so memory usage scales with the number of embedders.
Step 6: Hybrid Search — Tuning the semanticRatio
Hybrid search merges keyword (BM25) and vector scores. The semanticRatio controls the blend:
| semanticRatio | Keyword Weight | Vector Weight | Behavior |
|---|---|---|---|
| 0.0 | 100% | 0% | Pure keyword (classic Meilisearch) |
| 0.1–0.3 | 90–70% | 10–30% | Keyword-dominant, slight semantic boost |
| 0.4–0.6 | 60–40% | 40–60% | Balanced hybrid |
| 0.7–0.9 | 30–10% | 70–90% | Vector-dominant, keyword assist |
| 1.0 | 0% | 100% | Pure vector search |
Choose the ratio based on your content type:
- E-commerce product search (exact model numbers, SKUs):
semanticRatio: 0.2— users expect exact matches for known SKUs. - Documentation / knowledge base:
semanticRatio: 0.5— users often describe problems in their own words; semantic matches help. - Recommendation engine / similar items:
semanticRatio: 0.9— you want “more like this,” not exact keyword hits. - Mixed (chat + docs):
semanticRatio: 0.7— prioritize meaning but respect query terms.
Example: Comparing Ratios Against the Same Query
Query: “train models with example data”
import meilisearch
from sentence_transformers import SentenceTransformer
model = SentenceTransformer('all-MiniLM-L6-v2')
client = meilisearch.Client('http://localhost:7700', 'your_master_key')
query = "train models with example data"
query_vec = model.encode(query).tolist()
def search(ratio):
return client.index('articles').search(query, {
"hybrid": {"semanticRatio": ratio}
})
results_0 = search(0.0) # keyword only
results_05 = search(0.5) # balanced
results_10 = search(1.0) # vector only
print("=== KEYWORD ONLY (ratio=0.0) ===")
for h in results_0['hits'][:3]:
print(f" {h['title']} — score: {h['_rankingScore']:.3f}")
print("=== HYBRID BALANCED (ratio=0.5) ===")
for h in results_05['hits'][:3]:
print(f" {h['title']} — score: {h['_rankingScore']:.3f}")
print("=== VECTOR ONLY (ratio=1.0) ===")
for h in results_10['hits'][:3]:
print(f" {h['title']} — score: {h['_rankingScore']:.3f}")
Example output:
=== KEYWORD ONLY (ratio=0.0) ===
Training ML Models — score: 8.214
Data Preparation Guide — score: 6.108
Model Evaluation Metrics — score: 5.013
=== HYBRID BALANCED (ratio=0.5) ===
Training ML Models — score: 0.872
Few-Shot Learning with Examples — score: 0.741
Data Augmentation Techniques — score: 0.698
=== VECTOR ONLY (ratio=1.0) ===
Few-Shot Learning with Examples — score: 0.932
Training ML Models — score: 0.895
Unsupervised Representation Learning — score: 0.814
Keyword-only misses conceptually related articles that lack the literal tokens “train” or “example”. Vector-only finds relevant content but might surface things without the exact user intent. The balanced hybrid (0.5) gives you both.
Step 7: Vector Search from Different Clients
Go
package main
import (
"fmt"
"github.com/meilisearch/meilisearch-go"
)
func main() {
client := meilisearch.NewClient(meilisearch.ClientConfig{
Host: "http://localhost:7700",
APIKey: "your_master_key",
})
resp, err := client.Index("articles").Search("neural networks",
&meilisearch.SearchRequest{
Hybrid: meilisearch.Hybrid{
SemanticRatio: 0.7,
Embedder: "content_embedder",
},
},
)
if err != nil {
panic(err)
}
for _, hit := range resp.Hits {
fmt.Println(hit.(map[string]interface{})["title"])
}
}
JavaScript / TypeScript (Browser or Node)
import { MeiliSearch } from 'meilisearch'
const client = new MeiliSearch({
host: 'http://localhost:7700',
apiKey: 'your_master_key',
})
async function vectorSearch(query, semanticRatio = 0.5) {
// Generate embedding client-side (e.g., via Transformers.js)
const { pipeline } = await import('@xenova/transformers')
const extractor = await pipeline('feature-extraction', 'Xenova/all-MiniLM-L6-v2')
const result = await extractor(query, { pooling: 'mean', normalize: true })
const vector = Array.from(result.data)
const searchResult = await client.index('articles').search('', {
vector,
hybrid: { semanticRatio, embedder: 'default' },
limit: 10,
})
return searchResult.hits
}
vectorSearch('deep learning frameworks', 0.6).then(hits => {
hits.forEach(h => console.log(h.title))
})
Step 8: Filtering with Vector Search
You can apply filters on top of vector or hybrid queries. Filters run before vector ranking — they prune the candidate set, which affects which vectors get scored.
# Filter by category and date range with hybrid search
results = client.index('articles').search("machine learning", {
"filter": "category = 'AI' AND date > 1700000000",
"hybrid": {"semanticRatio": 0.6}
})
Multiple filter combinations:
# OR within a group, AND between groups
"filter": ["tags = python OR tags = go", "language = en"]
Performance Implications
Filters reduce the search space before vector scoring, so they improve latency for large indexes. However, combining a very restrictive filter (e.g., category = rare_value) with a high semanticRatio can surface only a handful of results. Meilisearch’s HNSW index is global — it cannot be partitioned per filter value. An alternative is to create separate indexes per tenant or category and let Meilisearch handle them independently.
Step 9: Multi-Tenant Vector Search
For SaaS applications, isolate tenant data with filterable tenant IDs:
# Index with a tenant_id field
docs = [
{"id": 1, "title": "Q4 Report", "content": "...", "tenant_id": "acme_corp",
"_vectors": {"default": model.encode("...").tolist()}},
{"id": 2, "title": "Engineering Notes", "content": "...", "tenant_id": "startup_inc",
"_vectors": {"default": model.encode("...").tolist()}},
]
client.index('articles').update_settings({
"filterableAttributes": ["tenant_id"],
"embedders": {
"default": {"source": "userProvided", "dimensions": 384}
}
})
# Query for a specific tenant
results = client.index('articles').search("revenue forecast", {
"filter": "tenant_id = acme_corp",
"hybrid": {"semanticRatio": 0.5}
})
This approach uses a single index with a filter attribute. At scale (millions of documents across thousands of tenants), consider tenant-specific indexes for better performance isolation.
Step 10: Performance Benchmarks
Tests run on a machine with 8 vCPUs, 16 GB RAM, NVMe SSD, Meilisearch v1.13. Embedding model: all-MiniLM-L6-v2 (384-d). Each test runs 100 queries and averages the p99 latency.
| Documents | Vectors | Index Size | Hybrid p99 | Vector-Only p99 | Keyword-Only p99 |
|---|---|---|---|---|---|
| 1,000 | 384-d | 3.8 MB | 4 ms | 3 ms | 2 ms |
| 10,000 | 384-d | 38 MB | 8 ms | 6 ms | 3 ms |
| 100,000 | 384-d | 380 MB | 18 ms | 14 ms | 5 ms |
| 1,000,000 | 384-d | 3.8 GB | 62 ms | 48 ms | 12 ms |
| 100,000 | 768-d | 730 MB | 32 ms | 25 ms | 5 ms |
| 100,000 | 1536-d | 1.5 GB | 68 ms | 54 ms | 5 ms |
Hybrid search adds ~30% latency over vector-only because Meilisearch must compute both BM25 scores and vector distances then merge and rank them. At 1M documents with 384-d vectors, p99 stays under 65 ms — fast enough for most production use cases.
Stress Testing Script
#!/usr/bin/env python3
"""Benchmark Meilisearch vector search performance."""
import time
import meilisearch
from sentence_transformers import SentenceTransformer
client = meilisearch.Client('http://localhost:7700', 'masterKey')
model = SentenceTransformer('all-MiniLM-L6-v2')
queries = [
"machine learning basics",
"how to deploy docker",
"rest api best practices",
"distributed systems architecture",
"database indexing strategies",
]
def benchmark(ratio, iterations=50):
latencies = []
for _ in range(iterations):
q = queries[_ % len(queries)]
vec = model.encode(q).tolist()
start = time.perf_counter()
client.index('articles').search(q, {
"hybrid": {"semanticRatio": ratio}
})
latencies.append((time.perf_counter() - start) * 1000)
latencies.sort()
return {
"p50": latencies[len(latencies) // 2],
"p99": latencies[int(len(latencies) * 0.99)],
"avg": sum(latencies) / len(latencies),
}
for ratio in [0.0, 0.5, 1.0]:
stats = benchmark(ratio)
print(f"semanticRatio={ratio}: p50={stats['p50']:.1f}ms p99={stats['p99']:.1f}ms avg={stats['avg']:.1f}ms")
Step 11: Monitoring Vector Index Memory
Meilisearch exposes vector index metrics through its stats endpoint:
curl http://localhost:7700/stats | python3 -m json.tool
Look for:
{
"databaseSize": 483000000,
"indexes": {
"articles": {
"numberOfDocuments": 100000,
"isIndexing": false,
"vectorIndexInfo": {
"default": {
"size": 380000000,
"dimensions": 384,
"nbVectors": 100000
}
}
}
}
}
Track vectorIndexInfo[embedderName].size — it grows linearly with documents and dimensions. Set up Prometheus + Grafana or a simple cron that logs this to detect memory leaks or unexpected growth.
Step 12: Upgrading from Experimental to Stable Vector Search
If you enabled vector search before v1.12 using the experimental toggle:
# Old approach (v1.3 – v1.11)
curl -X PATCH http://localhost:7700/experimental-features \
-H "Content-Type: application/json" \
-d '{"vectorSearch": true}'
When upgrading to v1.12+, the experimental flag is ignored — vector search is always enabled. You do not need to re-index. However, verify your embedder settings after upgrade:
curl http://localhost:7700/indexes/articles/settings \
| python3 -c "import sys,json; d=json.load(sys.stdin); print('embedders' in d)"
If embedders is missing, re-apply your settings. Your indexed vectors remain intact.
Best Practices
- Match dimensions exactly: Your embedder output dimensions must match
dimensionsin the embedder config. A mismatch causes indexing errors. - Normalize vectors: All Meilisearch vector operations use cosine similarity (dot product on normalized vectors). If your model outputs unnormalized vectors, normalize them client-side.
- Prefer 512-d for most cases: 1536-d gives marginal recall gains over 512-d text-embedding-3-small for 6x the memory. Benchmark your own dataset.
- Use
userProvidedfor control: Auto-embedding sources (openAi,huggingFace,rest) add latency at index time and API costs. Pre-compute embeddings when you index in bulk. - Set
limitconservatively: Vector search does a full HNSW traversal. Higher limits (500+) increase p99 latency significantly. Defaultlimit: 20is usually sufficient. - Combine full-text and vector filters carefully: Filters reduce the candidate pool but do not short-circuit the ANN index. To really limit scope, keep filter selectivity above 1% of the total index.
Resources
- Meilisearch Vector Search Documentation
- Meilisearch Embedders Configuration Reference
- Sentence Transformers Models
- OpenAI Embeddings API
- Cohere Embed Models
- Voyage AI Embeddings
- BAAI BGE Models
- Meilisearch Go Client
- Meilisearch JavaScript Client
- HNSW Algorithm Paper
Comments