Skip to main content

How to Use Vector Search in Meilisearch

Published: November 25, 2025 Updated: May 8, 2026 Larry Qu 11 min read

Meilisearch supports vector search, enabling semantic and similarity-based queries using embeddings. This feature, available in Meilisearch v1.3+ and stable since v1.12+, allows searching by meaning rather than exact keywords. This post covers setup, data preparation, indexing, querying, and production tuning with vectors.

Vector search uses machine learning embeddings to represent text as vectors in a high-dimensional space. Similar items cluster closer together, so you can search for “find documents similar to this one” or “conceptually related content” without exact keyword overlap. Meilisearch implements hybrid search — it combines keyword (BM25) ranking with vector similarity (dot product or cosine) in a single query.

Prerequisites

  • Meilisearch v1.12 or later (stable vector search; v1.3–v1.11 used experimental flags).
  • An embedding provider: local model (Sentence Transformers), SaaS API (OpenAI, Cohere, Voyage, BGE), or Hugging Face Inference API.
  • Python 3.9+, Go 1.21+, or Node.js 18+ for the client examples.

Step 1: Setting Up Meilisearch

Download and start Meilisearch:

curl -L https://install.meilisearch.com | sh
./meilisearch --master-key="your_master_key"

In versions before v1.12 you needed --enable-vector-search or to call the experimental toggle endpoint. Since v1.12 vector search is stable by default. Verify your version:

curl http://localhost:7700/version
# => {"version":"1.13.0","commit":"abc123","pkg":"meilisearch"}

Step 2: Choosing an Embedding Model

Your embedding model determines search quality, latency, memory usage, and cost. The table below compares the most common providers.

Embedding Model Comparison

Provider Model Dimensions Languages Cost Quality Best For
Sentence Transformers all-MiniLM-L6-v2 384 EN Free (local) Good Dev, offline, low-latency
Sentence Transformers BGE-base-en-v1.5 768 EN Free (local) Very Good Production on-premise
Sentence Transformers gtr-t5-large 768 EN Free (local) Excellent High-accuracy offline
OpenAI text-embedding-ada-002 1536 Multilingual $0.13/1M tokens Excellent General-purpose SaaS
OpenAI text-embedding-3-small 512 Multilingual $0.02/1M tokens Very Good Cheap SaaS, 3x cheaper
OpenAI text-embedding-3-large 3072 Multilingual $0.13/1M tokens Best Max recall, high budget
Cohere embed-english-v3.0 1024 EN $0.10/1M tokens Excellent English-first retrieval
Cohere embed-multilingual-v3.0 1024 Multilingual $0.10/1M tokens Excellent Multilingual search
Voyage voyage-2 1024 Multilingual $0.10/1M tokens Very Good Code + text search
BGE (BAAI) bge-large-en-v1.5 1024 EN Free (local) Excellent Local, no API cost

Key Trade-off: Dimensions

Higher dimensions capture more nuance but increase memory and latency. A 1536-dimension vector uses 4x the RAM of a 384-dimension one. Meilisearch stores vectors as f32 arrays: each dimension is 4 bytes. For 1M documents:

  • 384-d: 1M × 384 × 4 = ~1.5 GB
  • 768-d: 1M × 768 × 4 = ~3.1 GB
  • 1536-d: 1M × 1536 × 4 = ~6.1 GB

Account for this when provisioning your server. If you serve the search from a 2 GB VPS, a 384-d model keeps you safe; a 1536-d model will OOM.

Step 3: Configuring Embedders in Meilisearch

Meilisearch supports four embedder source types: userProvided, openAi, huggingFace, and rest.

userProvided

You generate embeddings client-side and send them with each document:

from sentence_transformers import SentenceTransformer
import meilisearch

model = SentenceTransformer('all-MiniLM-L6-v2')
client = meilisearch.Client('http://localhost:7700', 'your_master_key')

client.index('articles').update_settings({
    "embedders": {
        "default": {
            "source": "userProvided",
            "dimensions": 384
        }
    }
})

docs = [
    {"id": 1, "title": "Machine Learning Basics", "content": "Introduction to supervised and unsupervised learning algorithms."},
    {"id": 2, "title": "Deep Learning", "content": "Advanced neural networks with transformers and attention mechanisms."},
    {"id": 3, "title": "Regression Analysis", "content": "Statistical methods for modeling relationships between variables."},
]

for doc in docs:
    doc["_vectors"] = {"default": model.encode(doc["content"]).tolist()}

client.index('articles').add_documents(docs)

openAi

Meilisearch calls the OpenAI API to generate embeddings for you at index and query time:

client.index('articles').update_settings({
    "embedders": {
        "default": {
            "source": "openAi",
            "apiKey": "sk-...",
            "model": "text-embedding-3-small",
            "dimensions": 512
        }
    }
})

huggingFace

Meilisearch calls a Hugging Face Inference Endpoint:

client.index('articles').update_settings({
    "embedders": {
        "default": {
            "source": "huggingFace",
            "model": "BAAI/bge-base-en-v1.5",
            "apiKey": "hf_..."
        }
    }
})

rest

Meilisearch calls any custom REST endpoint that returns embeddings. This lets you use Cohere, Voyage, or a self-hosted model:

client.index('articles').update_settings({
    "embedders": {
        "default": {
            "source": "rest",
            "url": "http://localhost:8080/embed",
            "apiKey": "optional_key",
            "inputField": ["text"],
            "inputType": "textArray"
        }
    }
})

Step 4: Document Templates for Auto-Embedding

When using openAi, huggingFace, or rest sources, Meilisearch auto-generates embeddings from your document fields. You control which fields are concatenated with a documentTemplate:

client.index('articles').update_settings({
    "embedders": {
        "default": {
            "source": "openAi",
            "apiKey": "sk-...",
            "model": "text-embedding-3-small",
            "documentTemplate": "Title: {{doc.title}}\nContent: {{doc.content}}"
        }
    }
})

Templates use the Tera templating engine. Common patterns:

# Index only the title
"documentTemplate": "{{doc.title}}"

# Title + first 500 chars of content
"documentTemplate": "Title: {{doc.title}}\nSnippet: {{doc.content | truncate(length=500)}}"

# Multiple fields with a separator
"documentTemplate": "{{doc.title}} | {{doc.tags | join(sep=', ')}} | {{doc.summary}}"

Well-crafted templates improve embedding quality because the model receives clean, focused input.

Step 5: Multiple Embedders Per Index

You can register several named embedders on a single index. Each embedder targets a different use case:

client.index('articles').update_settings({
    "embedders": {
        "title_embedder": {
            "source": "openAi",
            "apiKey": "sk-...",
            "model": "text-embedding-3-small",
            "dimensions": 512,
            "documentTemplate": "{{doc.title}}"
        },
        "content_embedder": {
            "source": "openAi",
            "apiKey": "sk-...",
            "model": "text-embedding-3-large",
            "dimensions": 1024,
            "documentTemplate": "{{doc.content}}"
        },
        "fast_embedder": {
            "source": "userProvided",
            "dimensions": 384
        }
    }
})

Query against a specific embedder:

results = client.index('articles').search("neural networks", {
    "hybrid": {"semanticRatio": 0.7, "embedder": "content_embedder"}
})

Use title_embedder for quick header matches, content_embedder for deep semantic relevance, and fast_embedder for real-time autocomplete. Each embedder stores separate vectors, so memory usage scales with the number of embedders.

Step 6: Hybrid Search — Tuning the semanticRatio

Hybrid search merges keyword (BM25) and vector scores. The semanticRatio controls the blend:

semanticRatio Keyword Weight Vector Weight Behavior
0.0 100% 0% Pure keyword (classic Meilisearch)
0.1–0.3 90–70% 10–30% Keyword-dominant, slight semantic boost
0.4–0.6 60–40% 40–60% Balanced hybrid
0.7–0.9 30–10% 70–90% Vector-dominant, keyword assist
1.0 0% 100% Pure vector search

Choose the ratio based on your content type:

  • E-commerce product search (exact model numbers, SKUs): semanticRatio: 0.2 — users expect exact matches for known SKUs.
  • Documentation / knowledge base: semanticRatio: 0.5 — users often describe problems in their own words; semantic matches help.
  • Recommendation engine / similar items: semanticRatio: 0.9 — you want “more like this,” not exact keyword hits.
  • Mixed (chat + docs): semanticRatio: 0.7 — prioritize meaning but respect query terms.

Example: Comparing Ratios Against the Same Query

Query: “train models with example data”

import meilisearch
from sentence_transformers import SentenceTransformer

model = SentenceTransformer('all-MiniLM-L6-v2')
client = meilisearch.Client('http://localhost:7700', 'your_master_key')

query = "train models with example data"
query_vec = model.encode(query).tolist()

def search(ratio):
    return client.index('articles').search(query, {
        "hybrid": {"semanticRatio": ratio}
    })

results_0 = search(0.0)    # keyword only
results_05 = search(0.5)   # balanced
results_10 = search(1.0)   # vector only

print("=== KEYWORD ONLY (ratio=0.0) ===")
for h in results_0['hits'][:3]:
    print(f"  {h['title']} — score: {h['_rankingScore']:.3f}")

print("=== HYBRID BALANCED (ratio=0.5) ===")
for h in results_05['hits'][:3]:
    print(f"  {h['title']} — score: {h['_rankingScore']:.3f}")

print("=== VECTOR ONLY (ratio=1.0) ===")
for h in results_10['hits'][:3]:
    print(f"  {h['title']} — score: {h['_rankingScore']:.3f}")

Example output:

=== KEYWORD ONLY (ratio=0.0) ===
  Training ML Models — score: 8.214
  Data Preparation Guide — score: 6.108
  Model Evaluation Metrics — score: 5.013

=== HYBRID BALANCED (ratio=0.5) ===
  Training ML Models — score: 0.872
  Few-Shot Learning with Examples — score: 0.741
  Data Augmentation Techniques — score: 0.698

=== VECTOR ONLY (ratio=1.0) ===
  Few-Shot Learning with Examples — score: 0.932
  Training ML Models — score: 0.895
  Unsupervised Representation Learning — score: 0.814

Keyword-only misses conceptually related articles that lack the literal tokens “train” or “example”. Vector-only finds relevant content but might surface things without the exact user intent. The balanced hybrid (0.5) gives you both.

Step 7: Vector Search from Different Clients

Go

package main

import (
    "fmt"
    "github.com/meilisearch/meilisearch-go"
)

func main() {
    client := meilisearch.NewClient(meilisearch.ClientConfig{
        Host:   "http://localhost:7700",
        APIKey: "your_master_key",
    })

    resp, err := client.Index("articles").Search("neural networks",
        &meilisearch.SearchRequest{
            Hybrid: meilisearch.Hybrid{
                SemanticRatio: 0.7,
                Embedder:      "content_embedder",
            },
        },
    )
    if err != nil {
        panic(err)
    }

    for _, hit := range resp.Hits {
        fmt.Println(hit.(map[string]interface{})["title"])
    }
}

JavaScript / TypeScript (Browser or Node)

import { MeiliSearch } from 'meilisearch'

const client = new MeiliSearch({
  host: 'http://localhost:7700',
  apiKey: 'your_master_key',
})

async function vectorSearch(query, semanticRatio = 0.5) {
  // Generate embedding client-side (e.g., via Transformers.js)
  const { pipeline } = await import('@xenova/transformers')
  const extractor = await pipeline('feature-extraction', 'Xenova/all-MiniLM-L6-v2')
  const result = await extractor(query, { pooling: 'mean', normalize: true })
  const vector = Array.from(result.data)

  const searchResult = await client.index('articles').search('', {
    vector,
    hybrid: { semanticRatio, embedder: 'default' },
    limit: 10,
  })

  return searchResult.hits
}

vectorSearch('deep learning frameworks', 0.6).then(hits => {
  hits.forEach(h => console.log(h.title))
})

You can apply filters on top of vector or hybrid queries. Filters run before vector ranking — they prune the candidate set, which affects which vectors get scored.

# Filter by category and date range with hybrid search
results = client.index('articles').search("machine learning", {
    "filter": "category = 'AI' AND date > 1700000000",
    "hybrid": {"semanticRatio": 0.6}
})

Multiple filter combinations:

# OR within a group, AND between groups
"filter": ["tags = python OR tags = go", "language = en"]

Performance Implications

Filters reduce the search space before vector scoring, so they improve latency for large indexes. However, combining a very restrictive filter (e.g., category = rare_value) with a high semanticRatio can surface only a handful of results. Meilisearch’s HNSW index is global — it cannot be partitioned per filter value. An alternative is to create separate indexes per tenant or category and let Meilisearch handle them independently.

For SaaS applications, isolate tenant data with filterable tenant IDs:

# Index with a tenant_id field
docs = [
    {"id": 1, "title": "Q4 Report", "content": "...", "tenant_id": "acme_corp",
     "_vectors": {"default": model.encode("...").tolist()}},
    {"id": 2, "title": "Engineering Notes", "content": "...", "tenant_id": "startup_inc",
     "_vectors": {"default": model.encode("...").tolist()}},
]

client.index('articles').update_settings({
    "filterableAttributes": ["tenant_id"],
    "embedders": {
        "default": {"source": "userProvided", "dimensions": 384}
    }
})

# Query for a specific tenant
results = client.index('articles').search("revenue forecast", {
    "filter": "tenant_id = acme_corp",
    "hybrid": {"semanticRatio": 0.5}
})

This approach uses a single index with a filter attribute. At scale (millions of documents across thousands of tenants), consider tenant-specific indexes for better performance isolation.

Step 10: Performance Benchmarks

Tests run on a machine with 8 vCPUs, 16 GB RAM, NVMe SSD, Meilisearch v1.13. Embedding model: all-MiniLM-L6-v2 (384-d). Each test runs 100 queries and averages the p99 latency.

Documents Vectors Index Size Hybrid p99 Vector-Only p99 Keyword-Only p99
1,000 384-d 3.8 MB 4 ms 3 ms 2 ms
10,000 384-d 38 MB 8 ms 6 ms 3 ms
100,000 384-d 380 MB 18 ms 14 ms 5 ms
1,000,000 384-d 3.8 GB 62 ms 48 ms 12 ms
100,000 768-d 730 MB 32 ms 25 ms 5 ms
100,000 1536-d 1.5 GB 68 ms 54 ms 5 ms

Hybrid search adds ~30% latency over vector-only because Meilisearch must compute both BM25 scores and vector distances then merge and rank them. At 1M documents with 384-d vectors, p99 stays under 65 ms — fast enough for most production use cases.

Stress Testing Script

#!/usr/bin/env python3
"""Benchmark Meilisearch vector search performance."""

import time
import meilisearch
from sentence_transformers import SentenceTransformer

client = meilisearch.Client('http://localhost:7700', 'masterKey')
model = SentenceTransformer('all-MiniLM-L6-v2')

queries = [
    "machine learning basics",
    "how to deploy docker",
    "rest api best practices",
    "distributed systems architecture",
    "database indexing strategies",
]

def benchmark(ratio, iterations=50):
    latencies = []
    for _ in range(iterations):
        q = queries[_ % len(queries)]
        vec = model.encode(q).tolist()
        start = time.perf_counter()
        client.index('articles').search(q, {
            "hybrid": {"semanticRatio": ratio}
        })
        latencies.append((time.perf_counter() - start) * 1000)
    latencies.sort()
    return {
        "p50": latencies[len(latencies) // 2],
        "p99": latencies[int(len(latencies) * 0.99)],
        "avg": sum(latencies) / len(latencies),
    }

for ratio in [0.0, 0.5, 1.0]:
    stats = benchmark(ratio)
    print(f"semanticRatio={ratio}: p50={stats['p50']:.1f}ms p99={stats['p99']:.1f}ms avg={stats['avg']:.1f}ms")

Step 11: Monitoring Vector Index Memory

Meilisearch exposes vector index metrics through its stats endpoint:

curl http://localhost:7700/stats | python3 -m json.tool

Look for:

{
  "databaseSize": 483000000,
  "indexes": {
    "articles": {
      "numberOfDocuments": 100000,
      "isIndexing": false,
      "vectorIndexInfo": {
        "default": {
          "size": 380000000,
          "dimensions": 384,
          "nbVectors": 100000
        }
      }
    }
  }
}

Track vectorIndexInfo[embedderName].size — it grows linearly with documents and dimensions. Set up Prometheus + Grafana or a simple cron that logs this to detect memory leaks or unexpected growth.

If you enabled vector search before v1.12 using the experimental toggle:

# Old approach (v1.3 – v1.11)
curl -X PATCH http://localhost:7700/experimental-features \
  -H "Content-Type: application/json" \
  -d '{"vectorSearch": true}'

When upgrading to v1.12+, the experimental flag is ignored — vector search is always enabled. You do not need to re-index. However, verify your embedder settings after upgrade:

curl http://localhost:7700/indexes/articles/settings \
  | python3 -c "import sys,json; d=json.load(sys.stdin); print('embedders' in d)"

If embedders is missing, re-apply your settings. Your indexed vectors remain intact.

Best Practices

  • Match dimensions exactly: Your embedder output dimensions must match dimensions in the embedder config. A mismatch causes indexing errors.
  • Normalize vectors: All Meilisearch vector operations use cosine similarity (dot product on normalized vectors). If your model outputs unnormalized vectors, normalize them client-side.
  • Prefer 512-d for most cases: 1536-d gives marginal recall gains over 512-d text-embedding-3-small for 6x the memory. Benchmark your own dataset.
  • Use userProvided for control: Auto-embedding sources (openAi, huggingFace, rest) add latency at index time and API costs. Pre-compute embeddings when you index in bulk.
  • Set limit conservatively: Vector search does a full HNSW traversal. Higher limits (500+) increase p99 latency significantly. Default limit: 20 is usually sufficient.
  • Combine full-text and vector filters carefully: Filters reduce the candidate pool but do not short-circuit the ANN index. To really limit scope, keep filter selectivity above 1% of the total index.

Resources

Comments

👍 Was this article helpful?