Introduction
Artificial intelligence has transformed how we build search applications. Traditional keyword search, while still valuable, can no longer meet the expectations of users who expect semantic understanding and contextual relevance. Meilisearch has evolved to meet these challenges, offering vector search capabilities that power modern AI applications.
This comprehensive guide explores how to leverage Meilisearch for AI-powered applications. We will cover vector search fundamentals, building Retrieval-Augmented Generation (RAG) pipelines, embedding management, and integration with popular AI frameworks like LangChain.
Understanding Vector Search
Vector search represents a fundamental shift in how search engines find relevant content.
What are Embeddings?
Embeddings are numerical representations of text, images, or other data that capture semantic meaning. These vectors are generated by machine learning models trained to place similar items close together in a high-dimensional space.
Consider these sentences:
- “The cat sat on the mat”
- “A feline rested on the rug”
These sentences have different words but similar meanings. An embedding model would place them close together in vector space, enabling semantic matching.
How Vector Search Works
Vector search finds the nearest vectors to a query vector:
- Convert the query into a vector using an embedding model
- Compare the query vector against stored document vectors
- Return the most similar documents based on distance metrics
Common similarity measures include:
- Cosine similarity - Measures angle between vectors
- Euclidean distance - Straight-line distance
- Dot product - Projection of one vector onto another
Meilisearch Vector Support
Meilisearch supports storing and searching vectors:
# Enable experimental vector store
MEILI_EXPERIMENTAL_VECTOR_STORE=true
Add documents with embeddings:
{
"id": "doc-1",
"title": "Introduction to Machine Learning",
"content": "Machine learning is a subset of artificial intelligence...",
"embedding": [0.123, -0.456, 0.789, 0.012, -0.345, ...]
}
Search semantically:
{
"q": "What is deep learning?",
"hybrid": true,
"semanticRatio": 0.8
}
Building Embeddings
Effective vector search requires high-quality embeddings.
Choosing an Embedding Model
Select models based on your use case:
| Model | Dimensions | Best For | Speed |
|---|---|---|---|
| sentence-transformers/all-MiniLM-L6-v2 | 384 | General purpose | Fast |
| openai/text-embedding-ada-002 | 1536 | Production | Medium |
| cohereembed-multilingual-v3.0 | 1024 | Multi-language | Medium |
| BAAI/bge-large-zh-v1.5 | 1024 | Chinese | Slow |
Generating Embeddings with Python
from sentence_transformers import SentenceTransformer
import json
# Load embedding model
model = SentenceTransformer('sentence-transformers/all-MiniLM-L6-v2')
# Sample documents
documents = [
{"id": "1", "title": "Machine Learning Basics", "content": "Introduction to ML..."},
{"id": "2", "title": "Deep Learning Guide", "content": "Neural networks explained..."},
{"id": "3", "title": "Python Programming", "content": "Learn Python..."}
]
# Generate embeddings
for doc in documents:
text = f"{doc['title']} {doc['content']}"
embedding = model.encode(text).tolist()
doc['embedding'] = embedding
# Save for Meilisearch
with open('documents_with_embeddings.json', 'w') as f:
json.dump(documents, f)
Batch Processing for Large Datasets
For large document collections, process in batches:
from sentence_transformers import SentenceTransformer
from tqdm import tqdm
model = SentenceTransformer('sentence-transformers/all-MiniLM-L6-v2')
def generate_embeddings_batch(documents, batch_size=32):
embeddings = []
texts = [f"{doc['title']} {doc['content']}" for doc in documents]
for i in tqdm(range(0, len(texts), batch_size)):
batch = texts[i:i + batch_size]
batch_embeddings = model.encode(batch)
embeddings.extend(batch_embeddings.tolist())
return embeddings
# Process documents
documents = load_documents() # Your document loading function
embeddings = generate_embeddings_batch(documents)
# Add embeddings to documents
for doc, emb in zip(documents, embeddings):
doc['embedding'] = emb
RAG Pipeline Architecture
Retrieval-Augmented Generation combines the power of LLMs with relevant context from your data.
RAG Overview
RAG improves LLM responses by:
- Retrieving relevant documents from a knowledge base
- Including the retrieved context in the prompt
- Generating responses based on the augmented context
This approach:
- Reduces hallucinations
- Provides up-to-date information
- Enables citation of sources
- Keeps data private
Building a RAG Pipeline with Meilisearch
Step 1: Index Your Knowledge Base
from sentence_transformers import SentenceTransformer
from meilisearch import Client
import json
# Initialize clients
embed_model = SentenceTransformer('sentence-transformers/all-MiniLM-L6-v2')
meili = Client('http://localhost:7700', 'your_master_key')
# Create index for documents
index = meili.index('knowledge_base')
index.update_settings({
'searchableAttributes': ['title', 'content'],
'filterableAttributes': ['category', 'source'],
'sortableAttributes': ['date']
})
# Load and process documents
documents = load_your_documents() # Load your documents
# Generate embeddings
for doc in documents:
text = f"{doc['title']} {doc['content']}"
doc['embedding'] = embed_model.encode(text).tolist()
# Add to Meilisearch
task = index.add_documents(documents)
print(f"Indexed {len(documents)} documents")
Step 2: Retrieve Relevant Context
def retrieve_context(query, top_k=5):
# Generate query embedding
query_embedding = embed_model.encode(query).tolist()
# Search Meilisearch with hybrid search
results = index.search('', {
'filter': 'category = "documentation"',
'limit': top_k,
# Use vector search
})
# Return top documents
return [hit['content'] for hit in results['hits']]
# Test retrieval
query = "How do I configure Meilisearch?"
context = retrieve_context(query)
print(f"Retrieved {len(context)} relevant passages")
Step 3: Generate with LLM
from openai import OpenAI
client = OpenAI(api_key='your_openai_key')
def generate_response(query, context):
# Build augmented prompt
prompt = f"""Based on the following context, answer the question.
Context:
{chr(10).join(context)}
Question: {query}
Answer:"""
# Generate response
response = client.chat.completions.create(
model='gpt-4o',
messages=[{'role': 'user', 'content': prompt}],
temperature=0.7
)
return response.choices[0].message.content
# Full RAG pipeline
query = "How do I configure Meilisearch?"
context = retrieve_context(query)
response = generate_response(query, context)
print(response)
Hybrid Search for RAG
Combine keyword and semantic search for better results:
def hybrid_retrieve(query, top_k=5):
# Generate query embedding
query_embedding = embed_model.encode(query).tolist()
# Search with hybrid approach
results = index.search(query, {
'hybrid': True,
'semanticRatio': 0.7, # 70% semantic, 30% keyword
'limit': top_k,
'attributesToRetrieve': ['title', 'content', 'source'],
'attributesToHighlight': ['content']
})
return results['hits']
This hybrid approach:
- Uses keyword matching for precision
- Uses semantic search for recall
- Balances both for optimal results
Semantic Caching
Reduce costs and latency by caching similar queries.
Why Cache Semantically?
Semantic caching stores:
- Similar queries return cached results
- Reduces API calls to expensive LLMs
- Improves response time for repeated questions
Implementation
from meilisearch import Client
from sentence_transformers import SentenceTransformer
class SemanticCache:
def __init__(self, meili_url, api_key, threshold=0.9):
self.client = Client(meili_url, api_key)
self.embed_model = SentenceTransformer('all-MiniLM-L6-v2')
self.threshold = threshold
self.cache_index = 'semantic_cache'
# Initialize cache index
try:
self.client.create_index(self.cache_index, {'primaryKey': 'id'})
except:
pass
self.cache_index = self.client.index(self.cache_index)
def _generate_cache_id(self, query):
return hashlib.md5(query.encode()).hexdigest()
def get_or_query(self, query, fetch_func):
# Check cache first
cache_key = self._generate_cache_id(query)
try:
results = self.cache_index.search(query, {
'limit': 1,
'attributesToRetrieve': ['response', 'query_hash']
})
if results['hits']:
return results['hits'][0]['response']
except:
pass
# Cache miss - execute function
response = fetch_func(query)
# Store in cache
query_embedding = self.embed_model.encode(query).tolist()
self.cache_index.add_documents([{
'id': cache_key,
'query': query,
'query_hash': cache_key,
'response': response,
'embedding': query_embedding
}])
return response
# Usage
cache = SemanticCache('http://localhost:7700', 'your_key')
def fetch_from_llm(query):
# Your LLM call here
return "Expensive LLM response..."
result = cache.get_or_query("What is Meilisearch?", fetch_from_llm)
Multi-Modal Search
Extend Meilisearch for images and other media.
Image Embeddings
Generate embeddings for images:
from transformers import CLIPModel, CLIPProcessor
from PIL import Image
import torch
# Load CLIP model
model = CLIPModel.from_pretrained("openai/clip-vit-base-patch32")
processor = CLIPProcessor.from_pretrained("openai/clip-vit-base-patch32")
def encode_image(image_path):
image = Image.open(image_path).convert('RGB')
inputs = processor(images=image, return_tensors="pt")
with torch.no_grad():
image_features = model.get_image_features(**inputs)
return image_features.numpy().flatten().tolist()
# Process product images
products = [
{"id": "1", "name": "Red T-Shirt", "image_path": "tshirt.jpg"},
{"id": "2", "name": "Blue Jeans", "image_path": "jeans.jpg"}
]
for product in products:
product['image_embedding'] = encode_image(product['image_path'])
# Index in Meilisearch
index = client.index('products')
index.add_documents(products)
Search Images by Text
def search_images_by_text(query):
# Encode text query
text_features = model.get_text_features(**processor(
text=[query], return_tensors="pt"
))
query_embedding = text_features.numpy().flatten().tolist()
# Search (requires custom implementation or vector-only index)
results = index.search('', {
'vector': query_embedding,
'limit': 10
})
return results['hits']
Integration with LangChain
LangChain provides excellent integration with Meilisearch.
LangChain Retriever
from langchain_community.retrievers import MeilisearchRetriever
from langchain_community.embeddings import HuggingFaceEmbeddings
# Set up embeddings
embeddings = HuggingFaceEmbeddings(
model_name="sentence-transformers/all-MiniLM-L6-v2"
)
# Create retriever
retriever = MeilisearchRetriever(
meilisearch_url="http://localhost:7700",
index_name="documents",
api_key="your_api_key",
embeddings=embeddings,
search_kwargs={"k": 5}
)
# Use directly
docs = retriever.invoke("What is machine learning?")
for doc in docs:
print(doc.page_content)
LangChain RAG Chain
from langchain_openai import ChatOpenAI
from langchain.chains import RetrievalQA
from langchain.prompts import PromptTemplate
# Initialize LLM
llm = ChatOpenAI(model="gpt-4o", temperature=0)
# Create QA chain
qa_chain = RetrievalQA.from_chain_type(
llm=llm,
chain_type="stuff",
retriever=retriever,
return_source_documents=True
)
# Query
result = qa_chain.invoke("How does Meilisearch handle typos?")
print(result['result'])
# Check sources
for doc in result['source_documents']:
print(f"Source: {doc.metadata.get('source', 'Unknown')}")
Custom Meilisearch Loader
Create custom document loaders:
from langchain_community.document_loaders import BaseLoader
from langchain.schema import Document
from meilisearch import Client
class MeilisearchLoader(BaseLoader):
def __init__(self, host, api_key, index_name, embedding_field='embedding'):
self.client = Client(host, api_key)
self.index = self.client.index(index_name)
self.embedding_field = embedding_field
def load(self):
documents = []
results = self.index.get_all_documents()
for hit in results['hits']:
content = hit.get('content', '')
metadata = {k: v for k, v in hit.items()
if k not in [self.embedding_field, 'content']}
documents.append(Document(
page_content=content,
metadata=metadata
))
return documents
# Usage
loader = MeilisearchLoader(
host="http://localhost:7700",
api_key="your_key",
index_name="documents"
)
docs = loader.load()
Production Considerations
Running AI-powered search in production requires careful planning.
Embedding Model Selection
Consider:
- Dimension count - Higher = more precise but more storage
- Model size - Affects latency and cost
- Language support - Choose multilingual if needed
- Latency - Balance quality vs speed
Cost Optimization
AI features can be expensive:
# Cache aggressively
CACHE_TTL = 3600 # 1 hour
# Use smaller models when possible
MODEL = "sentence-transformers/all-MiniLM-L6-v2" # Fast, cheap
# Batch requests
def batch_embed(texts):
return embed_model.encode(texts, batch_size=32)
Scaling Vector Search
For large-scale deployments:
- Shard across instances - Distribute by document ID
- Use approximate nearest neighbors - Trade some accuracy for speed
- Pre-compute embeddings - Generate at indexing time
Monitoring
Track key metrics:
import time
from dataclasses import dataclass
from typing import Optional
@dataclass
class SearchMetrics:
query: str
latency_ms: float
hits: int
cache_hit: bool
def to_dict(self):
return {
'query': self.query,
'latency_ms': self.latency_ms,
'hits': self.hits,
'cache_hit': self.cache_hit,
'timestamp': time.time()
}
def monitored_search(query, index):
start = time.time()
cache_hit = check_cache(query)
if cache_hit:
results = get_cached(query)
else:
results = index.search(query)
store_cache(query, results)
latency = (time.time() - start) * 1000
return SearchMetrics(
query=query,
latency_ms=latency,
hits=len(results['hits']),
cache_hit=cache_hit
)
Best Practices
Follow these guidelines for successful AI search implementations.
Data Preparation
- Clean your data - Remove noise and inconsistencies
- Chunk strategically - Optimal chunk size is typically 500-1000 tokens
- Include context - Add metadata for better retrieval
- Generate quality embeddings - Use appropriate models
Query Optimization
- Use hybrid search - Combine keyword and semantic
- Tune semantic ratio - Adjust based on use case
- Implement caching - Reduce redundant computation
- Monitor latency - Track and optimize slow queries
Evaluation
Regularly evaluate your system:
def evaluate_retrieval(queries, ground_truth, retriever):
results = []
for query in queries:
retrieved = retriever.invoke(query)
retrieved_ids = [doc.metadata['id'] for doc in retrieved]
# Calculate metrics
precision = len(set(retrieved_ids) & set(ground_truth[query])) / len(retrieved_ids)
recall = len(set(retrieved_ids) & set(ground_truth[query])) / len(ground_truth[query])
results.append({
'query': query,
'precision': precision,
'recall': recall
})
return {
'avg_precision': sum(r['precision'] for r in results) / len(results),
'avg_recall': sum(r['recall'] for r in results) / len(results)
}
External Resources
- Meilisearch Vector Search Documentation
- LangChain Documentation
- Sentence Transformers
- OpenAI Embeddings
- CLIP Model
Conclusion
Meilisearch provides a powerful foundation for AI-powered search applications. Its vector search capabilities, combined with traditional keyword search through hybrid approaches, enable sophisticated retrieval systems that can power RAG pipelines, semantic caching, and multi-modal applications.
Key takeaways:
- Vector search enables semantic understanding beyond keywords
- Hybrid search combines the best of both approaches
- RAG pipelines leverage your data with LLMs
- Proper caching reduces costs and improves latency
- Integration with LangChain simplifies AI application development
As AI continues to transform search, Meilisearch’s vector capabilities position it as an excellent choice for modern intelligent applications. The combination of fast keyword search with semantic understanding provides the best of both worlds for most use cases.
In the next article, we will explore real-world production use cases for Meilisearch across different industries and applications.
Comments