Skip to main content
โšก Calmops

Meilisearch for AI: Vector Search, RAG, and Intelligent Applications

Introduction

Artificial intelligence has transformed how we build search applications. Traditional keyword search, while still valuable, can no longer meet the expectations of users who expect semantic understanding and contextual relevance. Meilisearch has evolved to meet these challenges, offering vector search capabilities that power modern AI applications.

This comprehensive guide explores how to leverage Meilisearch for AI-powered applications. We will cover vector search fundamentals, building Retrieval-Augmented Generation (RAG) pipelines, embedding management, and integration with popular AI frameworks like LangChain.

Vector search represents a fundamental shift in how search engines find relevant content.

What are Embeddings?

Embeddings are numerical representations of text, images, or other data that capture semantic meaning. These vectors are generated by machine learning models trained to place similar items close together in a high-dimensional space.

Consider these sentences:

  • “The cat sat on the mat”
  • “A feline rested on the rug”

These sentences have different words but similar meanings. An embedding model would place them close together in vector space, enabling semantic matching.

How Vector Search Works

Vector search finds the nearest vectors to a query vector:

  1. Convert the query into a vector using an embedding model
  2. Compare the query vector against stored document vectors
  3. Return the most similar documents based on distance metrics

Common similarity measures include:

  • Cosine similarity - Measures angle between vectors
  • Euclidean distance - Straight-line distance
  • Dot product - Projection of one vector onto another

Meilisearch Vector Support

Meilisearch supports storing and searching vectors:

# Enable experimental vector store
MEILI_EXPERIMENTAL_VECTOR_STORE=true

Add documents with embeddings:

{
  "id": "doc-1",
  "title": "Introduction to Machine Learning",
  "content": "Machine learning is a subset of artificial intelligence...",
  "embedding": [0.123, -0.456, 0.789, 0.012, -0.345, ...]
}

Search semantically:

{
  "q": "What is deep learning?",
  "hybrid": true,
  "semanticRatio": 0.8
}

Building Embeddings

Effective vector search requires high-quality embeddings.

Choosing an Embedding Model

Select models based on your use case:

Model Dimensions Best For Speed
sentence-transformers/all-MiniLM-L6-v2 384 General purpose Fast
openai/text-embedding-ada-002 1536 Production Medium
cohereembed-multilingual-v3.0 1024 Multi-language Medium
BAAI/bge-large-zh-v1.5 1024 Chinese Slow

Generating Embeddings with Python

from sentence_transformers import SentenceTransformer
import json

# Load embedding model
model = SentenceTransformer('sentence-transformers/all-MiniLM-L6-v2')

# Sample documents
documents = [
    {"id": "1", "title": "Machine Learning Basics", "content": "Introduction to ML..."},
    {"id": "2", "title": "Deep Learning Guide", "content": "Neural networks explained..."},
    {"id": "3", "title": "Python Programming", "content": "Learn Python..."}
]

# Generate embeddings
for doc in documents:
    text = f"{doc['title']} {doc['content']}"
    embedding = model.encode(text).tolist()
    doc['embedding'] = embedding

# Save for Meilisearch
with open('documents_with_embeddings.json', 'w') as f:
    json.dump(documents, f)

Batch Processing for Large Datasets

For large document collections, process in batches:

from sentence_transformers import SentenceTransformer
from tqdm import tqdm

model = SentenceTransformer('sentence-transformers/all-MiniLM-L6-v2')

def generate_embeddings_batch(documents, batch_size=32):
    embeddings = []
    texts = [f"{doc['title']} {doc['content']}" for doc in documents]
    
    for i in tqdm(range(0, len(texts), batch_size)):
        batch = texts[i:i + batch_size]
        batch_embeddings = model.encode(batch)
        embeddings.extend(batch_embeddings.tolist())
    
    return embeddings

# Process documents
documents = load_documents()  # Your document loading function
embeddings = generate_embeddings_batch(documents)

# Add embeddings to documents
for doc, emb in zip(documents, embeddings):
    doc['embedding'] = emb

RAG Pipeline Architecture

Retrieval-Augmented Generation combines the power of LLMs with relevant context from your data.

RAG Overview

RAG improves LLM responses by:

  1. Retrieving relevant documents from a knowledge base
  2. Including the retrieved context in the prompt
  3. Generating responses based on the augmented context

This approach:

  • Reduces hallucinations
  • Provides up-to-date information
  • Enables citation of sources
  • Keeps data private

Building a RAG Pipeline with Meilisearch

Step 1: Index Your Knowledge Base

from sentence_transformers import SentenceTransformer
from meilisearch import Client
import json

# Initialize clients
embed_model = SentenceTransformer('sentence-transformers/all-MiniLM-L6-v2')
meili = Client('http://localhost:7700', 'your_master_key')

# Create index for documents
index = meili.index('knowledge_base')
index.update_settings({
    'searchableAttributes': ['title', 'content'],
    'filterableAttributes': ['category', 'source'],
    'sortableAttributes': ['date']
})

# Load and process documents
documents = load_your_documents()  # Load your documents

# Generate embeddings
for doc in documents:
    text = f"{doc['title']} {doc['content']}"
    doc['embedding'] = embed_model.encode(text).tolist()

# Add to Meilisearch
task = index.add_documents(documents)
print(f"Indexed {len(documents)} documents")

Step 2: Retrieve Relevant Context

def retrieve_context(query, top_k=5):
    # Generate query embedding
    query_embedding = embed_model.encode(query).tolist()
    
    # Search Meilisearch with hybrid search
    results = index.search('', {
        'filter': 'category = "documentation"',
        'limit': top_k,
        # Use vector search
    })
    
    # Return top documents
    return [hit['content'] for hit in results['hits']]

# Test retrieval
query = "How do I configure Meilisearch?"
context = retrieve_context(query)
print(f"Retrieved {len(context)} relevant passages")

Step 3: Generate with LLM

from openai import OpenAI

client = OpenAI(api_key='your_openai_key')

def generate_response(query, context):
    # Build augmented prompt
    prompt = f"""Based on the following context, answer the question.

Context:
{chr(10).join(context)}

Question: {query}

Answer:"""
    
    # Generate response
    response = client.chat.completions.create(
        model='gpt-4o',
        messages=[{'role': 'user', 'content': prompt}],
        temperature=0.7
    )
    
    return response.choices[0].message.content

# Full RAG pipeline
query = "How do I configure Meilisearch?"
context = retrieve_context(query)
response = generate_response(query, context)
print(response)

Hybrid Search for RAG

Combine keyword and semantic search for better results:

def hybrid_retrieve(query, top_k=5):
    # Generate query embedding
    query_embedding = embed_model.encode(query).tolist()
    
    # Search with hybrid approach
    results = index.search(query, {
        'hybrid': True,
        'semanticRatio': 0.7,  # 70% semantic, 30% keyword
        'limit': top_k,
        'attributesToRetrieve': ['title', 'content', 'source'],
        'attributesToHighlight': ['content']
    })
    
    return results['hits']

This hybrid approach:

  • Uses keyword matching for precision
  • Uses semantic search for recall
  • Balances both for optimal results

Semantic Caching

Reduce costs and latency by caching similar queries.

Why Cache Semantically?

Semantic caching stores:

  • Similar queries return cached results
  • Reduces API calls to expensive LLMs
  • Improves response time for repeated questions

Implementation

from meilisearch import Client
from sentence_transformers import SentenceTransformer

class SemanticCache:
    def __init__(self, meili_url, api_key, threshold=0.9):
        self.client = Client(meili_url, api_key)
        self.embed_model = SentenceTransformer('all-MiniLM-L6-v2')
        self.threshold = threshold
        self.cache_index = 'semantic_cache'
        
        # Initialize cache index
        try:
            self.client.create_index(self.cache_index, {'primaryKey': 'id'})
        except:
            pass
        
        self.cache_index = self.client.index(self.cache_index)
    
    def _generate_cache_id(self, query):
        return hashlib.md5(query.encode()).hexdigest()
    
    def get_or_query(self, query, fetch_func):
        # Check cache first
        cache_key = self._generate_cache_id(query)
        
        try:
            results = self.cache_index.search(query, {
                'limit': 1,
                'attributesToRetrieve': ['response', 'query_hash']
            })
            
            if results['hits']:
                return results['hits'][0]['response']
        except:
            pass
        
        # Cache miss - execute function
        response = fetch_func(query)
        
        # Store in cache
        query_embedding = self.embed_model.encode(query).tolist()
        
        self.cache_index.add_documents([{
            'id': cache_key,
            'query': query,
            'query_hash': cache_key,
            'response': response,
            'embedding': query_embedding
        }])
        
        return response

# Usage
cache = SemanticCache('http://localhost:7700', 'your_key')

def fetch_from_llm(query):
    # Your LLM call here
    return "Expensive LLM response..."

result = cache.get_or_query("What is Meilisearch?", fetch_from_llm)

Extend Meilisearch for images and other media.

Image Embeddings

Generate embeddings for images:

from transformers import CLIPModel, CLIPProcessor
from PIL import Image
import torch

# Load CLIP model
model = CLIPModel.from_pretrained("openai/clip-vit-base-patch32")
processor = CLIPProcessor.from_pretrained("openai/clip-vit-base-patch32")

def encode_image(image_path):
    image = Image.open(image_path).convert('RGB')
    inputs = processor(images=image, return_tensors="pt")
    
    with torch.no_grad():
        image_features = model.get_image_features(**inputs)
    
    return image_features.numpy().flatten().tolist()

# Process product images
products = [
    {"id": "1", "name": "Red T-Shirt", "image_path": "tshirt.jpg"},
    {"id": "2", "name": "Blue Jeans", "image_path": "jeans.jpg"}
]

for product in products:
    product['image_embedding'] = encode_image(product['image_path'])

# Index in Meilisearch
index = client.index('products')
index.add_documents(products)

Search Images by Text

def search_images_by_text(query):
    # Encode text query
    text_features = model.get_text_features(**processor(
        text=[query], return_tensors="pt"
    ))
    query_embedding = text_features.numpy().flatten().tolist()
    
    # Search (requires custom implementation or vector-only index)
    results = index.search('', {
        'vector': query_embedding,
        'limit': 10
    })
    
    return results['hits']

Integration with LangChain

LangChain provides excellent integration with Meilisearch.

LangChain Retriever

from langchain_community.retrievers import MeilisearchRetriever
from langchain_community.embeddings import HuggingFaceEmbeddings

# Set up embeddings
embeddings = HuggingFaceEmbeddings(
    model_name="sentence-transformers/all-MiniLM-L6-v2"
)

# Create retriever
retriever = MeilisearchRetriever(
    meilisearch_url="http://localhost:7700",
    index_name="documents",
    api_key="your_api_key",
    embeddings=embeddings,
    search_kwargs={"k": 5}
)

# Use directly
docs = retriever.invoke("What is machine learning?")
for doc in docs:
    print(doc.page_content)

LangChain RAG Chain

from langchain_openai import ChatOpenAI
from langchain.chains import RetrievalQA
from langchain.prompts import PromptTemplate

# Initialize LLM
llm = ChatOpenAI(model="gpt-4o", temperature=0)

# Create QA chain
qa_chain = RetrievalQA.from_chain_type(
    llm=llm,
    chain_type="stuff",
    retriever=retriever,
    return_source_documents=True
)

# Query
result = qa_chain.invoke("How does Meilisearch handle typos?")
print(result['result'])

# Check sources
for doc in result['source_documents']:
    print(f"Source: {doc.metadata.get('source', 'Unknown')}")

Custom Meilisearch Loader

Create custom document loaders:

from langchain_community.document_loaders import BaseLoader
from langchain.schema import Document
from meilisearch import Client

class MeilisearchLoader(BaseLoader):
    def __init__(self, host, api_key, index_name, embedding_field='embedding'):
        self.client = Client(host, api_key)
        self.index = self.client.index(index_name)
        self.embedding_field = embedding_field
    
    def load(self):
        documents = []
        results = self.index.get_all_documents()
        
        for hit in results['hits']:
            content = hit.get('content', '')
            metadata = {k: v for k, v in hit.items() 
                      if k not in [self.embedding_field, 'content']}
            
            documents.append(Document(
                page_content=content,
                metadata=metadata
            ))
        
        return documents

# Usage
loader = MeilisearchLoader(
    host="http://localhost:7700",
    api_key="your_key",
    index_name="documents"
)

docs = loader.load()

Production Considerations

Running AI-powered search in production requires careful planning.

Embedding Model Selection

Consider:

  1. Dimension count - Higher = more precise but more storage
  2. Model size - Affects latency and cost
  3. Language support - Choose multilingual if needed
  4. Latency - Balance quality vs speed

Cost Optimization

AI features can be expensive:

# Cache aggressively
CACHE_TTL = 3600  # 1 hour

# Use smaller models when possible
MODEL = "sentence-transformers/all-MiniLM-L6-v2"  # Fast, cheap

# Batch requests
def batch_embed(texts):
    return embed_model.encode(texts, batch_size=32)

For large-scale deployments:

  1. Shard across instances - Distribute by document ID
  2. Use approximate nearest neighbors - Trade some accuracy for speed
  3. Pre-compute embeddings - Generate at indexing time

Monitoring

Track key metrics:

import time
from dataclasses import dataclass
from typing import Optional

@dataclass
class SearchMetrics:
    query: str
    latency_ms: float
    hits: int
    cache_hit: bool
    
    def to_dict(self):
        return {
            'query': self.query,
            'latency_ms': self.latency_ms,
            'hits': self.hits,
            'cache_hit': self.cache_hit,
            'timestamp': time.time()
        }

def monitored_search(query, index):
    start = time.time()
    cache_hit = check_cache(query)
    
    if cache_hit:
        results = get_cached(query)
    else:
        results = index.search(query)
        store_cache(query, results)
    
    latency = (time.time() - start) * 1000
    
    return SearchMetrics(
        query=query,
        latency_ms=latency,
        hits=len(results['hits']),
        cache_hit=cache_hit
    )

Best Practices

Follow these guidelines for successful AI search implementations.

Data Preparation

  1. Clean your data - Remove noise and inconsistencies
  2. Chunk strategically - Optimal chunk size is typically 500-1000 tokens
  3. Include context - Add metadata for better retrieval
  4. Generate quality embeddings - Use appropriate models

Query Optimization

  1. Use hybrid search - Combine keyword and semantic
  2. Tune semantic ratio - Adjust based on use case
  3. Implement caching - Reduce redundant computation
  4. Monitor latency - Track and optimize slow queries

Evaluation

Regularly evaluate your system:

def evaluate_retrieval(queries, ground_truth, retriever):
    results = []
    
    for query in queries:
        retrieved = retriever.invoke(query)
        retrieved_ids = [doc.metadata['id'] for doc in retrieved]
        
        # Calculate metrics
        precision = len(set(retrieved_ids) & set(ground_truth[query])) / len(retrieved_ids)
        recall = len(set(retrieved_ids) & set(ground_truth[query])) / len(ground_truth[query])
        
        results.append({
            'query': query,
            'precision': precision,
            'recall': recall
        })
    
    return {
        'avg_precision': sum(r['precision'] for r in results) / len(results),
        'avg_recall': sum(r['recall'] for r in results) / len(results)
    }

External Resources

Conclusion

Meilisearch provides a powerful foundation for AI-powered search applications. Its vector search capabilities, combined with traditional keyword search through hybrid approaches, enable sophisticated retrieval systems that can power RAG pipelines, semantic caching, and multi-modal applications.

Key takeaways:

  • Vector search enables semantic understanding beyond keywords
  • Hybrid search combines the best of both approaches
  • RAG pipelines leverage your data with LLMs
  • Proper caching reduces costs and improves latency
  • Integration with LangChain simplifies AI application development

As AI continues to transform search, Meilisearch’s vector capabilities position it as an excellent choice for modern intelligent applications. The combination of fast keyword search with semantic understanding provides the best of both worlds for most use cases.

In the next article, we will explore real-world production use cases for Meilisearch across different industries and applications.

Comments