Building Hybrid Search with Rust: Combining Keyword and Semantic Search for Better Results

In the rapidly evolving landscape of information retrieval, traditional keyword-based search methods are being augmented by advanced semantic understanding powered by Large Language Models (LLMs). While Semantic Search (also known as Vector Search) has revolutionized how we find information by understanding context and intent, it often falls short in scenarios requiring precise, exact matches. This is where Hybrid Search emerges as the optimal solution, blending the precision of keyword search with the contextual depth of semantic search.

This comprehensive guide will walk you through building a robust hybrid search system using Rust, a systems programming language renowned for its performance, memory safety, and concurrency features. We’ll explore the theoretical foundations, practical implementation, and real-world considerations for creating production-ready search solutions.

Understanding Core Concepts and Terminology

Before diving into implementation, let’s establish a solid foundation by defining the key terms and concepts that form the backbone of hybrid search systems.

1.1 Keyword Search (Lexical Search)

Keyword Search, also known as lexical search, operates on the principle of exact word matching and term frequency analysis. It relies on traditional information retrieval techniques like BM25 (Best Matching 25), which scores documents based on:

Term Frequency (TF): How often a search term appears in a document
Inverse Document Frequency (IDF): How rare a term is across the entire document collection

BM25 is an improvement over the older TF-IDF algorithm, incorporating document length normalization to prevent bias toward longer documents.

Advantages: Excellent for exact matches, product codes, technical terms, and structured data queries.

Limitations: Cannot understand synonyms, context, or conceptual relationships.

1.2 Semantic Search (Vector Search)

Semantic Search leverages embeddings - dense vector representations of text that capture semantic meaning. These vectors are generated using transformer models like BERT, RoBERTa, or specialized embedding models such as BGE (BAAI General Embedding).

Key Components:

Embeddings: Numerical vectors (typically 384-1024 dimensions) representing text semantics
Vector Similarity: Measured using metrics like Cosine Similarity, Euclidean Distance, or Dot Product
Dense Retrieval: Finding relevant documents by comparing vector similarities

Advantages: Understands intent, handles synonyms, and captures contextual relationships.

Limitations: Computationally expensive, may miss exact matches, and requires significant resources for large-scale deployment.

1.3 Hybrid Search

Hybrid Search combines both approaches by:

Executing parallel searches using both methods
Applying fusion algorithms to merge and rerank results
Reciprocal Rank Fusion (RRF) being the most popular fusion technique

1.4 Reciprocal Rank Fusion (RRF)

RRF is a simple yet effective method for combining ranked lists from different retrieval systems. Unlike score-based fusion (which requires normalization), RRF uses position-based ranking:

RRF Score = Σ(1 / (k + r_i))

Where:

k is a constant (typically 60) that dampens the contribution of lower-ranked items
r_i is the rank position in the i-th retrieval system

Why RRF? It works well because:

No need to normalize different scoring scales
Mathematically sound and empirically validated
Computationally efficient

1.5 Large Language Models (LLMs) and Retrieval-Augmented Generation (RAG)

LLMs like GPT-4 or Claude are powerful language models trained on vast amounts of text data. RAG combines retrieval systems with LLMs to provide accurate, up-to-date responses by grounding model outputs in retrieved documents.

Hybrid search is particularly crucial for RAG applications, ensuring both precision and comprehensiveness.

The Architecture of Hybrid Search Systems

A well-designed hybrid search architecture consists of several interconnected components working in harmony:

┌─────────────────┐    ┌─────────────────┐
│   User Query    │───▶│  Query Router  │
└─────────────────┘    └─────────────────┘
                              │
                    ┌─────────┼─────────┐
                    │         │         │
            ┌───────▼──────┐  │  ┌──────▼──────┐
            │Keyword Search│  │  │Semantic     │
            │   Engine     │  │  │Search Engine│
            │   (Tantivy)  │  │  │ (Vector DB) │
            └───────▲──────┘  │  └──────▲──────┘
                    │         │         │
                    └─────────┼─────────┘
                              │
                    ┌─────────▼─────────┐
                    │   Result Fusion  │
                    │     (RRF)        │
                    └─────────▲─────────┘
                              │
                    ┌─────────▼─────────┐
                    │   Reranked        │
                    │   Results         │
                    └──────────────────┘

Component Breakdown:

Query Router: Determines whether to route queries to keyword, semantic, or hybrid search based on query characteristics
Keyword Search Engine: Handles exact matches and structured queries
Semantic Search Engine: Processes natural language queries and conceptual searches
Result Fusion Module: Applies RRF or other fusion algorithms to combine results
Reranking Layer: May include additional ML models for fine-tuning result order

The Rust Ecosystem for Search Implementation

Rust’s rich ecosystem provides excellent tools for building high-performance search systems:

3.1 Tantivy: The Full-Text Search Engine

Tantivy is Rust’s answer to Apache Lucene, offering:

High Performance: Written in Rust with zero-copy deserialization
BM25 Scoring: Industry-standard relevance scoring
Custom Tokenizers: Support for various languages and text processing needs
Real-time Indexing: Add documents and search immediately

3.2 Vector Databases

Qdrant: Qdrant is a vector database with excellent Rust bindings, featuring:

High-dimensional vectors: Support for embeddings up to 65,536 dimensions
Advanced filtering: Combine vector similarity with metadata filters
Distributed deployment: Horizontal scaling for large datasets

LanceDB: LanceDB offers:

Embedded mode: Run directly in your application without external services
SQL interface: Query vectors using familiar SQL syntax
Rust-native: Written in Rust for seamless integration

3.3 Embedding Libraries

Fastembed: Fastembed provides:

Local inference: Generate embeddings without API calls
Multiple models: Support for various embedding models
ONNX runtime: Efficient CPU inference
Quantization: Reduced memory footprint

Detailed Implementation Guide

Let’s build a complete hybrid search system step by step.

4.1 Project Setup and Dependencies

First, create a new Rust project and add the necessary dependencies:

// Cargo.toml
[package]
name = "hybrid-search-engine"
version = "0.1.0"
edition = "2021"

[dependencies]
tantivy = "0.21"
fastembed = "2.1"
qdrant-client = "1.7"
tokio = { version = "1.0", features = ["full"] }
serde = { version = "1.0", features = ["derive"] }
serde_json = "1.0"
anyhow = "1.0"

4.2 Setting Up the Document Schema

Define a comprehensive document structure that supports both keyword and semantic search:

use serde::{Deserialize, Serialize};
use tantivy::schema::*;

#[derive(Debug, Clone, Serialize, Deserialize)]
pub struct Document {
    pub id: String,
    pub title: String,
    pub content: String,
    pub category: String,
    pub tags: Vec<String>,
    pub created_at: chrono::DateTime<chrono::Utc>,
}

pub fn create_schema() -> Schema {
    let mut schema_builder = Schema::builder();

    // Stored fields for retrieval
    schema_builder.add_text_field("id", STRING | STORED);
    schema_builder.add_text_field("title", TEXT | STORED);
    schema_builder.add_text_field("content", TEXT);
    schema_builder.add_text_field("category", STRING | STORED | FAST);
    schema_builder.add_text_field("tags", STRING | STORED);

    // Index for full-text search
    let title_field = schema_builder.add_text_field("title_index", TEXT);
    let content_field = schema_builder.add_text_field("content_index", TEXT);

    schema_builder.build()
}

4.3 Implementing Keyword Search with Tantivy

Create a comprehensive keyword search module:

use tantivy::{Index, IndexWriter, IndexReader, Document:tantivy::Document};
use tantivy::query::QueryParser;
use tantivy::collector::TopDocs;
use std::path::Path;

pub struct KeywordSearchEngine {
    index: Index,
    reader: IndexReader,
    query_parser: QueryParser,
}

impl KeywordSearchEngine {
    pub fn new(index_path: &Path) -> anyhow::Result<Self> {
        let schema = create_schema();
        let index = if index_path.exists() {
            Index::open_in_dir(index_path)?
        } else {
            std::fs::create_dir_all(index_path)?;
            Index::create_in_dir(index_path, schema.clone())?
        };

        let reader = index.reader()?;
        let title_field = schema.get_field("title_index").unwrap();
        let content_field = schema.get_field("content_index").unwrap();
        let mut query_parser = QueryParser::for_index(&index, vec![title_field, content_field]);
        query_parser.set_conjunction_by_default();

        Ok(Self {
            index,
            reader,
            query_parser,
        })
    }

    pub fn add_document(&self, doc: &Document) -> anyhow::Result<()> {
        let mut index_writer = self.index.writer(50_000_000)?;
        let mut tantivy_doc = tantivy::Document::default();

        tantivy_doc.add_text(self.index.schema().get_field("id").unwrap(), &doc.id);
        tantivy_doc.add_text(self.index.schema().get_field("title").unwrap(), &doc.title);
        tantivy_doc.add_text(self.index.schema().get_field("content").unwrap(), &doc.content);
        tantivy_doc.add_text(self.index.schema().get_field("category").unwrap(), &doc.category);
        for tag in &doc.tags {
            tantivy_doc.add_text(self.index.schema().get_field("tags").unwrap(), tag);
        }

        // Add to full-text index
        tantivy_doc.add_text(self.index.schema().get_field("title_index").unwrap(), &doc.title);
        tantivy_doc.add_text(self.index.schema().get_field("content_index").unwrap(), &doc.content);

        index_writer.add_document(tantivy_doc)?;
        index_writer.commit()?;

        Ok(())
    }

    pub fn search(&self, query: &str, limit: usize) -> anyhow::Result<Vec<(f32, Document)>> {
        let searcher = self.reader.searcher();
        let parsed_query = self.query_parser.parse_query(query)?;
        let top_docs = searcher.search(&parsed_query, &TopDocs::with_limit(limit))?;

        let mut results = Vec::new();
        for (score, doc_address) in top_docs {
            let retrieved_doc = searcher.doc(doc_address)?;
            let id = retrieved_doc.get_first(self.index.schema().get_field("id").unwrap())
                .and_then(|f| f.as_text()).unwrap_or("").to_string();
            let title = retrieved_doc.get_first(self.index.schema().get_field("title").unwrap())
                .and_then(|f| f.as_text()).unwrap_or("").to_string();
            let content = retrieved_doc.get_first(self.index.schema().get_field("content").unwrap())
                .and_then(|f| f.as_text()).unwrap_or("").to_string();
            let category = retrieved_doc.get_first(self.index.schema().get_field("category").unwrap())
                .and_then(|f| f.as_text()).unwrap_or("").to_string();

            let tags = retrieved_doc.get_all(self.index.schema().get_field("tags").unwrap())
                .filter_map(|f| f.as_text().map(|s| s.to_string()))
                .collect();

            results.push((score, Document {
                id,
                title,
                content,
                category,
                tags,
                created_at: chrono::Utc::now(), // In real implementation, store and retrieve this
            }));
        }

        Ok(results)
    }
}

4.4 Implementing Semantic Search with Qdrant

Set up the vector search component:

use qdrant_client::prelude::*;
use fastembed::{TextEmbedding, InitOptions, EmbeddingModel};

pub struct SemanticSearchEngine {
    client: QdrantClient,
    collection_name: String,
    embedding_model: TextEmbedding,
}

impl SemanticSearchEngine {
    pub async fn new(qdrant_url: &str, collection_name: &str) -> anyhow::Result<Self> {
        let client = QdrantClient::from_url(qdrant_url).build()?;
        let embedding_model = TextEmbedding::try_new(InitOptions {
            model_name: EmbeddingModel::BGESmallENV15,
            show_download_progress: true,
            ..Default::default()
        })?;

        // Create collection if it doesn't exist
        if !client.collection_exists(collection_name).await? {
            client.create_collection(&CreateCollection {
                collection_name: collection_name.to_string(),
                vectors_config: Some(VectorsConfig {
                    config: Some(Config::Params(VectorParams {
                        size: 384, // BGE-Small-EN-V1.5 dimension
                        distance: Distance::Cosine.into(),
                        ..Default::default()
                    })),
                }),
                ..Default::default()
            }).await?;
        }

        Ok(Self {
            client,
            collection_name: collection_name.to_string(),
            embedding_model,
        })
    }

    pub async fn add_document(&self, doc: &Document) -> anyhow::Result<()> {
        let embedding = self.embedding_model.embed(vec![&format!("{} {}", doc.title, doc.content)], None)?
            .into_iter().next().unwrap();

        let point = PointStruct::new(
            doc.id.clone(),
            embedding,
            serde_json::json!({
                "title": doc.title,
                "category": doc.category,
                "tags": doc.tags,
                "created_at": doc.created_at.to_rfc3339()
            }).try_into()?,
        );

        self.client.upsert_points(&self.collection_name, vec![point], None).await?;
        Ok(())
    }

    pub async fn search(&self, query: &str, limit: usize) -> anyhow::Result<Vec<(f32, Document)>> {
        let query_embedding = self.embedding_model.embed(vec![query], None)?
            .into_iter().next().unwrap();

        let search_result = self.client.search_points(&SearchPoints {
            collection_name: self.collection_name.clone(),
            vector: query_embedding,
            limit: limit as u64,
            with_payload: Some(true.into()),
            ..Default::default()
        }).await?;

        let mut results = Vec::new();
        for point in search_result.result {
            let payload = point.payload;
            let title = payload.get("title").and_then(|v| v.as_str()).unwrap_or("").to_string();
            let category = payload.get("category").and_then(|v| v.as_str()).unwrap_or("").to_string();
            let tags: Vec<String> = payload.get("tags")
                .and_then(|v| v.as_array())
                .map(|arr| arr.iter().filter_map(|v| v.as_str().map(|s| s.to_string())).collect())
                .unwrap_or_default();

            results.push((point.score, Document {
                id: point.id.as_str().unwrap_or("").to_string(),
                title,
                content: "".to_string(), // Payload doesn't include full content for brevity
                category,
                tags,
                created_at: chrono::Utc::now(),
            }));
        }

        Ok(results)
    }
}

4.5 Implementing Result Fusion with RRF

Create the fusion logic:

pub struct HybridSearchEngine {
    keyword_engine: KeywordSearchEngine,
    semantic_engine: SemanticSearchEngine,
}

impl HybridSearchEngine {
    pub fn new(keyword_engine: KeywordSearchEngine, semantic_engine: SemanticSearchEngine) -> Self {
        Self {
            keyword_engine,
            semantic_engine,
        }
    }

    pub async fn hybrid_search(&self, query: &str, limit: usize) -> anyhow::Result<Vec<(f32, Document)>> {
        // Execute searches in parallel
        let (keyword_results, semantic_results) = tokio::join!(
            tokio::spawn(async move { self.keyword_engine.search(query, limit * 2).await }),
            self.semantic_engine.search(query, limit * 2)
        );

        let keyword_results = keyword_results??;
        let semantic_results = semantic_results?;

        // Create a map of document IDs to their ranks
        let mut keyword_ranks = std::collections::HashMap::new();
        for (rank, (_, doc)) in keyword_results.iter().enumerate() {
            keyword_ranks.insert(doc.id.clone(), rank);
        }

        let mut semantic_ranks = std::collections::HashMap::new();
        for (rank, (_, doc)) in semantic_results.iter().enumerate() {
            semantic_ranks.insert(doc.id.clone(), rank);
        }

        // Calculate RRF scores
        let mut rrf_scores = std::collections::HashMap::new();
        let k = 60.0;

        // Collect all unique document IDs
        let mut all_docs = std::collections::HashMap::new();
        for (_, doc) in &keyword_results {
            all_docs.insert(doc.id.clone(), doc.clone());
        }
        for (_, doc) in &semantic_results {
            all_docs.insert(doc.id.clone(), doc.clone());
        }

        for (doc_id, doc) in &all_docs {
            let keyword_rank = keyword_ranks.get(doc_id).copied();
            let semantic_rank = semantic_ranks.get(doc_id).copied();

            let mut score = 0.0;
            if let Some(rank) = keyword_rank {
                score += 1.0 / (k + rank as f32);
            }
            if let Some(rank) = semantic_rank {
                score += 1.0 / (k + rank as f32);
            }

            rrf_scores.insert(doc_id.clone(), score);
        }

        // Sort by RRF score and return top results
        let mut sorted_results: Vec<_> = rrf_scores.into_iter()
            .filter_map(|(id, score)| all_docs.get(&id).map(|doc| (score, doc.clone())))
            .collect();

        sorted_results.sort_by(|a, b| b.0.partial_cmp(&a.0).unwrap());
        sorted_results.truncate(limit);

        Ok(sorted_results)
    }
}

Common Pitfalls and Best Practices

5.1 Pitfalls to Avoid

Score Normalization Issues: Never directly add BM25 and cosine similarity scores - they operate on different scales.
Query Routing Problems: Avoid hard-coded thresholds for determining when to use keyword vs. semantic search. Use query analysis instead.
Embedding Model Mismatch: Ensure the same embedding model is used for indexing and querying.
Memory Inefficiency: Vector databases can consume significant memory. Use quantization and proper indexing strategies.
Cold Start Problem: New documents may not appear in semantic search until embeddings are generated.

5.2 Best Practices

Query Analysis: Implement query classification to route queries appropriately:
- Short queries (< 3 words): Favor keyword search
- Natural language queries: Use hybrid search
- Exact phrases: Prioritize keyword search
Caching Strategy: Cache frequently searched embeddings and results.
Incremental Updates: Implement efficient mechanisms for adding new documents without full reindexing.
Monitoring and Metrics: Track query latency, result quality, and system performance.
A/B Testing: Regularly test different fusion algorithms and parameters.

Pros and Cons: Hybrid Search vs. Alternatives

6.1 Advantages of Hybrid Search

Superior Relevance: Combines precision and recall for better user experience
Robustness: Works well across diverse query types
Scalability: Can leverage existing infrastructure investments
Cost-Effective: Better performance than pure semantic search for many use cases

6.2 Disadvantages

Complexity: Requires managing multiple search systems
Latency: Parallel execution adds some overhead
Tuning Required: RRF parameters need optimization for specific domains
Resource Intensive: Requires both text indexing and vector storage

6.3 Comparison with Alternatives

Pure Keyword Search:

Pros: Simple, fast, exact matches
Cons: Poor semantic understanding
Best for: Structured data, exact term matching

Pure Semantic Search:

Pros: Excellent for natural language, handles synonyms
Cons: Expensive, may miss exact matches
Best for: Conversational interfaces, exploratory search

Hybrid Search:

Pros: Balanced approach, best of both worlds
Cons: Most complex to implement
Best for: Production applications requiring high accuracy

Deployment Architecture

For production deployment, consider this architecture:

┌─────────────────┐    ┌─────────────────┐
│   Load Balancer │    │   API Gateway   │
│    (Nginx/HA)   │    │   (Kong/Traefik)│
└─────────┬───────┘    └─────────┬───────┘
          │                      │
          └─────────┬────────────┘
                    │
          ┌─────────▼─────────┐
          │   Hybrid Search   │
          │     Service       │
          │     (Rust)        │
          └─────────┬─────────┘
                    │
          ┌─────────┼─────────┐
          │         │         │
┌─────────▼──────┐  │  ┌──────▼─────────┐
│  Tantivy Index │  │  │   Vector DB    │
│   (Local/SSD)  │  │  │  (Qdrant/Lance)│
└────────────────┘  │  └────────────────┘
                    │
          ┌─────────▼─────────┐
          │   Embedding       │
          │   Service         │
          │  (Fastembed)      │
          └───────────────────┘

Key Considerations:

Horizontal Scaling: Use Kubernetes for container orchestration
Data Persistence: Implement backup strategies for indexes and vectors
Monitoring: Integrate with Prometheus and Grafana for observability
Caching: Use Redis for query result caching

Further Resources and Alternative Technologies

7.1 Learning Resources

Books:
- “Information Retrieval: Implementing and Evaluating Search Engines” by Stefan Büttcher et al.
- “Deep Learning for Search” by Tommaso Teofili
Online Courses:
- Stanford CS 276: Information Retrieval
- DeepLearning.AI’s Vector Databases course
Research Papers:
- “Reciprocal Rank Fusion outperforms Condorcet and individual Rank Learning Methods” (Cormack et al., 2009)
- “Dense Passage Retrieval for Open-Domain Question Answering” (Karpukhin et al., 2020)

7.2 Alternative Technologies

Search Engines:

Elasticsearch: Mature, feature-rich, but resource-intensive
Meilisearch: Fast, easy to use, good for small to medium projects
Typesense: Open-source, typo-tolerant, real-time search

Vector Databases:

Pinecone: Managed vector database with excellent performance
Weaviate: Open-source with GraphQL interface
Milvus: High-performance, distributed vector database

Embedding Models:

OpenAI Embeddings: High quality, but API-dependent
Sentence Transformers: Python library with many pre-trained models
Cohere Embed: Commercial embedding service

Hybrid Search Frameworks:

LangChain: Python framework with hybrid search capabilities
LlamaIndex: Data framework with hybrid retrieval
Vespa: Open-source big data serving engine with hybrid search

7.3 Community and Tools

Rust Search Ecosystem: Follow Tantivy and Fastembed
Vector Search Community: Join discussions on VectorDB and Pinecone blogs
Research Updates: Follow papers on arXiv with keywords “hybrid search” and “dense retrieval”

Conclusion

Hybrid search represents the current state-of-the-art in information retrieval, offering a compelling balance between precision and semantic understanding. By leveraging Rust’s performance characteristics and its mature search ecosystem, developers can build systems that scale efficiently while maintaining high relevance.

The implementation we’ve explored provides a solid foundation that can be extended with advanced features like query expansion, result diversification, and machine learning-based reranking. As the field continues to evolve, staying updated with the latest research and tools will be crucial for maintaining competitive search experiences.

Remember, the key to successful hybrid search implementation lies not just in the technology stack, but in understanding your users’ needs and continuously optimizing based on real-world usage patterns. With Rust’s reliability and performance, you’re well-equipped to build search systems that can handle the demands of modern applications.