Skip to main content
โšก Calmops

Vector Databases: Pinecone vs Milvus vs Qdrant - Complete Comparison

Introduction

Vector databases have become essential infrastructure for AI applications, powering semantic search, retrieval-augmented generation (RAG), recommendation systems, and similarity matching. As organizations build AI-powered products, choosing the right vector database impacts performance, scalability, and development velocity.

This guide compares three leading vector databases: Pinecone, Milvus, and Qdrant. Each offers distinct approaches to vector search with different trade-offs around deployment options, scalability, and features.

Before diving into the databases, let’s understand the core concepts:

  • Embeddings: Dense numerical representations of data (text, images, audio) generated by ML models
  • Vector Search: Finding similar items based on embedding similarity using metrics like cosine similarity, Euclidean distance, or dot product
  • Approximate Nearest Neighbor (ANN): Algorithms that trade some accuracy for speed in large-scale searches
# Example: Generating embeddings and storing in vector DB
from sentence_transformers import SentenceTransformer
import pinecone

# Generate embeddings
model = SentenceTransformer('all-MiniLM-L6-v2')
texts = [
    "Machine learning is transforming software development",
    "Deep learning enables breakthrough AI capabilities",
    "Vector databases power semantic search applications"
]

embeddings = model.encode(texts)
print(f"Embedding dimension: {embeddings[0].shape}")  # (384,)

What is Pinecone?

Pinecone is a managed vector database designed for simplicity and scalability. It offers a cloud-native, serverless architecture that handles infrastructure automatically.

Key Features

  • Fully Managed: No server provisioning or infrastructure management
  • Serverless Option: Pay only for storage and compute used
  • Metadata Filtering: Filter by metadata before or after vector search
  • Real-time Index Updates: Add, update, and delete vectors without rebuilding
  • Hybrid Search: Combine dense (embedding) and sparse (keyword) search

Pinecone Best Practices

from pinecone import Pinecone, ServerlessSpec
import os

# Initialize Pinecone
pc = Pinecone(api_key=os.environ["PINECONE_API_KEY"])

# Create index with serverless spec
pc.create_index(
    name="semantic-search",
    dimension=384,
    metric="cosine",
    spec=ServerlessSpec(
        cloud="aws",
        region="us-west-2"
    )
)

# Connect to index
index = pc.Index("semantic-search")

# Upsert vectors with metadata
vectors = [
    {
        "id": "doc-001",
        "values": [0.1, 0.2, 0.3, ...],  # 384-dim embedding
        "metadata": {
            "title": "ML Guide",
            "category": "technology",
            "published_date": "2025-01-15",
            "author": "Jane Smith"
        }
    },
    {
        "id": "doc-002",
        "values": [0.4, 0.5, 0.6, ...],
        "metadata": {
            "title": "AI Fundamentals",
            "category": "technology",
            "published_date": "2025-02-20",
            "author": "John Doe"
        }
    }
]

index.upsert(vectors=vectors, namespace="documents")

# Query with metadata filtering
query_results = index.query(
    vector=[0.1, 0.2, 0.3, ...],
    top_k=10,
    include_metadata=True,
    filter={
        "category": {"$eq": "technology"},
        "published_date": {"$gte": "2025-01-01"}
    }
)

for result in query_results['matches']:
    print(f"Score: {result['score']:.4f}")
    print(f"Title: {result['metadata']['title']}")
# Pinecone hybrid search (dense + sparse)
index = pc.Index("hybrid-search")

# Prepare sparse vectors (BM25-style)
sparse_vector = {
    "indices": [100, 200, 300],
    "values": [0.5, 0.3, 0.2]
}

# Dense embedding
dense_vector = [0.1, 0.2, 0.3, ...]

# Query with hybrid search
results = index.query(
    vector=dense_vector,
    sparse_vector=sparse_vector,
    top_k=10,
    alpha=0.5  # Balance between dense (1.0) and sparse (0.0)
)

What is Milvus?

Milvus is an open-source vector database originally developed by Zilliz and now a top-level project at LF AI & Data Foundation. It offers both standalone and distributed deployment options.

Key Features

  • Open Source: Apache 2.0 licensed, self-hostable
  • Distributed Architecture: Scale horizontally for massive datasets
  • Multiple Index Types: IVF, HNSW, ANNOY, DiskANN, and more
  • Rich Data Types: Support for binary, sparse, and dense vectors
  • Time Travel: Query historical data states
  • Multi-tenancy: Built-in support for multiple users/tenants

Milvus Best Practices

from pymilvus import connections, Collection, FieldSchema, CollectionSchema, DataType, utility

# Connect to Milvus
connections.connect(host="localhost", port="19530")

# Define schema
fields = [
    FieldSchema(name="id", dtype=DataType.INT64, is_primary=True, auto_id=True),
    FieldSchema(name="embedding", dtype=DataType.FLOAT_VECTOR, dim=384),
    FieldSchema(name="text", dtype=DataType.VARCHAR, max_length=10000),
    FieldSchema(name="category", dtype=DataType.VARCHAR, max_length=100),
    FieldSchema(name="timestamp", dtype=DataType.INT64)
]

schema = CollectionSchema(fields=fields, description="Document collection")

# Create collection
collection = Collection(name="documents", schema=schema)

# Create index for efficient search
index_params = {
    "index_type": "HNSW",
    "metric_type": "L2",
    "params": {
        "M": 16,
        "efConstruction": 256
    }
}

collection.create_index(field_name="embedding", index_params=index_params)

# Load collection into memory
collection.load()

# Insert data
data = [
    [[0.1, 0.2, 0.3, ...]],  # embeddings
    ["Machine learning guide", "AI fundamentals"],  # text
    ["technology", "technology"],  # category
    [1704067200, 1704153600]  # timestamps
]

insert_result = collection.insert(data)
print(f"Inserted {insert_result.insert_count} vectors")

# Search
search_params = {"metric_type": "L2", "params": {"ef": 64}}

query_embedding = [[0.1, 0.2, 0.3, ...]]

results = collection.search(
    data=query_embedding,
    anns_field="embedding",
    param=search_params,
    limit=10,
    expr='category == "technology"',
    output_fields=["text", "category", "timestamp"]
)

for hit in results[0]:
    print(f"ID: {hit.id}, Distance: {hit.distance}")
    print(f"Text: {hit.entity.get('text')}")

Milvus Time Travel

# Query historical data state
collection.load(partition_names=["partition_2024"])

# Query at specific timestamp (time travel)
results = collection.search(
    data=query_embedding,
    anns_field="embedding",
    param=search_params,
    limit=10,
    output_fields=["text"],
    travel_timestamp=1704153600  # Query at specific time
)

What is Qdrant?

Qdrant is an open-source vector search engine written in Rust. It emphasizes performance, developer experience, and ease of use.

Key Features

  • High Performance: Written in Rust for memory safety and speed
  • Flexible Deployment: Docker, Kubernetes, or cloud
  • Payload Storage: Rich metadata alongside vectors
  • Filtering: Powerful filtering with payload conditions
  • RESTful API: Easy integration with any language
  • Quantization: Support for binary and product quantization

Qdrant Best Practices

from qdrant_client import QdrantClient, models

# Initialize client
client = QdrantClient(host="localhost", port=6333)

# Create collection with HNSW index
client.create_collection(
    collection_name="documents",
    vectors_config=models.VectorParams(
        size=384,
        distance=models.Distance.COSINE,
        hnsw_config=models.HnswConfig(
            m=16,
            ef_construct=256,
            full_scan_threshold=10000
        ),
        quantization_config=models.QuantizationConfig(
            type=models.QuantizationType.Binary,
            ratio=0.8
        )
    ),
    payload_schema={
        "title": models.TextField(index=True, tokenizer="word"),
        "category": models.KeywordField(index=True),
        "rating": models.IntegerField(index=True, range_filter=True),
        "published_date": models.DatetimeField(index=True)
    }
)

# Insert points with payloads
operations = []
for i, (embedding, text, category) in enumerate(zip(
    embeddings, 
    ["ML Guide", "AI Fundamentals", "Vector DBs"],
    ["technology", "technology", "data"]
)):
    operations.append(
        models.UpsertOperation(
            point=models.PointStruct(
                id=i,
                vector=embedding.tolist(),
                payload={
                    "title": text,
                    "category": category,
                    "rating": 5,
                    "published_date": "2025-01-15T00:00:00Z"
                }
            )
        )
    )

client.upsert(
    collection_name="documents",
    operations=operations
)

# Search with filtering
search_result = client.search(
    collection_name="documents",
    query_vector=[0.1, 0.2, 0.3, ...],
    limit=10,
    query_filter=models.Filter(
        must=[
            models.FieldCondition(
                key="category",
                match=models.MatchValue(value="technology")
            ),
            models.Range(
                key="rating",
                gte=4
            )
        ]
    ),
    with_payload=True,
    with_vectors=False
)

for result in search_result:
    print(f"Score: {result.score}")
    print(f"Title: {result.payload['title']}")

Qdrant Batch and Async Operations

import asyncio
from qdrant_client import QdrantClient

client = QdrantClient(host="localhost")

# Batch search
results = client.search_batch(
    collection_name="documents",
    query_vector=[[0.1, 0.2, ...], [0.3, 0.4, ...]],
    limit=5
)

# Async upsert for large datasets
async def upsert_large_dataset():
    async with QdrantClient(host="localhost", timeout=120) as async_client:
        points = []
        for i, embedding in enumerate(embeddings):
            points.append(models.PointStruct(
                id=i,
                vector=embedding.tolist(),
                payload={"text": texts[i]}
            ))
            
            if len(points) >= 1000:
                await async_client.upsert(
                    collection_name="documents",
                    points=points
                )
                points = []
        
        if points:
            await async_client.upsert(
                collection_name="documents",
                points=points
            )

asyncio.run(upsert_large_dataset())

Feature Comparison

Feature Pinecone Milvus Qdrant
Deployment Cloud (managed) Self-hosted, cloud Self-hosted, cloud
Open Source No Yes Yes
Pricing Usage-based Infrastructure Free tier, paid cloud
Scalability Auto-scaling Horizontal Vertical + sharding
Index Types Proprietary IVF, HNSW, DiskANN HNSW, quantization
Filtering Pre/post-filter Pre-filter Post-filter
Time Travel No Yes No (but snapshots)
API gRPC, REST gRPC, REST REST
Languages Python, JS, Go Python, Go, Java Python, Go, JS, Rust

When to Use Each Database

Use Pinecone When:

  • You want minimal infrastructure management
  • You need hybrid search capabilities
  • Your team prioritizes developer experience
  • You’re building on AWS, GCP, or Azure
# Good: Pinecone for quick deployment
from pinecone import Pinecone

pc = Pinecone(api_key="your-key")
index = pc.Index("quick-start")
index.upsert([{"id": "1", "values": [0.1]*384}])
# Done - fully managed

Use Milvus When:

  • You need full control over infrastructure
  • You require time travel capabilities
  • You’re building a large-scale system (billion+ vectors)
  • You need multi-tenancy support
# Good: Milvus for large-scale, self-hosted
from pymilvus import connections

connections.connect(host="milvus-cluster", port="19530")
collection = Collection("billion-scale")
collection.load()
# Full control over deployment

Use Qdrant When:

  • You need high performance with limited resources
  • You prefer Rust-based systems
  • You want powerful filtering capabilities
  • You’re building a hybrid cloud/self-hosted solution
# Good: Qdrant for performance-critical apps
from qdrant_client import QdrantClient

client = QdrantClient(host="qdrant.internal")
results = client.search("production", query_vector=embedding)
# Fast, memory-efficient search

Bad Practices to Avoid

Bad Practice 1: Not Using Proper Index Configuration

# Bad: Using default index without tuning
collection.create_index(field_name="embedding", index_type="FLAT")

# For production with millions of vectors, this is extremely slow

Bad Practice 2: Ignoring Quantization

# Bad: Storing full-precision vectors without quantization
# Results in 4x memory usage, slower search

Bad Practice 3: No Metadata Filtering Strategy

# Bad: Fetching all results then filtering in application
results = index.query(vector, top_k=10000)  # Too many results
filtered = [r for r in results if r['category'] == 'tech']  # Slow!

Good Practices Summary

Index Selection Guide

Data Size Index Type Use Case
< 10K FLAT/IVF_FLAT Simple, exact results
10K - 1M IVF_PQ Balanced speed/accuracy
1M - 10M HNSW High speed, good accuracy
10M+ DiskANN Large scale, memory efficient
# Good: Proper index selection based on data size
if num_vectors < 10000:
    index_type = "FLAT"
elif num_vectors < 1000000:
    index_type = "IVF_PQ"
    index_params = {"nlist": 128, "nbits": 8}
else:
    index_type = "HNSW"
    index_params = {"M": 16, "efConstruction": 256}

Filtering Best Practices

# Good: Use database-native filtering
from qdrant_client import models

results = client.search(
    collection_name="documents",
    query_vector=embedding,
    limit=10,
    query_filter=models.Filter(
        must=[
            models.FieldCondition(key="category", match=models.MatchValue(value="tech")),
            models.Range(key="date", gte="2025-01-01")
        ]
    )
)

Monitoring and Optimization

# Good: Monitor index stats and optimize
collection = Collection("documents")
stats = collection.get_index_build_progress()
print(f"Indexed: {stats['total_vectors']}")
print(f"Index size: {stats['index_size']} bytes")

External Resources

Comments