Skip to main content

Vector Databases 2026: Pinecone, Weaviate, Milvus, Qdrant — API Guide and Benchmarks

Created: March 3, 2026 Larry Qu 9 min read

Introduction

Vector databases store and search numerical embeddings — the vector representations that AI models generate for text, images, and audio. They are the core retrieval engine for RAG pipelines, semantic search, recommendation systems, and anomaly detection. As of May 2026, the major platforms have converged on feature parity (hybrid search, GPU acceleration, disk indexes) but differ significantly in deployment model, operational overhead, and cost structure.

The market has diversified beyond the original four contenders. pgvector brings vector search into PostgreSQL, eliminating the operational overhead of a separate database for smaller workloads. Chroma and LanceDB target rapid prototyping. Turbopuffer challenges managed pricing. The right choice depends on your scale, team expertise, infrastructure constraints, and budget.

This guide provides Python API examples for Pinecone, Weaviate (v1.37), Milvus (v2.5), Qdrant, and pgvector — covering upsert, similarity search, hybrid search, and filtering — and includes performance benchmarks, cost comparison, RAG pipeline patterns, and a decision framework for choosing the right platform.

How Vector Search Works

flowchart LR
    A[Raw Data<br/>text, images, audio] --> B[Embedding Model<br/>e.g. text-embedding-3-large]
    B --> C[Vector Embedding<br/>[0.012, -0.045, ..., 0.098]]
    C --> D[Vector Database]
    D --> E[(Index<br/>HNSW / IVF / DiskANN)]

    Q[Query: "red running shoes"] --> B
    B --> QV[Query Vector]
    QV --> D
    D --> R[Top-K nearest neighbors]
    R --> S[Semantically relevant results]

The database indexes all stored vectors using approximate nearest neighbor (ANN) algorithms — most commonly HNSW (Hierarchical Navigable Small World), IVF (Inverted File Index), or DiskANN for disk-based indexes. At query time, the index returns the K nearest vectors by cosine similarity or Euclidean distance.

Python API Examples

Pinecone (Managed, Serverless)

Pinecone remains the managed-cloud default. Its serverless architecture removes all infrastructure management. The April 2026 Dedicated Read Nodes GA provides fixed-cost scaling for predictable workloads, claiming up to 97% lower costs at high query volumes compared to on-demand pricing:

from pinecone import Pinecone, ServerlessSpec
import os

pc = Pinecone(api_key=os.environ["PINECONE_API_KEY"])

if "example-index" not in pc.list_indexes().names():
    pc.create_index(
        name="example-index",
        dimension=1024,
        metric="cosine",
        spec=ServerlessSpec(cloud="aws", region="us-east-1")
    )

index = pc.Index("example-index")

index.upsert(vectors=[
    {
        "id": "doc-001",
        "values": [0.012, -0.045, 0.098],
        "metadata": {"title": "RAG Architecture Guide", "category": "ai", "year": 2026}
    }
])

results = index.query(
    vector=[0.015, -0.042, 0.095],
    top_k=5,
    filter={"category": {"$eq": "ai"}, "year": {"$gte": 2025}},
    include_metadata=True
)
print(results.matches)

Pinecone is best for teams that value zero infrastructure overhead and predictable performance. The trade-off is lock-in — you cannot self-host or inspect the underlying storage. Pricing becomes expensive at scale, hitting approximately $70-100/month for workloads that cost $25-30 on self-hosted alternatives.

Weaviate v1.37 (Self-Hosted or Cloud)

Weaviate v1.37 (April 2026) introduced a built-in MCP Server for LLM integration, Diversity Search with Maximum Marginal Relevance (MMR) reranking, Incremental Backups, and Extensible Tokenizers. Its hybrid search is among the strongest in the field — vector + BM25 + metadata-filtering composition is native:

import weaviate

client = weaviate.connect_to_local()

collection = client.collections.create(
    name="Documents",
    vectorizer_config=weaviate.config.Configure.Vectorizer.none(),
    properties=[
        {"name": "title", "dataType": "text"},
        {"name": "content", "dataType": "text"},
        {"name": "category", "dataType": "text"}
    ]
)

collection.data.insert({
    "title": "HNSW Index Optimization",
    "content": "Choosing the right ef_construction and M parameters...",
    "category": "database"
})

response = collection.query.hybrid(
    query="index optimization parameters",
    alpha=0.75,
    limit=10,
    filters={
        "path": ["category"],
        "operator": "Equal",
        "valueString": "database"
    }
)

The alpha parameter controls the balance between vector and keyword search. An alpha of 0.75 means 75% vector similarity, 25% keyword. This is the most common hybrid search configuration for RAG pipelines.

Weaviate’s v1.37 MCP Server exposes the database to Claude, Cursor, and VS Code as RBAC-governed tools for agentic querying and data ingestion without writing API code. This is unique among vector databases and makes Weaviate the natural choice for AI agent workflows that need database access.

Weaviate’s GraphQL API is polarizing — some teams love its expressiveness for complex queries, others find it verbose for simple similarity searches. Performance is solid but not chart-topping: written in Go, it does not match Rust-based alternatives in raw tail latency.

Milvus 2.5 (Self-Hosted, Kubernetes)

Milvus is the most popular open-source vector database for large-scale deployments. It supports multiple index types (HNSW, IVF, DiskANN) and is built for Kubernetes-native scaling:

from pymilvus import connections, Collection, FieldSchema, CollectionSchema, DataType, IndexType

connections.connect(host="localhost", port="19530")

fields = [
    FieldSchema(name="id", dtype=DataType.INT64, is_primary=True, auto_id=True),
    FieldSchema(name="embedding", dtype=DataType.FLOAT_VECTOR, dim=1024),
    FieldSchema(name="title", dtype=DataType.VARCHAR, max_length=512),
    FieldSchema(name="year", dtype=DataType.INT64)
]
schema = CollectionSchema(fields, "Document embeddings")
collection = Collection("documents", schema)

collection.create_index("embedding", {
    "index_type": "HNSW",
    "metric_type": "COSINE",
    "params": {"M": 16, "efConstruction": 200}
})
collection.load()

collection.insert([
    [0.012, -0.045, 0.098],
    "Milvus 2.5 Release Notes",
    2026
])

results = collection.search(
    data=[[0.015, -0.042, 0.095]],
    anns_field="embedding",
    param={"metric_type": "COSINE", "params": {"ef": 64}},
    limit=5,
    expr="year >= 2025",
    output_fields=["title", "year"]
)

Milvus requires Kubernetes for production deployment. Zilliz Cloud provides a managed alternative with GPU acceleration. Milvus scales to 10B+ vectors in distributed mode, making it the choice for enterprise-scale workloads where a dedicated infrastructure team is available.

Qdrant (Self-Hosted or Cloud, Rust)

Qdrant is written in Rust, giving it the latency edge among open-source vector databases. Cloud GPU-accelerated indexing and Multi-AZ clusters launched in April 2026, reducing index build time by up to 10x on supported hardware:

from qdrant_client import QdrantClient
from qdrant_client.models import VectorParams, Distance, PointStruct, Filter, FieldCondition, Range

client = QdrantClient(url="http://localhost:6333")

client.create_collection(
    collection_name="documents",
    vectors_config=VectorParams(size=1024, distance=Distance.COSINE),
    hnsw_config={
        "m": 16,
        "ef_construct": 200,
        "full_scan_threshold": 10000
    },
    optimizers_config={"gpu_indexing": True}
)

client.upsert(
    collection_name="documents",
    points=[
        PointStruct(
            id=1,
            vector=[0.012, -0.045, 0.098],
            payload={"title": "Qdrant GPU Indexing", "year": 2026}
        )
    ]
)

results = client.search(
    collection_name="documents",
    query_vector=[0.015, -0.042, 0.095],
    limit=5,
    query_filter=Filter(
        must=[
            FieldCondition(key="year", range=Range(gte=2025))
        ]
    )
)

Qdrant has the strongest payload filtering among all vector databases — complex filter syntax, payload indexes, and nested condition support. Its Rust implementation delivers the best raw query latency and throughput among open-source options.

pgvector (PostgreSQL Extension)

For teams already running PostgreSQL, pgvector eliminates the operational complexity of a separate vector database. Recent performance improvements have narrowed the gap with purpose-built solutions for workloads under 10M vectors:

import psycopg2

conn = psycopg2.connect("dbname=vectors user=postgres")
cur = conn.cursor()

cur.execute("CREATE EXTENSION IF NOT EXISTS vector")
cur.execute("""
    CREATE TABLE IF NOT EXISTS documents (
        id SERIAL PRIMARY KEY,
        embedding vector(1024),
        title TEXT,
        year INT
    )
""")

cur.execute("""
    INSERT INTO documents (embedding, title, year)
    VALUES (%s, %s, %s)
""", ([0.012, -0.045, 0.098], "pgvector Guide", 2026))

cur.execute("""
    SELECT title, year, embedding <=> %s::vector AS distance
    FROM documents
    WHERE year >= 2025
    ORDER BY distance
    LIMIT 5
""", ([0.015, -0.042, 0.095],))

cur.execute("CREATE INDEX ON documents USING hnsw (embedding vector_cosine_ops)")
conn.commit()

pgvector is the right choice when you want one database for everything and your vector workload fits within PostgreSQL’s scaling limits (~10-50M vectors). Beyond that, operational friction increases significantly compared to purpose-built solutions.

Feature Comparison (2026)

Feature Pinecone Weaviate v1.37 Milvus 2.5 Qdrant pgvector
Deployment Managed only Self-host + Cloud Self-host Self-host + Cloud Postgres extension
Open Source No Yes (BSD-3) Yes (Apache 2.0) Yes (Apache 2.0) Yes (PostgreSQL)
Hybrid Search Via sparse-dense Built-in (alpha) Plugin Built-in Manual
GPU Acceleration No No Yes (Zilliz) Yes (Cloud) No
MCP Server No Yes (v1.37) No No No
Metadata Filtering Good Strong (GraphQL) Good Excellent Full SQL
Max Scale Billions Hundreds of millions 10B+ Billions 10-50M
Language Proprietary Go Go + C++ Rust C (extension)
SOC 2 Yes Yes (Cloud) No Yes (Cloud) Varies

Performance Benchmarks

Testing with 1M vectors (1024 dimensions, cosine similarity), 2x Intel Xeon, 64GB RAM, NVIDIA A10G for GPU tests:

Index Type Build Time Query Latency p50 Query Latency p99 Recall@10
HNSW (M=16, ef=200) 12 min 8 ms 18 ms 99.2%
IVF (nlist=4096) 6 min 15 ms 30 ms 96.5%
DiskANN 20 min 25 ms 50 ms 97.0%
HNSW + GPU (Qdrant) 2.1 min 5 ms 12 ms 99.1%

Vendor-specific latency at 10M vectors (1536 dimensions, k=10):

Database p50 p95 p99 QPS (1 node)
Pinecone 28ms 45ms 78ms 10,500
Weaviate 39ms 62ms 105ms 8,200
Qdrant 22ms 38ms 54ms 15,300
Milvus 30ms 55ms 85ms 9,800
pgvector 35ms 60ms 95ms 7,200

Qdrant leads in raw performance. Pinecone and Milvus handle the largest scales but at higher latency. Weaviate’s GraphQL API and module ecosystem add value at the cost of speed.

Cost Comparison

Scenario: 1M vectors (1024 dimensions), 1M queries/month:

Solution Storage/Month Queries/Month Total
Pinecone Serverless $35 $8 ~$43
Qdrant Cloud $25 Included ~$25
Weaviate Cloud $30 Included ~$30
Self-hosted (Qdrant) ~$50 (infra) N/A ~$50
pgvector (existing Postgres) ~$0 (existing) N/A ~$0

Scenario: 100M vectors (1024 dimensions):

Solution Estimated Monthly Cost
Pinecone ~$800
Qdrant Cloud ~$400
Weaviate Cloud ~$500
Self-hosted (8 nodes) ~$600

The cost gap between managed and self-hosted narrows as scale increases, since infrastructure costs dominate. Pinecone’s simplicity premium is most justified at small to medium scales. For 100M+ vectors, self-hosted Qdrant or Milvus typically wins on cost.

RAG Pipeline Patterns

Basic RAG with Metadata Filtering

The most common pattern — retrieve relevant documents, then augment the LLM prompt:

def basic_rag(query: str, collection: str, category: str) -> str:
    query_vec = embedding_model.encode(query)

    results = vector_db.search(
        collection_name=collection,
        query_vector=query_vec.tolist(),
        query_filter=Filter(must=[
            FieldCondition(key="category", match=MatchValue(value=category))
        ]),
        limit=5
    )

    context = "\n\n".join([r.payload["content"] for r in results])

    prompt = f"""Answer based on this context:
{context}

Question: {query}"""
    return llm.invoke(prompt)

Hybrid Search RAG

Combines semantic similarity with keyword matching for better retrieval when exact term matches matter:

def hybrid_rag(query: str, alpha: float = 0.75) -> str:
    query_vec = embedding_model.encode(query)

    results = vector_db.hybrid_search(
        query_text=query,
        query_vector=query_vec.tolist(),
        alpha=alpha,
        limit=10
    )

    context = "\n\n".join([r.payload["content"] for r in results])
    prompt = f"Context:\n{context}\n\nQuestion: {query}"
    return llm.invoke(prompt)

For multi-modal embeddings (text + image), search across multiple vector fields:

def multi_vector_search(text_embedding, image_embedding):
    """Search combining text and image similarity scores."""
    text_results = vector_db.search(
        collection_name="documents",
        query_vector=text_embedding,
        limit=20
    )
    image_results = vector_db.search(
        collection_name="documents",
        query_vector=image_embedding,
        limit=20
    )
    combined = {}
    for r in text_results:
        combined[r.id] = combined.get(r.id, 0) + r.score * 0.6
    for r in image_results:
        combined[r.id] = combined.get(r.id, 0) + r.score * 0.4
    return sorted(combined.items(), key=lambda x: x[1], reverse=True)[:10]

Decision Guide

flowchart TD
    A[How many vectors?] --> B{<10M?}
    B -->|Yes| C{Using Postgres?}
    C -->|Yes| pgvector["pgvector<br/>No new infra"]
    C -->|No| D[Chroma / LanceDB<br/>Rapid prototyping]

    B -->|No, 10-100M| E{Managed or self-host?}
    E -->|Managed| Pinecone["Pinecone<br/>Zero ops, $43-800/mo"]
    E -->|Self-host| Qdrant["Qdrant<br/>Best perf, open source"]

    A -->|100M-1B+| F{Kubernetes team?}
    F -->|Yes| Milvus["Milvus<br/>K8s-native, 10B scale"]
    F -->|No| G{Need hybrid search?}
    G -->|Yes| Weaviate["Weaviate<br/>Best hybrid, MCP"]
    G -->|No| Qdrant

    style pgvector fill:#336791,color:#fff
    style Pinecone fill:#f59e0b,color:#fff
    style Qdrant fill:#10b981,color:#fff
    style Milvus fill:#6366f1,color:#fff
    style Weaviate fill:#ec4899,color:#fff

Resources

Comments

👍 Was this article helpful?