Vector Databases 2026: Pinecone, Weaviate, Milvus, Qdrant — API Guide and Benchmarks - Calmops | Tech, Business & Indie Hacker Knowledge Base

Introduction

Vector databases store and search numerical embeddings — the vector representations that AI models generate for text, images, and audio. They are the core retrieval engine for RAG pipelines, semantic search, recommendation systems, and anomaly detection. As of May 2026, the major platforms have converged on feature parity (hybrid search, GPU acceleration, disk indexes) but differ significantly in deployment model, operational overhead, and cost structure.

The market has diversified beyond the original four contenders. pgvector brings vector search into PostgreSQL, eliminating the operational overhead of a separate database for smaller workloads. Chroma and LanceDB target rapid prototyping. Turbopuffer challenges managed pricing. The right choice depends on your scale, team expertise, infrastructure constraints, and budget.

This guide provides Python API examples for Pinecone, Weaviate (v1.37), Milvus (v2.5), Qdrant, and pgvector — covering upsert, similarity search, hybrid search, and filtering — and includes performance benchmarks, cost comparison, RAG pipeline patterns, and a decision framework for choosing the right platform.

How Vector Search Works

flowchart LR
    A[Raw Data<br/>text, images, audio] --> B[Embedding Model<br/>e.g. text-embedding-3-large]
    B --> C[Vector Embedding<br/>[0.012, -0.045, ..., 0.098]]
    C --> D[Vector Database]
    D --> E[(Index<br/>HNSW / IVF / DiskANN)]

    Q[Query: "red running shoes"] --> B
    B --> QV[Query Vector]
    QV --> D
    D --> R[Top-K nearest neighbors]
    R --> S[Semantically relevant results]

The database indexes all stored vectors using approximate nearest neighbor (ANN) algorithms — most commonly HNSW (Hierarchical Navigable Small World), IVF (Inverted File Index), or DiskANN for disk-based indexes. At query time, the index returns the K nearest vectors by cosine similarity or Euclidean distance.

Python API Examples

Pinecone (Managed, Serverless)

Pinecone remains the managed-cloud default. Its serverless architecture removes all infrastructure management. The April 2026 Dedicated Read Nodes GA provides fixed-cost scaling for predictable workloads, claiming up to 97% lower costs at high query volumes compared to on-demand pricing:

from pinecone import Pinecone, ServerlessSpec
import os

pc = Pinecone(api_key=os.environ["PINECONE_API_KEY"])

if "example-index" not in pc.list_indexes().names():
    pc.create_index(
        name="example-index",
        dimension=1024,
        metric="cosine",
        spec=ServerlessSpec(cloud="aws", region="us-east-1")
    )

index = pc.Index("example-index")

index.upsert(vectors=[
    {
        "id": "doc-001",
        "values": [0.012, -0.045, 0.098],
        "metadata": {"title": "RAG Architecture Guide", "category": "ai", "year": 2026}
    }
])

results = index.query(
    vector=[0.015, -0.042, 0.095],
    top_k=5,
    filter={"category": {"$eq": "ai"}, "year": {"$gte": 2025}},
    include_metadata=True
)
print(results.matches)

Pinecone is best for teams that value zero infrastructure overhead and predictable performance. The trade-off is lock-in — you cannot self-host or inspect the underlying storage. Pricing becomes expensive at scale, hitting approximately $70-100/month for workloads that cost $25-30 on self-hosted alternatives.

Weaviate v1.37 (Self-Hosted or Cloud)

Weaviate v1.37 (April 2026) introduced a built-in MCP Server for LLM integration, Diversity Search with Maximum Marginal Relevance (MMR) reranking, Incremental Backups, and Extensible Tokenizers. Its hybrid search is among the strongest in the field — vector + BM25 + metadata-filtering composition is native:

import weaviate

client = weaviate.connect_to_local()

collection = client.collections.create(
    name="Documents",
    vectorizer_config=weaviate.config.Configure.Vectorizer.none(),
    properties=[
        {"name": "title", "dataType": "text"},
        {"name": "content", "dataType": "text"},
        {"name": "category", "dataType": "text"}
    ]
)

collection.data.insert({
    "title": "HNSW Index Optimization",
    "content": "Choosing the right ef_construction and M parameters...",
    "category": "database"
})

response = collection.query.hybrid(
    query="index optimization parameters",
    alpha=0.75,
    limit=10,
    filters={
        "path": ["category"],
        "operator": "Equal",
        "valueString": "database"
    }
)

The alpha parameter controls the balance between vector and keyword search. An alpha of 0.75 means 75% vector similarity, 25% keyword. This is the most common hybrid search configuration for RAG pipelines.

Weaviate’s v1.37 MCP Server exposes the database to Claude, Cursor, and VS Code as RBAC-governed tools for agentic querying and data ingestion without writing API code. This is unique among vector databases and makes Weaviate the natural choice for AI agent workflows that need database access.

Weaviate’s GraphQL API is polarizing — some teams love its expressiveness for complex queries, others find it verbose for simple similarity searches. Performance is solid but not chart-topping: written in Go, it does not match Rust-based alternatives in raw tail latency.

Milvus 2.5 (Self-Hosted, Kubernetes)

Milvus is the most popular open-source vector database for large-scale deployments. It supports multiple index types (HNSW, IVF, DiskANN) and is built for Kubernetes-native scaling:

from pymilvus import connections, Collection, FieldSchema, CollectionSchema, DataType, IndexType

connections.connect(host="localhost", port="19530")

fields = [
    FieldSchema(name="id", dtype=DataType.INT64, is_primary=True, auto_id=True),
    FieldSchema(name="embedding", dtype=DataType.FLOAT_VECTOR, dim=1024),
    FieldSchema(name="title", dtype=DataType.VARCHAR, max_length=512),
    FieldSchema(name="year", dtype=DataType.INT64)
]
schema = CollectionSchema(fields, "Document embeddings")
collection = Collection("documents", schema)

collection.create_index("embedding", {
    "index_type": "HNSW",
    "metric_type": "COSINE",
    "params": {"M": 16, "efConstruction": 200}
})
collection.load()

collection.insert([
    [0.012, -0.045, 0.098],
    "Milvus 2.5 Release Notes",
    2026
])

results = collection.search(
    data=[[0.015, -0.042, 0.095]],
    anns_field="embedding",
    param={"metric_type": "COSINE", "params": {"ef": 64}},
    limit=5,
    expr="year >= 2025",
    output_fields=["title", "year"]
)

Milvus requires Kubernetes for production deployment. Zilliz Cloud provides a managed alternative with GPU acceleration. Milvus scales to 10B+ vectors in distributed mode, making it the choice for enterprise-scale workloads where a dedicated infrastructure team is available.

Qdrant (Self-Hosted or Cloud, Rust)

Qdrant is written in Rust, giving it the latency edge among open-source vector databases. Cloud GPU-accelerated indexing and Multi-AZ clusters launched in April 2026, reducing index build time by up to 10x on supported hardware:

from qdrant_client import QdrantClient
from qdrant_client.models import VectorParams, Distance, PointStruct, Filter, FieldCondition, Range

client = QdrantClient(url="http://localhost:6333")

client.create_collection(
    collection_name="documents",
    vectors_config=VectorParams(size=1024, distance=Distance.COSINE),
    hnsw_config={
        "m": 16,
        "ef_construct": 200,
        "full_scan_threshold": 10000
    },
    optimizers_config={"gpu_indexing": True}
)

client.upsert(
    collection_name="documents",
    points=[
        PointStruct(
            id=1,
            vector=[0.012, -0.045, 0.098],
            payload={"title": "Qdrant GPU Indexing", "year": 2026}
        )
    ]
)

results = client.search(
    collection_name="documents",
    query_vector=[0.015, -0.042, 0.095],
    limit=5,
    query_filter=Filter(
        must=[
            FieldCondition(key="year", range=Range(gte=2025))
        ]
    )
)

Qdrant has the strongest payload filtering among all vector databases — complex filter syntax, payload indexes, and nested condition support. Its Rust implementation delivers the best raw query latency and throughput among open-source options.

pgvector (PostgreSQL Extension)

For teams already running PostgreSQL, pgvector eliminates the operational complexity of a separate vector database. Recent performance improvements have narrowed the gap with purpose-built solutions for workloads under 10M vectors:

import psycopg2

conn = psycopg2.connect("dbname=vectors user=postgres")
cur = conn.cursor()

cur.execute("CREATE EXTENSION IF NOT EXISTS vector")
cur.execute("""
    CREATE TABLE IF NOT EXISTS documents (
        id SERIAL PRIMARY KEY,
        embedding vector(1024),
        title TEXT,
        year INT
    )
""")

cur.execute("""
    INSERT INTO documents (embedding, title, year)
    VALUES (%s, %s, %s)
""", ([0.012, -0.045, 0.098], "pgvector Guide", 2026))

cur.execute("""
    SELECT title, year, embedding <=> %s::vector AS distance
    FROM documents
    WHERE year >= 2025
    ORDER BY distance
    LIMIT 5
""", ([0.015, -0.042, 0.095],))

cur.execute("CREATE INDEX ON documents USING hnsw (embedding vector_cosine_ops)")
conn.commit()

pgvector is the right choice when you want one database for everything and your vector workload fits within PostgreSQL’s scaling limits (~10-50M vectors). Beyond that, operational friction increases significantly compared to purpose-built solutions.

Feature Comparison (2026)

Feature	Pinecone	Weaviate v1.37	Milvus 2.5	Qdrant	pgvector
Deployment	Managed only	Self-host + Cloud	Self-host	Self-host + Cloud	Postgres extension
Open Source	No	Yes (BSD-3)	Yes (Apache 2.0)	Yes (Apache 2.0)	Yes (PostgreSQL)
Hybrid Search	Via sparse-dense	Built-in (alpha)	Plugin	Built-in	Manual
GPU Acceleration	No	No	Yes (Zilliz)	Yes (Cloud)	No
MCP Server	No	Yes (v1.37)	No	No	No
Metadata Filtering	Good	Strong (GraphQL)	Good	Excellent	Full SQL
Max Scale	Billions	Hundreds of millions	10B+	Billions	10-50M
Language	Proprietary	Go	Go + C++	Rust	C (extension)
SOC 2	Yes	Yes (Cloud)	No	Yes (Cloud)	Varies

Performance Benchmarks

Testing with 1M vectors (1024 dimensions, cosine similarity), 2x Intel Xeon, 64GB RAM, NVIDIA A10G for GPU tests:

Index Type	Build Time	Query Latency p50	Query Latency p99	Recall@10
HNSW (M=16, ef=200)	12 min	8 ms	18 ms	99.2%
IVF (nlist=4096)	6 min	15 ms	30 ms	96.5%
DiskANN	20 min	25 ms	50 ms	97.0%
HNSW + GPU (Qdrant)	2.1 min	5 ms	12 ms	99.1%

Vendor-specific latency at 10M vectors (1536 dimensions, k=10):

Database	p50	p95	p99	QPS (1 node)
Pinecone	28ms	45ms	78ms	10,500
Weaviate	39ms	62ms	105ms	8,200
Qdrant	22ms	38ms	54ms	15,300
Milvus	30ms	55ms	85ms	9,800
pgvector	35ms	60ms	95ms	7,200

Qdrant leads in raw performance. Pinecone and Milvus handle the largest scales but at higher latency. Weaviate’s GraphQL API and module ecosystem add value at the cost of speed.

Cost Comparison

Scenario: 1M vectors (1024 dimensions), 1M queries/month:

Solution	Storage/Month	Queries/Month	Total
Pinecone Serverless	$35	$8	~$43
Qdrant Cloud	$25	Included	~$25
Weaviate Cloud	$30	Included	~$30
Self-hosted (Qdrant)	~$50 (infra)	N/A	~$50
pgvector (existing Postgres)	~$0 (existing)	N/A	~$0

Scenario: 100M vectors (1024 dimensions):

Solution	Estimated Monthly Cost
Pinecone	~$800
Qdrant Cloud	~$400
Weaviate Cloud	~$500
Self-hosted (8 nodes)	~$600

The cost gap between managed and self-hosted narrows as scale increases, since infrastructure costs dominate. Pinecone’s simplicity premium is most justified at small to medium scales. For 100M+ vectors, self-hosted Qdrant or Milvus typically wins on cost.

RAG Pipeline Patterns

Basic RAG with Metadata Filtering

The most common pattern — retrieve relevant documents, then augment the LLM prompt:

def basic_rag(query: str, collection: str, category: str) -> str:
    query_vec = embedding_model.encode(query)

    results = vector_db.search(
        collection_name=collection,
        query_vector=query_vec.tolist(),
        query_filter=Filter(must=[
            FieldCondition(key="category", match=MatchValue(value=category))
        ]),
        limit=5
    )

    context = "\n\n".join([r.payload["content"] for r in results])

    prompt = f"""Answer based on this context:
{context}

Question: {query}"""
    return llm.invoke(prompt)

Hybrid Search RAG

Combines semantic similarity with keyword matching for better retrieval when exact term matches matter:

def hybrid_rag(query: str, alpha: float = 0.75) -> str:
    query_vec = embedding_model.encode(query)

    results = vector_db.hybrid_search(
        query_text=query,
        query_vector=query_vec.tolist(),
        alpha=alpha,
        limit=10
    )

    context = "\n\n".join([r.payload["content"] for r in results])
    prompt = f"Context:\n{context}\n\nQuestion: {query}"
    return llm.invoke(prompt)

Multi-Vector Search

For multi-modal embeddings (text + image), search across multiple vector fields:

def multi_vector_search(text_embedding, image_embedding):
    """Search combining text and image similarity scores."""
    text_results = vector_db.search(
        collection_name="documents",
        query_vector=text_embedding,
        limit=20
    )
    image_results = vector_db.search(
        collection_name="documents",
        query_vector=image_embedding,
        limit=20
    )
    combined = {}
    for r in text_results:
        combined[r.id] = combined.get(r.id, 0) + r.score * 0.6
    for r in image_results:
        combined[r.id] = combined.get(r.id, 0) + r.score * 0.4
    return sorted(combined.items(), key=lambda x: x[1], reverse=True)[:10]

Decision Guide

flowchart TD
    A[How many vectors?] --> B{<10M?}
    B -->|Yes| C{Using Postgres?}
    C -->|Yes| pgvector["pgvector<br/>No new infra"]
    C -->|No| D[Chroma / LanceDB<br/>Rapid prototyping]

    B -->|No, 10-100M| E{Managed or self-host?}
    E -->|Managed| Pinecone["Pinecone<br/>Zero ops, $43-800/mo"]
    E -->|Self-host| Qdrant["Qdrant<br/>Best perf, open source"]

    A -->|100M-1B+| F{Kubernetes team?}
    F -->|Yes| Milvus["Milvus<br/>K8s-native, 10B scale"]
    F -->|No| G{Need hybrid search?}
    G -->|Yes| Weaviate["Weaviate<br/>Best hybrid, MCP"]
    G -->|No| Qdrant

    style pgvector fill:#336791,color:#fff
    style Pinecone fill:#f59e0b,color:#fff
    style Qdrant fill:#10b981,color:#fff
    style Milvus fill:#6366f1,color:#fff
    style Weaviate fill:#ec4899,color:#fff

Platform Comparison Overview

Feature	Pinecone	Milvus	Qdrant	Weaviate
Deployment	Fully managed	Self-hosted / Cloud	Self-hosted / Cloud	Self-hosted / Cloud
Query language	Python SDK, REST	gRPC, REST, SDKs	REST, gRPC	GraphQL, REST
Filtering	Metadata filter	Scalar & bitmap	Payload filtering	Where filters
Hybrid search	Sparse-dense	Full-text + vector	Keyword + vector	Keyword + vector
Pricing	Per index/hour	Self-hosted cost	Self-hosted cost	Self-hosted/Cloud
Best for	Quick start, managed	Large scale, custom	Precision, low-latency	Flexibility, GraphQL

For detailed code examples and benchmarks, refer to the comprehensive API guide sections above.

Resources

Pinecone Python SDK Documentation
Weaviate v1.37 Release Notes — MCP Server, Diversity Search
Milvus 2.5 Release Notes — Latest security patches
Qdrant Cloud GPU Indexing — Multi-AZ and GPU features
pgvector Documentation — PostgreSQL vector extension
Hugging Face Embedding Models — Compatible embedding models
Encore: Best Vector Databases 2026 — Comparison guide
Firecrawl: Vector Database Benchmarks — Independent testing

Vector Databases 2026: Pinecone, Weaviate, Milvus, Qdrant — API Guide and Benchmarks

Introduction

How Vector Search Works

Python API Examples

Pinecone (Managed, Serverless)

Weaviate v1.37 (Self-Hosted or Cloud)

Milvus 2.5 (Self-Hosted, Kubernetes)

Qdrant (Self-Hosted or Cloud, Rust)

pgvector (PostgreSQL Extension)

Feature Comparison (2026)

Performance Benchmarks

Cost Comparison

RAG Pipeline Patterns

Basic RAG with Metadata Filtering

Hybrid Search RAG

Multi-Vector Search

Decision Guide

Platform Comparison Overview

Resources

Comments

Share this article

👍 Was this article helpful?