Skip to main content

Vector Databases 2026: Pinecone, Weaviate, Milvus, Qdrant — API Guide and Benchmarks

Created: March 3, 2026 Larry Qu 5 min read

Introduction

Vector databases store and search numerical embeddings — the vector representations that AI models generate for text, images, and audio. They are the core retrieval engine for RAG pipelines, semantic search, recommendation systems, and anomaly detection. As of May 2026, the major platforms have converged on feature parity (hybrid search, GPU acceleration, disk indexes) but differ significantly in deployment model, operational overhead, and cost structure.

This guide provides Python code examples for Pinecone, Weaviate (v1.37), Milvus (v2.5), and Qdrant — covering upsert, similarity search, hybrid search, and filtering — and includes a performance comparison with latest version features and a decision framework for choosing the right platform.

How Vector Search Works

flowchart LR
    A[Raw Data<br/>text, images, audio] --> B[Embedding Model<br/>e.g. text-embedding-3-large]
    B --> C[Vector Embedding<br/>[0.012, -0.045, ..., 0.098]]
    C --> D[Vector Database]
    D --> E[(Index<br/>HNSW / IVF / DiskANN)]

    Q[Query: "red running shoes"] --> B
    B --> QV[Query Vector]
    QV --> D
    D --> R[Top-K nearest neighbors]
    R --> S[Semantically relevant results]

The database indexes all stored vectors using approximate nearest neighbor (ANN) algorithms — most commonly HNSW (Hierarchical Navigable Small World), IVF (Inverted File Index), or DiskANN for disk-based indexes. At query time, the index returns the K nearest vectors by cosine similarity or Euclidean distance.

Python API Examples

Pinecone (Managed, Serverless)

from pinecone import Pinecone, ServerlessSpec
import os

pc = Pinecone(api_key=os.environ["PINECONE_API_KEY"])

# Create index (serverless, 1024-dim embeddings)
if "example-index" not in pc.list_indexes().names():
    pc.create_index(
        name="example-index",
        dimension=1024,
        metric="cosine",
        spec=ServerlessSpec(cloud="aws", region="us-east-1")
    )

index = pc.Index("example-index")

# Upsert vectors with metadata
index.upsert(vectors=[
    {
        "id": "doc-001",
        "values": [0.012, -0.045, 0.098],  # 1024-dim vector
        "metadata": {"title": "RAG Architecture Guide", "category": "ai", "year": 2026}
    }
])

# Query with metadata filter
results = index.query(
    vector=[0.015, -0.042, 0.095],
    top_k=5,
    filter={"category": {"$eq": "ai"}, "year": {"$gte": 2025}}
)
print(results.matches)

Pinecone’s April 2026 Dedicated Read Nodes GA provides fixed-cost scaling for predictable workloads, claiming up to 97% lower costs at high query volumes compared to serverless on-demand pricing.

Weaviate v1.37 (Self-Hosted or Cloud)

Weaviate v1.37 (April 2026) introduced a built-in MCP Server for LLM integration, Diversity Search with Maximum Marginal Relevance (MMR) reranking, Incremental Backups, and Extensible Tokenizers:

import weaviate

client = weaviate.connect_to_local()

# Create collection with HNSW index and hybrid search
collection = client.collections.create(
    name="Documents",
    vectorizer_config=weaviate.config.Configure.Vectorizer.none(),
    properties=[
        {"name": "title", "dataType": "text"},
        {"name": "content", "dataType": "text"},
        {"name": "category", "dataType": "text"}
    ]
)

# Insert with auto-generated vector using OpenAI embedding
# (or pre-compute and pass vector manually)
collection.data.insert({
    "title": "HNSW Index Optimization",
    "content": "Choosing the right ef_construction and M parameters...",
    "category": "database"
})

# Hybrid search (combines vector similarity + keyword BM25)
response = collection.query.hybrid(
    query="index optimization parameters",
    alpha=0.75,  # 0 = pure BM25, 1 = pure vector
    limit=10,
    filters={
        "path": ["category"],
        "operator": "Equal",
        "valueString": "database"
    }
)

The alpha parameter controls the balance between vector and keyword search. An alpha of 0.75 means 75% vector similarity, 25% keyword. This is the most common hybrid search configuration for RAG pipelines where both semantic relevance and term matching matter.

Weaviate’s v1.37 MCP Server exposes the database to Claude, Cursor, and VS Code as RBAC-governed tools for agentic querying and data ingestion without writing API code.

Milvus 2.5 (Self-Hosted, Kubernetes)

Milvus 2.5.27 (February 2026) includes security hardening for the /expr endpoint, Pulsar client race condition fixes, and Azure storage improvements. For new deployments, use the Python SDK pymilvus:

from pymilvus import connections, Collection, FieldSchema, CollectionSchema, DataType, IndexType

connections.connect(host="localhost", port="19530")

# Define schema
fields = [
    FieldSchema(name="id", dtype=DataType.INT64, is_primary=True, auto_id=True),
    FieldSchema(name="embedding", dtype=DataType.FLOAT_VECTOR, dim=1024),
    FieldSchema(name="title", dtype=DataType.VARCHAR, max_length=512),
    FieldSchema(name="year", dtype=DataType.INT64)
]
schema = CollectionSchema(fields, "Document embeddings")
collection = Collection("documents", schema)

# Create HNSW index
collection.create_index("embedding", {
    "index_type": "HNSW",
    "metric_type": "COSINE",
    "params": {"M": 16, "efConstruction": 200}
})
collection.load()

# Insert
collection.insert([
    [0.012, -0.045, 0.098],  # embedding
    "Milvus 2.5 Release Notes",
    2026
])

# Search with metadata filter
results = collection.search(
    data=[[0.015, -0.042, 0.095]],
    anns_field="embedding",
    param={"metric_type": "COSINE", "params": {"ef": 64}},
    limit=5,
    expr="year >= 2025",
    output_fields=["title", "year"]
)

Milvus requires Kubernetes for production deployment. Zilliz Cloud provides a managed alternative with the same API.

Qdrant (Self-Hosted or Cloud, with GPU Indexing)

Qdrant Cloud launched GPU-accelerated indexing and Multi-AZ clusters in April 2026:

from qdrant_client import QdrantClient
from qdrant_client.models import VectorParams, Distance, PointStruct, Filter, FieldCondition, Range

client = QdrantClient(url="http://localhost:6333")

# Create collection with GPU-optimized HNSW
client.create_collection(
    collection_name="documents",
    vectors_config=VectorParams(size=1024, distance=Distance.COSINE),
    hnsw_config={
        "m": 16,
        "ef_construct": 200,
        "full_scan_threshold": 10000
    },
    # Enable GPU indexing (requires GPU node)
    optimizers_config={"gpu_indexing": True}
)

# Upsert
client.upsert(
    collection_name="documents",
    points=[
        PointStruct(
            id=1,
            vector=[0.012, -0.045, 0.098],
            payload={"title": "Qdrant GPU Indexing", "year": 2026}
        )
    ]
)

# Search with filter
results = client.search(
    collection_name="documents",
    query_vector=[0.015, -0.042, 0.095],
    limit=5,
    query_filter=Filter(
        must=[
            FieldCondition(key="year", range=Range(gte=2025))
        ]
    )
)

Qdrant’s GPU-accelerated indexing reduces index build time by up to 10x on supported hardware (NVIDIA A10G or better) while maintaining the same recall quality.

Feature Comparison (2026)

Feature Pinecone Weaviate v1.37 Milvus 2.5 Qdrant
Deployment Managed only Self-hosted + Cloud Self-hosted Self-hosted + Cloud
Hybrid Search Via sparse-dense Built-in (alpha param) Built-in Built-in
GPU Acceleration No No Yes (Zilliz Cloud) Yes (Qdrant Cloud)
MCP Server No Yes (v1.37) No No
Disk Index No No DiskANN Yes
Free Tier Yes (1GB) Yes (Cloud sandbox) OSS (self-host) Yes (1GB Cloud)
Latest Version 2026.1 v1.37.0 (Apr 2026) v2.5.27 (Feb 2026) Cloud Apr 2026
Best For Zero-ops managed Developer flexibility Enterprise scale High performance

Performance Benchmarks

Index Type Build Time (1M vectors, 1024d) Query Latency (p99) Recall@10
HNSW (M=16, ef=200) 12 min 8 ms 99.2%
IVF (nlist=4096) 6 min 15 ms 96.5%
DiskANN 20 min 25 ms 97.0%
HNSW + GPU (Qdrant) 2.1 min 5 ms 99.1%

Hardware: 2x Intel Xeon, 64GB RAM, NVIDIA A10G (for GPU tests). Dataset: 1M vectors, 1024 dimensions, cosine similarity.

Resources

Comments

Share this article

Scan to read on mobile

👍 Was this article helpful?