Skip to main content
โšก Calmops

Vector Databases: Pinecone vs Milvus vs Weaviate

Introduction

Vector databases are the backbone of modern AI applications. They enable semantic search, recommendation systems, and retrieval-augmented generation by efficiently storing and querying embeddings.

This guide compares major vector database solutions and implementation patterns.


What Are Vector Databases?

How Vector Search Works

Traditional Database:
Query: "SELECT * WHERE name = 'John'"
Match: Exact string match only

Vector Database:
Query: embed("Find customers similar to John") โ†’ [0.2, 0.8, 0.1, ...]
Match: Find vectors closest in semantic space
Result: Similar names, similar attributes, similar behavior

Use Cases

โœ… Semantic search (Google-like, not keyword-based)
โœ… Recommendation systems (similarity matching)
โœ… RAG (Retrieval-Augmented Generation)
โœ… Image/video search (visual similarity)
โœ… Duplicate detection
โœ… Anomaly detection
โœ… Question answering

Embedding Models

Text Embeddings

from sentence_transformers import SentenceTransformer

# Load embedding model
model = SentenceTransformer('all-MiniLM-L6-v2')

# Embed texts
texts = [
    "The cat sat on the mat",
    "A dog plays in the park",
    "Feline rests on fabric"
]

embeddings = model.encode(texts)
print(embeddings.shape)  # (3, 384) - 384 dimensional vectors

# Similarity search
from sklearn.metrics.pairwise import cosine_similarity
similarities = cosine_similarity([embeddings[0]], embeddings[1:])
# [[0.72, 0.15]] - first text most similar to third
Model                  Dimensions  Speed    Quality   Use Case
โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€
all-MiniLM-L6-v2      384         Fast     Good      General
all-mpnet-base-v2     768         Medium   Excellent General
bge-small-en          384         Fast     Good      Chinese/English
bge-large-en          1024        Slow     Excellent Large text
text-embedding-3-small 1536       Medium   Excellent OpenAI
text-embedding-3-large 3072       Slow     Excellent OpenAI

Pinecone

What is Pinecone?

Pinecone = Fully managed vector database
โ”œโ”€โ”€ No infrastructure management
โ”œโ”€โ”€ Automatic scaling
โ”œโ”€โ”€ Built-in filtering
โ”œโ”€โ”€ Multi-region support
โ””โ”€โ”€ Pay per usage

Pricing Model

Pinecone Pricing:
- Index storage: $0.11 per vector/month (1M vectors = $110/month)
- Vector capacity: Scales automatically
- Queries: Included
- Free tier: 1M vectors total

Example: 100M vectors
Cost: ~$11,000/month

Implementation

import pinecone

# Initialize
pinecone.init(api_key="YOUR_API_KEY", environment="us-west1-gcp")

# Create index
pinecone.create_index(
    name="document-search",
    dimension=384,
    metric="cosine",
)

# Get index
index = pinecone.Index("document-search")

# Upsert vectors
vectors = [
    ("id-1", [0.1, 0.2, 0.3, ...], {"text": "Sample text"}),
    ("id-2", [0.4, 0.5, 0.6, ...], {"text": "Another text"}),
]
index.upsert(vectors=vectors)

# Query
results = index.query(
    vector=[0.1, 0.2, 0.3, ...],
    top_k=10,
    include_metadata=True,
)

for match in results["matches"]:
    print(f"ID: {match['id']}, Score: {match['score']}")
    print(f"Metadata: {match['metadata']}")

Milvus

What is Milvus?

Milvus = Open-source vector database
โ”œโ”€โ”€ Self-hosted (full control)
โ”œโ”€โ”€ Scalable (distributed)
โ”œโ”€โ”€ Fast (optimized for vectors)
โ”œโ”€โ”€ No vendor lock-in
โ””โ”€โ”€ Free and open source

Architecture

Milvus Cluster:
โ”œโ”€โ”€ Root Coordinator (metadata management)
โ”œโ”€โ”€ Query Coordinators (query execution)
โ”œโ”€โ”€ Data Coordinators (data management)
โ”œโ”€โ”€ Query Nodes (query processing)
โ”œโ”€โ”€ Data Nodes (data ingestion)
โ””โ”€โ”€ Index Nodes (index building)

Installation & Setup

# Install with Docker
docker run -d \
  --name milvus \
  -p 19530:19530 \
  -p 9091:9091 \
  milvusdb/milvus:latest

# Or Kubernetes
helm install milvus milvus/milvus -n milvus-ns

Implementation

from pymilvus import Collection, connections

# Connect
connections.connect("default", host="localhost", port=19530)

# Create collection with schema
schema = CollectionSchema([
    FieldSchema("id", DataType.INT64, is_primary=True),
    FieldSchema("text", DataType.VARCHAR, max_length=1000),
    FieldSchema("embedding", DataType.FLOAT_VECTOR, dim=384),
])

collection = Collection(
    name="documents",
    schema=schema,
    using="default"
)

# Insert data
data = {
    "id": [1, 2, 3],
    "text": ["doc1", "doc2", "doc3"],
    "embedding": [[0.1, 0.2, ...], [0.4, 0.5, ...], [0.7, 0.8, ...]],
}
collection.insert(data)

# Create index
collection.create_index(
    field_name="embedding",
    index_params={"index_type": "IVF_FLAT", "metric_type": "L2"}
)

# Search
results = collection.search(
    data=[[0.1, 0.2, 0.3, ...]],
    anns_field="embedding",
    param={"metric_type": "L2"},
    limit=10,
    output_fields=["id", "text"],
)

for hits in results:
    for hit in hits:
        print(f"ID: {hit.id}, Score: {hit.distance}")

Weaviate

What is Weaviate?

Weaviate = Graph + Vector Database
โ”œโ”€โ”€ Semantic graph structure
โ”œโ”€โ”€ GraphQL API
โ”œโ”€โ”€ Built-in LLM integration
โ”œโ”€โ”€ Self-hosted or managed
โ””โ”€โ”€ Open source core

Key Features

โœ… GraphQL queries (semantic)
โœ… Automatic vectorization
โœ… LLM integration
โœ… Horizontal scaling
โœ… Multi-tenancy

Implementation

import weaviate

# Connect to Weaviate
client = weaviate.Client("http://localhost:8080")

# Create class (schema)
class_definition = {
    "class": "Document",
    "properties": [
        {
            "name": "title",
            "dataType": ["string"],
        },
        {
            "name": "content",
            "dataType": ["text"],
        },
    ],
    "vectorizer": "text2vec-openai",  # Auto-vectorize
}

client.schema.create_class(class_definition)

# Add objects
doc1 = {
    "title": "Machine Learning Basics",
    "content": "Machine learning is a subset of AI..."
}

client.data_object.create(
    doc1,
    class_name="Document",
)

# Semantic search (GraphQL)
query = (
    client.query
    .get("Document", ["title", "content", "_additional {distance}"])
    .with_near_text({"concepts": ["neural networks"]})
    .with_limit(10)
    .do()
)

for doc in query["data"]["Get"]["Document"]:
    print(f"Title: {doc['title']}")
    print(f"Distance: {doc['_additional']['distance']}")

Comparison Matrix

Feature              Pinecone    Milvus      Weaviate
โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€
Deployment          Managed     Self/Cloud  Self/Managed
Cost                High        Low         Low
Scalability         Automatic   Manual      Good
Filtering           โœ… Yes      โœ… Yes      โœ… Yes
GraphQL             โŒ No       โŒ No       โœ… Yes
LLM Integration     Basic       No          โœ… Yes
Performance         Excellent   Excellent   Good
Learning Curve      Easy        Medium      Medium
Community           Large       Growing     Growing

Pricing Comparison (1M vectors):
Pinecone:    $110/month
Milvus:      $50-100 (self-hosted)
Weaviate:    $100-200 (managed)

Real-World Implementation: RAG System

from sentence_transformers import SentenceTransformer
from pymilvus import Collection, connections
import openai

class RAGSystem:
    def __init__(self):
        self.embedder = SentenceTransformer('all-MiniLM-L6-v2')
        connections.connect("default", host="localhost", port=19530)
        self.collection = Collection("documents")
    
    def index_documents(self, documents):
        """Add documents to vector database"""
        embeddings = self.embedder.encode([d["text"] for d in documents])
        
        data = {
            "id": [d["id"] for d in documents],
            "text": [d["text"] for d in documents],
            "embedding": embeddings,
        }
        
        self.collection.insert(data)
    
    def retrieve(self, query, top_k=5):
        """Retrieve relevant documents"""
        query_embedding = self.embedder.encode([query])[0]
        
        results = self.collection.search(
            data=[query_embedding],
            anns_field="embedding",
            limit=top_k,
        )
        
        return [result.id for result in results[0]]
    
    def generate_answer(self, query):
        """RAG: Retrieve + Generate"""
        # Step 1: Retrieve context
        doc_ids = self.retrieve(query)
        context = " ".join([
            self.get_document(doc_id)["text"]
            for doc_id in doc_ids
        ])
        
        # Step 2: Generate with LLM
        prompt = f"""Based on this context, answer the question.
        
Context: {context}

Question: {query}

Answer:"""
        
        response = openai.ChatCompletion.create(
            model="gpt-3.5-turbo",
            messages=[{"role": "user", "content": prompt}]
        )
        
        return response["choices"][0]["message"]["content"]

# Usage
rag = RAGSystem()

# Index documents
documents = [
    {"id": 1, "text": "Machine learning uses algorithms to learn from data"},
    {"id": 2, "text": "Deep learning uses neural networks with multiple layers"},
]
rag.index_documents(documents)

# Generate answer
answer = rag.generate_answer("What is machine learning?")

Performance Benchmarks

Query Latency (100M vectors, top-100 results):

Operation           Pinecone    Milvus      Weaviate
โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€
Single query        50-100ms    30-80ms     100-150ms
Batch (100)         500-1000ms  400-800ms   1000-2000ms
Filtered search     100-200ms   80-150ms    200-300ms
Range search        200-400ms   150-300ms   400-600ms

Glossary

  • Vector Database: Database optimized for vector/embedding storage
  • Embedding: Numerical representation of semantic meaning
  • Vector Search: Finding similar vectors efficiently
  • RAG: Retrieval-Augmented Generation
  • Cosine Similarity: Measure of vector similarity

Resources


Comments