Introduction
Vector databases are the backbone of modern AI applications. They enable semantic search, recommendation systems, and retrieval-augmented generation by efficiently storing and querying embeddings.
This guide compares major vector database solutions and implementation patterns.
What Are Vector Databases?
How Vector Search Works
Traditional Database:
Query: "SELECT * WHERE name = 'John'"
Match: Exact string match only
Vector Database:
Query: embed("Find customers similar to John") → [0.2, 0.8, 0.1, ...]
Match: Find vectors closest in semantic space
Result: Similar names, similar attributes, similar behavior
Use Cases
✅ Semantic search (Google-like, not keyword-based)
✅ Recommendation systems (similarity matching)
✅ RAG (Retrieval-Augmented Generation)
✅ Image/video search (visual similarity)
✅ Duplicate detection
✅ Anomaly detection
✅ Question answering
Embedding Models
Text Embeddings
from sentence_transformers import SentenceTransformer
# Load embedding model
model = SentenceTransformer('all-MiniLM-L6-v2')
# Embed texts
texts = [
"The cat sat on the mat",
"A dog plays in the park",
"Feline rests on fabric"
]
embeddings = model.encode(texts)
print(embeddings.shape) # (3, 384) - 384 dimensional vectors
# Similarity search
from sklearn.metrics.pairwise import cosine_similarity
similarities = cosine_similarity([embeddings[0]], embeddings[1:])
# [[0.72, 0.15]] - first text most similar to third
Popular Embedding Models
Model Dimensions Speed Quality Use Case
──────────────────────────────────────────────────────────────
all-MiniLM-L6-v2 384 Fast Good General
all-mpnet-base-v2 768 Medium Excellent General
bge-small-en 384 Fast Good Chinese/English
bge-large-en 1024 Slow Excellent Large text
text-embedding-3-small 1536 Medium Excellent OpenAI
text-embedding-3-large 3072 Slow Excellent OpenAI
Pinecone
What is Pinecone?
Pinecone = Fully managed vector database
├── No infrastructure management
├── Automatic scaling
├── Built-in filtering
├── Multi-region support
└── Pay per usage
Pricing Model
Pinecone Pricing:
- Index storage: $0.11 per vector/month (1M vectors = $110/month)
- Vector capacity: Scales automatically
- Queries: Included
- Free tier: 1M vectors total
Example: 100M vectors
Cost: ~$11,000/month
Implementation
import pinecone
# Initialize
pinecone.init(api_key="YOUR_API_KEY", environment="us-west1-gcp")
# Create index
pinecone.create_index(
name="document-search",
dimension=384,
metric="cosine",
)
# Get index
index = pinecone.Index("document-search")
# Upsert vectors
vectors = [
("id-1", [0.1, 0.2, 0.3, ...], {"text": "Sample text"}),
("id-2", [0.4, 0.5, 0.6, ...], {"text": "Another text"}),
]
index.upsert(vectors=vectors)
# Query
results = index.query(
vector=[0.1, 0.2, 0.3, ...],
top_k=10,
include_metadata=True,
)
for match in results["matches"]:
print(f"ID: {match['id']}, Score: {match['score']}")
print(f"Metadata: {match['metadata']}")
Milvus
What is Milvus?
Milvus = Open-source vector database
├── Self-hosted (full control)
├── Scalable (distributed)
├── Fast (optimized for vectors)
├── No vendor lock-in
└── Free and open source
Architecture
Milvus Cluster:
├── Root Coordinator (metadata management)
├── Query Coordinators (query execution)
├── Data Coordinators (data management)
├── Query Nodes (query processing)
├── Data Nodes (data ingestion)
└── Index Nodes (index building)
Installation & Setup
# Install with Docker
docker run -d \
--name milvus \
-p 19530:19530 \
-p 9091:9091 \
milvusdb/milvus:latest
# Or Kubernetes
helm install milvus milvus/milvus -n milvus-ns
Implementation
from pymilvus import Collection, connections
# Connect
connections.connect("default", host="localhost", port=19530)
# Create collection with schema
schema = CollectionSchema([
FieldSchema("id", DataType.INT64, is_primary=True),
FieldSchema("text", DataType.VARCHAR, max_length=1000),
FieldSchema("embedding", DataType.FLOAT_VECTOR, dim=384),
])
collection = Collection(
name="documents",
schema=schema,
using="default"
)
# Insert data
data = {
"id": [1, 2, 3],
"text": ["doc1", "doc2", "doc3"],
"embedding": [[0.1, 0.2, ...], [0.4, 0.5, ...], [0.7, 0.8, ...]],
}
collection.insert(data)
# Create index
collection.create_index(
field_name="embedding",
index_params={"index_type": "IVF_FLAT", "metric_type": "L2"}
)
# Search
results = collection.search(
data=[[0.1, 0.2, 0.3, ...]],
anns_field="embedding",
param={"metric_type": "L2"},
limit=10,
output_fields=["id", "text"],
)
for hits in results:
for hit in hits:
print(f"ID: {hit.id}, Score: {hit.distance}")
Weaviate
What is Weaviate?
Weaviate = Graph + Vector Database
├── Semantic graph structure
├── GraphQL API
├── Built-in LLM integration
├── Self-hosted or managed
└── Open source core
Key Features
✅ GraphQL queries (semantic)
✅ Automatic vectorization
✅ LLM integration
✅ Horizontal scaling
✅ Multi-tenancy
Implementation
import weaviate
# Connect to Weaviate
client = weaviate.Client("http://localhost:8080")
# Create class (schema)
class_definition = {
"class": "Document",
"properties": [
{
"name": "title",
"dataType": ["string"],
},
{
"name": "content",
"dataType": ["text"],
},
],
"vectorizer": "text2vec-openai", # Auto-vectorize
}
client.schema.create_class(class_definition)
# Add objects
doc1 = {
"title": "Machine Learning Basics",
"content": "Machine learning is a subset of AI..."
}
client.data_object.create(
doc1,
class_name="Document",
)
# Semantic search (GraphQL)
query = (
client.query
.get("Document", ["title", "content", "_additional {distance}"])
.with_near_text({"concepts": ["neural networks"]})
.with_limit(10)
.do()
)
for doc in query["data"]["Get"]["Document"]:
print(f"Title: {doc['title']}")
print(f"Distance: {doc['_additional']['distance']}")
Comparison Matrix
Feature Pinecone Milvus Weaviate
─────────────────────────────────────────────────────
Deployment Managed Self/Cloud Self/Managed
Cost High Low Low
Scalability Automatic Manual Good
Filtering ✅ Yes ✅ Yes ✅ Yes
GraphQL ❌ No ❌ No ✅ Yes
LLM Integration Basic No ✅ Yes
Performance Excellent Excellent Good
Learning Curve Easy Medium Medium
Community Large Growing Growing
Pricing Comparison (1M vectors):
Pinecone: $110/month
Milvus: $50-100 (self-hosted)
Weaviate: $100-200 (managed)
Real-World Implementation: RAG System
from sentence_transformers import SentenceTransformer
from pymilvus import Collection, connections
import openai
class RAGSystem:
def __init__(self):
self.embedder = SentenceTransformer('all-MiniLM-L6-v2')
connections.connect("default", host="localhost", port=19530)
self.collection = Collection("documents")
def index_documents(self, documents):
"""Add documents to vector database"""
embeddings = self.embedder.encode([d["text"] for d in documents])
data = {
"id": [d["id"] for d in documents],
"text": [d["text"] for d in documents],
"embedding": embeddings,
}
self.collection.insert(data)
def retrieve(self, query, top_k=5):
"""Retrieve relevant documents"""
query_embedding = self.embedder.encode([query])[0]
results = self.collection.search(
data=[query_embedding],
anns_field="embedding",
limit=top_k,
)
return [result.id for result in results[0]]
def generate_answer(self, query):
"""RAG: Retrieve + Generate"""
# Step 1: Retrieve context
doc_ids = self.retrieve(query)
context = " ".join([
self.get_document(doc_id)["text"]
for doc_id in doc_ids
])
# Step 2: Generate with LLM
prompt = f"""Based on this context, answer the question.
Context: {context}
Question: {query}
Answer:"""
response = openai.ChatCompletion.create(
model="gpt-3.5-turbo",
messages=[{"role": "user", "content": prompt}]
)
return response["choices"][0]["message"]["content"]
# Usage
rag = RAGSystem()
# Index documents
documents = [
{"id": 1, "text": "Machine learning uses algorithms to learn from data"},
{"id": 2, "text": "Deep learning uses neural networks with multiple layers"},
]
rag.index_documents(documents)
# Generate answer
answer = rag.generate_answer("What is machine learning?")
Performance Benchmarks
Query Latency (100M vectors, top-100 results):
Operation Pinecone Milvus Weaviate
─────────────────────────────────────────────────
Single query 50-100ms 30-80ms 100-150ms
Batch (100) 500-1000ms 400-800ms 1000-2000ms
Filtered search 100-200ms 80-150ms 200-300ms
Range search 200-400ms 150-300ms 400-600ms
Glossary
- Vector Database: Database optimized for vector/embedding storage
- Embedding: Numerical representation of semantic meaning
- Vector Search: Finding similar vectors efficiently
- RAG: Retrieval-Augmented Generation
- Cosine Similarity: Measure of vector similarity
Comments