Introduction
Vector databases are the backbone of modern AI applications. They enable semantic search, recommendation systems, and retrieval-augmented generation by efficiently storing and querying embeddings.
This guide compares major vector database solutions and implementation patterns.
What Are Vector Databases?
How Vector Search Works
Traditional Database:
Query: "SELECT * WHERE name = 'John'"
Match: Exact string match only
Vector Database:
Query: embed("Find customers similar to John") โ [0.2, 0.8, 0.1, ...]
Match: Find vectors closest in semantic space
Result: Similar names, similar attributes, similar behavior
Use Cases
โ
Semantic search (Google-like, not keyword-based)
โ
Recommendation systems (similarity matching)
โ
RAG (Retrieval-Augmented Generation)
โ
Image/video search (visual similarity)
โ
Duplicate detection
โ
Anomaly detection
โ
Question answering
Embedding Models
Text Embeddings
from sentence_transformers import SentenceTransformer
# Load embedding model
model = SentenceTransformer('all-MiniLM-L6-v2')
# Embed texts
texts = [
"The cat sat on the mat",
"A dog plays in the park",
"Feline rests on fabric"
]
embeddings = model.encode(texts)
print(embeddings.shape) # (3, 384) - 384 dimensional vectors
# Similarity search
from sklearn.metrics.pairwise import cosine_similarity
similarities = cosine_similarity([embeddings[0]], embeddings[1:])
# [[0.72, 0.15]] - first text most similar to third
Popular Embedding Models
Model Dimensions Speed Quality Use Case
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
all-MiniLM-L6-v2 384 Fast Good General
all-mpnet-base-v2 768 Medium Excellent General
bge-small-en 384 Fast Good Chinese/English
bge-large-en 1024 Slow Excellent Large text
text-embedding-3-small 1536 Medium Excellent OpenAI
text-embedding-3-large 3072 Slow Excellent OpenAI
Pinecone
What is Pinecone?
Pinecone = Fully managed vector database
โโโ No infrastructure management
โโโ Automatic scaling
โโโ Built-in filtering
โโโ Multi-region support
โโโ Pay per usage
Pricing Model
Pinecone Pricing:
- Index storage: $0.11 per vector/month (1M vectors = $110/month)
- Vector capacity: Scales automatically
- Queries: Included
- Free tier: 1M vectors total
Example: 100M vectors
Cost: ~$11,000/month
Implementation
import pinecone
# Initialize
pinecone.init(api_key="YOUR_API_KEY", environment="us-west1-gcp")
# Create index
pinecone.create_index(
name="document-search",
dimension=384,
metric="cosine",
)
# Get index
index = pinecone.Index("document-search")
# Upsert vectors
vectors = [
("id-1", [0.1, 0.2, 0.3, ...], {"text": "Sample text"}),
("id-2", [0.4, 0.5, 0.6, ...], {"text": "Another text"}),
]
index.upsert(vectors=vectors)
# Query
results = index.query(
vector=[0.1, 0.2, 0.3, ...],
top_k=10,
include_metadata=True,
)
for match in results["matches"]:
print(f"ID: {match['id']}, Score: {match['score']}")
print(f"Metadata: {match['metadata']}")
Milvus
What is Milvus?
Milvus = Open-source vector database
โโโ Self-hosted (full control)
โโโ Scalable (distributed)
โโโ Fast (optimized for vectors)
โโโ No vendor lock-in
โโโ Free and open source
Architecture
Milvus Cluster:
โโโ Root Coordinator (metadata management)
โโโ Query Coordinators (query execution)
โโโ Data Coordinators (data management)
โโโ Query Nodes (query processing)
โโโ Data Nodes (data ingestion)
โโโ Index Nodes (index building)
Installation & Setup
# Install with Docker
docker run -d \
--name milvus \
-p 19530:19530 \
-p 9091:9091 \
milvusdb/milvus:latest
# Or Kubernetes
helm install milvus milvus/milvus -n milvus-ns
Implementation
from pymilvus import Collection, connections
# Connect
connections.connect("default", host="localhost", port=19530)
# Create collection with schema
schema = CollectionSchema([
FieldSchema("id", DataType.INT64, is_primary=True),
FieldSchema("text", DataType.VARCHAR, max_length=1000),
FieldSchema("embedding", DataType.FLOAT_VECTOR, dim=384),
])
collection = Collection(
name="documents",
schema=schema,
using="default"
)
# Insert data
data = {
"id": [1, 2, 3],
"text": ["doc1", "doc2", "doc3"],
"embedding": [[0.1, 0.2, ...], [0.4, 0.5, ...], [0.7, 0.8, ...]],
}
collection.insert(data)
# Create index
collection.create_index(
field_name="embedding",
index_params={"index_type": "IVF_FLAT", "metric_type": "L2"}
)
# Search
results = collection.search(
data=[[0.1, 0.2, 0.3, ...]],
anns_field="embedding",
param={"metric_type": "L2"},
limit=10,
output_fields=["id", "text"],
)
for hits in results:
for hit in hits:
print(f"ID: {hit.id}, Score: {hit.distance}")
Weaviate
What is Weaviate?
Weaviate = Graph + Vector Database
โโโ Semantic graph structure
โโโ GraphQL API
โโโ Built-in LLM integration
โโโ Self-hosted or managed
โโโ Open source core
Key Features
โ
GraphQL queries (semantic)
โ
Automatic vectorization
โ
LLM integration
โ
Horizontal scaling
โ
Multi-tenancy
Implementation
import weaviate
# Connect to Weaviate
client = weaviate.Client("http://localhost:8080")
# Create class (schema)
class_definition = {
"class": "Document",
"properties": [
{
"name": "title",
"dataType": ["string"],
},
{
"name": "content",
"dataType": ["text"],
},
],
"vectorizer": "text2vec-openai", # Auto-vectorize
}
client.schema.create_class(class_definition)
# Add objects
doc1 = {
"title": "Machine Learning Basics",
"content": "Machine learning is a subset of AI..."
}
client.data_object.create(
doc1,
class_name="Document",
)
# Semantic search (GraphQL)
query = (
client.query
.get("Document", ["title", "content", "_additional {distance}"])
.with_near_text({"concepts": ["neural networks"]})
.with_limit(10)
.do()
)
for doc in query["data"]["Get"]["Document"]:
print(f"Title: {doc['title']}")
print(f"Distance: {doc['_additional']['distance']}")
Comparison Matrix
Feature Pinecone Milvus Weaviate
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
Deployment Managed Self/Cloud Self/Managed
Cost High Low Low
Scalability Automatic Manual Good
Filtering โ
Yes โ
Yes โ
Yes
GraphQL โ No โ No โ
Yes
LLM Integration Basic No โ
Yes
Performance Excellent Excellent Good
Learning Curve Easy Medium Medium
Community Large Growing Growing
Pricing Comparison (1M vectors):
Pinecone: $110/month
Milvus: $50-100 (self-hosted)
Weaviate: $100-200 (managed)
Real-World Implementation: RAG System
from sentence_transformers import SentenceTransformer
from pymilvus import Collection, connections
import openai
class RAGSystem:
def __init__(self):
self.embedder = SentenceTransformer('all-MiniLM-L6-v2')
connections.connect("default", host="localhost", port=19530)
self.collection = Collection("documents")
def index_documents(self, documents):
"""Add documents to vector database"""
embeddings = self.embedder.encode([d["text"] for d in documents])
data = {
"id": [d["id"] for d in documents],
"text": [d["text"] for d in documents],
"embedding": embeddings,
}
self.collection.insert(data)
def retrieve(self, query, top_k=5):
"""Retrieve relevant documents"""
query_embedding = self.embedder.encode([query])[0]
results = self.collection.search(
data=[query_embedding],
anns_field="embedding",
limit=top_k,
)
return [result.id for result in results[0]]
def generate_answer(self, query):
"""RAG: Retrieve + Generate"""
# Step 1: Retrieve context
doc_ids = self.retrieve(query)
context = " ".join([
self.get_document(doc_id)["text"]
for doc_id in doc_ids
])
# Step 2: Generate with LLM
prompt = f"""Based on this context, answer the question.
Context: {context}
Question: {query}
Answer:"""
response = openai.ChatCompletion.create(
model="gpt-3.5-turbo",
messages=[{"role": "user", "content": prompt}]
)
return response["choices"][0]["message"]["content"]
# Usage
rag = RAGSystem()
# Index documents
documents = [
{"id": 1, "text": "Machine learning uses algorithms to learn from data"},
{"id": 2, "text": "Deep learning uses neural networks with multiple layers"},
]
rag.index_documents(documents)
# Generate answer
answer = rag.generate_answer("What is machine learning?")
Performance Benchmarks
Query Latency (100M vectors, top-100 results):
Operation Pinecone Milvus Weaviate
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
Single query 50-100ms 30-80ms 100-150ms
Batch (100) 500-1000ms 400-800ms 1000-2000ms
Filtered search 100-200ms 80-150ms 200-300ms
Range search 200-400ms 150-300ms 400-600ms
Glossary
- Vector Database: Database optimized for vector/embedding storage
- Embedding: Numerical representation of semantic meaning
- Vector Search: Finding similar vectors efficiently
- RAG: Retrieval-Augmented Generation
- Cosine Similarity: Measure of vector similarity
Comments