Introduction
As AI applications become mainstream, the need to store and query high-dimensional embeddings has driven the emergence of vector databases. These specialized databases excel at similarity search, powering applications from semantic search engines to recommendation systems to generative AI retrieval-augmented generation.
In 2026, vector databases have matured from niche solutions to essential infrastructure for AI applications. This guide explores vector database concepts, implementation patterns, and practical guidance for building AI-powered applications.
Understanding Vector Databases
What Are Vector Embeddings?
Vector embeddings are numerical representations of dataโtext, images, audioโthat capture semantic meaning:
# Text to embedding conversion
text = "The quick brown fox jumps over the lazy dog"
# Embedding model (e.g., OpenAI text-embedding-3-small)
embedding = [
0.023, -0.089, 0.034, # ... 1536 dimensions
...
]
# Similar texts have similar embeddings
similar_text = "A fast fox leap over a sleepy canine"
similar_embedding = generate_embedding(similar_text)
# Cosine similarity measures semantic similarity
similarity = cosine_similarity(embedding, similar_embedding)
# 0.92 - high similarity!
Why Vector Databases?
Traditional databases struggle with similarity search:
-- Traditional exact match - won't find synonyms
SELECT * FROM products WHERE name = 'running shoes';
-- Vector search - finds semantic matches
SELECT * FROM products
WHERE embedding <=> '[0.023, -0.089, ...]' < 0.3;
Vector databases specialize in:
- Approximate nearest neighbor (ANN) search: Find similar items in milliseconds
- High-dimensional indexing: Handle embeddings with thousands of dimensions
- Scalability: Billions of vectors with fast queries
Use Cases
| Use Case | Description |
|---|---|
| Semantic Search | Find relevant documents by meaning, not keywords |
| RAG | Retrieve context for LLM generation |
| Recommendations | Similar products, content, users |
| Image Search | Find visually similar images |
| Fraud Detection | Identify anomalous patterns |
| Deduplication | Find duplicate content |
Vector Database Options
pgvector (PostgreSQL)
Open-source, runs in PostgreSQL:
# Enable pgvector
# pip install pgvector
from pgvector.sqlalchemy import Vector
from sqlalchemy import create_engine
# Define embedding column
class Document(Base):
__tablename__ = 'documents'
id = Column(Integer, primary_key=True)
content = Column(Text)
embedding = Column(Vector(1536)) # OpenAI embeddings
# Create index for fast search
CREATE INDEX ON documents
USING ivfflat (embedding vector_cosine_ops)
WITH (lists = 100);
# Query for similar documents
SELECT content, (embedding <=> query_embedding) as similarity
FROM documents
ORDER BY embedding <=> query_embedding
LIMIT 5;
Pros: Integrated with PostgreSQL, open-source, familiar SQL Cons: Less optimized than specialized solutions
Pinecone
Managed vector database:
import pinecone
# Initialize
pinecone.init(api_key="YOUR_API_KEY", environment="us-west1")
index = pinecone.Index("documents")
# Upsert vectors
index.upsert(
vectors=[
{
"id": "doc1",
"values": [0.023, -0.089, ...], # embedding
"metadata": {"text": "Document content"}
},
],
namespace="example"
)
# Query
results = index.query(
vector=[0.023, -0.089, ...],
top_k=5,
include_metadata=True
)
Pros: Fully managed, excellent performance, serverless option Cons: Proprietary, cloud-only
Weaviate
Open-source with GraphQL and REST APIs:
import weaviate
# Connect
client = weaviate.Client(
url="http://localhost:8080",
additional_headers={
"X-OpenAI-Api-Key": "YOUR_KEY"
}
# Add data
client.data_object.create(
class_name="Document",
data_object={
"content": "Your document text",
"title": "Document Title"
},
vectorizer="text2vec-openai"
)
# Query
result = client.query.get(
"Document",
["content", "title"]
).with_near_text({
"concepts": ["search concept"]
}).with_limit(5).do()
Pros: Open-source, multiple vectorizers, GraphQL Cons: Requires more setup than cloud services
Chroma
Lightweight, embedded:
import chromadb
# Create client (in-memory or persistent)
client = chromadb.PersistentClient(path="./chroma_db")
# Create collection
collection = client.create_collection("documents")
# Add embeddings
collection.add(
documents=["Doc 1 content", "Doc 2 content"],
ids=["doc1", "doc2"],
embeddings=[[0.1, 0.2, ...], [0.3, 0.4, ...]]
)
# Query
results = collection.query(
query_texts=["search query"],
n_results=5
)
Pros: Simple, lightweight, great for prototyping Cons: Limited production features
Implementation Patterns
Retrieval-Augmented Generation (RAG)
The most common pattern:
class RAGSystem:
def __init__(self):
self.llm = OpenAI()
self.vector_db = Pinecone("index-name")
self.embedder = OpenAIEmbeddings()
def answer_question(self, question):
# 1. Embed the question
question_embedding = self.embedder.embed(question)
# 2. Retrieve relevant context
context_results = self.vector_db.query(
vector=question_embedding,
top_k=5
)
# 3. Build prompt with context
context = "\n\n".join([
r["metadata"]["text"] for r in context_results["matches"]
])
prompt = f"""Answer the question based on this context:
Context: {context}
Question: {question}
Answer:"""
# 4. Generate answer
answer = self.llm.generate(prompt)
return answer
Semantic Search
class SemanticSearch:
def __init__(self, documents):
self.documents = documents
self.embedder = OpenAIEmbeddings()
# Embed all documents once
self.embeddings = self.embedder.embed([
doc["content"] for doc in documents
])
def search(self, query, top_k=10):
# Embed query
query_embedding = self.embedder.embed(query)
# Calculate similarities
scores = cosine_similarities(query_embedding, self.embeddings)
# Return top results
indices = np.argsort(scores)[-top_k:][::-1]
return [
{
"document": self.documents[i],
"score": scores[i]
}
for i in indices
]
Hybrid Search
Combine vector and keyword search:
class HybridSearch:
def __init__(self):
self.vector_db = Pinecone("hybrid-index")
self.keyword_db = Elasticsearch("keyword-index")
def search(self, query, alpha=0.5):
# Vector search
vector_results = self.vector_db.query(
embed(query),
top_k=20
)
# Keyword search
keyword_results = self.keyword_db.search(
query,
size=20
)
# Combine with reranking
combined = self.rerank(
query,
vector_results,
keyword_results,
alpha=alpha
)
return combined
Indexing Strategies
Choosing the Right Index
| Index Type | Best For | Trade-offs |
|---|---|---|
| HNSW | High recall, moderate scale | Memory intensive |
| IVFFlat | Large datasets, lower recall | Fast build, moderate search |
| PQ | Massive scale, lower recall | Compression, accuracy loss |
HNSW Index
-- pgvector HNSW index
CREATE INDEX ON documents
USING hnsw (embedding vector_cosine_ops)
WITH (m = 16, ef_construction = 64);
-- m: number of connections per layer
-- ef_construction: search width during build
Index Parameters
# Pinecone index configuration
index_config = {
"name": "production-index",
"dimension": 1536,
"metric": "cosine",
"pods": 2,
"pod_type": "p1",
"index_config": {
"algorithm": "hnsw",
"hnsw_config": {
"m": 16,
"ef_construction": 200
}
}
}
Best Practices
Embedding Models
Choose the right embedding model:
embedding_models = {
"openai_text-embedding-3-small": {
"dimensions": 1536,
"cost": "low",
"quality": "good",
"use_case": "General purpose"
},
"openai_text-embedding-3-large": {
"dimensions": 3072,
"cost": "medium",
"quality": "best",
"use_case": "High accuracy"
},
"cohere-embed-english-v3": {
"dimensions": 1024,
"cost": "medium",
"quality": "good",
"use_case": "English-specific"
}
}
Chunking Strategies
# Fixed-size chunking
def chunk_text(text, chunk_size=1000, overlap=100):
chunks = []
for i in range(0, len(text), chunk_size - overlap):
chunks.append(text[i:i + chunk_size])
return chunks
# Semantic chunking
def semantic_chunk(text, embedder):
sentences = split_into_sentences(text)
embeddings = embedder.embed(sentences)
# Group sentences with similar embeddings
chunks = []
current_chunk = [sentences[0]]
for sentence, embedding in zip(sentences[1:], embeddings[1:]):
if cosine_similarity(embedding, embeddings[len(current_chunk) - 1]) > 0.8:
current_chunk.append(sentence)
else:
chunks.append(" ".join(current_chunk))
current_chunk = [sentence]
return chunks
Handling Updates
# Incremental updates with versioning
class VersionedVectorStore:
def __init__(self):
self.db = Pinecone("versioned-index")
self.version = 0
def update_document(self, doc_id, new_content):
new_embedding = embed(new_content)
self.db.upsert(
vectors=[{
"id": f"{doc_id}_v{self.version}",
"values": new_embedding,
"metadata": {
"content": new_content,
"version": self.version
}
}],
namespace="documents"
)
# Optionally delete old version
self.version += 1
Performance Optimization
Query Optimization
# Prefilter with metadata
results = index.query(
vector=query_embedding,
top_k=10,
filter={"category": "documentation"}, # Metadata filter
include_metadata=True
)
# Batch queries for efficiency
def batch_search(queries, batch_size=100):
results = []
for i in range(0, len(queries), batch_size):
batch = queries[i:i + batch_size]
batch_results = index.query(
vectors=batch,
top_k=5
)
results.extend(batch_results)
return results
Monitoring
# Track key metrics
metrics = {
"query_latency_p50": 0,
"query_latency_p99": 0,
"index_size": 0,
"search_count": 0,
}
The Future of Vector Databases
Emerging trends:
- Native GPU acceleration: Faster HNSW on GPUs
- Hybrid retrieval: Better keyword + vector combination
- Multi-modal embeddings: Images, audio, video
- Serverless: Pay-per-query vector databases
Resources
Conclusion
Vector databases are essential infrastructure for AI applications. Whether you choose the simplicity of Chroma for prototyping, the power of Pinecone for production, or the flexibility of pgvector for existing PostgreSQL deployments, understanding vector search fundamentals enables building powerful AI applications.
Start simple, measure performance, and scale as needed. The vector database ecosystem continues evolving rapidly, offering better tools and integrations.
Comments