Introduction
As AI applications proliferate, the need to store and search embeddings โ vector representations of text, images, and other data โ has become critical. While specialized vector databases have emerged, many organizations are finding that PostgreSQL with the pgvector extension provides the simplest path to vector search capabilities.
In 2026, pgvector has matured significantly, offering production-ready vector operations alongside PostgreSQL’s proven relational capabilities. This guide covers implementing vector search in PostgreSQL, from installation to production deployment.
Understanding Vector Search
What are Embeddings?
Embeddings are numerical representations of data that capture semantic meaning:
# Example: Text to embedding
text = "The cat sat on the mat"
embedding = [0.12, -0.34, 0.78, ...] # 1536 dimensions for text-embedding-3-small
# Similar texts have similar embeddings
similar_text = "A feline rested on the rug"
similar_embedding = [0.11, -0.33, 0.79, ...] # Very similar!
different_text = "Space rockets fly to Mars"
different_embedding = [-0.45, 0.67, -0.12, ...] # Very different!
Why PostgreSQL for Vectors?
- Single Database: Store vectors alongside your existing data
- Mature Features: ACID compliance, replication, backups
- Familiar Tools: Use existing PostgreSQL knowledge
- Performance: Good enough for most use cases
- Simplicity: One system to learn and maintain
Setting Up pgvector
Installation
# Ubuntu/Debian
curl -fsSL https://www.postgresql.org/media/keys/ACCC4CF8.asc | sudo gpg --dearmor -o /usr/share/keyrings/postgresql-keyring.gpg
echo "deb [signed-by=/usr/share/keyrings/postgresql-keyring.gpg] http://apt.postgresql.org/pub/repos/apt $(lsb_release -cs)-pgdg main" | sudo tee /etc/apt/sources.list.d/pgdg.list
sudo apt update
sudo apt install postgresql-16-pgvector
# Docker
docker run -d \
-e POSTGRES_PASSWORD=password \
-e POSTGRES_DB=vectors \
pgvector/pgvector:pg16
Enable Extension
-- Create extension
CREATE EXTENSION IF NOT EXISTS vector;
-- Verify
SELECT * FROM pg_extension WHERE extname = 'vector';
Creating Vector Columns
Table Definition
-- Simple documents table with vectors
CREATE TABLE documents (
id BIGSERIAL PRIMARY KEY,
title TEXT NOT NULL,
content TEXT NOT NULL,
embedding vector(1536), -- OpenAI ada-002 dimension
metadata JSONB,
created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
updated_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP
);
-- Create index for fast similarity search
CREATE INDEX ON documents USING ivfflat (embedding vector_cosine_ops)
WITH (lists = 100);
-- Or for exact search (slower but accurate)
CREATE INDEX ON documents USING hnsw (embedding vector_cosine_ops);
Embedding Dimensions
Different models have different dimensions:
| Model | Dimensions |
|---|---|
| text-embedding-3-small | 1536 |
| text-embedding-3-large | 3072 |
| text-embedding-ada-002 | 1536 |
| Claude (Anthropic) | 1536 |
| sentence-transformers/all-MiniLM-L6-v2 | 384 |
Inserting Embeddings
Python Example
import psycopg2
from openai import OpenAI
client = OpenAI()
def create_documents_table(conn):
with conn.cursor() as cur:
cur.execute("""
CREATE TABLE IF NOT EXISTS documents (
id BIGSERIAL PRIMARY KEY,
title TEXT NOT NULL,
content TEXT NOT NULL,
embedding vector(1536),
metadata JSONB,
created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP
)
""")
# Create index
cur.execute("""
CREATE INDEX IF NOT EXISTS idx_documents_embedding
ON documents USING ivfflat (embedding vector_cosine_ops)
WITH (lists = 100)
""")
conn.commit()
def generate_embedding(text: str) -> list:
response = client.embeddings.create(
model="text-embedding-3-small",
input=text
)
return response.data[0].embedding
def insert_document(conn, title: str, content: str, metadata: dict = None):
embedding = generate_embedding(content)
with conn.cursor() as cur:
cur.execute("""
INSERT INTO documents (title, content, embedding, metadata)
VALUES (%s, %s, %s, %s)
RETURNING id
""", (title, content, embedding, metadata))
doc_id = cur.fetchone()[0]
conn.commit()
return doc_id
Batch Insertion
def insert_documents_batch(conn, documents: list[dict]):
# Generate embeddings in batch
texts = [doc["content"] for doc in documents]
response = client.embeddings.create(
model="text-embedding-3-small",
input=texts
)
embeddings = [item.embedding for item in response.data]
with conn.cursor() as cur:
for doc, embedding in zip(documents, embeddings):
cur.execute("""
INSERT INTO documents (title, content, embedding, metadata)
VALUES (%s, %s, %s, %s)
""", (doc["title"], doc["content"], embedding, doc.get("metadata")))
conn.commit()
Similarity Search
Basic Similarity Queries
-- Cosine similarity (most common)
SELECT id, title,
1 - (embedding <=> query_embedding) AS similarity
FROM documents
ORDER BY embedding <=> query_embedding
LIMIT 5;
-- Euclidean distance
SELECT id, title,
embedding <-> query_embedding AS distance
FROM documents
ORDER BY embedding <-> query_embedding
LIMIT 5;
-- Inner product (for normalized vectors)
SELECT id, title,
embedding <#> query_embedding AS similarity
FROM documents
ORDER BY embedding <#> query_embedding
LIMIT 5;
Filtered Search
-- Search with filters
SELECT id, title,
1 - (embedding <=> query_embedding) AS similarity
FROM documents
WHERE metadata->>'category' = 'technology'
ORDER BY embedding <=> query_embedding
LIMIT 10;
-- Date-based filtering
SELECT id, title,
1 - (embedding <=> query_embedding) AS similarity
FROM documents
WHERE created_at > NOW() - INTERVAL '30 days'
ORDER BY embedding <=> query_embedding
LIMIT 10;
Python Implementation
def semantic_search(conn, query: str, limit: int = 5, filters: dict = None):
# Generate query embedding
query_embedding = generate_embedding(query)
# Build query
sql = """
SELECT id, title, content,
1 - (embedding <=> %s) AS similarity
FROM documents
"""
params = [query_embedding]
# Add filters
if filters:
conditions = []
for key, value in filters.items():
conditions.append(f"metadata->>'{key}' = %s")
params.append(value)
sql += " WHERE " + " AND ".join(conditions)
sql += f" ORDER BY embedding <=> %s LIMIT {limit}"
params.append(query_embedding)
with conn.cursor() as cur:
cur.execute(sql, params)
results = cur.fetchall()
return [
{
"id": row[0],
"title": row[1],
"content": row[2],
"similarity": row[3]
}
for row in results
]
Hybrid Search
Combine vector and text search:
-- Hybrid search: combine vector and keyword
SELECT id, title,
1 - (embedding <=> query_embedding) AS vector_sim,
ts_rank(to_tsvector(content), plainto_tsquery('english', %s)) AS text_rank
FROM documents
WHERE to_tsvector('english', content) @@ plainto_tsquery('english', %s)
OR embedding <=> query_embedding < 0.5
ORDER BY (vector_sim * 0.7 + text_rank * 0.3) DESC
LIMIT 10;
def hybrid_search(conn, query: str, limit: int = 5):
query_embedding = generate_embedding(query)
sql = """
SELECT id, title, content,
1 - (embedding <=> %s) AS vector_sim,
ts_rank(to_tsvector('english', content),
plainto_tsquery('english', %s)) AS text_rank
FROM documents
WHERE to_tsvector('english', content) @@ plainto_tsquery('english', %s)
OR embedding <=> %s < 0.8
ORDER BY (vector_sim * 0.7 + text_rank * 0.3) DESC
LIMIT %s
"""
params = [
query_embedding, query, query,
query_embedding, limit
]
with conn.cursor() as cur:
cur.execute(sql, params)
return cur.fetchall()
Building RAG Systems
Complete RAG Implementation
class PostgresRAG:
def __init__(self, conn, openai_client):
self.conn = conn
self.client = openai_client
def index_documents(self, documents: list[dict]):
"""Index documents into the database."""
insert_documents_batch(self.conn, documents)
def retrieve(self, query: str, top_k: int = 5) -> list[dict]:
"""Retrieve relevant documents for a query."""
results = semantic_search(self.conn, query, limit=top_k)
return results
def answer(self, question: str, context_limit: int = 5) -> str:
"""Answer a question using RAG."""
# Retrieve relevant context
context_docs = self.retrieve(question, top_k=context_limit)
# Build context string
context = "\n\n".join([
f"Document {i+1}: {doc['content'][:500]}..."
for i, doc in enumerate(context_docs)
])
# Generate answer
response = self.client.chat.completions.create(
model="gpt-4o",
messages=[
{
"role": "system",
"content": "You are a helpful assistant. Use the provided context to answer questions accurately."
},
{
"role": "user",
"content": f"Context:\n{context}\n\nQuestion: {question}\n\nAnswer:"
}
],
temperature=0.7
)
return {
"answer": response.choices[0].message.content,
"sources": [
{"id": doc["id"], "title": doc["title"], "similarity": doc["similarity"]}
for doc in context_docs
]
}
Handling Updates
def update_document_embedding(conn, doc_id: int, new_content: str):
"""Update document and regenerate embedding."""
new_embedding = generate_embedding(new_content)
with conn.cursor() as cur:
cur.execute("""
UPDATE documents
SET content = %s,
embedding = %s,
updated_at = CURRENT_TIMESTAMP
WHERE id = %s
""", (new_content, new_embedding, doc_id))
conn.commit()
def delete_document(conn, doc_id: int):
"""Delete a document."""
with conn.cursor() as cur:
cur.execute("DELETE FROM documents WHERE id = %s", (doc_id,))
conn.commit()
Performance Optimization
Index Selection
-- IVFFlat index (faster build, slower query, approximate)
CREATE INDEX ON documents USING ivfflat (embedding vector_cosine_ops)
WITH (lists = 100); -- More lists = better accuracy, slower build
-- HNSW index (slower build, faster query, better accuracy)
CREATE INDEX ON documents USING hnsw (embedding vector_cosine_ops)
WITH (m = 16, ef_construction = 64);
Query Optimization
-- Reindex to maintain index quality
REINDEX INDEX idx_documents_embedding;
-- Analyze for query planner
ANALYZE documents;
-- Check index usage
EXPLAIN ANALYZE
SELECT id, title
FROM documents
ORDER BY embedding <=> (SELECT embedding FROM documents WHERE id = 1)
LIMIT 5;
Partitioning
-- Partition by date for large datasets
CREATE TABLE documents (
id BIGSERIAL,
embedding vector(1536),
created_at DATE
) PARTITION BY RANGE (created_at);
-- Create partitions
CREATE TABLE documents_2026_01 PARTITION OF documents
FOR VALUES FROM ('2026-01-01') TO ('2026-02-01');
CREATE TABLE documents_2026_02 PARTITION OF documents
FOR VALUES FROM ('2026-02-01') TO ('2026-03-01');
Production Considerations
Connection Pooling
from psycopg2 import pool
# Connection pool
connection_pool = pool.ThreadedConnectionPool(
minconn=5,
maxconn=20,
host="localhost",
database="vectors",
user="postgres",
password="password"
)
def get_connection():
return connection_pool.getconn()
def return_connection(conn):
connection_pool.putconn(conn)
Monitoring
-- Check index size
SELECT
pg_size_pretty(pg_relation_size('documents')) AS table_size,
pg_size_pretty(pg_relation_size('documents_embedding_idx')) AS index_size;
-- Query performance
SELECT
query,
calls,
mean_exec_time,
total_exec_time
FROM pg_stat_statements
WHERE query LIKE '%embedding%'
ORDER BY total_exec_time DESC;
-- Index usage
SELECT
indexrelname,
idx_scan,
idx_tup_read,
idx_tup_fetch
FROM pg_stat_user_indexes
WHERE indexrelname LIKE '%embedding%';
Backup and Recovery
# Backup with pg_dump
pg_dump -h localhost -U postgres -Fc vectors > vectors_backup.dump
# Restore
pg_restore -h localhost -U postgres -d vectors vectors_backup.dump
Advanced Patterns
Multi-tenant Vector Search
-- Per-tenant vector tables
CREATE TABLE tenant_documents (
id BIGSERIAL,
tenant_id UUID NOT NULL,
title TEXT,
embedding vector(1536),
PRIMARY KEY (tenant_id, id)
);
CREATE INDEX ON tenant_documents (tenant_id, embedding vector_cosine_ops);
-- Query for specific tenant
SELECT * FROM tenant_documents
WHERE tenant_id = %s
ORDER BY embedding <=> %s
LIMIT 10;
Metadata Filtering Best Practices
-- Efficient JSONB queries
CREATE INDEX idx_documents_metadata ON documents USING gin (metadata);
-- Use containment for filtering
SELECT * FROM documents
WHERE metadata @> '{"category": "tech"}'
ORDER BY embedding <=> %s
LIMIT 10;
Combining Multiple Embeddings
-- Store multiple vector types
ALTER TABLE documents ADD COLUMN title_embedding vector(384);
ALTER TABLE documents ADD COLUMN image_embedding vector(512);
-- Query different embeddings
SELECT id,
embedding <=> %s AS doc_sim,
title_embedding <=> %s AS title_sim,
image_embedding <=> %s AS image_sim
FROM documents
ORDER BY (doc_sim + title_sim + image_sim) / 3
LIMIT 10;
Migration from Other Databases
Import from Pinecone
def import_from_pinecone(pinecone_index, conn):
# Fetch all vectors from Pinecone
while True:
response = pinecone_index.list(
limit=1000,
pagination_token=pagination_token
)
# Insert into PostgreSQL
documents = []
for item in response["vectors"]:
documents.append({
"id": item["id"],
"content": item["metadata"].get("text", ""),
"embedding": item["values"],
"metadata": {k: v for k, v in item["metadata"].items() if k != "text"}
})
insert_documents_batch(conn, documents)
if not response.get("pagination_token"):
break
Import from Weaviate
def import_from_weaviate(weaviate_client, class_name: str, conn):
# Query all objects
result = weaviate_client.query.get(
class_name,
["content", "embedding"]
).do()
documents = [
{
"content": obj["content"],
"embedding": obj["embedding"],
"metadata": {k: v for k, v in obj.items()
if k not in ["content", "embedding"]}
}
for obj in result["data"]["Get"][class_name]
]
insert_documents_batch(conn, documents)
Conclusion
PostgreSQL with pgvector provides an excellent foundation for AI applications requiring vector search. By combining vectors with your existing relational data, you simplify your architecture without sacrificing capability.
Start with a simple implementation: add the extension, create a column, generate some embeddings, and try a similarity search. The familiar SQL interface makes it easy to add filtering, combine with text search, and scale as needed.
For most applications, PostgreSQL’s vector capabilities are more than sufficient. Reserve specialized vector databases for extreme scale or specialized requirements.
Comments