Introduction
PostgreSQL with pgvector has become the go-to solution for teams building AI applications that want vector capabilities without deploying separate vector databases. The combination leverages PostgreSQL’s reliability, ecosystem, and operational familiarity while adding vector similarity search. In 2026, pgvector powers production RAG systems, recommendation engines, and semantic search across organizations of all sizes.
The appeal is straightforward: many organizations already run PostgreSQL. Adding vector capabilities through pgvector requires no new infrastructure, no new operational expertise, and no new monitoring systems. For teams building AI applications with existing PostgreSQL deploymentsโor teams that prefer to minimize infrastructure complexityโpgvector provides an attractive path to vector search.
What Is Vector Search and Why Does It Matter?
Vector search finds items by semantic meaning rather than exact keyword matches. Machine learning models convert text, images, or audio into embedding vectorsโarrays of floating-point numbers that represent semantic content. Similar items cluster together in vector space, so calculating the distance between vectors reveals semantic similarity.
Traditional databases rely on exact matches and pattern matching (LIKE, full-text search). These approaches miss synonyms, paraphrases, and conceptually related content. A search for “canine companions” won’t match “dog training” in a keyword system, but vector search captures the semantic relationship.
pgvector brings this capability directly into PostgreSQL. You store embeddings alongside relational data, run vector similarity queries with standard SQL, and combine vector search with PostgreSQL’s filtering, joins, and full-text search. This unified approach eliminates the need for a separate vector database, reducing infrastructure complexity and operational overhead.
Prerequisites
Before installing pgvector, ensure you have:
- PostgreSQL 12 or later (PostgreSQL 16+ recommended for best performance)
- A C compiler and build tools (for source installations)
- Sufficient memory for index building (at least 2GB for production workloads)
- Database superuser or CREATEEXT privilege to install extensions
Installing pgvector
pgvector installation varies by deployment method. Choose the approach that matches your environment.
Linux (Debian/Ubuntu) via APT
$ sudo apt update
$ sudo apt install postgresql-16-pgvector
# => Installs pgvector for PostgreSQL 16
For other PostgreSQL versions, match the package version:
$ sudo apt install postgresql-15-pgvector
$ sudo apt install postgresql-14-pgvector
macOS via Homebrew
$ brew install pgvector
$ brew services restart postgresql
Docker
PostgreSQL containers require pgvector installed at build time. Use the official pgvector image or install it in a custom Dockerfile:
FROM postgres:16
RUN apt-get update && \
apt-get install -y postgresql-16-pgvector && \
rm -rf /var/lib/apt/lists/*
Build and run:
$ docker build -t postgres-pgvector .
$ docker run -e POSTGRES_PASSWORD=secret -d postgres-pgvector
Build from Source
When package managers don’t provide pgvector, build from source:
$ git clone https://github.com/pgvector/pgvector.git
$ cd pgvector
$ make
$ sudo make install
# => Installs the extension files to the PostgreSQL extension directory
If PostgreSQL is installed from a non-default location, specify the pg_config path:
$ make PG_CONFIG=/usr/local/pgsql/bin/pg_config
$ sudo make PG_CONFIG=/usr/local/pgsql/bin/pg_config install
Cloud PostgreSQL Services
| Service | Enabling pgvector |
|---|---|
| AWS RDS for PostgreSQL | CREATE EXTENSION vector; after enabling in parameter group |
| Google Cloud SQL | CREATE EXTENSION vector; โ available by default |
| Azure Database for PostgreSQL | Enabled via server parameters or CREATE EXTENSION |
| Supabase | Available by default |
| Neon | Available by default |
Enabling the Extension
Connect to your database and create the extension:
CREATE EXTENSION IF NOT EXISTS vector;
Verify the installation:
SELECT extname, extversion FROM pg_extension WHERE extname = 'vector';
-- Returns: vector | 0.8.0 (or later)
Creating Vector Columns
Vector columns store embeddings as arrays of floating-point numbers. Specify the dimension when creating the columnโit must match your embedding model’s output size.
SQL: Creating a Table with Vector Columns
CREATE TABLE documents (
id BIGSERIAL PRIMARY KEY,
title TEXT NOT NULL,
content TEXT NOT NULL,
embedding vector(1536), -- OpenAI text-embedding-3-small
summary_embedding vector(384), -- A smaller model for quick similarity
source TEXT,
created_at TIMESTAMPTZ DEFAULT NOW(),
metadata JSONB
);
Common embedding dimensions by model:
| Model | Dimensions | Use Case |
|---|---|---|
text-embedding-3-small |
1536 | General-purpose text embeddings |
text-embedding-3-large |
3072 | Maximum precision |
text-embedding-ada-002 |
1536 | Legacy OpenAI embeddings |
all-MiniLM-L6-v2 |
384 | Lightweight, fast |
BAAI/bge-base-en-v1.5 |
768 | Balanced quality and speed |
Cohere embed-english-v3.0 |
1024 | Long document support |
Inserting Vectors
Insert embeddings using the vector cast syntax:
INSERT INTO documents (title, content, embedding, source)
VALUES (
'PostgreSQL Performance Tuning',
'Optimize your PostgreSQL database with proper configuration...',
'[0.023, -0.045, 0.112, ...]'::vector(1536),
'blog'
);
For batch inserts, use multi-row syntax or COPY:
INSERT INTO documents (title, content, embedding, source) VALUES
('Article One', 'Content one...', '[0.1, 0.2, 0.3]'::vector(3), 'blog'),
('Article Two', 'Content two...', '[0.4, 0.5, 0.6]'::vector(3), 'docs'),
('Article Three', 'Content three...', '[0.7, 0.8, 0.9]'::vector(3), 'wiki');
Python: Inserting Embeddings with psycopg2
import json
import psycopg2
from psycopg2.extras import execute_values
def insert_documents(records: list[dict]) -> int:
"""Insert document records with embeddings. Returns count inserted."""
conn = psycopg2.connect(
host="localhost",
dbname="vectordb",
user="app_user",
password=os.environ["DB_PASSWORD"]
)
try:
with conn.cursor() as cur:
sql = """
INSERT INTO documents (title, content, embedding, source)
VALUES %s
"""
values = [
(
r["title"],
r["content"],
json.dumps(r["embedding"]), # pgvector accepts JSON array
r["source"]
)
for r in records
]
execute_values(cur, sql, values, template="(%s, %s, %s::vector(1536), %s)")
conn.commit()
return cur.rowcount
except psycopg2.Error as e:
conn.rollback()
print(f"[ERROR] Insert failed: {e}")
raise
finally:
conn.close()
# Usage
records = [
{
"title": "Vector Search Basics",
"content": "Understanding vector embeddings...",
"embedding": [0.023, -0.045, 0.112, 0.078, -0.091],
"source": "blog"
}
]
count = insert_documents(records)
print(f"[INFO] Inserted {count} documents")
Python: Using asyncpg
import asyncpg
import json
async def insert_documents_async(records: list[dict]) -> int:
"""Async insert of document records with embeddings."""
conn = await asyncpg.connect(
host="localhost",
database="vectordb",
user="app_user",
password=os.environ["DB_PASSWORD"]
)
try:
async with conn.transaction():
result = await conn.executemany(
"""
INSERT INTO documents (title, content, embedding, source)
VALUES ($1, $2, $3::vector(1536), $4)
""",
[
(r["title"], r["content"], json.dumps(r["embedding"]), r["source"])
for r in records
]
)
return len(records)
except asyncpg.PostgresError as e:
print(f"[ERROR] Async insert failed: {e}")
raise
finally:
await conn.close()
Go: Inserting Embeddings with pgx
package main
import (
"context"
"encoding/json"
"fmt"
"os"
"github.com/jackc/pgx/v5/pgxpool"
)
type Document struct {
Title string
Content string
Embedding []float32
Source string
}
func insertDocuments(ctx context.Context, pool *pgxpool.Pool, docs []Document) (int64, error) {
batch := &pgx.Batch{}
for _, doc := range docs {
embJSON, err := json.Marshal(doc.Embedding)
if err != nil {
return 0, fmt.Errorf("marshal embedding: %w", err)
}
sql := `INSERT INTO documents (title, content, embedding, source)
VALUES ($1, $2, $3::vector(1536), $4)`
batch.Queue(sql, doc.Title, doc.Content, string(embJSON), doc.Source)
}
br := pool.SendBatch(ctx, batch)
defer br.Close()
var total int64
for range docs {
_, err := br.Exec()
if err != nil {
return total, fmt.Errorf("batch exec: %w", err)
}
total++
}
return total, nil
}
Similarity Search Queries
pgvector provides three distance operators for similarity search:
| Operator | Distance Metric | Best For | Range |
|---|---|---|---|
<-> |
Euclidean (L2) | General-purpose | 0 to โ |
<=> |
Cosine | Semantic text search | 0 to 2 |
<#> |
Negative inner product | Maximum inner product | -โ to โ |
Choose the operator that matches the distance metric used during embedding model training. Most text embedding models use cosine similarity.
SQL: Cosine Similarity Search
SELECT
id,
title,
content,
1 - (embedding <=> '[0.023, -0.045, 0.112, ...]'::vector(1536)) AS similarity
FROM documents
ORDER BY embedding <=> '[0.023, -0.045, 0.112, ...]'::vector(1536)
LIMIT 10;
SQL: Euclidean Distance Search
SELECT
id,
title,
content,
embedding <-> '[0.023, -0.045, 0.112, ...]'::vector(1536) AS distance
FROM documents
ORDER BY embedding <-> '[0.023, -0.045, 0.112, ...]'::vector(1536)
LIMIT 5;
SQL: Inner Product Search
Maximum inner product search (MIPS) is useful when embeddings are normalized and you want to maximize similarity:
SELECT
id,
title,
content,
-(embedding <#> '[0.023, -0.045, 0.112, ...]'::vector(1536)) AS inner_product
FROM documents
ORDER BY embedding <#> '[0.023, -0.045, 0.112, ...]'::vector(1536)
LIMIT 10;
-- Note: <#> returns negative inner product, so ORDER BY returns largest first
Python: Querying with Similarity Search
def search_similar(query_vector: list[float], limit: int = 10, threshold: float = 0.7) -> list[dict]:
"""Search for documents by cosine similarity. Returns results above threshold."""
conn = psycopg2.connect(
host="localhost",
dbname="vectordb",
user="app_user",
password=os.environ["DB_PASSWORD"]
)
try:
with conn.cursor() as cur:
cur.execute(
"""
SELECT id, title, content, source,
1 - (embedding <=> %s::vector(1536)) AS similarity
FROM documents
WHERE 1 - (embedding <=> %s::vector(1536)) > %s
ORDER BY similarity DESC
LIMIT %s
""",
(json.dumps(query_vector), json.dumps(query_vector), threshold, limit)
)
columns = [desc[0] for desc in cur.description]
return [dict(zip(columns, row)) for row in cur.fetchall()]
except psycopg2.Error as e:
print(f"[ERROR] Search failed: {e}")
return []
finally:
conn.close()
# Usage
results = search_similar([0.023, -0.045, 0.112], limit=5, threshold=0.75)
for r in results:
print(f"[RESULT] {r['title']} โ similarity: {r['similarity']:.3f}")
Go: Querying with pgx
func searchSimilar(ctx context.Context, pool *pgxpool.Pool, queryVec []float32, limit int) ([]SearchResult, error) {
embJSON, err := json.Marshal(queryVec)
if err != nil {
return nil, fmt.Errorf("marshal query vector: %w", err)
}
rows, err := pool.Query(ctx, `
SELECT id, title, content, source,
1 - (embedding <=> $1::vector(1536)) AS similarity
FROM documents
ORDER BY embedding <=> $1::vector(1536)
LIMIT $2`, string(embJSON), limit)
if err != nil {
return nil, fmt.Errorf("query: %w", err)
}
defer rows.Close()
type SearchResult struct {
ID int64
Title string
Content string
Source string
Similarity float64
}
var results []SearchResult
for rows.Next() {
var r SearchResult
if err := rows.Scan(&r.ID, &r.Title, &r.Content, &r.Source, &r.Similarity); err != nil {
return nil, fmt.Errorf("scan: %w", err)
}
results = append(results, r)
}
return results, rows.Err()
}
Indexing Strategies
Vector indexes prevent full-table scans during similarity search. Without an index, PostgreSQL compares the query vector against every rowโacceptable for small datasets but unusable beyond tens of thousands of rows.
Creating an IVFFlat Index
IVFFlat (Inverted File with Flat Compression) partitions vectors into clusters. Queries search only the nearest clusters, reducing the comparison count.
CREATE INDEX idx_documents_embedding_ivf
ON documents
USING ivf (embedding vector_cosine_ops)
WITH (lists = 100);
The lists parameter controls the number of clusters. General guidance:
| Dataset Size | Recommended lists |
Build Time |
|---|---|---|
| < 100K rows | lists = 10-20 |
Seconds |
| 100Kโ1M rows | lists = 100 |
Minutes |
| 1Mโ10M rows | lists = 500-1000 |
10-30 minutes |
| 10M+ rows | lists = 2000-4000 |
Hours |
Adjust the probes parameter at query time to balance speed vs. recall:
SET ivfflat.probes = 10;
-- Default is 1; higher values search more clusters, improving recall
Performance impact of probes:
-- Low probes = fast but lower recall
SET ivfflat.probes = 5;
-- Balanced
SET ivfflat.probes = 20;
-- High recall (approaches brute-force quality)
SET ivfflat.probes = 100;
Creating an HNSW Index
HNSW (Hierarchical Navigable Small World) builds a multi-layer graph structure. It provides better recall than IVFFlat at the cost of larger index size and slower builds.
CREATE INDEX idx_documents_embedding_hnsw
ON documents
USING hnsw (embedding vector_cosine_ops)
WITH (m = 16, ef_construction = 200);
Parameters:
| Parameter | Default | Range | Effect |
|---|---|---|---|
m |
16 | 4-100 | Connections per node. Higher = better recall, larger index |
ef_construction |
200 | 100-800 | Search scope during build. Higher = better quality, slower build |
Set the ef parameter at query time:
SET hnsw.ef = 100;
-- Default is 40; higher values improve recall but increase latency
IVFFlat vs HNSW: Choosing the Right Index
| Factor | IVFFlat | HNSW |
|---|---|---|
| Query Speed | Fast (with enough probes) | Very fast |
| Recall Quality | Good (with enough probes) | Excellent |
| Index Size | Small (10-20% of data) | Large (2-4x of IVFFlat) |
| Build Time | Fast | Slow (5-10x IVFFlat) |
| Insert Performance | Fast (no rebuild needed) | Slow (graph restructuring) |
| Memory Usage | Low | High |
| Best For | Large datasets, write-heavy | High-recall, read-heavy |
Choose HNSW when query recall is critical and you have sufficient memory. Choose IVFFlat when building large indexes with limited resources or when new vectors arrive frequently.
Distance Operator Access Methods
Each index type requires the matching operator class:
-- For cosine similarity
USING ivf (embedding vector_cosine_ops)
USING hnsw (embedding vector_cosine_ops)
-- For Euclidean (L2) distance
USING ivf (embedding vector_l2_ops)
USING hnsw (embedding vector_l2_ops)
-- For inner product
USING ivf (embedding vector_ip_ops)
USING hnsw (embedding vector_ip_ops)
Hybrid Search: Combining Vector Similarity with Metadata Filtering
Real-world applications rarely rely on vector similarity alone. Most queries need to filter by metadataโby date, category, author, or sourceโwhile also finding semantically similar results.
SQL: Vector Search with Metadata Filtering
SELECT
id,
title,
1 - (embedding <=> '[0.023, -0.045, 0.112]'::vector(1536)) AS similarity,
source,
created_at
FROM documents
WHERE
source = 'blog'
AND created_at >= '2026-01-01'
AND metadata->>'language' = 'en'
ORDER BY embedding <=> '[0.023, -0.045, 0.112]'::vector(1536)
LIMIT 20;
Be aware that PostgreSQL evaluates the WHERE clause before the ORDER BY, so filters reduce the search space before similarity comparison. This is efficient when filters are selective.
SQL: Vector Search + Full-Text Search (Hybrid)
Combine semantic and keyword matching for maximum relevance:
WITH semantic AS (
SELECT id, title, content,
1 - (embedding <=> '[0.023, -0.045, 0.112]'::vector(1536)) AS score
FROM documents
ORDER BY embedding <=> '[0.023, -0.045, 0.112]'::vector(1536)
LIMIT 100
),
keyword AS (
SELECT id, title, content,
ts_rank(to_tsvector('english', content),
plainto_tsquery('english', 'postgresql performance tuning')) AS score
FROM documents
WHERE to_tsvector('english', content) @@ plainto_tsquery('english', 'postgresql performance tuning')
LIMIT 100
),
combined AS (
SELECT id, title, content,
0.7 * semantic.score + 0.3 * keyword.score AS score
FROM semantic
FULL OUTER JOIN keyword USING (id)
)
SELECT id, title, content, score
FROM combined
ORDER BY score DESC
LIMIT 10;
-- Weighted blend: 70% semantic relevance, 30% keyword match
Python: Hybrid Search with Metadata
def hybrid_search(
query_vector: list[float],
query_text: str,
source_filter: str | None = None,
limit: int = 20
) -> list[dict]:
"""Hybrid search combining vector similarity, full-text, and metadata filtering."""
conn = psycopg2.connect(
host="localhost",
dbname="vectordb",
user="app_user",
password=os.environ["DB_PASSWORD"]
)
try:
with conn.cursor() as cur:
sql = """
WITH semantic AS (
SELECT id, title, content, source, created_at,
1 - (embedding <=> %s::vector(1536)) AS vector_score
FROM documents
WHERE (%s IS NULL OR source = %s)
ORDER BY embedding <=> %s::vector(1536)
LIMIT 200
),
keyword AS (
SELECT id, title, content, source, created_at,
ts_rank(to_tsvector('english', content),
plainto_tsquery('english', %s)) AS text_score
FROM documents
WHERE to_tsvector('english', content) @@ plainto_tsquery('english', %s)
AND (%s IS NULL OR source = %s)
LIMIT 200
)
SELECT
COALESCE(s.id, k.id) AS id,
COALESCE(s.title, k.title) AS title,
COALESCE(s.content, k.content) AS content,
COALESCE(s.source, k.source) AS source,
0.6 * COALESCE(s.vector_score, 0) +
0.4 * COALESCE(k.text_score, 0) AS combined_score
FROM semantic s
FULL OUTER JOIN keyword k ON s.id = k.id
ORDER BY combined_score DESC
LIMIT %s
"""
vec_json = json.dumps(query_vector)
cur.execute(sql, (
vec_json, source_filter, source_filter,
vec_json,
query_text, query_text, source_filter, source_filter,
limit
))
columns = [desc[0] for desc in cur.description]
return [dict(zip(columns, row)) for row in cur.fetchall()]
except psycopg2.Error as e:
print(f"[ERROR] Hybrid search failed: {e}")
return []
finally:
conn.close()
Performance Tuning
Production pgvector deployments require deliberate tuning beyond default PostgreSQL settings.
PostgreSQL Configuration
Adjust these parameters in postgresql.conf:
# Memory for query sorting and index building
work_mem = '256MB' # Default: 4MB
maintenance_work_mem = '2GB' # Default: 64MB โ affects index creation
# Shared buffers for caching
shared_buffers = '8GB' # 25% of RAM for dedicated DB servers
# Effective cache size for query planning
effective_cache_size = '24GB' # 75% of RAM
# Parallel query workers โ vector searches benefit from parallelism
max_parallel_workers_per_gather = 4
max_parallel_workers = 8
max_worker_processes = 16
Python: Measuring Query Performance
import time
def benchmark_search(query_vector: list[float], iterations: int = 10) -> dict:
"""Benchmark vector search latency. Returns timing stats."""
latencies = []
conn = psycopg2.connect(
host="localhost",
dbname="vectordb",
user="app_user",
password=os.environ["DB_PASSWORD"]
)
try:
with conn.cursor() as cur:
for _ in range(iterations):
start = time.perf_counter()
cur.execute(
"""
SELECT id FROM documents
ORDER BY embedding <=> %s::vector(1536)
LIMIT 10
""",
(json.dumps(query_vector),)
)
cur.fetchall()
latencies.append(time.perf_counter() - start)
avg_ms = (sum(latencies) / len(latencies)) * 1000
return {
"avg_ms": round(avg_ms, 2),
"min_ms": round(min(latencies) * 1000, 2),
"max_ms": round(max(latencies) * 1000, 2),
"iterations": iterations
}
except psycopg2.Error as e:
print(f"[ERROR] Benchmark failed: {e}")
return {}
finally:
conn.close()
# Usage
stats = benchmark_search([0.023, -0.045, 0.112])
# Example output: {"avg_ms": 12.34, "min_ms": 8.91, "max_ms": 18.76, "iterations": 10}
print(f"[INFO] Average query latency: {stats['avg_ms']}ms")
Index Maintenance
Rebuild indexes periodically to maintain performance after many inserts or updates:
REINDEX INDEX idx_documents_embedding_hnsw;
-- Acquires an exclusive lock; schedule during maintenance windows
For zero-downtime rebuilds, create a new index concurrently, then drop the old one:
CREATE INDEX CONCURRENTLY idx_documents_embedding_hnsw_new
ON documents USING hnsw (embedding vector_cosine_ops)
WITH (m = 16, ef_construction = 200);
DROP INDEX CONCURRENTLY idx_documents_embedding_hnsw;
ALTER INDEX idx_documents_embedding_hnsw_new
RENAME TO idx_documents_embedding_hnsw;
Partitioning Large Datasets
For tables with hundreds of millions of vectors, partition by a natural attribute:
CREATE TABLE documents (
id BIGSERIAL,
title TEXT,
content TEXT,
embedding vector(1536),
source TEXT,
created_at TIMESTAMPTZ DEFAULT NOW()
) PARTITION BY LIST (source);
CREATE TABLE documents_blog PARTITION OF documents FOR VALUES IN ('blog');
CREATE TABLE documents_docs PARTITION OF documents FOR VALUES IN ('docs');
CREATE TABLE documents_wiki PARTITION OF documents FOR VALUES IN ('wiki');
-- Create indexes on each partition separately
CREATE INDEX idx_documents_blog_embedding
ON documents_blog USING hnsw (embedding vector_cosine_ops)
WITH (m = 16, ef_construction = 200);
CREATE INDEX idx_documents_docs_embedding
ON documents_docs USING hnsw (embedding vector_cosine_ops)
WITH (m = 16, ef_construction = 200);
Queries that filter on the partition key scan only the relevant partition, reducing the search space dramatically.
Production Considerations
Monitoring
Track these metrics for vector workloads:
-- Index size monitoring
SELECT
indexname,
pg_size_pretty(pg_relation_size(indexrelid)) AS index_size
FROM pg_stat_user_indexes
WHERE indexname LIKE '%embedding%';
-- Query latency for vector searches
SELECT
query,
calls,
mean_exec_time::numeric(10,2) AS avg_ms,
total_exec_time::numeric(10,2) AS total_ms
FROM pg_stat_statements
WHERE query LIKE '%<=>%'
ORDER BY mean_exec_time DESC;
Connection Pooling
Vector search queries benefit from connection pooling. Use PgBouncer or the application-level pooling in pgx and asyncpg:
# asyncpg connection pool
pool = await asyncpg.create_pool(
host="localhost",
database="vectordb",
user="app_user",
password=os.environ["DB_PASSWORD"],
min_size=5,
max_size=20
)
Backup Considerations
Standard PostgreSQL backup tools work with vector columns, but note that vector indexes are large:
# pg_dump with custom format for compression
$ pg_dump -Fc vectordb > vectordb_backup.dump
# Size estimate for vector-heavy tables
$ psql -d vectordb -c "
SELECT pg_size_pretty(pg_total_relation_size('documents')) AS total_size,
pg_size_pretty(pg_indexes_size('documents')) AS index_size;
"
For large vector datasets, physical backups (pg_basebackup, WAL archiving) are faster than logical dumps.
Building a RAG Pipeline with pgvector
A complete RAG pipeline stores document embeddings, searches them with user queries, and feeds results to an LLM:
import openai
import psycopg2
import json
def rag_query(user_query: str, openai_client: openai.Client, limit: int = 5) -> str:
"""Retrieve relevant documents and generate an LLM response."""
# Step 1: Generate query embedding
response = openai_client.embeddings.create(
model="text-embedding-3-small",
input=user_query
)
query_vector = response.data[0].embedding
# Step 2: Search for relevant documents
conn = psycopg2.connect(
host="localhost",
dbname="vectordb",
user="app_user",
password=os.environ["DB_PASSWORD"]
)
try:
with conn.cursor() as cur:
cur.execute(
"""
SELECT title, content,
1 - (embedding <=> %s::vector(1536)) AS similarity
FROM documents
ORDER BY embedding <=> %s::vector(1536)
LIMIT %s
""",
(json.dumps(query_vector), json.dumps(query_vector), limit)
)
documents = cur.fetchall()
except psycopg2.Error as e:
print(f"[ERROR] RAG search failed: {e}")
return "An error occurred during search."
finally:
conn.close()
if not documents:
return "No relevant documents found."
# Step 3: Build context and generate response
context = "\n\n".join([
f"Title: {doc[0]}\nContent: {doc[1]}"
for doc in documents
])
completion = openai_client.chat.completions.create(
model="gpt-4o",
messages=[
{
"role": "system",
"content": "Answer the user's question using the provided context."
},
{
"role": "user",
"content": f"Context:\n{context}\n\nQuestion: {user_query}"
}
]
)
return completion.choices[0].message.content
Comparison with Alternative Vector Databases
pgvector is not the only vector search option. Here is how it compares:
| Factor | pgvector | Pinecone | Weaviate | Milvus |
|---|---|---|---|---|
| Database ops | Existing PostgreSQL expertise | New platform | New platform | New platform |
| Index types | IVF, HNSW | HNSW (managed) | HNSW | IVF, HNSW, DiskANN |
| Hybrid search | Native SQL | Limited | Native | Limited |
| Transaction support | ACID | None | Limited | Limited |
| Scaling | Read replicas, partitioning | Auto-scaling | Sharding | Sharding |
| Operational cost | Existing DB infrastructure | Per-vector pricing | Self-hosted or cloud | Self-hosted or Zilliz |
Choose pgvector when you want to minimize infrastructure complexity and already use PostgreSQL. Dedicated vector databases make sense for very large scale (billions of vectors) or specialized requirements like real-time streaming ingestion.
Common Pitfalls and How to Avoid Them
Wrong operator class: Using vector_cosine_ops with Euclidean queries produces correct results but misses the index. Match the operator class to your query operator.
Missing index rebuild: After bulk inserting millions of vectors, IVFFlat indexes degrade. Rebuild or use lists proportional to the final dataset size.
Ignoring work_mem: The default 4MB causes disk spilling for large vector sorts. Increase work_mem to at least 256MB for production workloads.
Wrong distance metric: Using cosine distance on embeddings trained with inner product loss produces poor results. Verify your embedding model’s expected distance metric.
No metadata filtering: Without filters, vector search on a 10M-row table searches all vectors. Add WHERE clauses on frequently filtered columns to improve latency.
Conclusion
PostgreSQL with pgvector provides a capable vector search solution that leverages existing PostgreSQL infrastructure. The extension adds vector data types, similarity operators, and efficient indexes to the world’s most popular open-source database. For teams building AI applications with existing PostgreSQL deploymentsโor teams that prefer to minimize infrastructure complexityโpgvector offers an attractive path to vector search.
The key to successful pgvector deployment is understanding the trade-offs between index types, query parameters, and performance characteristics. IVF indexes provide good performance for large datasets with moderate recall requirements. HNSW indexes provide the highest recall at the cost of larger indexes and slower builds. The choice depends on specific requirements for recall, latency, and resource usage.
Production deployments require attention to indexing strategies, query optimization, and operational concerns. The investment in proper configuration and tuning pays dividends in query performance and system reliability. For many AI applications, pgvector provides the vector capabilities needed without the complexity of separate vector database deployments.
Resources
- pgvector GitHub Repository
- pgvector Documentation
- PostgreSQL Vector Search Guide
- HNSW Algorithm Paper โ Original research on Hierarchical Navigable Small World graphs
- pgvector Performance Benchmarks โ Official benchmark suite
- PostgreSQL Full-Text Search Documentation โ For hybrid search implementations
Comments