PostgreSQL Vector Search: Complete Guide 2026

Introduction

As AI applications proliferate, the need to store and search embeddings — vector representations of text, images, and other data — has become critical. While specialized vector databases have emerged, many organizations are finding that PostgreSQL with the pgvector extension provides the simplest path to vector search capabilities.

In 2026, pgvector has matured significantly, offering production-ready vector operations alongside PostgreSQL’s proven relational capabilities. This guide covers implementing vector search in PostgreSQL, from installation to production deployment.

Understanding Vector Search

What are Embeddings?

Embeddings are numerical representations of data that capture semantic meaning:

# Example: Text to embedding
text = "The cat sat on the mat"
embedding = [0.12, -0.34, 0.78, ...]  # 1536 dimensions for text-embedding-3-small

# Similar texts have similar embeddings
similar_text = "A feline rested on the rug"
similar_embedding = [0.11, -0.33, 0.79, ...]  # Very similar!

different_text = "Space rockets fly to Mars"
different_embedding = [-0.45, 0.67, -0.12, ...]  # Very different!

Why PostgreSQL for Vectors?

Single Database: Store vectors alongside your existing data
Mature Features: ACID compliance, replication, backups
Familiar Tools: Use existing PostgreSQL knowledge
Performance: Good enough for most use cases
Simplicity: One system to learn and maintain

Setting Up pgvector

Installation

# Ubuntu/Debian
curl -fsSL https://www.postgresql.org/media/keys/ACCC4CF8.asc | sudo gpg --dearmor -o /usr/share/keyrings/postgresql-keyring.gpg
echo "deb [signed-by=/usr/share/keyrings/postgresql-keyring.gpg] http://apt.postgresql.org/pub/repos/apt $(lsb_release -cs)-pgdg main" | sudo tee /etc/apt/sources.list.d/pgdg.list
sudo apt update
sudo apt install postgresql-16-pgvector

# Docker
docker run -d \
  -e POSTGRES_PASSWORD=password \
  -e POSTGRES_DB=vectors \
  pgvector/pgvector:pg16

Enable Extension

-- Create extension
CREATE EXTENSION IF NOT EXISTS vector;

-- Verify
SELECT * FROM pg_extension WHERE extname = 'vector';

Creating Vector Columns

Table Definition

-- Simple documents table with vectors
CREATE TABLE documents (
    id BIGSERIAL PRIMARY KEY,
    title TEXT NOT NULL,
    content TEXT NOT NULL,
    embedding vector(1536),  -- OpenAI ada-002 dimension
    metadata JSONB,
    created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
    updated_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP
);

-- Create index for fast similarity search
CREATE INDEX ON documents USING ivfflat (embedding vector_cosine_ops)
WITH (lists = 100);

-- Or for exact search (slower but accurate)
CREATE INDEX ON documents USING hnsw (embedding vector_cosine_ops);

Embedding Dimensions

Different models have different dimensions:

Model	Dimensions
text-embedding-3-small	1536
text-embedding-3-large	3072
text-embedding-ada-002	1536
Claude (Anthropic)	1536
sentence-transformers/all-MiniLM-L6-v2	384

Inserting Embeddings

Python Example

import psycopg2
from openai import OpenAI

client = OpenAI()

def create_documents_table(conn):
    with conn.cursor() as cur:
        cur.execute("""
            CREATE TABLE IF NOT EXISTS documents (
                id BIGSERIAL PRIMARY KEY,
                title TEXT NOT NULL,
                content TEXT NOT NULL,
                embedding vector(1536),
                metadata JSONB,
                created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP
            )
        """)
        
        # Create index
        cur.execute("""
            CREATE INDEX IF NOT EXISTS idx_documents_embedding 
            ON documents USING ivfflat (embedding vector_cosine_ops)
            WITH (lists = 100)
        """)
        
        conn.commit()

def generate_embedding(text: str) -> list:
    response = client.embeddings.create(
        model="text-embedding-3-small",
        input=text
    )
    return response.data[0].embedding

def insert_document(conn, title: str, content: str, metadata: dict = None):
    embedding = generate_embedding(content)
    
    with conn.cursor() as cur:
        cur.execute("""
            INSERT INTO documents (title, content, embedding, metadata)
            VALUES (%s, %s, %s, %s)
            RETURNING id
        """, (title, content, embedding, metadata))
        
        doc_id = cur.fetchone()[0]
        conn.commit()
    
    return doc_id

Batch Insertion

def insert_documents_batch(conn, documents: list[dict]):
    # Generate embeddings in batch
    texts = [doc["content"] for doc in documents]
    
    response = client.embeddings.create(
        model="text-embedding-3-small",
        input=texts
    )
    
    embeddings = [item.embedding for item in response.data]
    
    with conn.cursor() as cur:
        for doc, embedding in zip(documents, embeddings):
            cur.execute("""
                INSERT INTO documents (title, content, embedding, metadata)
                VALUES (%s, %s, %s, %s)
            """, (doc["title"], doc["content"], embedding, doc.get("metadata")))
        
        conn.commit()

Similarity Search

Basic Similarity Queries

-- Cosine similarity (most common)
SELECT id, title, 
       1 - (embedding <=> query_embedding) AS similarity
FROM documents
ORDER BY embedding <=> query_embedding
LIMIT 5;

-- Euclidean distance
SELECT id, title,
       embedding <-> query_embedding AS distance
FROM documents
ORDER BY embedding <-> query_embedding
LIMIT 5;

-- Inner product (for normalized vectors)
SELECT id, title,
       embedding <#> query_embedding AS similarity
FROM documents
ORDER BY embedding <#> query_embedding
LIMIT 5;

Filtered Search

-- Search with filters
SELECT id, title, 
       1 - (embedding <=> query_embedding) AS similarity
FROM documents
WHERE metadata->>'category' = 'technology'
ORDER BY embedding <=> query_embedding
LIMIT 10;

-- Date-based filtering
SELECT id, title,
       1 - (embedding <=> query_embedding) AS similarity
FROM documents
WHERE created_at > NOW() - INTERVAL '30 days'
ORDER BY embedding <=> query_embedding
LIMIT 10;

Python Implementation

def semantic_search(conn, query: str, limit: int = 5, filters: dict = None):
    # Generate query embedding
    query_embedding = generate_embedding(query)
    
    # Build query
    sql = """
        SELECT id, title, content,
               1 - (embedding <=> %s) AS similarity
        FROM documents
    """
    params = [query_embedding]
    
    # Add filters
    if filters:
        conditions = []
        for key, value in filters.items():
            conditions.append(f"metadata->>'{key}' = %s")
            params.append(value)
        
        sql += " WHERE " + " AND ".join(conditions)
    
    sql += f" ORDER BY embedding <=> %s LIMIT {limit}"
    params.append(query_embedding)
    
    with conn.cursor() as cur:
        cur.execute(sql, params)
        results = cur.fetchall()
    
    return [
        {
            "id": row[0],
            "title": row[1],
            "content": row[2],
            "similarity": row[3]
        }
        for row in results
    ]

Hybrid Search

Combine vector and text search:

-- Hybrid search: combine vector and keyword
SELECT id, title,
       1 - (embedding <=> query_embedding) AS vector_sim,
       ts_rank(to_tsvector(content), plainto_tsquery('english', %s)) AS text_rank
FROM documents
WHERE to_tsvector('english', content) @@ plainto_tsquery('english', %s)
   OR embedding <=> query_embedding < 0.5
ORDER BY (vector_sim * 0.7 + text_rank * 0.3) DESC
LIMIT 10;

def hybrid_search(conn, query: str, limit: int = 5):
    query_embedding = generate_embedding(query)
    
    sql = """
        SELECT id, title, content,
               1 - (embedding <=> %s) AS vector_sim,
               ts_rank(to_tsvector('english', content), 
                       plainto_tsquery('english', %s)) AS text_rank
        FROM documents
        WHERE to_tsvector('english', content) @@ plainto_tsquery('english', %s)
           OR embedding <=> %s < 0.8
        ORDER BY (vector_sim * 0.7 + text_rank * 0.3) DESC
        LIMIT %s
    """
    
    params = [
        query_embedding, query, query, 
        query_embedding, limit
    ]
    
    with conn.cursor() as cur:
        cur.execute(sql, params)
        return cur.fetchall()

Building RAG Systems

Complete RAG Implementation

class PostgresRAG:
    def __init__(self, conn, openai_client):
        self.conn = conn
        self.client = openai_client
    
    def index_documents(self, documents: list[dict]):
        """Index documents into the database."""
        insert_documents_batch(self.conn, documents)
    
    def retrieve(self, query: str, top_k: int = 5) -> list[dict]:
        """Retrieve relevant documents for a query."""
        results = semantic_search(self.conn, query, limit=top_k)
        return results
    
    def answer(self, question: str, context_limit: int = 5) -> str:
        """Answer a question using RAG."""
        # Retrieve relevant context
        context_docs = self.retrieve(question, top_k=context_limit)
        
        # Build context string
        context = "\n\n".join([
            f"Document {i+1}: {doc['content'][:500]}..."
            for i, doc in enumerate(context_docs)
        ])
        
        # Generate answer
        response = self.client.chat.completions.create(
            model="gpt-4o",
            messages=[
                {
                    "role": "system",
                    "content": "You are a helpful assistant. Use the provided context to answer questions accurately."
                },
                {
                    "role": "user",
                    "content": f"Context:\n{context}\n\nQuestion: {question}\n\nAnswer:"
                }
            ],
            temperature=0.7
        )
        
        return {
            "answer": response.choices[0].message.content,
            "sources": [
                {"id": doc["id"], "title": doc["title"], "similarity": doc["similarity"]}
                for doc in context_docs
            ]
        }

Handling Updates

def update_document_embedding(conn, doc_id: int, new_content: str):
    """Update document and regenerate embedding."""
    new_embedding = generate_embedding(new_content)
    
    with conn.cursor() as cur:
        cur.execute("""
            UPDATE documents 
            SET content = %s, 
                embedding = %s,
                updated_at = CURRENT_TIMESTAMP
            WHERE id = %s
        """, (new_content, new_embedding, doc_id))
        
        conn.commit()

def delete_document(conn, doc_id: int):
    """Delete a document."""
    with conn.cursor() as cur:
        cur.execute("DELETE FROM documents WHERE id = %s", (doc_id,))
        conn.commit()

Performance Optimization

Index Selection

-- IVFFlat index (faster build, slower query, approximate)
CREATE INDEX ON documents USING ivfflat (embedding vector_cosine_ops)
WITH (lists = 100);  -- More lists = better accuracy, slower build

-- HNSW index (slower build, faster query, better accuracy)
CREATE INDEX ON documents USING hnsw (embedding vector_cosine_ops)
WITH (m = 16, ef_construction = 64);

Query Optimization

-- Reindex to maintain index quality
REINDEX INDEX idx_documents_embedding;

-- Analyze for query planner
ANALYZE documents;

-- Check index usage
EXPLAIN ANALYZE
SELECT id, title
FROM documents
ORDER BY embedding <=> (SELECT embedding FROM documents WHERE id = 1)
LIMIT 5;

Partitioning

-- Partition by date for large datasets
CREATE TABLE documents (
    id BIGSERIAL,
    embedding vector(1536),
    created_at DATE
) PARTITION BY RANGE (created_at);

-- Create partitions
CREATE TABLE documents_2026_01 PARTITION OF documents
    FOR VALUES FROM ('2026-01-01') TO ('2026-02-01');

CREATE TABLE documents_2026_02 PARTITION OF documents
    FOR VALUES FROM ('2026-02-01') TO ('2026-03-01');

Production Considerations

Connection Pooling

from psycopg2 import pool

# Connection pool
connection_pool = pool.ThreadedConnectionPool(
    minconn=5,
    maxconn=20,
    host="localhost",
    database="vectors",
    user="postgres",
    password="password"
)

def get_connection():
    return connection_pool.getconn()

def return_connection(conn):
    connection_pool.putconn(conn)

Monitoring

-- Check index size
SELECT 
    pg_size_pretty(pg_relation_size('documents')) AS table_size,
    pg_size_pretty(pg_relation_size('documents_embedding_idx')) AS index_size;

-- Query performance
SELECT 
    query,
    calls,
    mean_exec_time,
    total_exec_time
FROM pg_stat_statements
WHERE query LIKE '%embedding%'
ORDER BY total_exec_time DESC;

-- Index usage
SELECT 
    indexrelname,
    idx_scan,
    idx_tup_read,
    idx_tup_fetch
FROM pg_stat_user_indexes
WHERE indexrelname LIKE '%embedding%';

Backup and Recovery

# Backup with pg_dump
pg_dump -h localhost -U postgres -Fc vectors > vectors_backup.dump

# Restore
pg_restore -h localhost -U postgres -d vectors vectors_backup.dump

Advanced Patterns

Multi-tenant Vector Search

-- Per-tenant vector tables
CREATE TABLE tenant_documents (
    id BIGSERIAL,
    tenant_id UUID NOT NULL,
    title TEXT,
    embedding vector(1536),
    PRIMARY KEY (tenant_id, id)
);

CREATE INDEX ON tenant_documents (tenant_id, embedding vector_cosine_ops);

-- Query for specific tenant
SELECT * FROM tenant_documents
WHERE tenant_id = %s
ORDER BY embedding <=> %s
LIMIT 10;

Metadata Filtering Best Practices

-- Efficient JSONB queries
CREATE INDEX idx_documents_metadata ON documents USING gin (metadata);

-- Use containment for filtering
SELECT * FROM documents
WHERE metadata @> '{"category": "tech"}'
ORDER BY embedding <=> %s
LIMIT 10;

Combining Multiple Embeddings

-- Store multiple vector types
ALTER TABLE documents ADD COLUMN title_embedding vector(384);
ALTER TABLE documents ADD COLUMN image_embedding vector(512);

-- Query different embeddings
SELECT id, 
       embedding <=> %s AS doc_sim,
       title_embedding <=> %s AS title_sim,
       image_embedding <=> %s AS image_sim
FROM documents
ORDER BY (doc_sim + title_sim + image_sim) / 3
LIMIT 10;

Migration from Other Databases

Import from Pinecone

def import_from_pinecone(pinecone_index, conn):
    # Fetch all vectors from Pinecone
    while True:
        response = pinecone_index.list(
            limit=1000,
            pagination_token=pagination_token
        )
        
        # Insert into PostgreSQL
        documents = []
        for item in response["vectors"]:
            documents.append({
                "id": item["id"],
                "content": item["metadata"].get("text", ""),
                "embedding": item["values"],
                "metadata": {k: v for k, v in item["metadata"].items() if k != "text"}
            })
        
        insert_documents_batch(conn, documents)
        
        if not response.get("pagination_token"):
            break

Import from Weaviate

def import_from_weaviate(weaviate_client, class_name: str, conn):
    # Query all objects
    result = weaviate_client.query.get(
        class_name,
        ["content", "embedding"]
    ).do()
    
    documents = [
        {
            "content": obj["content"],
            "embedding": obj["embedding"],
            "metadata": {k: v for k, v in obj.items() 
                        if k not in ["content", "embedding"]}
        }
        for obj in result["data"]["Get"][class_name]
    ]
    
    insert_documents_batch(conn, documents)

Conclusion

PostgreSQL with pgvector provides an excellent foundation for AI applications requiring vector search. By combining vectors with your existing relational data, you simplify your architecture without sacrificing capability.

Start with a simple implementation: add the extension, create a column, generate some embeddings, and try a similarity search. The familiar SQL interface makes it easy to add filtering, combine with text search, and scale as needed.

For most applications, PostgreSQL’s vector capabilities are more than sufficient. Reserve specialized vector databases for extreme scale or specialized requirements.