PostgreSQL Vector Search with pgvector: Complete Guide 2026

Introduction

PostgreSQL with pgvector has become the go-to solution for teams building AI applications that want vector capabilities without deploying separate vector databases. The combination leverages PostgreSQL’s reliability, ecosystem, and operational familiarity while adding vector similarity search. In 2026, pgvector powers production RAG systems, recommendation engines, and semantic search across organizations of all sizes.

The appeal is straightforward: many organizations already run PostgreSQL. Adding vector capabilities through pgvector requires no new infrastructure, no new operational expertise, and no new monitoring systems. For teams building AI applications with existing PostgreSQL deployments—or teams that prefer to minimize infrastructure complexity—pgvector provides an attractive path to vector search.

What Is Vector Search and Why Does It Matter?

Vector search finds items by semantic meaning rather than exact keyword matches. Machine learning models convert text, images, or audio into embedding vectors—arrays of floating-point numbers that represent semantic content. Similar items cluster together in vector space, so calculating the distance between vectors reveals semantic similarity.

Traditional databases rely on exact matches and pattern matching (LIKE, full-text search). These approaches miss synonyms, paraphrases, and conceptually related content. A search for “canine companions” won’t match “dog training” in a keyword system, but vector search captures the semantic relationship.

pgvector brings this capability directly into PostgreSQL. You store embeddings alongside relational data, run vector similarity queries with standard SQL, and combine vector search with PostgreSQL’s filtering, joins, and full-text search. This unified approach eliminates the need for a separate vector database, reducing infrastructure complexity and operational overhead.

Prerequisites

Before installing pgvector, ensure you have:

PostgreSQL 12 or later (PostgreSQL 16+ recommended for best performance)
A C compiler and build tools (for source installations)
Sufficient memory for index building (at least 2GB for production workloads)
Database superuser or CREATEEXT privilege to install extensions

Installing pgvector

pgvector installation varies by deployment method. Choose the approach that matches your environment.

Linux (Debian/Ubuntu) via APT

$ sudo apt update
$ sudo apt install postgresql-16-pgvector
# => Installs pgvector for PostgreSQL 16

For other PostgreSQL versions, match the package version:

$ sudo apt install postgresql-15-pgvector
$ sudo apt install postgresql-14-pgvector

macOS via Homebrew

$ brew install pgvector
$ brew services restart postgresql

Docker

PostgreSQL containers require pgvector installed at build time. Use the official pgvector image or install it in a custom Dockerfile:

FROM postgres:16

RUN apt-get update && \
    apt-get install -y postgresql-16-pgvector && \
    rm -rf /var/lib/apt/lists/*

Build and run:

$ docker build -t postgres-pgvector .
$ docker run -e POSTGRES_PASSWORD=secret -d postgres-pgvector

Build from Source

When package managers don’t provide pgvector, build from source:

$ git clone https://github.com/pgvector/pgvector.git
$ cd pgvector
$ make
$ sudo make install
# => Installs the extension files to the PostgreSQL extension directory

If PostgreSQL is installed from a non-default location, specify the pg_config path:

$ make PG_CONFIG=/usr/local/pgsql/bin/pg_config
$ sudo make PG_CONFIG=/usr/local/pgsql/bin/pg_config install

Cloud PostgreSQL Services

Service	Enabling pgvector
AWS RDS for PostgreSQL	`CREATE EXTENSION vector;` after enabling in parameter group
Google Cloud SQL	`CREATE EXTENSION vector;` — available by default
Azure Database for PostgreSQL	Enabled via server parameters or `CREATE EXTENSION`
Supabase	Available by default
Neon	Available by default

Enabling the Extension

Connect to your database and create the extension:

CREATE EXTENSION IF NOT EXISTS vector;

Verify the installation:

SELECT extname, extversion FROM pg_extension WHERE extname = 'vector';
-- Returns: vector | 0.8.0 (or later)

Creating Vector Columns

Vector columns store embeddings as arrays of floating-point numbers. Specify the dimension when creating the column—it must match your embedding model’s output size.

SQL: Creating a Table with Vector Columns

CREATE TABLE documents (
    id BIGSERIAL PRIMARY KEY,
    title TEXT NOT NULL,
    content TEXT NOT NULL,
    embedding vector(1536),        -- OpenAI text-embedding-3-small
    summary_embedding vector(384), -- A smaller model for quick similarity
    source TEXT,
    created_at TIMESTAMPTZ DEFAULT NOW(),
    metadata JSONB
);

Common embedding dimensions by model:

Model	Dimensions	Use Case
`text-embedding-3-small`	1536	General-purpose text embeddings
`text-embedding-3-large`	3072	Maximum precision
`text-embedding-ada-002`	1536	Legacy OpenAI embeddings
`all-MiniLM-L6-v2`	384	Lightweight, fast
`BAAI/bge-base-en-v1.5`	768	Balanced quality and speed
`Cohere embed-english-v3.0`	1024	Long document support

Inserting Vectors

Insert embeddings using the vector cast syntax:

INSERT INTO documents (title, content, embedding, source)
VALUES (
    'PostgreSQL Performance Tuning',
    'Optimize your PostgreSQL database with proper configuration...',
    '[0.023, -0.045, 0.112, ...]'::vector(1536),
    'blog'
);

For batch inserts, use multi-row syntax or COPY:

INSERT INTO documents (title, content, embedding, source) VALUES
    ('Article One', 'Content one...', '[0.1, 0.2, 0.3]'::vector(3), 'blog'),
    ('Article Two', 'Content two...', '[0.4, 0.5, 0.6]'::vector(3), 'docs'),
    ('Article Three', 'Content three...', '[0.7, 0.8, 0.9]'::vector(3), 'wiki');

Python: Inserting Embeddings with psycopg2

import json
import psycopg2
from psycopg2.extras import execute_values

def insert_documents(records: list[dict]) -> int:
    """Insert document records with embeddings. Returns count inserted."""
    conn = psycopg2.connect(
        host="localhost",
        dbname="vectordb",
        user="app_user",
        password=os.environ["DB_PASSWORD"]
    )
    try:
        with conn.cursor() as cur:
            sql = """
                INSERT INTO documents (title, content, embedding, source)
                VALUES %s
            """
            values = [
                (
                    r["title"],
                    r["content"],
                    json.dumps(r["embedding"]),  # pgvector accepts JSON array
                    r["source"]
                )
                for r in records
            ]
            execute_values(cur, sql, values, template="(%s, %s, %s::vector(1536), %s)")
            conn.commit()
            return cur.rowcount
    except psycopg2.Error as e:
        conn.rollback()
        print(f"[ERROR] Insert failed: {e}")
        raise
    finally:
        conn.close()

# Usage
records = [
    {
        "title": "Vector Search Basics",
        "content": "Understanding vector embeddings...",
        "embedding": [0.023, -0.045, 0.112, 0.078, -0.091],
        "source": "blog"
    }
]
count = insert_documents(records)
print(f"[INFO] Inserted {count} documents")

Python: Using asyncpg

import asyncpg
import json

async def insert_documents_async(records: list[dict]) -> int:
    """Async insert of document records with embeddings."""
    conn = await asyncpg.connect(
        host="localhost",
        database="vectordb",
        user="app_user",
        password=os.environ["DB_PASSWORD"]
    )
    try:
        async with conn.transaction():
            result = await conn.executemany(
                """
                INSERT INTO documents (title, content, embedding, source)
                VALUES ($1, $2, $3::vector(1536), $4)
                """,
                [
                    (r["title"], r["content"], json.dumps(r["embedding"]), r["source"])
                    for r in records
                ]
            )
        return len(records)
    except asyncpg.PostgresError as e:
        print(f"[ERROR] Async insert failed: {e}")
        raise
    finally:
        await conn.close()

Go: Inserting Embeddings with pgx

package main

import (
    "context"
    "encoding/json"
    "fmt"
    "os"

    "github.com/jackc/pgx/v5/pgxpool"
)

type Document struct {
    Title     string
    Content   string
    Embedding []float32
    Source    string
}

func insertDocuments(ctx context.Context, pool *pgxpool.Pool, docs []Document) (int64, error) {
    batch := &pgx.Batch{}
    for _, doc := range docs {
        embJSON, err := json.Marshal(doc.Embedding)
        if err != nil {
            return 0, fmt.Errorf("marshal embedding: %w", err)
        }
        sql := `INSERT INTO documents (title, content, embedding, source)
                VALUES ($1, $2, $3::vector(1536), $4)`
        batch.Queue(sql, doc.Title, doc.Content, string(embJSON), doc.Source)
    }

    br := pool.SendBatch(ctx, batch)
    defer br.Close()

    var total int64
    for range docs {
        _, err := br.Exec()
        if err != nil {
            return total, fmt.Errorf("batch exec: %w", err)
        }
        total++
    }
    return total, nil
}

Similarity Search Queries

pgvector provides three distance operators for similarity search:

Operator	Distance Metric	Best For	Range
`<->`	Euclidean (L2)	General-purpose	0 to ∞
`<=>`	Cosine	Semantic text search	0 to 2
`<#>`	Negative inner product	Maximum inner product	-∞ to ∞

Choose the operator that matches the distance metric used during embedding model training. Most text embedding models use cosine similarity.

SQL: Cosine Similarity Search

SELECT
    id,
    title,
    content,
    1 - (embedding <=> '[0.023, -0.045, 0.112, ...]'::vector(1536)) AS similarity
FROM documents
ORDER BY embedding <=> '[0.023, -0.045, 0.112, ...]'::vector(1536)
LIMIT 10;

SQL: Euclidean Distance Search

SELECT
    id,
    title,
    content,
    embedding <-> '[0.023, -0.045, 0.112, ...]'::vector(1536) AS distance
FROM documents
ORDER BY embedding <-> '[0.023, -0.045, 0.112, ...]'::vector(1536)
LIMIT 5;

SQL: Inner Product Search

Maximum inner product search (MIPS) is useful when embeddings are normalized and you want to maximize similarity:

SELECT
    id,
    title,
    content,
    -(embedding <#> '[0.023, -0.045, 0.112, ...]'::vector(1536)) AS inner_product
FROM documents
ORDER BY embedding <#> '[0.023, -0.045, 0.112, ...]'::vector(1536)
LIMIT 10;
-- Note: <#> returns negative inner product, so ORDER BY returns largest first

Python: Querying with Similarity Search

def search_similar(query_vector: list[float], limit: int = 10, threshold: float = 0.7) -> list[dict]:
    """Search for documents by cosine similarity. Returns results above threshold."""
    conn = psycopg2.connect(
        host="localhost",
        dbname="vectordb",
        user="app_user",
        password=os.environ["DB_PASSWORD"]
    )
    try:
        with conn.cursor() as cur:
            cur.execute(
                """
                SELECT id, title, content, source,
                       1 - (embedding <=> %s::vector(1536)) AS similarity
                FROM documents
                WHERE 1 - (embedding <=> %s::vector(1536)) > %s
                ORDER BY similarity DESC
                LIMIT %s
                """,
                (json.dumps(query_vector), json.dumps(query_vector), threshold, limit)
            )
            columns = [desc[0] for desc in cur.description]
            return [dict(zip(columns, row)) for row in cur.fetchall()]
    except psycopg2.Error as e:
        print(f"[ERROR] Search failed: {e}")
        return []
    finally:
        conn.close()

# Usage
results = search_similar([0.023, -0.045, 0.112], limit=5, threshold=0.75)
for r in results:
    print(f"[RESULT] {r['title']} — similarity: {r['similarity']:.3f}")

Go: Querying with pgx

func searchSimilar(ctx context.Context, pool *pgxpool.Pool, queryVec []float32, limit int) ([]SearchResult, error) {
    embJSON, err := json.Marshal(queryVec)
    if err != nil {
        return nil, fmt.Errorf("marshal query vector: %w", err)
    }

    rows, err := pool.Query(ctx, `
        SELECT id, title, content, source,
               1 - (embedding <=> $1::vector(1536)) AS similarity
        FROM documents
        ORDER BY embedding <=> $1::vector(1536)
        LIMIT $2`, string(embJSON), limit)
    if err != nil {
        return nil, fmt.Errorf("query: %w", err)
    }
    defer rows.Close()

    type SearchResult struct {
        ID         int64
        Title      string
        Content    string
        Source     string
        Similarity float64
    }

    var results []SearchResult
    for rows.Next() {
        var r SearchResult
        if err := rows.Scan(&r.ID, &r.Title, &r.Content, &r.Source, &r.Similarity); err != nil {
            return nil, fmt.Errorf("scan: %w", err)
        }
        results = append(results, r)
    }
    return results, rows.Err()
}

Indexing Strategies

Vector indexes prevent full-table scans during similarity search. Without an index, PostgreSQL compares the query vector against every row—acceptable for small datasets but unusable beyond tens of thousands of rows.

Creating an IVFFlat Index

IVFFlat (Inverted File with Flat Compression) partitions vectors into clusters. Queries search only the nearest clusters, reducing the comparison count.

CREATE INDEX idx_documents_embedding_ivf
    ON documents
    USING ivf (embedding vector_cosine_ops)
    WITH (lists = 100);

The lists parameter controls the number of clusters. General guidance:

Dataset Size	Recommended `lists`	Build Time
< 100K rows	`lists = 10-20`	Seconds
100K–1M rows	`lists = 100`	Minutes
1M–10M rows	`lists = 500-1000`	10-30 minutes
10M+ rows	`lists = 2000-4000`	Hours

Adjust the probes parameter at query time to balance speed vs. recall:

SET ivfflat.probes = 10;
-- Default is 1; higher values search more clusters, improving recall

Performance impact of probes:

-- Low probes = fast but lower recall
SET ivfflat.probes = 5;
-- Balanced
SET ivfflat.probes = 20;
-- High recall (approaches brute-force quality)
SET ivfflat.probes = 100;

Creating an HNSW Index

HNSW (Hierarchical Navigable Small World) builds a multi-layer graph structure. It provides better recall than IVFFlat at the cost of larger index size and slower builds.

CREATE INDEX idx_documents_embedding_hnsw
    ON documents
    USING hnsw (embedding vector_cosine_ops)
    WITH (m = 16, ef_construction = 200);

Parameters:

Parameter	Default	Range	Effect
`m`	16	4-100	Connections per node. Higher = better recall, larger index
`ef_construction`	200	100-800	Search scope during build. Higher = better quality, slower build

Set the ef parameter at query time:

SET hnsw.ef = 100;
-- Default is 40; higher values improve recall but increase latency

IVFFlat vs HNSW: Choosing the Right Index

Factor	IVFFlat	HNSW
Query Speed	Fast (with enough probes)	Very fast
Recall Quality	Good (with enough probes)	Excellent
Index Size	Small (10-20% of data)	Large (2-4x of IVFFlat)
Build Time	Fast	Slow (5-10x IVFFlat)
Insert Performance	Fast (no rebuild needed)	Slow (graph restructuring)
Memory Usage	Low	High
Best For	Large datasets, write-heavy	High-recall, read-heavy

Choose HNSW when query recall is critical and you have sufficient memory. Choose IVFFlat when building large indexes with limited resources or when new vectors arrive frequently.

Distance Operator Access Methods

Each index type requires the matching operator class:

-- For cosine similarity
USING ivf (embedding vector_cosine_ops)
USING hnsw (embedding vector_cosine_ops)

-- For Euclidean (L2) distance
USING ivf (embedding vector_l2_ops)
USING hnsw (embedding vector_l2_ops)

-- For inner product
USING ivf (embedding vector_ip_ops)
USING hnsw (embedding vector_ip_ops)

Hybrid Search: Combining Vector Similarity with Metadata Filtering

Real-world applications rarely rely on vector similarity alone. Most queries need to filter by metadata—by date, category, author, or source—while also finding semantically similar results.

SQL: Vector Search with Metadata Filtering

SELECT
    id,
    title,
    1 - (embedding <=> '[0.023, -0.045, 0.112]'::vector(1536)) AS similarity,
    source,
    created_at
FROM documents
WHERE
    source = 'blog'
    AND created_at >= '2026-01-01'
    AND metadata->>'language' = 'en'
ORDER BY embedding <=> '[0.023, -0.045, 0.112]'::vector(1536)
LIMIT 20;

Be aware that PostgreSQL evaluates the WHERE clause before the ORDER BY, so filters reduce the search space before similarity comparison. This is efficient when filters are selective.

SQL: Vector Search + Full-Text Search (Hybrid)

Combine semantic and keyword matching for maximum relevance:

WITH semantic AS (
    SELECT id, title, content,
           1 - (embedding <=> '[0.023, -0.045, 0.112]'::vector(1536)) AS score
    FROM documents
    ORDER BY embedding <=> '[0.023, -0.045, 0.112]'::vector(1536)
    LIMIT 100
),
keyword AS (
    SELECT id, title, content,
           ts_rank(to_tsvector('english', content),
                   plainto_tsquery('english', 'postgresql performance tuning')) AS score
    FROM documents
    WHERE to_tsvector('english', content) @@ plainto_tsquery('english', 'postgresql performance tuning')
    LIMIT 100
),
combined AS (
    SELECT id, title, content,
           0.7 * semantic.score + 0.3 * keyword.score AS score
    FROM semantic
    FULL OUTER JOIN keyword USING (id)
)
SELECT id, title, content, score
FROM combined
ORDER BY score DESC
LIMIT 10;
-- Weighted blend: 70% semantic relevance, 30% keyword match

Python: Hybrid Search with Metadata

def hybrid_search(
    query_vector: list[float],
    query_text: str,
    source_filter: str | None = None,
    limit: int = 20
) -> list[dict]:
    """Hybrid search combining vector similarity, full-text, and metadata filtering."""
    conn = psycopg2.connect(
        host="localhost",
        dbname="vectordb",
        user="app_user",
        password=os.environ["DB_PASSWORD"]
    )
    try:
        with conn.cursor() as cur:
            sql = """
                WITH semantic AS (
                    SELECT id, title, content, source, created_at,
                           1 - (embedding <=> %s::vector(1536)) AS vector_score
                    FROM documents
                    WHERE (%s IS NULL OR source = %s)
                    ORDER BY embedding <=> %s::vector(1536)
                    LIMIT 200
                ),
                keyword AS (
                    SELECT id, title, content, source, created_at,
                           ts_rank(to_tsvector('english', content),
                                   plainto_tsquery('english', %s)) AS text_score
                    FROM documents
                    WHERE to_tsvector('english', content) @@ plainto_tsquery('english', %s)
                      AND (%s IS NULL OR source = %s)
                    LIMIT 200
                )
                SELECT
                    COALESCE(s.id, k.id) AS id,
                    COALESCE(s.title, k.title) AS title,
                    COALESCE(s.content, k.content) AS content,
                    COALESCE(s.source, k.source) AS source,
                    0.6 * COALESCE(s.vector_score, 0) +
                    0.4 * COALESCE(k.text_score, 0) AS combined_score
                FROM semantic s
                FULL OUTER JOIN keyword k ON s.id = k.id
                ORDER BY combined_score DESC
                LIMIT %s
            """
            vec_json = json.dumps(query_vector)
            cur.execute(sql, (
                vec_json, source_filter, source_filter,
                vec_json,
                query_text, query_text, source_filter, source_filter,
                limit
            ))
            columns = [desc[0] for desc in cur.description]
            return [dict(zip(columns, row)) for row in cur.fetchall()]
    except psycopg2.Error as e:
        print(f"[ERROR] Hybrid search failed: {e}")
        return []
    finally:
        conn.close()

Performance Tuning

Production pgvector deployments require deliberate tuning beyond default PostgreSQL settings.

PostgreSQL Configuration

Adjust these parameters in postgresql.conf:

# Memory for query sorting and index building
work_mem = '256MB'           # Default: 4MB
maintenance_work_mem = '2GB' # Default: 64MB — affects index creation

# Shared buffers for caching
shared_buffers = '8GB'       # 25% of RAM for dedicated DB servers

# Effective cache size for query planning
effective_cache_size = '24GB' # 75% of RAM

# Parallel query workers — vector searches benefit from parallelism
max_parallel_workers_per_gather = 4
max_parallel_workers = 8
max_worker_processes = 16

Python: Measuring Query Performance

import time

def benchmark_search(query_vector: list[float], iterations: int = 10) -> dict:
    """Benchmark vector search latency. Returns timing stats."""
    latencies = []
    conn = psycopg2.connect(
        host="localhost",
        dbname="vectordb",
        user="app_user",
        password=os.environ["DB_PASSWORD"]
    )
    try:
        with conn.cursor() as cur:
            for _ in range(iterations):
                start = time.perf_counter()
                cur.execute(
                    """
                    SELECT id FROM documents
                    ORDER BY embedding <=> %s::vector(1536)
                    LIMIT 10
                    """,
                    (json.dumps(query_vector),)
                )
                cur.fetchall()
                latencies.append(time.perf_counter() - start)

        avg_ms = (sum(latencies) / len(latencies)) * 1000
        return {
            "avg_ms": round(avg_ms, 2),
            "min_ms": round(min(latencies) * 1000, 2),
            "max_ms": round(max(latencies) * 1000, 2),
            "iterations": iterations
        }
    except psycopg2.Error as e:
        print(f"[ERROR] Benchmark failed: {e}")
        return {}
    finally:
        conn.close()

# Usage
stats = benchmark_search([0.023, -0.045, 0.112])
# Example output: {"avg_ms": 12.34, "min_ms": 8.91, "max_ms": 18.76, "iterations": 10}
print(f"[INFO] Average query latency: {stats['avg_ms']}ms")

Index Maintenance

Rebuild indexes periodically to maintain performance after many inserts or updates:

REINDEX INDEX idx_documents_embedding_hnsw;
-- Acquires an exclusive lock; schedule during maintenance windows

For zero-downtime rebuilds, create a new index concurrently, then drop the old one:

CREATE INDEX CONCURRENTLY idx_documents_embedding_hnsw_new
    ON documents USING hnsw (embedding vector_cosine_ops)
    WITH (m = 16, ef_construction = 200);

DROP INDEX CONCURRENTLY idx_documents_embedding_hnsw;

ALTER INDEX idx_documents_embedding_hnsw_new
    RENAME TO idx_documents_embedding_hnsw;

Partitioning Large Datasets

For tables with hundreds of millions of vectors, partition by a natural attribute:

CREATE TABLE documents (
    id BIGSERIAL,
    title TEXT,
    content TEXT,
    embedding vector(1536),
    source TEXT,
    created_at TIMESTAMPTZ DEFAULT NOW()
) PARTITION BY LIST (source);

CREATE TABLE documents_blog PARTITION OF documents FOR VALUES IN ('blog');
CREATE TABLE documents_docs PARTITION OF documents FOR VALUES IN ('docs');
CREATE TABLE documents_wiki PARTITION OF documents FOR VALUES IN ('wiki');

-- Create indexes on each partition separately
CREATE INDEX idx_documents_blog_embedding
    ON documents_blog USING hnsw (embedding vector_cosine_ops)
    WITH (m = 16, ef_construction = 200);

CREATE INDEX idx_documents_docs_embedding
    ON documents_docs USING hnsw (embedding vector_cosine_ops)
    WITH (m = 16, ef_construction = 200);

Queries that filter on the partition key scan only the relevant partition, reducing the search space dramatically.

Production Considerations

Monitoring

Track these metrics for vector workloads:

-- Index size monitoring
SELECT
    indexname,
    pg_size_pretty(pg_relation_size(indexrelid)) AS index_size
FROM pg_stat_user_indexes
WHERE indexname LIKE '%embedding%';

-- Query latency for vector searches
SELECT
    query,
    calls,
    mean_exec_time::numeric(10,2) AS avg_ms,
    total_exec_time::numeric(10,2) AS total_ms
FROM pg_stat_statements
WHERE query LIKE '%<=>%'
ORDER BY mean_exec_time DESC;

Connection Pooling

Vector search queries benefit from connection pooling. Use PgBouncer or the application-level pooling in pgx and asyncpg:

# asyncpg connection pool
pool = await asyncpg.create_pool(
    host="localhost",
    database="vectordb",
    user="app_user",
    password=os.environ["DB_PASSWORD"],
    min_size=5,
    max_size=20
)

Backup Considerations

Standard PostgreSQL backup tools work with vector columns, but note that vector indexes are large:

# pg_dump with custom format for compression
$ pg_dump -Fc vectordb > vectordb_backup.dump

# Size estimate for vector-heavy tables
$ psql -d vectordb -c "
    SELECT pg_size_pretty(pg_total_relation_size('documents')) AS total_size,
           pg_size_pretty(pg_indexes_size('documents')) AS index_size;
"

For large vector datasets, physical backups (pg_basebackup, WAL archiving) are faster than logical dumps.

Building a RAG Pipeline with pgvector

A complete RAG pipeline stores document embeddings, searches them with user queries, and feeds results to an LLM:

import openai
import psycopg2
import json

def rag_query(user_query: str, openai_client: openai.Client, limit: int = 5) -> str:
    """Retrieve relevant documents and generate an LLM response."""
    # Step 1: Generate query embedding
    response = openai_client.embeddings.create(
        model="text-embedding-3-small",
        input=user_query
    )
    query_vector = response.data[0].embedding

    # Step 2: Search for relevant documents
    conn = psycopg2.connect(
        host="localhost",
        dbname="vectordb",
        user="app_user",
        password=os.environ["DB_PASSWORD"]
    )
    try:
        with conn.cursor() as cur:
            cur.execute(
                """
                SELECT title, content,
                       1 - (embedding <=> %s::vector(1536)) AS similarity
                FROM documents
                ORDER BY embedding <=> %s::vector(1536)
                LIMIT %s
                """,
                (json.dumps(query_vector), json.dumps(query_vector), limit)
            )
            documents = cur.fetchall()
    except psycopg2.Error as e:
        print(f"[ERROR] RAG search failed: {e}")
        return "An error occurred during search."
    finally:
        conn.close()

    if not documents:
        return "No relevant documents found."

    # Step 3: Build context and generate response
    context = "\n\n".join([
        f"Title: {doc[0]}\nContent: {doc[1]}"
        for doc in documents
    ])

    completion = openai_client.chat.completions.create(
        model="gpt-4o",
        messages=[
            {
                "role": "system",
                "content": "Answer the user's question using the provided context."
            },
            {
                "role": "user",
                "content": f"Context:\n{context}\n\nQuestion: {user_query}"
            }
        ]
    )
    return completion.choices[0].message.content

Comparison with Alternative Vector Databases

pgvector is not the only vector search option. Here is how it compares:

Factor	pgvector	Pinecone	Weaviate	Milvus
Database ops	Existing PostgreSQL expertise	New platform	New platform	New platform
Index types	IVF, HNSW	HNSW (managed)	HNSW	IVF, HNSW, DiskANN
Hybrid search	Native SQL	Limited	Native	Limited
Transaction support	ACID	None	Limited	Limited
Scaling	Read replicas, partitioning	Auto-scaling	Sharding	Sharding
Operational cost	Existing DB infrastructure	Per-vector pricing	Self-hosted or cloud	Self-hosted or Zilliz

Choose pgvector when you want to minimize infrastructure complexity and already use PostgreSQL. Dedicated vector databases make sense for very large scale (billions of vectors) or specialized requirements like real-time streaming ingestion.

Common Pitfalls and How to Avoid Them

Wrong operator class: Using vector_cosine_ops with Euclidean queries produces correct results but misses the index. Match the operator class to your query operator.

Missing index rebuild: After bulk inserting millions of vectors, IVFFlat indexes degrade. Rebuild or use lists proportional to the final dataset size.

Ignoring work_mem: The default 4MB causes disk spilling for large vector sorts. Increase work_mem to at least 256MB for production workloads.

Wrong distance metric: Using cosine distance on embeddings trained with inner product loss produces poor results. Verify your embedding model’s expected distance metric.

No metadata filtering: Without filters, vector search on a 10M-row table searches all vectors. Add WHERE clauses on frequently filtered columns to improve latency.

Conclusion

PostgreSQL with pgvector provides a capable vector search solution that leverages existing PostgreSQL infrastructure. The extension adds vector data types, similarity operators, and efficient indexes to the world’s most popular open-source database. For teams building AI applications with existing PostgreSQL deployments—or teams that prefer to minimize infrastructure complexity—pgvector offers an attractive path to vector search.

The key to successful pgvector deployment is understanding the trade-offs between index types, query parameters, and performance characteristics. IVF indexes provide good performance for large datasets with moderate recall requirements. HNSW indexes provide the highest recall at the cost of larger indexes and slower builds. The choice depends on specific requirements for recall, latency, and resource usage.

Production deployments require attention to indexing strategies, query optimization, and operational concerns. The investment in proper configuration and tuning pays dividends in query performance and system reliability. For many AI applications, pgvector provides the vector capabilities needed without the complexity of separate vector database deployments.

Resources

pgvector GitHub Repository
pgvector Documentation
PostgreSQL Vector Search Guide
HNSW Algorithm Paper — Original research on Hierarchical Navigable Small World graphs
pgvector Performance Benchmarks — Official benchmark suite
PostgreSQL Full-Text Search Documentation — For hybrid search implementations

Introduction

What Is Vector Search and Why Does It Matter?

Prerequisites

Installing pgvector

Linux (Debian/Ubuntu) via APT

macOS via Homebrew

Docker

Build from Source

Cloud PostgreSQL Services

Enabling the Extension

Creating Vector Columns

SQL: Creating a Table with Vector Columns

Inserting Vectors

Python: Inserting Embeddings with psycopg2

Python: Using asyncpg

Go: Inserting Embeddings with pgx

Similarity Search Queries

SQL: Cosine Similarity Search

SQL: Euclidean Distance Search

SQL: Inner Product Search

Python: Querying with Similarity Search

Go: Querying with pgx

Indexing Strategies

Creating an IVFFlat Index

Creating an HNSW Index

IVFFlat vs HNSW: Choosing the Right Index

Distance Operator Access Methods

Hybrid Search: Combining Vector Similarity with Metadata Filtering

SQL: Vector Search with Metadata Filtering

SQL: Vector Search + Full-Text Search (Hybrid)

Python: Hybrid Search with Metadata

Performance Tuning

PostgreSQL Configuration

Python: Measuring Query Performance

Index Maintenance

Partitioning Large Datasets

Production Considerations

Monitoring

Connection Pooling

Backup Considerations

Building a RAG Pipeline with pgvector

Comparison with Alternative Vector Databases

Common Pitfalls and How to Avoid Them

Conclusion

Resources

Comments

Share this article

👍 Was this article helpful?