TL;DR
This article maps out a production-ready vector search pipeline using Redis (RediSearch / Redis Stack). It includes concrete, runnable examples (Python + Node), index creation and tuning for HNSW, a benchmark harness to measure recall vs latency, production scaling patterns, and an operational checklist.
1. Why Redis for vector search
Redis delivers low-latency KNN queries, co-located document storage (RedisJSON) and secondary indexes (RediSearch), making hybrid search (filters + semantic similarity) simple and fast. For many RAG and recommendation systems, this single-system approach reduces complexity compared to separate vector DBs plus a document store.
2. Pipeline overview
- Ingest documents (JSON) with metadata
- Compute or obtain embeddings from a provider
- Store embeddings (JSON path or separate binary key)
- Create RediSearch index with VECTOR (HNSW)
- Query with KNN and apply post-processing/reranking
3. Embedding provider choices
- Hosted APIs (OpenAI, Anthropic, AWS Bedrock): easy integration, stable models, pay-per-call.
- Self-hosted (sentence-transformers, ggml-based models): lower marginal cost at scale, more ops overhead.
- Recommendation: start with a provider for development; add a local bulk-reembedding pipeline for refreshes.
Dimension advice: prefer powers-of-two common dims (768/1024/1536). Verify model dimension before building index.
4. Storage strategies (detailed)
A. RedisJSON (single object)
- Pros: one key per doc, easier queries, RediSearch can index JSON paths directly.
- Cons: updating vectors rewrites JSON; larger objects increase memory churn.
B. Separate vector key (HASH or STRING binary)
- Pros: cheap vector updates, less JSON churn.
- Cons: need to coordinate doc โ vec coherence, more complex indexing.
Hybrid: store metadata in JSON and keep vector bytes in a separate HSET or STRING; create a synthetic document index that references the vector path.
5. Creating a RediSearch index (recommended settings)
Example (JSON + HNSW vectors, 1536-dim):
FT.CREATE idx:docs ON JSON PREFIX 1 "doc:" SCHEMA \
$.title AS title TEXT SORTABLE \
$.body AS body TEXT \
$.embedding AS embedding VECTOR HNSW \
TYPE FLOAT32 DIM 1536 DISTANCE_METRIC COSINE \
PARAMS {"M":16, "EF_CONSTRUCTION":200}
Notes:
- Set DIM and TYPE to match your embedding bytes.
- If using cluster, ensure doc keys and their vectors are collocated (use hash tags: doc:{tenant}:
).
6. Serializing embeddings (Python and Node)
Python (numpy):
import numpy as np
def to_vec_bytes(embedding_list):
return np.array(embedding_list, dtype=np.float32).tobytes()
Node (Float32Array -> Buffer):
function toVecBuffer(embedding){
const floatArray = new Float32Array(embedding);
return Buffer.from(floatArray.buffer);
}
7. Query examples (Python runnable)
This Python example uses redis-py client to issue a raw FT.SEARCH with KNN. Adjust for your client library helpers when available.
import numpy as np
from redis import Redis
r = Redis()
# query vector (numpy float32)
q = np.random.rand(1536).astype('float32')
q_bytes = q.tobytes()
# raw FT.SEARCH with KNN (pass vector bytes via PARAMS)
query = "*=>[KNN 10 @embedding $BLOB AS score]"
res = r.execute_command(
'FT.SEARCH', 'idx:docs', query,
'PARAMS', '2', 'BLOB', q_bytes,
'RETURN', '3', 'title', 'body', 'score', 'DIALECT', '2'
)
# parse `res` according to RediSearch result format
print(res)
Notes: the exact parameterization differs by client and RediSearch version; DIALECT 2 enables newer query features.
8. Hybrid search example
Apply filters then KNN (e.g., category + semantic):
FT.SEARCH idx:docs "@category:{finance}=>[KNN 10 @embedding $BLOB AS score]" PARAMS 2 BLOB <vec> RETURN 3 title body score
Pattern: filter candidates first to reduce vector work and improve precision.
9. Benchmark harness (Python)
A simple harness to measure latency and Recall@K vs brute-force:
import time, numpy as np
from redis import Redis
r = Redis()
# load small corpus of N vectors into memory for brute-force baseline
corpus = np.load('corpus.npy') # shape (N, D)
queries = np.load('queries.npy') # shape (Q, D)
# function to brute force top-K
def brute_topk(q, k=10):
sims = corpus @ q
idx = np.argpartition(-sims, k)[:k]
return set(idx)
# function to query Redis and get ids
def redis_query(q_bytes, k=10):
res = r.execute_command('FT.SEARCH','idx:docs', f'*=>[KNN {k} @embedding $BLOB AS score]','PARAMS','2','BLOB',q_bytes,'RETURN','1','__key','DIALECT','2')
# parse response -> list of ids
# ...
return ids
# run benchmark
k=10
start = time.time()
for q in queries:
q_bytes = q.astype('float32').tobytes()
ids = redis_query(q_bytes, k)
# compute recall vs brute
recall = len(brute_topk(q,k).intersection(ids)) / k
# record latency
print('Done')
Measure recall while varying ef (efRuntime) to find sweet spot.
10. HNSW tuning guidance
- M (16-64): higher => more memory and better recall.
- efConstruction (100-500): affects build-time index quality.
- ef (efRuntime) per query: tune based on latency budget; higher -> better recall but higher latency.
Practical flow: build at moderate efConstruction, test ef from 20..200 and measure recall/latency curve.
11. Scaling & topology
- Per-tenant indexes or per-domain partitioning are simplest to scale horizontally.
- For multi-tenant systems, route queries to the node owning the tenant’s index.
- When cross-index search is required, run queries in parallel across shards and merge results at application level.
12. Operational checklist (expanded)
- Secure: ACLs, TLS, and network controls
- Backup: test restore of RDB/AOF and module data
- Observability: Prometheus exporter, RedisInsight for cluster visibility
- Reindex: scripted reindex with blue-green switch to avoid downtime
- Cost: monitor memory and consider RoF or quantization when cost becomes limiting
13. FAQs and common pitfalls
-
Q: Why are KNN queries slower during concurrent writes? A: HNSW index updates are heavier than point writes; prefer batched writes or maintain a write buffer and background merge.
-
Q: What causes poor recall? A: Low ef or low M; also vector normalization mismatches or using poor-quality embeddings.
14. Next steps & repo idea
Provide a runnable sample repo with these components:
- ingestion worker (reads docs, computes embeddings, writes Redis)
- index builder/script
- query service (HTTP) exposing /search
- benchmark harness
This will be implemented as a companion GitHub repo linked from this article.
Resources
- RediSearch vector docs: https://redis.io/docs/stack/search/
- Redis Stack tutorials and community examples
Appendix: Common FT.SEARCH examples, parsing notes, and CLI tips added to repo.
Comments