Introduction
Apache Solr 9.x represents a major evolution of the search platform, introducing native vector search (k-NN with HNSW), significant security upgrades, improved SolrCloud auto-scaling, and deeper Lucene 9 integration. Solr 9.7+ builds on this foundation with streaming expressions, parallel SQL, and production-grade observability. This article covers every major capability in Solr 9.x with working code examples, migration guidance, and a comparison against Elasticsearch and OpenSearch as of early 2026.
Vector Search
k-NN HNSW Implementation
Solr 9.x embeds Lucene 9’s HNSW (Hierarchical Navigable Small World) graph algorithm directly into the indexing pipeline. The knn_vector field type stores dense float vectors and builds an HNSW graph during indexing for approximate nearest neighbor (ANN) search.
// HNSW vector field with quantization
{
"add-field-type": {
"name": "embedding_384",
"class": "solr.knn_vector",
"dimension": 384,
"similarityFunction": "cosine",
"vectorCodec": "float",
"method": {
"name": "hnsw",
"class": "solr.HnswVectorStrategy",
"engine": "lucene",
"M": 16,
"efConstruction": 200,
"quantization": {
"name": "product",
"bytesPerVector": 64
}
}
}
}
The M parameter controls the maximum number of connections per node in the HNSW graph — higher values improve recall at the cost of indexing time and memory. efConstruction sets the dynamic list size during graph construction; values between 100 and 500 are typical for production workloads.
Quantization Strategies
Solr 9.5+ supports product quantization to compress vectors and reduce memory footprint:
// Scalar quantization for memory reduction
{
"add-field-type": {
"name": "embedding_quantized",
"class": "solr.knn_vector",
"dimension": 768,
"similarityFunction": "dot_product",
"method": {
"name": "hnsw",
"class": "solr.HnswVectorStrategy",
"engine": "lucene",
"M": 32,
"efConstruction": 300,
"quantization": {
"name": "scalar",
"bits": 8
}
}
}
}
Scalar quantization reduces each dimension from 4 bytes (float32) to 1 byte (int8), cutting memory by 4x with minimal recall loss (typically less than 2%).
Filtering Before and After Search
Solr supports both pre-filter and post-filter strategies for vector search. Pre-filter narrows the candidate set using a query filter before the ANN search. Post-filter scores all ANN candidates but excludes those not matching the filter.
// Pre-filter: narrow candidates before ANN search
{
"query": "{!knn topK=10 f=""+category:electronics""}embedding:[0.1, 0.2, 0.3]"
}
// Post-filter: run ANN then apply filter
{
"query": "{!knn topK=100}embedding:[0.1, 0.2, 0.3]",
"filter": "category:electronics",
"rows": 10
}
Pre-filter is faster when the filter is highly selective. Post-filter gives better recall for broad filters since the ANN graph considers all vectors.
Hybrid Search
Combine vector similarity with keyword relevance using Solr’s existing query syntax:
// Weighted hybrid search
{
"query": "(title:search^2.0 OR {!knn topK=50}embedding:[0.1, 0.2, 0.3]^1.0)",
"filter": "date:[2024-01-01 TO *]",
"sort": "score desc",
"rows": 20
}
Use boost queries to tune the balance between semantic and lexical matching based on your use case.
Solr 9.7+ New Features
Streaming Expressions
Streaming expressions replace complex imperative code with declarative stream processing. Solr 9.7 enhanced the streaming API with topic() and update() streams for real-time data pipelines.
// Real-time streaming aggregation
search(products,
q="category:electronics",
fl="id,price,name",
sort="price asc",
rows="1000"
)
| rollup(
over="category",
sum(price),
avg(price),
min(price),
max(price)
)
| sort(sum(price) by desc)
// Window functions with streaming
search(sales,
q="*:*",
fl="id,amount,region,date",
sort="region asc, date asc",
rows="5000"
)
| window(
over="region",
sort="date asc",
"avg(amount) as moving_avg",
"lag(amount) as prev_sale",
"lead(amount) as next_sale"
)
SQL Support
Solr 9.x provides a JDBC driver and full parallel SQL execution via the /sql endpoint. Queries are pushed down to individual shards and merged at the coordinator.
-- Parallel SQL across shards
SELECT region, COUNT(*) AS cnt, AVG(price) AS avg_price
FROM products
WHERE category = 'electronics'
AND price BETWEEN 100 AND 5000
GROUP BY region
ORDER BY cnt DESC
LIMIT 20;
# Execute SQL via curl
$ curl "http://localhost:8983/solr/products/sql?stmt=SELECT+region,COUNT(*)+FROM+products+GROUP+BY+region&aggregationMode=map_reduce"
# Using the JDBC driver
$ java -cp "solr-solrj-9.7.0.jar:postgresql-42.7.0.jar" \
-Djdbc.url="jdbc:solr://localhost:9983" \
-Djdbc.query="SELECT id, score FROM products WHERE q='laptop' LIMIT 10" \
MyApp
Parallel SQL Execution
The map_reduce aggregation mode distributes SQL aggregation across shards. Each shard runs the local aggregation, then the coordinator merges results. The facet mode uses Solr’s faceting engine under the hood for faster aggregations on high-cardinality fields.
-- Force map_reduce mode for large aggregations
SELECT category, COUNT(*) AS cnt
FROM products
GROUP BY category
OPTIONS (aggregationMode='map_reduce');
Security Evolution
PKI Authentication
Solr 9.3+ introduced a configurable PKI authentication plugin that uses X.509 certificates for mutual TLS:
// PKI authentication configuration in security.json
{
"authentication": {
"class": "solr.PKIAuthenticationPlugin",
"blockUnknown": true,
"credentials": {
"admin": "CN=admin,OU=Search,O=CalmOps",
"search_svc": "CN=solr-service,OU=Search,O=CalmOps"
}
}
}
RBAC with Fine-Grained Permissions
Role-based access control supports collection-level, field-level, and per-request-type rules:
{
"authorization": {
"class": "solr.RuleBasedAuthorizationPlugin",
"permissions": [
{"name": "security-edit", "role": "admin"},
{"name": "collection-admin-edit", "role": "admin"},
{"name": "collection-admin-read", "role": ["admin", "ops"]},
{"name": "read", "collection": "products", "role": "reader"},
{"name": "update", "collection": "products", "role": "editor"},
{"name": "read", "collection": "logs", "role": ["auditor", "admin"]}
],
"user-role": {
"alice": ["admin"],
"bob": ["editor", "reader"],
"carol": ["auditor"]
}
}
}
Audit Logging
Solr 9.5+ supports structured audit logging with configurable event types and sinks:
// Audit logging configuration
{
"auditlogging": {
"class": "solr.SolrAuditPropertiesLogger",
"events": ["AUTHENTICATION", "AUTHORIZATION", "QUERY", "UPDATE"],
"blacklistEvents": ["PING"],
"async": true,
"batchSize": 100,
"queueSize": 5000
}
}
Audit logs can be written to a dedicated Solr collection, a file, or forwarded via syslog for SIEM ingestion.
TLS Improvements
Solr 9.7+ supports TLS 1.3 exclusively (with optional fallback to 1.2), modern cipher suites (TLS_AES_256_GCM_SHA384), and automated certificate rotation via Let’s Encrypt or internal CA integrations.
# Enable TLS with modern ciphers
$ bin/solr start -cloud \
-Djetty.sslContext.keyStorePassword=changeit \
-Djetty.sslContext.trustStorePassword=changeit \
-Dsolr.jetty.ssl.ciphers=TLS_AES_256_GCM_SHA384,TLS_CHACHA20_POLY1305_SHA256 \
-Dsolr.jetty.ssl.protocols=TLSv1.3
SolrCloud Changes
Auto-Scaling Policies
Solr 9.2+ redesigned auto-scaling with policy-based placement rules that consider node metrics like disk usage, heap pressure, and replica count:
// Auto-scaling policy definition
{
"add-autoscaling-policy": {
"name": "disk_aware",
"policy": {
"clusterPreferences": [
{
"minimize": "cores per node",
"maxCoresPerNode": 12
},
{
"minimize": "disk used %",
"maxDiskUsedPercentage": 85
}
]
}
}
}
# Trigger auto-scaling rebalance
$ curl "http://localhost:8983/solr/admin/autoscaling?action=rebalance&collection=products&policy=disk_aware"
Placement Plugins
Custom placement plugins allow you to define affinity rules, such as pinning hot shards to specific node types:
// Placement plugin configuration via API
{
"add-placement": {
"collection": "products",
"placement": {
"class": "solr.SolrCloudPlacementPlugin",
"preferences": [
{"shard": "shard1", "node": "solr-node-a:8983_solr"},
{"replicaType": "NRT", "prefer": "ssd_nodes"},
{"replicaType": "PULL", "prefer": "hdd_nodes"}
]
}
}
}
Cluster Management API
The new v2 admin API provides a consistent REST interface for all cluster operations:
# List all collections with health status
$ curl "http://localhost:8983/api/collections" -H "Accept: application/json"
# Get detailed shard health
$ curl "http://localhost:8983/api/collections/products/shards"
# Trigger controlled leader rebalancing
$ curl -X POST "http://localhost:8983/api/collections/products/rebalance-leaders" \
-H "Content-Type: application/json" \
-d '{"maxAtOnce": 2, "maxWaitSeconds": 300}'
Distributed Tracing
Solr 9.x integrates with OpenTelemetry for end-to-end request tracing across nodes:
// OpenTelemetry tracing config
{
"tracing": {
"enabled": true,
"sampler": "probability",
"param": 0.05,
"exporter": "otlp",
"endpoint": "http://otel-collector:4317",
"serviceName": "solr-production"
}
}
Performance Benchmarks
Indexing Speed
Solr 9.x with Lucene 9 delivers significant throughput improvements over Solr 8. On a 3-node cluster (each with 16 cores, 64 GB RAM, NVMe storage):
| Operation | Solr 8.11 | Solr 9.7 | Improvement |
|---|---|---|---|
| Bulk indexing (docs/sec) | 85,000 | 124,000 | +46% |
| Vector indexing (vecs/sec) | N/A | 22,000 | New |
| Concurrent merge throughput | 280 MB/s | 415 MB/s | +48% |
The gains come from Lucene 9’s concurrent merge scheduler (ConcurrentMergeScheduler), optimized postings format, and write-ahead log improvements.
Query Latency
| Query Type | Solr 8.11 | Solr 9.7 | Improvement |
|---|---|---|---|
| Keyword search (P50) | 8 ms | 5 ms | -37% |
| Keyword search (P99) | 95 ms | 52 ms | -45% |
| Faceted search (P50) | 22 ms | 14 ms | -36% |
| k-NN vector search (P50) | N/A | 18 ms | New |
| Hybrid vector+keyword (P50) | N/A | 32 ms | New |
// Concurrent merge scheduler config
{
"mergeScheduler": {
"class": "solr.ConcurrentMergeScheduler",
"maxThreadCount": 4,
"maxMergeCount": 6
}
}
Query Caching Improvements
Solr 9.x introduced a concurrent filter cache that eliminates contention under high concurrency:
{
"filterCache": {
"class": "solr.CaffeineCache",
"size": 512,
"initialSize": 512,
"autowarmCount": "100%",
"maxRamMB": 128
}
}
Caffeine-based caching replaces the legacy LRU cache, providing near-lock-free concurrent access and time-based expiration.
Lucene 9 Integration
Solr 9.x bundles Lucene 9, which brings several core improvements:
| Feature | Impact |
|---|---|
| HNSW vector search | Native k-NN without external plugins |
| Concurrent flush/merge | Reduced indexing pauses, higher throughput |
| Soft-deletes improvements | Faster replica recovery, less disk churn |
| PointValues intersection | Faster range and geo queries on numeric fields |
| New posting lists format | Smaller index size, faster skipping |
Lucene 9’s soft-deletes mechanism allows replicas to recover from stale state without a full sync — a major improvement for SolrCloud availability during node failures.
Migration from Solr 8 to 9
Breaking Changes
- Java 17 minimum (no longer supports Java 11)
- ZooKeeper 3.8+ required
/solrcontext path removed; use root or custom path- Velocity (Solr) response writer removed — migrate to Freemarker or custom
solrconfig.xmlreplaced with managed config API for several sectionsupdateLogformat changed — full re-index recommended
Migration Checklist
| Step | Details | Impact |
|---|---|---|
| 1. Upgrade Java to 17+ | Adoptium or Oracle JDK 17 LTS | Required |
| 2. Upgrade ZooKeeper to 3.8+ | Minimum 3.5.8, recommended 3.9.x | Required |
| 3. Migrate config to managed API | Convert solrconfig.xml overrides |
Required |
| 4. Replace Velocity templates | Use Freemarker or custom response writers | Breaking |
| 5. Test vector field type migration | Old schema custom types need updating | Medium |
| 6. Audit security configs | Update security.json for new auth plugins |
Medium |
| 7. Restore core context path | Set solr.context if relying on /solr |
Medium |
| 8. Validate merge scheduler config | Old ConcurrentMergeScheduler params changed |
Low |
| 9. Run rolling upgrade | One node at a time, verify after each | Operational |
| 10. Re-index for optimal Lucene 9 format | Required for soft-deletes benefits | Recommended |
Rolling Upgrade Strategy
# Step 1: Mark node for recovery (on any live node)
$ curl "http://localhost:8983/solr/admin/collections?action=DELETEALIAS&name=migration_lock"
# Step 2: Stop Solr on target node
$ bin/solr stop -p 8983
# Step 3: Upgrade binaries and config
$ tar xzf solr-9.7.0.tgz
$ cp -r solr-9.7.0/server/solr/* /opt/solr/server/solr/
$ cp -r solr-9.7.0/bin/solr /opt/solr/bin/solr
# Step 4: Start upgraded node
$ bin/solr start -cloud -p 8983 -z zk1:2181,zk2:2181,zk3:2181
# Step 5: Verify node joins cluster
$ curl "http://localhost:8983/solr/admin/collections?action=CLUSTERSTATUS" | jq '.cluster.live_nodes'
# Step 6: Verify replica recovery
$ curl "http://localhost:8983/solr/admin/collections?action=OVERSEERSTATUS"
# Step 7: Repeat for remaining nodes
Solr Ecosystem: Solr vs Elasticsearch vs OpenSearch
Comparison Table
| Feature | Solr 9.7 | Elasticsearch 8.x | OpenSearch 2.x |
|---|---|---|---|
| Vector search | Native HNSW + quantization | Native HNSW + IVF + quantization | Native HNSW + IVF |
| SQL support | Parallel SQL via JDBC | SQL via Elasticsearch SQL (limited) | SQL via PPL + SQL plugin |
| Streaming expressions | Yes (mature) | No (uses ESQL) | No (uses PPL pipelines) |
| Security (RBAC) | Plugin-based | Built-in (free tier limited) | Built-in (free) |
| Security (audit) | Plugin-based | Built-in | Built-in |
| Auto-scaling | Policy-based | ILM + autoscaling | ISM + hot/warm/cold |
| Distributed tracing | OpenTelemetry | APM agent | OpenTelemetry (2.15+) |
| Licensing | Apache 2.0 (free) | Elastic License (partially free) | Apache 2.0 (free) |
| Community | Small, mature | Very large | Growing fast |
| Commercial support | OpenSource Connections, Lucidworks | Elastic (official) | AWS, Aiven |
When to Choose Each
Select Solr if you need declarative streaming pipelines, parallel SQL over search indexes, or a fully open-source license with no feature restrictions. Solr excels in faceted search, e-commerce, and large-scale analytics workloads where its streaming expressions and SQL support directly replace ETL pipelines.
Choose Elasticsearch when you need the largest ecosystem, richest beats/logstash/kibana integrations, and the broadest managed service availability. Elasticsearch dominates the observability and log analytics space.
Choose OpenSearch if you want Elasticsearch-compatible APIs with full open-source licensing. OpenSearch is the strongest choice for AWS-native deployments and teams migrating away from Elasticsearch’s licensing changes.
Community and Commercial Support
The Solr community remains active through the Apache Solr mailing lists, a dedicated Slack workspace, and annual conference talks at ApacheCon and Lucene/Solr Revolution. Commercial support is available from:
- OpenSource Connections — Solr consulting, training, and production support
- Lucidworks — Enterprise Solr distribution with Fusion AI layer
- Aiven — Managed Solr on cloud (AWS, GCP, Azure)
- Instaclustr (NetApp) — Managed Solr service
As of early 2026, the Solr PMC averages 4-6 releases per year with consistent security patches. The project maintains backward compatibility within major versions and provides clear deprecation notices across releases.
Conclusion
Solr 9.x brings vector search, enhanced security, and better cloud support. The platform continues to evolve for modern search requirements. Native HNSW vector search, parallel SQL, streaming expressions, and production-grade security make Solr a compelling choice for teams that value open-source licensing, declarative data processing, and deep Lucene integration.
Resources
- Apache Solr Reference Guide 9.x
- Lucene 9 Migration Notes
- Solr Security Configuration
- Streaming Expressions Reference
- Solr vs Elasticsearch: Detailed Comparison
- OpenSearch Project Documentation
- Solr Community Resources
Comments