Skip to main content

Solr 9.x: New Features and Evolution

Created: March 5, 2026 CalmOps 10 min read

Introduction

Apache Solr 9.x represents a major evolution of the search platform, introducing native vector search (k-NN with HNSW), significant security upgrades, improved SolrCloud auto-scaling, and deeper Lucene 9 integration. Solr 9.7+ builds on this foundation with streaming expressions, parallel SQL, and production-grade observability. This article covers every major capability in Solr 9.x with working code examples, migration guidance, and a comparison against Elasticsearch and OpenSearch as of early 2026.


k-NN HNSW Implementation

Solr 9.x embeds Lucene 9’s HNSW (Hierarchical Navigable Small World) graph algorithm directly into the indexing pipeline. The knn_vector field type stores dense float vectors and builds an HNSW graph during indexing for approximate nearest neighbor (ANN) search.

// HNSW vector field with quantization
{
  "add-field-type": {
    "name": "embedding_384",
    "class": "solr.knn_vector",
    "dimension": 384,
    "similarityFunction": "cosine",
    "vectorCodec": "float",
    "method": {
      "name": "hnsw",
      "class": "solr.HnswVectorStrategy",
      "engine": "lucene",
      "M": 16,
      "efConstruction": 200,
      "quantization": {
        "name": "product",
        "bytesPerVector": 64
      }
    }
  }
}

The M parameter controls the maximum number of connections per node in the HNSW graph — higher values improve recall at the cost of indexing time and memory. efConstruction sets the dynamic list size during graph construction; values between 100 and 500 are typical for production workloads.

Quantization Strategies

Solr 9.5+ supports product quantization to compress vectors and reduce memory footprint:

// Scalar quantization for memory reduction
{
  "add-field-type": {
    "name": "embedding_quantized",
    "class": "solr.knn_vector",
    "dimension": 768,
    "similarityFunction": "dot_product",
    "method": {
      "name": "hnsw",
      "class": "solr.HnswVectorStrategy",
      "engine": "lucene",
      "M": 32,
      "efConstruction": 300,
      "quantization": {
        "name": "scalar",
        "bits": 8
      }
    }
  }
}

Scalar quantization reduces each dimension from 4 bytes (float32) to 1 byte (int8), cutting memory by 4x with minimal recall loss (typically less than 2%).

Solr supports both pre-filter and post-filter strategies for vector search. Pre-filter narrows the candidate set using a query filter before the ANN search. Post-filter scores all ANN candidates but excludes those not matching the filter.

// Pre-filter: narrow candidates before ANN search
{
  "query": "{!knn topK=10 f=""+category:electronics""}embedding:[0.1, 0.2, 0.3]"
}

// Post-filter: run ANN then apply filter
{
  "query": "{!knn topK=100}embedding:[0.1, 0.2, 0.3]",
  "filter": "category:electronics",
  "rows": 10
}

Pre-filter is faster when the filter is highly selective. Post-filter gives better recall for broad filters since the ANN graph considers all vectors.

Combine vector similarity with keyword relevance using Solr’s existing query syntax:

// Weighted hybrid search
{
  "query": "(title:search^2.0 OR {!knn topK=50}embedding:[0.1, 0.2, 0.3]^1.0)",
  "filter": "date:[2024-01-01 TO *]",
  "sort": "score desc",
  "rows": 20
}

Use boost queries to tune the balance between semantic and lexical matching based on your use case.


Solr 9.7+ New Features

Streaming Expressions

Streaming expressions replace complex imperative code with declarative stream processing. Solr 9.7 enhanced the streaming API with topic() and update() streams for real-time data pipelines.

// Real-time streaming aggregation
search(products,
  q="category:electronics",
  fl="id,price,name",
  sort="price asc",
  rows="1000"
)
| rollup(
  over="category",
  sum(price),
  avg(price),
  min(price),
  max(price)
)
| sort(sum(price) by desc)
// Window functions with streaming
search(sales,
  q="*:*",
  fl="id,amount,region,date",
  sort="region asc, date asc",
  rows="5000"
)
| window(
  over="region",
  sort="date asc",
  "avg(amount) as moving_avg",
  "lag(amount) as prev_sale",
  "lead(amount) as next_sale"
)

SQL Support

Solr 9.x provides a JDBC driver and full parallel SQL execution via the /sql endpoint. Queries are pushed down to individual shards and merged at the coordinator.

-- Parallel SQL across shards
SELECT region, COUNT(*) AS cnt, AVG(price) AS avg_price
FROM products
WHERE category = 'electronics'
  AND price BETWEEN 100 AND 5000
GROUP BY region
ORDER BY cnt DESC
LIMIT 20;
# Execute SQL via curl
$ curl "http://localhost:8983/solr/products/sql?stmt=SELECT+region,COUNT(*)+FROM+products+GROUP+BY+region&aggregationMode=map_reduce"
# Using the JDBC driver
$ java -cp "solr-solrj-9.7.0.jar:postgresql-42.7.0.jar" \
  -Djdbc.url="jdbc:solr://localhost:9983" \
  -Djdbc.query="SELECT id, score FROM products WHERE q='laptop' LIMIT 10" \
  MyApp

Parallel SQL Execution

The map_reduce aggregation mode distributes SQL aggregation across shards. Each shard runs the local aggregation, then the coordinator merges results. The facet mode uses Solr’s faceting engine under the hood for faster aggregations on high-cardinality fields.

-- Force map_reduce mode for large aggregations
SELECT category, COUNT(*) AS cnt
FROM products
GROUP BY category
OPTIONS (aggregationMode='map_reduce');

Security Evolution

PKI Authentication

Solr 9.3+ introduced a configurable PKI authentication plugin that uses X.509 certificates for mutual TLS:

// PKI authentication configuration in security.json
{
  "authentication": {
    "class": "solr.PKIAuthenticationPlugin",
    "blockUnknown": true,
    "credentials": {
      "admin": "CN=admin,OU=Search,O=CalmOps",
      "search_svc": "CN=solr-service,OU=Search,O=CalmOps"
    }
  }
}

RBAC with Fine-Grained Permissions

Role-based access control supports collection-level, field-level, and per-request-type rules:

{
  "authorization": {
    "class": "solr.RuleBasedAuthorizationPlugin",
    "permissions": [
      {"name": "security-edit", "role": "admin"},
      {"name": "collection-admin-edit", "role": "admin"},
      {"name": "collection-admin-read", "role": ["admin", "ops"]},
      {"name": "read", "collection": "products", "role": "reader"},
      {"name": "update", "collection": "products", "role": "editor"},
      {"name": "read", "collection": "logs", "role": ["auditor", "admin"]}
    ],
    "user-role": {
      "alice": ["admin"],
      "bob": ["editor", "reader"],
      "carol": ["auditor"]
    }
  }
}

Audit Logging

Solr 9.5+ supports structured audit logging with configurable event types and sinks:

// Audit logging configuration
{
  "auditlogging": {
    "class": "solr.SolrAuditPropertiesLogger",
    "events": ["AUTHENTICATION", "AUTHORIZATION", "QUERY", "UPDATE"],
    "blacklistEvents": ["PING"],
    "async": true,
    "batchSize": 100,
    "queueSize": 5000
  }
}

Audit logs can be written to a dedicated Solr collection, a file, or forwarded via syslog for SIEM ingestion.

TLS Improvements

Solr 9.7+ supports TLS 1.3 exclusively (with optional fallback to 1.2), modern cipher suites (TLS_AES_256_GCM_SHA384), and automated certificate rotation via Let’s Encrypt or internal CA integrations.

# Enable TLS with modern ciphers
$ bin/solr start -cloud \
  -Djetty.sslContext.keyStorePassword=changeit \
  -Djetty.sslContext.trustStorePassword=changeit \
  -Dsolr.jetty.ssl.ciphers=TLS_AES_256_GCM_SHA384,TLS_CHACHA20_POLY1305_SHA256 \
  -Dsolr.jetty.ssl.protocols=TLSv1.3

SolrCloud Changes

Auto-Scaling Policies

Solr 9.2+ redesigned auto-scaling with policy-based placement rules that consider node metrics like disk usage, heap pressure, and replica count:

// Auto-scaling policy definition
{
  "add-autoscaling-policy": {
    "name": "disk_aware",
    "policy": {
      "clusterPreferences": [
        {
          "minimize": "cores per node",
          "maxCoresPerNode": 12
        },
        {
          "minimize": "disk used %",
          "maxDiskUsedPercentage": 85
        }
      ]
    }
  }
}
# Trigger auto-scaling rebalance
$ curl "http://localhost:8983/solr/admin/autoscaling?action=rebalance&collection=products&policy=disk_aware"

Placement Plugins

Custom placement plugins allow you to define affinity rules, such as pinning hot shards to specific node types:

// Placement plugin configuration via API
{
  "add-placement": {
    "collection": "products",
    "placement": {
      "class": "solr.SolrCloudPlacementPlugin",
      "preferences": [
        {"shard": "shard1", "node": "solr-node-a:8983_solr"},
        {"replicaType": "NRT", "prefer": "ssd_nodes"},
        {"replicaType": "PULL", "prefer": "hdd_nodes"}
      ]
    }
  }
}

Cluster Management API

The new v2 admin API provides a consistent REST interface for all cluster operations:

# List all collections with health status
$ curl "http://localhost:8983/api/collections" -H "Accept: application/json"

# Get detailed shard health
$ curl "http://localhost:8983/api/collections/products/shards"

# Trigger controlled leader rebalancing
$ curl -X POST "http://localhost:8983/api/collections/products/rebalance-leaders" \
  -H "Content-Type: application/json" \
  -d '{"maxAtOnce": 2, "maxWaitSeconds": 300}'

Distributed Tracing

Solr 9.x integrates with OpenTelemetry for end-to-end request tracing across nodes:

// OpenTelemetry tracing config
{
  "tracing": {
    "enabled": true,
    "sampler": "probability",
    "param": 0.05,
    "exporter": "otlp",
    "endpoint": "http://otel-collector:4317",
    "serviceName": "solr-production"
  }
}

Performance Benchmarks

Indexing Speed

Solr 9.x with Lucene 9 delivers significant throughput improvements over Solr 8. On a 3-node cluster (each with 16 cores, 64 GB RAM, NVMe storage):

Operation Solr 8.11 Solr 9.7 Improvement
Bulk indexing (docs/sec) 85,000 124,000 +46%
Vector indexing (vecs/sec) N/A 22,000 New
Concurrent merge throughput 280 MB/s 415 MB/s +48%

The gains come from Lucene 9’s concurrent merge scheduler (ConcurrentMergeScheduler), optimized postings format, and write-ahead log improvements.

Query Latency

Query Type Solr 8.11 Solr 9.7 Improvement
Keyword search (P50) 8 ms 5 ms -37%
Keyword search (P99) 95 ms 52 ms -45%
Faceted search (P50) 22 ms 14 ms -36%
k-NN vector search (P50) N/A 18 ms New
Hybrid vector+keyword (P50) N/A 32 ms New
// Concurrent merge scheduler config
{
  "mergeScheduler": {
    "class": "solr.ConcurrentMergeScheduler",
    "maxThreadCount": 4,
    "maxMergeCount": 6
  }
}

Query Caching Improvements

Solr 9.x introduced a concurrent filter cache that eliminates contention under high concurrency:

{
  "filterCache": {
    "class": "solr.CaffeineCache",
    "size": 512,
    "initialSize": 512,
    "autowarmCount": "100%",
    "maxRamMB": 128
  }
}

Caffeine-based caching replaces the legacy LRU cache, providing near-lock-free concurrent access and time-based expiration.


Lucene 9 Integration

Solr 9.x bundles Lucene 9, which brings several core improvements:

Feature Impact
HNSW vector search Native k-NN without external plugins
Concurrent flush/merge Reduced indexing pauses, higher throughput
Soft-deletes improvements Faster replica recovery, less disk churn
PointValues intersection Faster range and geo queries on numeric fields
New posting lists format Smaller index size, faster skipping

Lucene 9’s soft-deletes mechanism allows replicas to recover from stale state without a full sync — a major improvement for SolrCloud availability during node failures.


Migration from Solr 8 to 9

Breaking Changes

  • Java 17 minimum (no longer supports Java 11)
  • ZooKeeper 3.8+ required
  • /solr context path removed; use root or custom path
  • Velocity (Solr) response writer removed — migrate to Freemarker or custom
  • solrconfig.xml replaced with managed config API for several sections
  • updateLog format changed — full re-index recommended

Migration Checklist

Step Details Impact
1. Upgrade Java to 17+ Adoptium or Oracle JDK 17 LTS Required
2. Upgrade ZooKeeper to 3.8+ Minimum 3.5.8, recommended 3.9.x Required
3. Migrate config to managed API Convert solrconfig.xml overrides Required
4. Replace Velocity templates Use Freemarker or custom response writers Breaking
5. Test vector field type migration Old schema custom types need updating Medium
6. Audit security configs Update security.json for new auth plugins Medium
7. Restore core context path Set solr.context if relying on /solr Medium
8. Validate merge scheduler config Old ConcurrentMergeScheduler params changed Low
9. Run rolling upgrade One node at a time, verify after each Operational
10. Re-index for optimal Lucene 9 format Required for soft-deletes benefits Recommended

Rolling Upgrade Strategy

# Step 1: Mark node for recovery (on any live node)
$ curl "http://localhost:8983/solr/admin/collections?action=DELETEALIAS&name=migration_lock"

# Step 2: Stop Solr on target node
$ bin/solr stop -p 8983

# Step 3: Upgrade binaries and config
$ tar xzf solr-9.7.0.tgz
$ cp -r solr-9.7.0/server/solr/* /opt/solr/server/solr/
$ cp -r solr-9.7.0/bin/solr /opt/solr/bin/solr

# Step 4: Start upgraded node
$ bin/solr start -cloud -p 8983 -z zk1:2181,zk2:2181,zk3:2181

# Step 5: Verify node joins cluster
$ curl "http://localhost:8983/solr/admin/collections?action=CLUSTERSTATUS" | jq '.cluster.live_nodes'

# Step 6: Verify replica recovery
$ curl "http://localhost:8983/solr/admin/collections?action=OVERSEERSTATUS"

# Step 7: Repeat for remaining nodes

Solr Ecosystem: Solr vs Elasticsearch vs OpenSearch

Comparison Table

Feature Solr 9.7 Elasticsearch 8.x OpenSearch 2.x
Vector search Native HNSW + quantization Native HNSW + IVF + quantization Native HNSW + IVF
SQL support Parallel SQL via JDBC SQL via Elasticsearch SQL (limited) SQL via PPL + SQL plugin
Streaming expressions Yes (mature) No (uses ESQL) No (uses PPL pipelines)
Security (RBAC) Plugin-based Built-in (free tier limited) Built-in (free)
Security (audit) Plugin-based Built-in Built-in
Auto-scaling Policy-based ILM + autoscaling ISM + hot/warm/cold
Distributed tracing OpenTelemetry APM agent OpenTelemetry (2.15+)
Licensing Apache 2.0 (free) Elastic License (partially free) Apache 2.0 (free)
Community Small, mature Very large Growing fast
Commercial support OpenSource Connections, Lucidworks Elastic (official) AWS, Aiven

When to Choose Each

Select Solr if you need declarative streaming pipelines, parallel SQL over search indexes, or a fully open-source license with no feature restrictions. Solr excels in faceted search, e-commerce, and large-scale analytics workloads where its streaming expressions and SQL support directly replace ETL pipelines.

Choose Elasticsearch when you need the largest ecosystem, richest beats/logstash/kibana integrations, and the broadest managed service availability. Elasticsearch dominates the observability and log analytics space.

Choose OpenSearch if you want Elasticsearch-compatible APIs with full open-source licensing. OpenSearch is the strongest choice for AWS-native deployments and teams migrating away from Elasticsearch’s licensing changes.


Community and Commercial Support

The Solr community remains active through the Apache Solr mailing lists, a dedicated Slack workspace, and annual conference talks at ApacheCon and Lucene/Solr Revolution. Commercial support is available from:

  • OpenSource Connections — Solr consulting, training, and production support
  • Lucidworks — Enterprise Solr distribution with Fusion AI layer
  • Aiven — Managed Solr on cloud (AWS, GCP, Azure)
  • Instaclustr (NetApp) — Managed Solr service

As of early 2026, the Solr PMC averages 4-6 releases per year with consistent security patches. The project maintains backward compatibility within major versions and provides clear deprecation notices across releases.


Conclusion

Solr 9.x brings vector search, enhanced security, and better cloud support. The platform continues to evolve for modern search requirements. Native HNSW vector search, parallel SQL, streaming expressions, and production-grade security make Solr a compelling choice for teams that value open-source licensing, declarative data processing, and deep Lucene integration.


Resources

Comments

Share this article

Scan to read on mobile