Introduction
Most data structures require trade-offs. Arrays offer fast access but slow insertions. Linked lists make insertions easy but searching is slow. Binary search trees provide balanced performance, but implementing them correctly is notoriously tricky.
Enter skip lists—a data structure that achieves the best of multiple worlds. With expected O(log n) time complexity for search, insert, and delete operations, skip lists rival balanced trees while being significantly simpler to implement. Their elegance and efficiency have made them favorites in practice, with Redis using them for ordered sets.
This article explores skip lists, explaining how they work, why they’re efficient, and when you might choose them over alternatives.
The Problem with Linked Lists
Imagine you’re searching for an item in a sorted linked list. Even though the data is sorted, you have no choice but to scan from the beginning, checking each node until you find your target. This gives O(n) search time—slow for large lists.
What if you could create “express lanes” that let you skip over large portions of the list? That’s exactly what skip lists do.
How Skip Lists Work
A skip list is a layered linked list where:
- The bottom layer contains all elements in sorted order
- Each higher layer acts as an “express lane,” containing a subset of elements from below
- Elements in higher layers are chosen randomly (typically with 50% probability)
- Searching starts from the highest layer and descends progressively
Think of it like a highway system. To travel from city A to city Z, you could take the local road through every town, or you could take highways that bypass many towns. Skip lists create multiple “highways” through the data.
Searching in a Skip List
Searching for a target value follows these steps:
- Start at the highest level of the list
- Move forward as long as the next node’s value is less than the target
- When you can’t move forward, descend one level
- Repeat until you reach the bottom level
- Check if the current node matches the target
This process lets you skip vast portions of the data. At each level, you’re essentially making a binary search among the elements at that level. With roughly log₂(n) levels and O(1) movement per level, search takes O(log n) expected time.
Insertion: Building the Express Lanes
Insertion is where the magic happens. When you insert a new element:
- Find where it should go in the bottom layer (standard linked list insertion)
- Randomly determine how many “express lanes” it should appear in
- Insert it into those corresponding higher layers
The random selection uses a geometric distribution—each level has a 50% chance of including the new element, assuming a 1/2 probability. This ensures that roughly half the elements appear in level 1, a quarter in level 2, an eighth in level 3, and so on.
This probabilistic approach means:
- Higher levels have fewer elements, creating longer “jumps”
- The overall structure maintains O(log n) expected height
- No complex rebalancing operations are needed
Deletion in Skip Lists
Deletion is straightforward:
- Find the element in all layers where it appears
- Remove it from each layer’s linked list
If the element doesn’t exist in a particular layer, you simply skip it. No complex rebalancing is needed because you’re just removing nodes—you never have to “fill in” gaps like with B-Trees.
Implementing Skip Lists
The simplicity of skip lists is one of their greatest strengths. Here’s what you need:
Node structure: Each node contains a value and an array of forward pointers (one for each level where the node appears).
Level management: You need a way to generate random levels. A simple approach uses repeated coin flips until you get tails (or reach a maximum level).
Search algorithm: The core search logic is just a few lines—compare, move forward, or descend.
Compare this to red-black trees, which require multiple rotation cases and careful color management. Skip lists achieve similar performance with a fraction of the code.
Time Complexity Analysis
Skip lists provide expected O(log n) time complexity for all major operations:
- Search: O(log n) expected
- Insert: O(log n) expected
- Delete: O(log n) expected
- Range query: O(log n + k) where k is the number of elements in the range
The key word is “expected.” While worst-case performance can be O(n), the probability of such worst cases is astronomically small with proper implementation. The mathematical analysis shows that a skip list with n elements has expected height of O(log n).
Space Complexity
A skip list uses more space than a single linked list because elements appear in multiple layers. On average, each element appears in 2 layers (since the sum of 1/2 + 1/4 + 1/8 + … = 1).
This gives O(n) space complexity, with the constant factor being roughly 2x a regular linked list. This trade-off is usually acceptable given the performance benefits.
Skip Lists in Practice
Skip lists are used in several real-world systems:
Redis uses skip lists for ordered sets, providing fast sorted operations while maintaining O(log n) insertion and retrieval.
Lucene (the search library behind Elasticsearch) uses skip lists in its inverted index for fast intersection of posting lists.
LevelDB, RocksDB and other key-value stores use skip lists as their underlying memtable structure.
The simplicity of skip lists also makes them excellent for teaching balanced tree concepts. Students can understand skip lists more easily than red-black trees while learning similar algorithmic concepts.
When to Use Skip Lists
Skip lists are excellent when:
- You need O(log n) operations with simpler implementation than balanced trees
- Memory is available for the extra levels (roughly 2x a linked list)
- You need ordered data with fast insertions and deletions
- Range queries are important (skip lists handle these well)
They might not be the best choice when:
- Worst-case guarantees must be strictly enforced (use arrays or balanced trees)
- Memory is extremely constrained
- You need persistent/immutable data structures
Probabilistic Analysis: Expected Levels
The expected height of a skip list with n elements and probability p (typically 0.5) is:
E[height] = log_(1/p) (n) = log₂(n) when p = 0.5
For p = 0.5, the expected number of levels across all nodes is n/(1-p) = 2n, giving O(n) expected space.
Probability Distribution
The probability that a node reaches level L is p^(L-1). For p = 0.5:
- Level 1: 100% of nodes
- Level 2: 50% of nodes
- Level 3: 25% of nodes
- Level 4: 12.5% of nodes
- Level k: 1/2^(k-1) of nodes
The maximum level for n elements is approximately log_(1/p)(n) with high probability.
from math import log2
def expected_height(n, p=0.5):
return log2(n) if p == 0.5 else log(n, 1/p)
def max_level_probability(n, p=0.5):
"""Probability that max level exceeds 3 * expected_height."""
expected = expected_height(n, p)
return (n * p ** (3 * expected - 1)) if expected > 0 else 0
Full Search, Insert, and Delete Algorithms
Complete Python Implementation
import random
class SkipListNode:
def __init__(self, value, level):
self.value = value
self.forward = [None] * (level + 1)
class SkipList:
def __init__(self, max_level=16, p=0.5):
self.max_level = max_level
self.p = p
self.header = SkipListNode(None, max_level)
self.level = 0
def _random_level(self):
level = 0
while random.random() < self.p and level < self.max_level:
level += 1
return level
def search(self, target):
current = self.header
for i in range(self.level, -1, -1):
while current.forward[i] and current.forward[i].value < target:
current = current.forward[i]
current = current.forward[0]
return current is not None and current.value == target
def insert(self, value):
update = [None] * (self.max_level + 1)
current = self.header
for i in range(self.level, -1, -1):
while current.forward[i] and current.forward[i].value < value:
current = current.forward[i]
update[i] = current
new_level = self._random_level()
if new_level > self.level:
for i in range(self.level + 1, new_level + 1):
update[i] = self.header
self.level = new_level
new_node = SkipListNode(value, new_level)
for i in range(new_level + 1):
new_node.forward[i] = update[i].forward[i]
update[i].forward[i] = new_node
def delete(self, value):
update = [None] * (self.max_level + 1)
current = self.header
for i in range(self.level, -1, -1):
while current.forward[i] and current.forward[i].value < value:
current = current.forward[i]
update[i] = current
target = current.forward[0]
if target and target.value == value:
for i in range(self.level + 1):
if update[i].forward[i] != target:
break
update[i].forward[i] = target.forward[i]
while self.level > 0 and self.header.forward[self.level] is None:
self.level -= 1
def display(self):
for i in range(self.level, -1, -1):
current = self.header.forward[i]
nodes = []
while current:
nodes.append(str(current.value))
current = current.forward[i]
print(f"Level {i}: {' -> '.join(nodes) if nodes else 'None'}")
Java Implementation
import java.util.Random;
public class SkipList<T extends Comparable<T>> {
private static final double P = 0.5;
private static final int MAX_LEVEL = 16;
private final Node<T> header = new Node<>(null, MAX_LEVEL);
private int level = 0;
private final Random random = new Random();
private static class Node<T> {
T value;
Node<T>[] forward;
@SuppressWarnings("unchecked")
Node(T value, int level) {
this.value = value;
this.forward = new Node[level + 1];
}
}
private int randomLevel() {
int lvl = 0;
while (random.nextDouble() < P && lvl < MAX_LEVEL)
lvl++;
return lvl;
}
public boolean search(T target) {
Node<T> current = header;
for (int i = level; i >= 0; i--) {
while (current.forward[i] != null
&& current.forward[i].value.compareTo(target) < 0)
current = current.forward[i];
}
current = current.forward[0];
return current != null && current.value.equals(target);
}
public void insert(T value) {
@SuppressWarnings("unchecked")
Node<T>[] update = new Node[MAX_LEVEL + 1];
Node<T> current = header;
for (int i = level; i >= 0; i--) {
while (current.forward[i] != null
&& current.forward[i].value.compareTo(value) < 0)
current = current.forward[i];
update[i] = current;
}
int newLevel = randomLevel();
if (newLevel > level) {
for (int i = level + 1; i <= newLevel; i++)
update[i] = header;
level = newLevel;
}
Node<T> newNode = new Node<>(value, newLevel);
for (int i = 0; i <= newLevel; i++) {
newNode.forward[i] = update[i].forward[i];
update[i].forward[i] = newNode;
}
}
public void delete(T value) {
@SuppressWarnings("unchecked")
Node<T>[] update = new Node[MAX_LEVEL + 1];
Node<T> current = header;
for (int i = level; i >= 0; i--) {
while (current.forward[i] != null
&& current.forward[i].value.compareTo(value) < 0)
current = current.forward[i];
update[i] = current;
}
Node<T> target = current.forward[0];
if (target != null && target.value.equals(value)) {
for (int i = 0; i <= level; i++) {
if (update[i].forward[i] != target)
break;
update[i].forward[i] = target.forward[i];
}
while (level > 0 && header.forward[level] == null)
level--;
}
}
}
Concurrent Skip Lists
Skip lists naturally support concurrent operations better than balanced trees because updates only affect local pointers, not global structure.
import java.util.concurrent.atomic.AtomicMarkableReference;
public class LockFreeSkipList<T extends Comparable<T>> {
static class Node<T> {
T value;
AtomicMarkableReference<Node<T>>[] next;
int topLevel;
@SuppressWarnings("unchecked")
Node(T value, int height) {
this.value = value;
this.topLevel = height;
this.next = new AtomicMarkableReference[height + 1];
for (int i = 0; i <= height; i++)
this.next[i] = new AtomicMarkableReference<>(null, false);
}
}
// Lock-free find, insert, delete using CAS operations
// See The Art of Multiprocessor Programming for full implementation
}
Java’s ConcurrentSkipListMap and ConcurrentSkipListSet (JDK 1.6+) are production-ready concurrent skip lists providing sorted, thread-safe collections.
Comparison with Balanced BSTs
| Feature | Skip List | Red-Black Tree | AVL Tree |
|---|---|---|---|
| Search (avg) | O(log n) | O(log n) | O(log n) |
| Insert (avg) | O(log n) | O(log n) | O(log n) |
| Delete (avg) | O(log n) | O(log n) | O(log n) |
| Implementation complexity | Low | High | High |
| Space overhead | ~2x linked list | ~3x pointers | ~3x pointers |
| Concurrency | Excellent | Difficult | Difficult |
| Range queries | O(log n + k) | O(log n + k) | O(log n + k) |
| Worst-case guaranteed | No (probabilistic) | Yes | Yes |
| Rebalancing required | No | Yes (rotations) | Yes (rotations) |
Redis Sorted Sets
Redis uses skip lists as the underlying data structure for sorted sets (ZSET). Key operations:
# Add members with scores
ZADD leaderboard 100 "alice"
ZADD leaderboard 200 "bob"
ZADD leaderboard 150 "charlie"
# Range queries (ascending)
ZRANGE leaderboard 0 -1 WITHSCORES
# Range by score
ZRANGEBYSCORE leaderboard 100 200
# Get rank
ZRANK leaderboard "bob"
# Get score
ZSCORE leaderboard "alice"
# Count elements in score range
ZCOUNT leaderboard 0 150
Redis chose skip lists over BSTs for sorted sets because:
- Skip lists are simpler to implement and debug
- Range queries are naturally efficient (linked list structure)
- The probabilistic approach performs well in practice
- Memory overhead is acceptable
Memory Layout and Performance
Skip list nodes are allocated individually, leading to potential memory fragmentation. However, the memory overhead per node is still manageable:
def estimate_memory(n, p=0.5, max_level=32):
"""Estimate memory usage of a skip list in bytes."""
pointer_size = 8 # 64-bit system
value_size = 8 # pointer to value object
overhead = 16 # Python object overhead (simplified)
total_pointers = n / (1 - p) # Expected total forward pointers
total_bytes = n * (value_size + overhead) + total_pointers * pointer_size
return total_bytes
# Example: 1M integers
mem_skip = estimate_memory(1_000_000)
mem_array = 1_000_000 * 28 # Python int object
mem_linked = 1_000_000 * (28 + 8) # value + next pointer
For a list of 1 million integers, a skip list uses approximately 2x memory of a singly linked list and 3-4x memory of an array. The performance gains typically justify this overhead.
Indexed Skip Lists (by Rank)
A common extension is supporting access by position (rank) in addition to by value:
class IndexedSkipListNode:
def __init__(self, value, level):
self.value = value
self.forward = [None] * (level + 1)
self.span = [0] * (level + 1) # Number of nodes skipped
class IndexedSkipList(SkipList):
"""Skip list supporting O(log n) access by rank."""
def _random_level(self):
level = 0
while random.random() < self.p and level < self.max_level:
level += 1
return level
def insert(self, value):
update = [None] * (self.max_level + 1)
rank = [0] * (self.max_level + 1)
current = self.header
for i in range(self.level, -1, -1):
rank[i] = rank[i+1] if i < self.level else 0
while current.forward[i] and current.forward[i].value < value:
rank[i] += current.span[i]
current = current.forward[i]
update[i] = current
new_level = self._random_level()
if new_level > self.level:
for i in range(self.level + 1, new_level + 1):
rank[i] = 0
update[i] = self.header
update[i].span[i] = self._size
self.level = new_level
new_node = IndexedSkipListNode(value, new_level)
for i in range(new_level + 1):
new_node.forward[i] = update[i].forward[i]
update[i].forward[i] = new_node
new_node.span[i] = update[i].span[i] - (rank[0] - rank[i])
update[i].span[i] = (rank[0] - rank[i]) + 1
for i in range(new_level + 1, self.level + 1):
update[i].span[i] += 1
self._size += 1
def get_by_rank(self, r):
"""Get element at position r (1-indexed)."""
current = self.header
for i in range(self.level, -1, -1):
while current.forward[i] and r > current.span[i]:
r -= current.span[i]
current = current.forward[i]
return current.forward[0].value
def find_rank(self, value):
"""Find rank of element by value. Returns 0 if not found."""
rank = 0
current = self.header
for i in range(self.level, -1, -1):
while current.forward[i] and current.forward[i].value < value:
rank += current.span[i]
current = current.forward[i]
current = current.forward[0]
if current and current.value == value:
return rank + 1
return 0
Deterministic Skip Lists
While standard skip lists use randomization, deterministic 1-2-3 skip lists use fixed rules to ensure O(log n) worst-case behavior:
class DeterministicSkipList:
"""1-2-3 deterministic skip list using promotion rules."""
# Each gap between nodes at level i can contain 1, 2, or 3 nodes at level i-1.
# When a gap reaches 4, the middle node is promoted.
# This guarantees O(log n) worst-case without probability.
pass # Implementation follows similar structure with promotion logic
Comparison with Other Ordered Data Structures
| Operation | Skip List | B-Tree | Red-Black Tree | Sorted Array |
|---|---|---|---|---|
| Search | O(log n) avg | O(log n) | O(log n) | O(log n) bin |
| Insert | O(log n) avg | O(log n) | O(log n) | O(n) |
| Delete | O(log n) avg | O(log n) | O(log n) | O(n) |
| Range query | O(log n + k) | O(log n + k) | O(log n + k) | O(k) + binary |
| Sequential iter | O(1) per step | O(1) per step | O(log n) per step | O(1) |
| Concurrency | Excellent | Good | Poor | Read-only |
| Memory | ~2n pointers | ~3n pointers | ~3n pointers | n values |
| Cache locality | Poor | Excellent (disk) | Poor | Excellent |
| Implementation | Simple | Complex | Very complex | Trivial |
External Resources
- Skip Lists: A Probabilistic Alternative to Balanced Trees - Original paper by William Pugh
- GeeksforGeeks - Skip List
- Redis Documentation - Sorted Sets
- OpenDSA - Skip Lists
Conclusion
Skip lists demonstrate that sometimes simpler approaches can match or exceed more complex ones. By embracing randomization, they achieve the same performance as carefully balanced trees with a fraction of the implementation complexity.
The next time you need an ordered data structure with fast operations, consider skip lists. Their elegance, efficiency, and simplicity make them an excellent choice for many applications.
Comments