Skip Lists: Simplicity Meets Efficiency

Introduction

Most data structures require trade-offs. Arrays offer fast access but slow insertions. Linked lists make insertions easy but searching is slow. Binary search trees provide balanced performance, but implementing them correctly is notoriously tricky.

Enter skip lists—a data structure that achieves the best of multiple worlds. With expected O(log n) time complexity for search, insert, and delete operations, skip lists rival balanced trees while being significantly simpler to implement. Their elegance and efficiency have made them favorites in practice, with Redis using them for ordered sets.

This article explores skip lists, explaining how they work, why they’re efficient, and when you might choose them over alternatives.

The Problem with Linked Lists

Imagine you’re searching for an item in a sorted linked list. Even though the data is sorted, you have no choice but to scan from the beginning, checking each node until you find your target. This gives O(n) search time—slow for large lists.

What if you could create “express lanes” that let you skip over large portions of the list? That’s exactly what skip lists do.

How Skip Lists Work

A skip list is a layered linked list where:

The bottom layer contains all elements in sorted order
Each higher layer acts as an “express lane,” containing a subset of elements from below
Elements in higher layers are chosen randomly (typically with 50% probability)
Searching starts from the highest layer and descends progressively

Think of it like a highway system. To travel from city A to city Z, you could take the local road through every town, or you could take highways that bypass many towns. Skip lists create multiple “highways” through the data.

Searching in a Skip List

Searching for a target value follows these steps:

Start at the highest level of the list
Move forward as long as the next node’s value is less than the target
When you can’t move forward, descend one level
Repeat until you reach the bottom level
Check if the current node matches the target

This process lets you skip vast portions of the data. At each level, you’re essentially making a binary search among the elements at that level. With roughly log₂(n) levels and O(1) movement per level, search takes O(log n) expected time.

Insertion: Building the Express Lanes

Insertion is where the magic happens. When you insert a new element:

Find where it should go in the bottom layer (standard linked list insertion)
Randomly determine how many “express lanes” it should appear in
Insert it into those corresponding higher layers

The random selection uses a geometric distribution—each level has a 50% chance of including the new element, assuming a 1/2 probability. This ensures that roughly half the elements appear in level 1, a quarter in level 2, an eighth in level 3, and so on.

This probabilistic approach means:

Higher levels have fewer elements, creating longer “jumps”
The overall structure maintains O(log n) expected height
No complex rebalancing operations are needed

Deletion in Skip Lists

Deletion is straightforward:

Find the element in all layers where it appears
Remove it from each layer’s linked list

If the element doesn’t exist in a particular layer, you simply skip it. No complex rebalancing is needed because you’re just removing nodes—you never have to “fill in” gaps like with B-Trees.

Implementing Skip Lists

The simplicity of skip lists is one of their greatest strengths. Here’s what you need:

Node structure: Each node contains a value and an array of forward pointers (one for each level where the node appears).

Level management: You need a way to generate random levels. A simple approach uses repeated coin flips until you get tails (or reach a maximum level).

Search algorithm: The core search logic is just a few lines—compare, move forward, or descend.

Compare this to red-black trees, which require multiple rotation cases and careful color management. Skip lists achieve similar performance with a fraction of the code.

Time Complexity Analysis

Skip lists provide expected O(log n) time complexity for all major operations:

Search: O(log n) expected
Insert: O(log n) expected
Delete: O(log n) expected
Range query: O(log n + k) where k is the number of elements in the range

The key word is “expected.” While worst-case performance can be O(n), the probability of such worst cases is astronomically small with proper implementation. The mathematical analysis shows that a skip list with n elements has expected height of O(log n).

Space Complexity

A skip list uses more space than a single linked list because elements appear in multiple layers. On average, each element appears in 2 layers (since the sum of 1/2 + 1/4 + 1/8 + … = 1).

This gives O(n) space complexity, with the constant factor being roughly 2x a regular linked list. This trade-off is usually acceptable given the performance benefits.

Skip Lists in Practice

Skip lists are used in several real-world systems:

Redis uses skip lists for ordered sets, providing fast sorted operations while maintaining O(log n) insertion and retrieval.

Lucene (the search library behind Elasticsearch) uses skip lists in its inverted index for fast intersection of posting lists.

LevelDB, RocksDB and other key-value stores use skip lists as their underlying memtable structure.

The simplicity of skip lists also makes them excellent for teaching balanced tree concepts. Students can understand skip lists more easily than red-black trees while learning similar algorithmic concepts.

When to Use Skip Lists

Skip lists are excellent when:

You need O(log n) operations with simpler implementation than balanced trees
Memory is available for the extra levels (roughly 2x a linked list)
You need ordered data with fast insertions and deletions
Range queries are important (skip lists handle these well)

They might not be the best choice when:

Worst-case guarantees must be strictly enforced (use arrays or balanced trees)
Memory is extremely constrained
You need persistent/immutable data structures

Probabilistic Analysis: Expected Levels

The expected height of a skip list with n elements and probability p (typically 0.5) is:

E[height] = log_(1/p) (n) = log₂(n) when p = 0.5

For p = 0.5, the expected number of levels across all nodes is n/(1-p) = 2n, giving O(n) expected space.

Probability Distribution

The probability that a node reaches level L is p^(L-1). For p = 0.5:

Level 1: 100% of nodes
Level 2: 50% of nodes
Level 3: 25% of nodes
Level 4: 12.5% of nodes
Level k: 1/2^(k-1) of nodes

The maximum level for n elements is approximately log_(1/p)(n) with high probability.

from math import log2

def expected_height(n, p=0.5):
    return log2(n) if p == 0.5 else log(n, 1/p)

def max_level_probability(n, p=0.5):
    """Probability that max level exceeds 3 * expected_height."""
    expected = expected_height(n, p)
    return (n * p ** (3 * expected - 1)) if expected > 0 else 0

Full Search, Insert, and Delete Algorithms

Complete Python Implementation

import random

class SkipListNode:
    def __init__(self, value, level):
        self.value = value
        self.forward = [None] * (level + 1)

class SkipList:
    def __init__(self, max_level=16, p=0.5):
        self.max_level = max_level
        self.p = p
        self.header = SkipListNode(None, max_level)
        self.level = 0

    def _random_level(self):
        level = 0
        while random.random() < self.p and level < self.max_level:
            level += 1
        return level

    def search(self, target):
        current = self.header
        for i in range(self.level, -1, -1):
            while current.forward[i] and current.forward[i].value < target:
                current = current.forward[i]
        current = current.forward[0]
        return current is not None and current.value == target

    def insert(self, value):
        update = [None] * (self.max_level + 1)
        current = self.header
        for i in range(self.level, -1, -1):
            while current.forward[i] and current.forward[i].value < value:
                current = current.forward[i]
            update[i] = current
        new_level = self._random_level()
        if new_level > self.level:
            for i in range(self.level + 1, new_level + 1):
                update[i] = self.header
            self.level = new_level
        new_node = SkipListNode(value, new_level)
        for i in range(new_level + 1):
            new_node.forward[i] = update[i].forward[i]
            update[i].forward[i] = new_node

    def delete(self, value):
        update = [None] * (self.max_level + 1)
        current = self.header
        for i in range(self.level, -1, -1):
            while current.forward[i] and current.forward[i].value < value:
                current = current.forward[i]
            update[i] = current
        target = current.forward[0]
        if target and target.value == value:
            for i in range(self.level + 1):
                if update[i].forward[i] != target:
                    break
                update[i].forward[i] = target.forward[i]
            while self.level > 0 and self.header.forward[self.level] is None:
                self.level -= 1

    def display(self):
        for i in range(self.level, -1, -1):
            current = self.header.forward[i]
            nodes = []
            while current:
                nodes.append(str(current.value))
                current = current.forward[i]
            print(f"Level {i}: {' -> '.join(nodes) if nodes else 'None'}")

Java Implementation

import java.util.Random;

public class SkipList<T extends Comparable<T>> {
    private static final double P = 0.5;
    private static final int MAX_LEVEL = 16;
    private final Node<T> header = new Node<>(null, MAX_LEVEL);
    private int level = 0;
    private final Random random = new Random();

    private static class Node<T> {
        T value;
        Node<T>[] forward;
        @SuppressWarnings("unchecked")
        Node(T value, int level) {
            this.value = value;
            this.forward = new Node[level + 1];
        }
    }

    private int randomLevel() {
        int lvl = 0;
        while (random.nextDouble() < P && lvl < MAX_LEVEL)
            lvl++;
        return lvl;
    }

    public boolean search(T target) {
        Node<T> current = header;
        for (int i = level; i >= 0; i--) {
            while (current.forward[i] != null
                    && current.forward[i].value.compareTo(target) < 0)
                current = current.forward[i];
        }
        current = current.forward[0];
        return current != null && current.value.equals(target);
    }

    public void insert(T value) {
        @SuppressWarnings("unchecked")
        Node<T>[] update = new Node[MAX_LEVEL + 1];
        Node<T> current = header;
        for (int i = level; i >= 0; i--) {
            while (current.forward[i] != null
                    && current.forward[i].value.compareTo(value) < 0)
                current = current.forward[i];
            update[i] = current;
        }
        int newLevel = randomLevel();
        if (newLevel > level) {
            for (int i = level + 1; i <= newLevel; i++)
                update[i] = header;
            level = newLevel;
        }
        Node<T> newNode = new Node<>(value, newLevel);
        for (int i = 0; i <= newLevel; i++) {
            newNode.forward[i] = update[i].forward[i];
            update[i].forward[i] = newNode;
        }
    }

    public void delete(T value) {
        @SuppressWarnings("unchecked")
        Node<T>[] update = new Node[MAX_LEVEL + 1];
        Node<T> current = header;
        for (int i = level; i >= 0; i--) {
            while (current.forward[i] != null
                    && current.forward[i].value.compareTo(value) < 0)
                current = current.forward[i];
            update[i] = current;
        }
        Node<T> target = current.forward[0];
        if (target != null && target.value.equals(value)) {
            for (int i = 0; i <= level; i++) {
                if (update[i].forward[i] != target)
                    break;
                update[i].forward[i] = target.forward[i];
            }
            while (level > 0 && header.forward[level] == null)
                level--;
        }
    }
}

Concurrent Skip Lists

Skip lists naturally support concurrent operations better than balanced trees because updates only affect local pointers, not global structure.

import java.util.concurrent.atomic.AtomicMarkableReference;

public class LockFreeSkipList<T extends Comparable<T>> {
    static class Node<T> {
        T value;
        AtomicMarkableReference<Node<T>>[] next;
        int topLevel;

        @SuppressWarnings("unchecked")
        Node(T value, int height) {
            this.value = value;
            this.topLevel = height;
            this.next = new AtomicMarkableReference[height + 1];
            for (int i = 0; i <= height; i++)
                this.next[i] = new AtomicMarkableReference<>(null, false);
        }
    }
    // Lock-free find, insert, delete using CAS operations
    // See The Art of Multiprocessor Programming for full implementation
}

Java’s ConcurrentSkipListMap and ConcurrentSkipListSet (JDK 1.6+) are production-ready concurrent skip lists providing sorted, thread-safe collections.

Comparison with Balanced BSTs

Feature	Skip List	Red-Black Tree	AVL Tree
Search (avg)	O(log n)	O(log n)	O(log n)
Insert (avg)	O(log n)	O(log n)	O(log n)
Delete (avg)	O(log n)	O(log n)	O(log n)
Implementation complexity	Low	High	High
Space overhead	~2x linked list	~3x pointers	~3x pointers
Concurrency	Excellent	Difficult	Difficult
Range queries	O(log n + k)	O(log n + k)	O(log n + k)
Worst-case guaranteed	No (probabilistic)	Yes	Yes
Rebalancing required	No	Yes (rotations)	Yes (rotations)

Redis Sorted Sets

Redis uses skip lists as the underlying data structure for sorted sets (ZSET). Key operations:

# Add members with scores
ZADD leaderboard 100 "alice"
ZADD leaderboard 200 "bob"
ZADD leaderboard 150 "charlie"

# Range queries (ascending)
ZRANGE leaderboard 0 -1 WITHSCORES

# Range by score
ZRANGEBYSCORE leaderboard 100 200

# Get rank
ZRANK leaderboard "bob"

# Get score
ZSCORE leaderboard "alice"

# Count elements in score range
ZCOUNT leaderboard 0 150

Redis chose skip lists over BSTs for sorted sets because:

Skip lists are simpler to implement and debug
Range queries are naturally efficient (linked list structure)
The probabilistic approach performs well in practice
Memory overhead is acceptable

Memory Layout and Performance

Skip list nodes are allocated individually, leading to potential memory fragmentation. However, the memory overhead per node is still manageable:

def estimate_memory(n, p=0.5, max_level=32):
    """Estimate memory usage of a skip list in bytes."""
    pointer_size = 8  # 64-bit system
    value_size = 8    # pointer to value object
    overhead = 16     # Python object overhead (simplified)
    total_pointers = n / (1 - p)  # Expected total forward pointers
    total_bytes = n * (value_size + overhead) + total_pointers * pointer_size
    return total_bytes

# Example: 1M integers
mem_skip = estimate_memory(1_000_000)
mem_array = 1_000_000 * 28  # Python int object
mem_linked = 1_000_000 * (28 + 8)  # value + next pointer

For a list of 1 million integers, a skip list uses approximately 2x memory of a singly linked list and 3-4x memory of an array. The performance gains typically justify this overhead.

Indexed Skip Lists (by Rank)

A common extension is supporting access by position (rank) in addition to by value:

class IndexedSkipListNode:
    def __init__(self, value, level):
        self.value = value
        self.forward = [None] * (level + 1)
        self.span = [0] * (level + 1)  # Number of nodes skipped

class IndexedSkipList(SkipList):
    """Skip list supporting O(log n) access by rank."""

    def _random_level(self):
        level = 0
        while random.random() < self.p and level < self.max_level:
            level += 1
        return level

    def insert(self, value):
        update = [None] * (self.max_level + 1)
        rank = [0] * (self.max_level + 1)
        current = self.header
        for i in range(self.level, -1, -1):
            rank[i] = rank[i+1] if i < self.level else 0
            while current.forward[i] and current.forward[i].value < value:
                rank[i] += current.span[i]
                current = current.forward[i]
            update[i] = current
        new_level = self._random_level()
        if new_level > self.level:
            for i in range(self.level + 1, new_level + 1):
                rank[i] = 0
                update[i] = self.header
                update[i].span[i] = self._size
            self.level = new_level
        new_node = IndexedSkipListNode(value, new_level)
        for i in range(new_level + 1):
            new_node.forward[i] = update[i].forward[i]
            update[i].forward[i] = new_node
            new_node.span[i] = update[i].span[i] - (rank[0] - rank[i])
            update[i].span[i] = (rank[0] - rank[i]) + 1
        for i in range(new_level + 1, self.level + 1):
            update[i].span[i] += 1
        self._size += 1

    def get_by_rank(self, r):
        """Get element at position r (1-indexed)."""
        current = self.header
        for i in range(self.level, -1, -1):
            while current.forward[i] and r > current.span[i]:
                r -= current.span[i]
                current = current.forward[i]
        return current.forward[0].value

    def find_rank(self, value):
        """Find rank of element by value. Returns 0 if not found."""
        rank = 0
        current = self.header
        for i in range(self.level, -1, -1):
            while current.forward[i] and current.forward[i].value < value:
                rank += current.span[i]
                current = current.forward[i]
        current = current.forward[0]
        if current and current.value == value:
            return rank + 1
        return 0

Deterministic Skip Lists

While standard skip lists use randomization, deterministic 1-2-3 skip lists use fixed rules to ensure O(log n) worst-case behavior:

class DeterministicSkipList:
    """1-2-3 deterministic skip list using promotion rules."""
    # Each gap between nodes at level i can contain 1, 2, or 3 nodes at level i-1.
    # When a gap reaches 4, the middle node is promoted.
    # This guarantees O(log n) worst-case without probability.
    pass  # Implementation follows similar structure with promotion logic

Comparison with Other Ordered Data Structures

Operation	Skip List	B-Tree	Red-Black Tree	Sorted Array
Search	O(log n) avg	O(log n)	O(log n)	O(log n) bin
Insert	O(log n) avg	O(log n)	O(log n)	O(n)
Delete	O(log n) avg	O(log n)	O(log n)	O(n)
Range query	O(log n + k)	O(log n + k)	O(log n + k)	O(k) + binary
Sequential iter	O(1) per step	O(1) per step	O(log n) per step	O(1)
Concurrency	Excellent	Good	Poor	Read-only
Memory	~2n pointers	~3n pointers	~3n pointers	n values
Cache locality	Poor	Excellent (disk)	Poor	Excellent
Implementation	Simple	Complex	Very complex	Trivial

External Resources

Skip Lists: A Probabilistic Alternative to Balanced Trees - Original paper by William Pugh
GeeksforGeeks - Skip List
Redis Documentation - Sorted Sets
OpenDSA - Skip Lists

Conclusion

Skip lists demonstrate that sometimes simpler approaches can match or exceed more complex ones. By embracing randomization, they achieve the same performance as carefully balanced trees with a fraction of the implementation complexity.

The next time you need an ordered data structure with fast operations, consider skip lists. Their elegance, efficiency, and simplicity make them an excellent choice for many applications.

Skip Lists: Simplicity Meets Efficiency

Introduction

The Problem with Linked Lists

How Skip Lists Work

Searching in a Skip List

Insertion: Building the Express Lanes

Deletion in Skip Lists

Implementing Skip Lists

Time Complexity Analysis

Space Complexity

Skip Lists in Practice

When to Use Skip Lists

Probabilistic Analysis: Expected Levels

Probability Distribution

Full Search, Insert, and Delete Algorithms

Complete Python Implementation

Java Implementation

Concurrent Skip Lists

Comparison with Balanced BSTs

Redis Sorted Sets

Memory Layout and Performance

Indexed Skip Lists (by Rank)

Deterministic Skip Lists

Comparison with Other Ordered Data Structures

External Resources

Conclusion

Comments

Share this article

👍 Was this article helpful?