Union-Find: Solving Connectivity Problems Efficiently

Introduction

Consider these problems: Are two people in the same social network connected? Are two pixels in the same region of an image? Is adding this road to the highway network going to create a cycle? These seemingly different questions share a common theme—they’re all about connectivity.

The Union-Find data structure, also known as Disjoint Set Union (DSU), provides an elegant and efficient solution for such connectivity problems. Despite its simplicity, it’s one of the most useful data structures in computer science, forming the backbone of algorithms like Kruskal’s minimum spanning tree and network connectivity analysis.

This article explores Union-Find, explaining how it works and why it’s so efficient.

The Basic Problem

We start with n elements, each in its own set. We need to support two operations:

Find: Given an element, identify which set it belongs to (usually represented by a “representative” element)

Union: Merge two sets into one

That’s it. Just these two operations, but they enable solving remarkably complex problems.

Naive Approaches and Their Problems

You could represent sets with linked lists or arrays, but performance suffers:

Finding the set representative could take O(n)
Merging sets might require updating all elements
Repeated operations become prohibitively expensive

For large datasets, this is unacceptable. We need something smarter.

Union-Find: The Smart Approach

Union-Find uses two key optimizations that make operations nearly constant time (amortized):

Path compression: When we find an element’s representative, we make a direct path to the root for all visited elements. This flattens the structure over time.

Union by rank/size: When merging sets, we attach the smaller tree under the larger one. This keeps the tree balanced.

With these optimizations, operations take almost O(1) amortized time—specifically α(n), the inverse Ackermann function, which is effectively constant for all practical values of n.

Implementation Details

Data structure: We maintain an array parent[i] where i is each element. The parent of the representative element points to itself.

Find with path compression:

def find(x):
    if parent[x] != x:
        parent[x] = find(parent[x])  # Path compression
    return parent[x]

Union by size:

def union(x, y):
    root_x = find(x)
    root_y = find(y)
    if root_x != root_y:
        if size[root_x] < size[root_y]:
            root_x, root_y = root_y, root_x
        parent[root_y] = root_x
        size[root_x] += size[root_y]

Understanding Path Compression

Path compression is elegant. When you call find(x), you recursively find the root. While unwinding the recursion, you update each node’s parent to point directly to the root.

This means the first find might be O(log n), but subsequent finds become faster. Over time, the tree becomes nearly flat, and operations become constant time.

The mathematical proof that this gives amortized O(α(n)) complexity is one of the most elegant results in data structure analysis.

Real-World Applications

Union-Find appears in many practical applications:

Kruskal’s algorithm: For finding minimum spanning trees, we sort edges by weight and add them if they connect different components (using Union-Find to check).

Network connectivity: Analyzing whether adding a connection would create cycles, or determining connected components in a network.

Image processing: Finding connected components in binary images for segmentation.

Social networks: Determining friend groups and detecting communities.

Dependency resolution: Package managers use Union-Find to resolve dependencies and detect circular dependencies.

Solving Problems with Union-Find

Here’s how you’d determine if two elements are connected:

class UnionFind:
    def __init__(self, n):
        self.parent = list(range(n))
        self.size = [1] * n
    
    def find(self, x):
        if self.parent[x] != x:
            self.parent[x] = self.find(self.parent[x])
        return self.parent[x]
    
    def union(self, x, y):
        px, py = self.find(x), self.find(y)
        if px == py:
            return False  # Already connected
        if self.size[px] < self.size[py]:
            px, py = py, px
        self.parent[py] = px
        self.size[px] += self.size[py]
        return True
    
    def connected(self, x, y):
        return self.find(x) == self.find(y)

Advanced Variations

Weighted Union-Find: Track additional information like component size or weight to optimize decisions.

Union-Find with rollback: Support undoing operations, useful in certain algorithms like offline queries.

Union-Find as graph: Sometimes viewing Union-Find as building a dynamic graph helps conceptualize problems.

Why It’s So Efficient

The key insight is that we don’t need to maintain the actual set, just connectivity. We sacrifice some information (we can’t list all elements in a set efficiently) but gain massive speed.

The inverse Ackermann function α(n) grows so slowly that for n = 10^6, α(n) ≈ 4, and for n = 10^9, α(n) ≈ 5. This is effectively constant time.

Union by Rank vs Union by Size

Union by Rank

Union by rank uses the tree height (rank) to decide which root becomes the parent:

def union_by_rank(x, y):
    rx, ry = find(x), find(y)
    if rx == ry:
        return False
    if rank[rx] < rank[ry]:
        rx, ry = ry, rx
    parent[ry] = rx
    if rank[rx] == rank[ry]:
        rank[rx] += 1
    return True

Union by Size

Union by size uses the number of elements:

def union_by_size(x, y):
    rx, ry = find(x), find(y)
    if rx == ry:
        return False
    if size[rx] < size[ry]:
        rx, ry = ry, rx
    parent[ry] = rx
    size[rx] += size[ry]
    return True

Comparison:

Criterion	Union by Rank	Union by Size
Information stored	Tree height	Component size
Memory	Same	Same
Time complexity	α(n)	α(n)
Additional utility	None	Can query component size

Path compression tends to flatten the tree, making rank an overestimate of actual height. This does not affect correctness.

Inverse Ackermann Complexity

The inverse Ackermann function α(n) is one of the slowest-growing functions in mathematics:

A(0, n) = n + 1 A(m, 0) = A(m-1, 1) A(m, n) = A(m-1, A(m, n-1))

A(0, 1) = 2 A(1, 1) = 3 A(2, 1) = 7 A(3, 1) = 2^65536 - 3 (already astronomical) A(4, 1) = 2^2^… (incomprehensibly large)

α(n) = the smallest m such that A(m, m) ≥ n

n	α(n)
10^2	3
10^10	4
2^65536	5

For any practical problem, α(n) ≤ 5. This is why Union-Find operations are called “almost O(1).”

Path Compression: Iterative vs Recursive

Recursive (shown above)

Uses call stack to flatten. Simple but risks stack overflow for deep trees.

Iterative Path Compression

def find_iterative(x):
    """Find root and compress path iteratively."""
    root = x
    while parent[root] != root:
        root = parent[root]
    while x != root:
        next_node = parent[x]
        parent[x] = root
        x = next_node
    return root

Two-Pass Path Compression

The iterative approach makes two passes: one to find the root, one to compress. Both the recursive and iterative approaches achieve the same α(n) amortized complexity.

Kruskal’s Minimum Spanning Tree

def kruskal(vertices, edges):
    """
    vertices: number of vertices
    edges: list of (weight, u, v)
    returns: list of edges in MST
    """
    uf = UnionFind(vertices)
    mst = []
    edges.sort()  # Sort by weight
    for weight, u, v in edges:
        if uf.union(u, v):  # Different components
            mst.append((u, v, weight))
    return mst

# Example usage
edges = [
    (10, 0, 1), (6, 0, 2), (5, 0, 3),
    (15, 1, 3), (4, 2, 3)
]
mst = kruskal(4, edges)
print(f"MST edges: {mst}")  # [(2, 3, 4), (0, 2, 5), (0, 1, 10)]

Kruskal’s algorithm runs in O(E log E) due to sorting, and the Union-Find operations add O(E α(V)) overhead.

Connected Components

def connected_components(graph, n):
    """Find all connected components in an undirected graph."""
    uf = UnionFind(n)
    for u in range(n):
        for v in graph[u]:
            uf.union(u, v)
    components = {}
    for v in range(n):
        root = uf.find(v)
        if root not in components:
            components[root] = []
        components[root].append(v)
    return list(components.values())

# Example: social network friends
graph = {
    0: [1, 2],
    1: [0],
    2: [0],
    3: [4],
    4: [3],
    5: []
}
comps = connected_components(graph, 6)
print(f"Components: {comps}")
# [[0, 1, 2], [3, 4], [5]]

Cycle Detection

Union-Find detects cycles in undirected graphs efficiently:

def has_cycle(edges, n):
    """Check if adding edges creates a cycle."""
    uf = UnionFind(n)
    for u, v in edges:
        if not uf.union(u, v):  # Already connected = cycle
            return True
    return False

edges = [(0, 1), (1, 2), (2, 0)]
print(has_cycle(edges, 3))  # True

This is used in:

Network design: Detecting loops when adding connections
Circuit analysis: Finding feedback loops
Deadlock detection: Resource allocation graphs

C++ Implementation

class UnionFind {
    vector<int> parent, sz;
public:
    UnionFind(int n) : parent(n), sz(n, 1) {
        for (int i = 0; i < n; i++)
            parent[i] = i;
    }

    int find(int x) {
        while (parent[x] != x) {
            parent[x] = parent[parent[x]];  // Path halving
            x = parent[x];
        }
        return x;
    }

    bool unite(int a, int b) {
        int ra = find(a), rb = find(b);
        if (ra == rb) return false;
        if (sz[ra] < sz[rb]) swap(ra, rb);
        parent[rb] = ra;
        sz[ra] += sz[rb];
        return true;
    }

    bool connected(int a, int b) {
        return find(a) == find(b);
    }

    int component_size(int x) {
        return sz[find(x)];
    }
};

Path Halving

Notice the C++ find uses path halving (parent[x] = parent[parent[x]]) instead of full compression. This is a middle ground that compresses less aggressively but reduces CPU overhead per find. It still achieves α(n) amortized complexity.

Image Processing: Connected Component Labeling

Union-Find is used in binary image segmentation to label connected regions:

def connected_component_labeling(image):
    """
    image: 2D binary numpy array (0 = background, 1 = foreground)
    returns: labeled image where each component has unique integer label
    """
    rows, cols = len(image), len(image[0])
    uf = UnionFind(rows * cols)

    def idx(r, c):
        return r * cols + c

    # First pass: union adjacent foreground pixels
    for r in range(rows):
        for c in range(cols):
            if image[r][c] == 1:
                if r > 0 and image[r-1][c] == 1:
                    uf.union(idx(r, c), idx(r-1, c))
                if c > 0 and image[r][c-1] == 1:
                    uf.union(idx(r, c), idx(r, c-1))

    # Label each pixel with its component root
    labels = {}
    labeled = [[0] * cols for _ in range(rows)]
    next_label = 1
    for r in range(rows):
        for c in range(cols):
            if image[r][c] == 1:
                root = uf.find(idx(r, c))
                if root not in labels:
                    labels[root] = next_label
                    next_label += 1
                labeled[r][c] = labels[root]
    return labeled

Applications:

Medical imaging: Identifying tumors, organs, or cells
Object detection: Separating distinct objects in a scene
Document analysis: Segmenting text regions from images
Robotics: Identifying obstacles in sensor data

Java Implementation

public class UnionFind {
    private int[] parent;
    private int[] rank;
    private int count;  // Number of components

    public UnionFind(int n) {
        parent = new int[n];
        rank = new int[n];
        count = n;
        for (int i = 0; i < n; i++)
            parent[i] = i;
    }

    public int find(int x) {
        if (parent[x] != x)
            parent[x] = find(parent[x]);
        return parent[x];
    }

    public boolean union(int x, int y) {
        int rx = find(x), ry = find(y);
        if (rx == ry) return false;
        if (rank[rx] < rank[ry]) {
            parent[rx] = ry;
        } else if (rank[rx] > rank[ry]) {
            parent[ry] = rx;
        } else {
            parent[ry] = rx;
            rank[rx]++;
        }
        count--;
        return true;
    }

    public int getCount() { return count; }

    public boolean connected(int x, int y) {
        return find(x) == find(y);
    }
}

LeetCode-Style Problems

Number of Islands

def num_islands(grid):
    if not grid:
        return 0
    rows, cols = len(grid), len(grid[0])
    uf = UnionFind(rows * cols)
    ones = 0
    for r in range(rows):
        for c in range(cols):
            if grid[r][c] == '1':
                ones += 1
                idx = r * cols + c
                if r > 0 and grid[r-1][c] == '1':
                    uf.union(idx, (r-1) * cols + c)
                if c > 0 and grid[r][c-1] == '1':
                    uf.union(idx, r * cols + (c-1))

    # Count distinct roots among '1' cells
    roots = set()
    for r in range(rows):
        for c in range(cols):
            if grid[r][c] == '1':
                roots.add(uf.find(r * cols + c))
    return len(roots)

Accounts Merge

def accounts_merge(accounts):
    """Merge Gmail accounts belonging to same person."""
    email_to_name = {}
    email_to_id = {}
    uf = UnionFind(10000)
    id_counter = 0
    for name, *emails in accounts:
        for email in emails:
            email_to_name[email] = name
            if email not in email_to_id:
                email_to_id[email] = id_counter
                id_counter += 1
            uf.union(email_to_id[emails[0]], email_to_id[email])
    result = defaultdict(list)
    for email in email_to_name:
        root = uf.find(email_to_id[email])
        result[root].append(email)
    return [[email_to_name[emails[0]]] + sorted(emails)
            for emails in result.values()]

External Resources

Conclusion

Union-Find demonstrates that simple ideas, properly optimized, can be extraordinarily powerful. With just two operations—find and union—we can solve connectivity problems that would otherwise be intractable.

The next time you encounter a problem about grouping, connectivity, or cycles, consider Union-Find. It’s likely the right tool, and its elegant implementation makes it a pleasure to use.

Union-Find: Solving Connectivity Problems Efficiently

Introduction

The Basic Problem

Naive Approaches and Their Problems

Union-Find: The Smart Approach

Implementation Details

Understanding Path Compression

Real-World Applications

Solving Problems with Union-Find

Advanced Variations

Why It’s So Efficient

Union by Rank vs Union by Size

Union by Rank

Union by Size

Inverse Ackermann Complexity

Path Compression: Iterative vs Recursive

Recursive (shown above)

Iterative Path Compression

Two-Pass Path Compression

Kruskal’s Minimum Spanning Tree

Connected Components

Cycle Detection

C++ Implementation

Path Halving

Image Processing: Connected Component Labeling

Java Implementation

LeetCode-Style Problems

Number of Islands

Accounts Merge

External Resources

Conclusion

Comments

Share this article

👍 Was this article helpful?