Introduction
If you thought about your family tree or an organizational chart, you’ve already encountered trees in the real world. In computer science, trees are hierarchical data structures that organize data in parent-child relationships—much like a family tree but with computational precision.
Trees are everywhere: file systems, XML/HTML documents, compiler abstract syntax trees, decision trees in machine learning, and binary search trees in databases. Understanding trees is fundamental to computer science.
This article explores tree fundamentals, types, and operations.
What Makes a Tree?
A tree is a connected acyclic graph. More practically:
- Each node (except the root) has exactly one parent
- Each node can have zero or more children
- There’s exactly one path between any two nodes
- The root is the single node without a parent
Key terminology:
- Root: Topmost node
- Leaf: Node with no children
- Internal node: Node with at least one child
- Height: Length of longest path from root to leaf
- Depth: Length of path from root to node
- Subtree: A node and all its descendants
Types of Trees
By number of children:
- General trees: Unlimited children
- Binary trees: Maximum 2 children per node
- Ternary trees: Maximum 3 children per node
By properties:
- Full binary tree: Every node has 0 or 2 children (no nodes with exactly 1 child)
- Complete binary tree: All levels completely filled except possibly last, filled left to right
- Perfect binary tree: All internal nodes have 2 children, all leaves at same level
- Balanced tree: Left and right subtrees height difference is bounded (usually ≤ 1)
- Binary Search Tree (BST): Left child < parent < right child
Binary Trees
The most common tree type in computer science. Each node has at most two children, typically called “left” and “right.”
Why binary trees matter:
- Efficient search, insert, delete (O(log n) when balanced)
- Represent hierarchical data naturally
- Foundation for expression trees, syntax trees, decision trees
Tree Traversals
Processing every node in a tree requires systematic traversal. There are several fundamental approaches:
Depth-First Search (DFS):
- Preorder: Visit node, then left subtree, then right (Root-Left-Right)
- Inorder: Left subtree, node, right subtree (Left-Root-Right) — for BSTs gives sorted order
- Postorder: Left subtree, right subtree, then node (Left-Right-Root)
Breadth-First Search (BFS):
- Level order: Visit nodes level by level, left to right
# Inorder traversal (BST gives sorted order)
def inorder(node):
if node:
inorder(node.left)
print(node.value)
inorder(node.right)
# Level order using queue
from collections import deque
def level_order(root):
if not root:
return
queue = deque([root])
while queue:
node = queue.popleft()
print(node.value)
if node.left:
queue.append(node.left)
if node.right:
queue.append(node.right)
Binary Search Trees (BST)
BSTs organize data for efficient lookup:
- Left subtree contains only values less than node
- Right subtree contains only values greater than node
- Both subtrees are also BSTs
Operations:
- Search: O(log n) average, O(n) worst if tree becomes skewed
- Insert: Follow search path, add leaf
- Delete: Three cases—leaf, one child, two children (replace with inorder successor)
Balancing Trees
When data insertions/deletions follow predictable patterns, BSTs can become unbalanced (essentially linked lists), losing their efficiency.
Self-balancing trees automatically maintain balance:
- AVL trees: Height balance property (left and right heights differ by ≤ 1)
- Red-black trees: Color-based balancing with simpler rotations
- B-trees: Optimized for disk access
Applications of Trees
In computing:
- File systems
- HTML/XML DOM
- Compilers (AST)
- Routing tables (trie)
In algorithms:
- Binary search
- Heap structures
- Decision trees (ML)
- Graph traversal
In data organization:
- Database indexes (B+ trees)
- JSON structure
- Organization charts
Tree Terminology Deep Dive
Node Relationships
- Parent: The immediate ancestor of a node
- Child: A node directly connected below another node
- Siblings: Nodes sharing the same parent
- Ancestor: Any node on the path from root to the node (inclusive)
- Descendant: Any node in the subtree of the given node
- Degree of node: Number of children a node has
- Degree of tree: Maximum degree among all nodes
- Internal path length: Sum of depths of all internal nodes
- External path length: Sum of depths of all leaf nodes
Binary Tree Properties
- The maximum number of nodes at level i is 2^i (root at level 0).
- The maximum number of nodes in a binary tree of height h is 2^(h+1) - 1.
- For any binary tree with n leaves and i internal nodes, the relationship depends on the tree type:
- Full binary tree: n = i + 1
- Complete binary tree: height = floor(log₂(n))
- The minimum height of a binary tree with n nodes is floor(log₂(n)).
- A binary tree with n nodes has exactly n - 1 edges.
Iterative Tree Traversals
While recursive traversals are elegant, iterative versions avoid stack overflow on deep trees.
Preorder (Iterative)
def preorder_iterative(root):
if not root:
return []
result = []
stack = [root]
while stack:
node = stack.pop()
result.append(node.value)
if node.right:
stack.append(node.right)
if node.left:
stack.append(node.left)
return result
Inorder (Iterative)
def inorder_iterative(root):
result = []
stack = []
current = root
while stack or current:
while current:
stack.append(current)
current = current.left
current = stack.pop()
result.append(current.value)
current = current.right
return result
Postorder (Iterative - Two-Stack Method)
def postorder_iterative(root):
if not root:
return []
result = []
stack1 = [root]
stack2 = []
while stack1:
node = stack1.pop()
stack2.append(node)
if node.left:
stack1.append(node.left)
if node.right:
stack1.append(node.right)
while stack2:
result.append(stack2.pop().value)
return result
Level Order (BFS)
from collections import deque
def level_order_iterative(root):
if not root:
return []
result = []
queue = deque([root])
while queue:
level_size = len(queue)
level = []
for _ in range(level_size):
node = queue.popleft()
level.append(node.value)
if node.left:
queue.append(node.left)
if node.right:
queue.append(node.right)
result.append(level)
return result
Binary Tree Serialization
Serialization converts a tree to a string (for storage/transmission). Deserialization reconstructs it.
class Codec:
def serialize(self, root):
"""Preorder serialization with '#' for None."""
if not root:
return "#"
return f"{root.value},{self.serialize(root.left)},{self.serialize(root.right)}"
def deserialize(self, data):
def _build(nodes):
val = next(nodes)
if val == "#":
return None
node = TreeNode(int(val))
node.left = _build(nodes)
node.right = _build(nodes)
return node
return _build(iter(data.split(",")))
Level Order Serialization
def serialize_level_order(root):
if not root:
return "[]"
result = []
queue = deque([root])
while queue:
node = queue.popleft()
if node:
result.append(str(node.value))
queue.append(node.left)
queue.append(node.right)
else:
result.append("#")
return "[" + ",".join(result) + "]"
Threaded Binary Trees
Threaded binary trees replace null pointers with references to inorder predecessor/successor, enabling traversal without recursion or stack.
Single-Threaded (Right pointers to successor)
class ThreadedNode:
def __init__(self, value):
self.value = value
self.left = None
self.right = None
self.is_thread = False # True if right is a thread to successor
class ThreadedBinaryTree:
def __init__(self, root=None):
self.root = root
def inorder(self):
"""Inorder traversal without stack or recursion."""
current = self.root
while current and current.left:
current = current.left
result = []
while current:
result.append(current.value)
if current.is_thread:
current = current.right
else:
current = current.right
while current and current.left:
current = current.left
return result
Threaded trees are useful in:
- Database indexing: Fast inorder traversal without recursion
- Expression evaluation: Quick sequential processing
- Embedded systems: Limited stack space
Applications of Trees in Detail
File System as a Tree
import os
def print_directory_tree(path, indent=0):
"""Display directory structure as a tree."""
prefix = " " * indent + "├── " if indent else ""
if os.path.isdir(path):
print(f"{prefix}{os.path.basename(path)}/")
for entry in sorted(os.listdir(path)):
print_directory_tree(os.path.join(path, entry), indent + 1)
else:
print(f"{prefix}{os.path.basename(path)}")
DOM Tree Parsing
from html.parser import HTMLParser
class DOMNode:
def __init__(self, tag, attrs=None):
self.tag = tag
self.attrs = attrs or {}
self.children = []
self.parent = None
self.text = ""
class DOMBuilder(HTMLParser):
def __init__(self):
super().__init__()
self.root = DOMNode("document")
self.stack = [self.root]
def handle_starttag(self, tag, attrs):
node = DOMNode(tag, dict(attrs))
node.parent = self.stack[-1]
self.stack[-1].children.append(node)
self.stack.append(node)
def handle_endtag(self, tag):
while self.stack and self.stack[-1].tag != tag:
self.stack.pop()
if self.stack:
self.stack.pop()
def handle_data(self, data):
if data.strip():
self.stack[-1].text += data.strip()
Decision Trees (Classification)
class DecisionNode:
def __init__(self, feature=None, threshold=None, left=None, right=None, value=None):
self.feature = feature # Feature index to split on
self.threshold = threshold # Value threshold
self.left = left # Left child (<= threshold)
self.right = right # Right child (> threshold)
self.value = value # Prediction (for leaf nodes)
def predict(tree, sample):
"""Traverse decision tree to predict class."""
if tree.value is not None:
return tree.value
if sample[tree.feature] <= tree.threshold:
return predict(tree.left, sample)
else:
return predict(tree.right, sample)
Expression Tree Evaluation
class ExprNode:
def __init__(self, value, left=None, right=None):
self.value = value
self.left = left
self.right = right
def evaluate(node):
"""Evaluate an arithmetic expression tree."""
if node.left is None and node.right is None:
return int(node.value)
left_val = evaluate(node.left)
right_val = evaluate(node.right)
if node.value == '+': return left_val + right_val
if node.value == '-': return left_val - right_val
if node.value == '*': return left_val * right_val
if node.value == '/': return left_val / right_val
Java Tree Implementation
public class BinaryTree<T extends Comparable<T>> {
static class Node<T> {
T value;
Node<T> left, right;
Node(T value) { this.value = value; }
}
private Node<T> root;
public void insert(T value) {
root = insertRec(root, value);
}
private Node<T> insertRec(Node<T> node, T value) {
if (node == null) return new Node<>(value);
if (value.compareTo(node.value) < 0)
node.left = insertRec(node.left, value);
else if (value.compareTo(node.value) > 0)
node.right = insertRec(node.right, value);
return node;
}
public boolean search(T value) {
return searchRec(root, value) != null;
}
private Node<T> searchRec(Node<T> node, T value) {
if (node == null || value.equals(node.value)) return node;
if (value.compareTo(node.value) < 0)
return searchRec(node.left, value);
return searchRec(node.right, value);
}
public List<T> inorder() {
List<T> result = new ArrayList<>();
inorderRec(root, result);
return result;
}
private void inorderRec(Node<T> node, List<T> result) {
if (node != null) {
inorderRec(node.left, result);
result.add(node.value);
inorderRec(node.right, result);
}
}
public int height() {
return heightRec(root);
}
private int heightRec(Node<T> node) {
if (node == null) return 0;
return 1 + Math.max(heightRec(node.left), heightRec(node.right));
}
}
External Resources
Conclusion
Trees are fundamental to computer science, providing efficient data organization and manipulation. Understanding trees—particularly binary trees and BSTs—lays the foundation for more complex structures like heaps, tries, and balanced trees.
The key concepts—traversal, search, balance—apply broadly. Master these fundamentals, and you’ll be well-prepared for advanced data structures and algorithms.
Comments