Introduction
Big O notation describes algorithm efficiency. Understanding it helps you write faster code and choose the right approach. This guide covers Big O fundamentals with examples.
What Is Big O
Definition
Big O describes how runtime scales with input size:
O(n) = "order n"
O(n²) = "order n squared"
Why It Matters
- Predict performance
- Compare algorithms
- Optimize code
- Pass interviews
Common Complexities
O(1) - Constant
# Always one operation
def first_element(arr):
return arr[0]
O(log n) - Logarithmic
# Binary search
def binary_search(arr, target):
left, right = 0, len(arr) - 1
while left <= right:
mid = (left + right) // 2
if arr[mid] == target:
return mid
elif arr[mid] < target:
left = mid + 1
else:
right = mid - 1
return -1
```python
### O(n) - Linear
```python
# Single loop
def find_max(arr):
max_val = arr[0]
for num in arr:
if num > max_val:
max_val = num
return max_val
O(n log n) - Linearithmic
# Sorting: merge sort, heap sort
# Most efficient comparison sorts
def merge_sort(arr):
if len(arr) <= 1:
return arr
mid = len(arr) // 2
left = merge_sort(arr[:mid])
right = merge_sort(arr[mid:])
return merge(left, right)
```python
### O(n²) - Quadratic
```python
# Nested loops
def bubble_sort(arr):
n = len(arr)
for i in range(n):
for j in range(0, n-i-1):
if arr[j] > arr[j+1]:
arr[j], arr[j+1] = arr[j+1], arr[j]
O(2^n) - Exponential
# Recursive without memoization
def fibonacci(n):
if n <= 1:
return n
return fibonacci(n-1) + fibonacci(n-2)
```python
### O(n!) - Factorial
```python
# Permutations
def permutations(arr):
if len(arr) <= 1:
return [arr]
result = []
for i, element in enumerate(arr):
remaining = arr[:i] + arr[i+1:]
for p in permutations(remaining):
result.append([element] + p)
return result
Visual Comparison
| Complexity | 10 | 100 | 1000 |
|---|---|---|---|
| O(1) | 1 | 1 | 1 |
| O(log n) | 3 | 7 | 10 |
| O(n) | 10 | 100 | 1000 |
| O(n log n) | 30 | 700 | 10,000 |
| O(n²) | 100 | 10,000 | 1,000,000 |
| O(2^n) | 1024 | 2^100 | 2^1000 |
Space Complexity
O(1) - Constant
def sum_arr(arr):
total = 0 # One variable
for num in arr:
total += num
return total
```python
### O(n) - Linear
```python
def double_arr(arr):
return [x * 2 for x in arr] # New array
Analyzing Code
Rules
- Drop constants
- Drop lower terms
- Different inputs = different variables
Examples
# O(n) + O(n) = O(n)
def example1(n):
for i in range(n):
print(i)
for i in range(n):
print(i)
# O(n²) + O(n) = O(n²)
def example2(n):
for i in range(n):
for j in range(n):
print(i, j)
for i in range(n):
print(i)
# O(1) inside O(n) = O(n)
def example3(n):
for i in range(n):
x = i * 2 # O(1)
```python
## Common Patterns
### Loops
- Single loop: O(n)
- Two nested loops: O(n²)
- Loop with halving: O(log n)
### Recursion
- Single call: O(n)
- Binary tree recursion: O(2^n)
### Data Structures
| Operation | Array | Linked List | Hash Table |
|-----------|-------|-------------|------------|
| Access | O(1) | O(n) | O(1) |
| Search | O(n) | O(n) | O(1) |
| Insert | O(n) | O(1) | O(1) |
| Delete | O(n) | O(1) | O(1) |
## Practical Application
### Choose Algorithms
| Input Size | Use |
|-----------|------|
| < 1000 | O(n²) okay |
| < 100K | O(n log n) |
| < 10M | O(n) |
| 10M+ | O(log n), O(1) |
### Optimization
```python
# Before: O(n²)
def has_duplicates_bad(arr):
for i in range(len(arr)):
for j in range(i+1, len(arr)):
if arr[i] == arr[j]:
return True
return False
# After: O(n)
def has_duplicates_good(arr):
seen = set()
for num in arr:
if num in seen:
return True
seen.add(num)
return False
Conclusion
Big O helps you reason about efficiency. Focus on dominant factors, drop constants, and think about how algorithms scale. Practice analyzing code to build intuition.
In-Depth Complexity Analysis with Code Examples
O(1) — Constant Time
The algorithm takes the same time regardless of input size. Array access by index, hash table lookup, and basic arithmetic operations are O(1).
# Real-world O(1) examples
def is_even(n):
return n % 2 == 0
# Hash table lookup
def get_user_name(users, user_id):
return users.get(user_id) # O(1) average case
# Stack/queue operations
stack.append(item) # O(1)
stack.pop() # O(1)
Real-world scenario: A CDN redirects users to the nearest server using a hash-based lookup from IP address to server URL. Regardless of how many servers are in the pool, the lookup is O(1).
O(log n) — Logarithmic Time
Each step halves the problem size. Binary search, balanced tree operations, and exponentiation by squaring are O(log n).
# Balanced BST lookup
import bisect
def find_closest(sorted_arr, target):
idx = bisect.bisect_left(sorted_arr, target)
if idx == 0:
return sorted_arr[0]
if idx == len(sorted_arr):
return sorted_arr[-1]
before = sorted_arr[idx - 1]
after = sorted_arr[idx]
return before if target - before < after - target else after
# Exponentiation by squaring
def fast_power(base, exp):
"""O(log exp) vs O(exp) naive."""
if exp == 0:
return 1
half = fast_power(base, exp // 2)
return half * half * base if exp % 2 else half * half
Real-world scenario: A phonebook with 1 billion entries needs at most 30 lookups with binary search (log₂ 1B ≈ 30). Each lookup halves the remaining entries.
O(n) — Linear Time
Processing each element once. Finding max/min, linear search, and simple iteration are O(n).
# Linear time examples
def has_duplicate_linear(arr):
"""O(n) using hash set."""
seen = set()
for x in arr:
if x in seen:
return True
seen.add(x)
return False
# Counting frequencies
from collections import Counter
def most_frequent(arr):
return Counter(arr).most_common(1)[0][0]
# Two-pointer technique
def is_palindrome(s):
i, j = 0, len(s) - 1
while i < j:
if s[i] != s[j]:
return False
i += 1
j -= 1
return True
Real-world scenario: A streaming service scans through a video file to generate thumbnails. Processing 4K video takes twice as long as 1080p; the time scales linearly with the number of frames.
O(n log n) — Linearithmic Time
The most efficient comparison-based sorting algorithms achieve O(n log n). Divide-and-conquer algorithms like merge sort, heap sort, and many tree operations fall here.
# Sorting is the classic O(n log n) problem
import heapq
def heap_sort(arr):
heapq.heapify(arr)
return [heapq.heappop(arr) for _ in range(len(arr))]
# Partitioning for quickselect
def find_kth_largest(nums, k):
"""Quickselect: O(n) average, O(n²) worst."""
return heapq.nlargest(k, nums)[-1] # O(n log k)
# Many divide-and-conquer algorithms
def count_inversions(arr):
"""Count inversions using merge sort: O(n log n)."""
if len(arr) <= 1:
return arr, 0
mid = len(arr) // 2
left, inv_left = count_inversions(arr[:mid])
right, inv_right = count_inversions(arr[mid:])
merged, inv_merge = merge_and_count(left, right)
return merged, inv_left + inv_right + inv_merge
Real-world scenario: Sorting 1 million records by date. O(n²) would take ~10¹² operations (hours). O(n log n) takes ~20 million operations (seconds). This is why databases use B-trees (O(log n) search after O(n log n) build).
O(n²) — Quadratic Time
Nested loops over the input. Bubble sort, insertion sort, and naive matrix multiplication are O(n²). Acceptable for n < 1000.
# Quadratic time examples
def naive_matrix_multiply(A, B):
n = len(A)
C = [[0] * n for _ in range(n)]
for i in range(n):
for j in range(n):
for k in range(n): # Triple nested = O(n³)
C[i][j] += A[i][k] * B[k][j]
return C
# All pairs shortest distance (naive)
def closest_pair_bruteforce(points):
n = len(points)
min_dist = float('inf')
for i in range(n):
for j in range(i + 1, n):
dist = math.dist(points[i], points[j])
min_dist = min(min_dist, dist)
return min_dist
# Compare all elements
def has_duplicate_bruteforce(arr):
for i in range(len(arr)):
for j in range(i + 1, len(arr)):
if arr[i] == arr[j]:
return True
return False
Real-world scenario: Recommending friends of friends in a social network. For each user, check all their friends’ friends. With 1000 friends each and 1M users, this becomes impractical. Graph algorithms reduce this to O(V + E).
O(2^n) — Exponential Time
Recursive enumeration of all subsets. Brute-force solutions for NP-hard problems (subset sum, traveling salesman) are O(2^n).
# Exponential time examples
def all_subsets(arr):
"""Generate all 2^n subsets."""
if not arr:
return [[]]
subsets = all_subsets(arr[1:])
return subsets + [[arr[0]] + s for s in subsets]
def fibonacci_exponential(n):
"""2^n without memoization."""
if n <= 1:
return n
return fibonacci_exponential(n - 1) + fibonacci_exponential(n - 2)
# With memoization -> O(n)
def fibonacci_linear(n, memo={}):
if n in memo:
return memo[n]
if n <= 1:
return n
memo[n] = fibonacci_linear(n - 1) + fibonacci_linear(n - 2)
return memo[n]
Real-world scenario: Cracking a 4-digit PIN requires at most 10⁴ = 10,000 attempts. Cracking a 12-character password with 72 possible characters requires 72¹² ≈ 10²² attempts — physically impossible with brute force.
Space Complexity Analysis
Space Complexity Classes
| Class | Memory Use | Example |
|---|---|---|
| O(1) | Constant (no scaling) | In-place reversal |
| O(n) | Linear to input | Hash table, copy of array |
| O(n²) | Quadratic | Adjacency matrix |
| O(log n) | Logarithmic | Binary search recursion |
| O(n log n) | Linearithmic | Merge sort recursion |
# O(1) space: in-place modification
def reverse_array_in_place(arr):
i, j = 0, len(arr) - 1
while i < j:
arr[i], arr[j] = arr[j], arr[i]
i += 1
j -= 1
# O(n) space: build a hash table
def find_missing_numbers(nums):
"""Returns missing numbers from 1..n."""
present = set(nums)
return [i for i in range(1, len(nums) + 1) if i not in present]
# O(n²) space: 2D DP table
def create_dp_table(n):
return [[0] * n for _ in range(n)]
# O(log n) space: recursive binary search
def binary_search_rec(arr, target, low=0, high=None):
if high is None:
high = len(arr) - 1
if low > high:
return -1
mid = (low + high) // 2
if arr[mid] == target:
return mid
elif arr[mid] < target:
return binary_search_rec(arr, target, mid + 1, high)
return binary_search_rec(arr, target, low, mid - 1)
Space-Time Tradeoffs
Hash tables use O(n) space to achieve O(1) lookup. Dynamic programming uses O(n²) space to avoid recomputation. In memory-constrained environments, preferring O(1) space algorithms may be worth the time cost.
Amortized Analysis
Amortized analysis bounds the average cost of operations over a sequence. A single operation may be expensive, but most are cheap.
Dynamic Array (ArrayList)
class DynamicArray:
"""Amortized analysis: O(1) append."""
def __init__(self):
self.capacity = 1
self.arr = [None] * self.capacity
self.size = 0
def append(self, item):
if self.size == self.capacity:
# Resize: O(n) copy
self.capacity *= 2
new_arr = [None] * self.capacity
for i in range(self.size):
new_arr[i] = self.arr[i]
self.arr = new_arr
self.arr[self.size] = item
self.size += 1
Most appends are O(1). When the array is full, resizing copies all n elements (O(n)). Over n appends, the total cost is O(n), so amortized cost is O(1) per operation. This analysis applies to Python lists, Java ArrayList, and C++ vector.
Best, Average, and Worst Case
Understanding different case complexities is critical for algorithm selection:
| Algorithm | Best | Average | Worst |
|---|---|---|---|
| Quick sort | O(n log n) | O(n log n) | O(n²) |
| Merge sort | O(n log n) | O(n log n) | O(n log n) |
| Binary search | O(1) | O(log n) | O(log n) |
| Hash table lookup | O(1) | O(1) | O(n) |
| Insertion sort | O(n) | O(n²) | O(n²) |
# Quicksort: worst case on already-sorted arrays with naive pivot
def quicksort_worst_case(arr):
"""Demonstrates O(n²) worst case with sorted input."""
if len(arr) <= 1:
return arr
# Always picks last element as pivot -> worst case for sorted input
pivot = arr[-1]
left = [x for x in arr[:-1] if x <= pivot]
right = [x for x in arr[:-1] if x > pivot]
return quicksort_worst_case(left) + [pivot] + quicksort_worst_case(right)
# Use random pivot for O(n log n) average
import random
def quicksort_random(arr):
if len(arr) <= 1:
return arr
pivot = random.choice(arr)
left = [x for x in arr if x < pivot]
mid = [x for x in arr if x == pivot]
right = [x for x in arr if x > pivot]
return quicksort_random(left) + mid + quicksort_random(right)
Complexity Classes (P, NP, NP-Complete)
P: Polynomial Time
Problems solvable in O(n^k) time (sorting, shortest path, matrix multiplication). These are tractable and efficient algorithms exist.
NP: Nondeterministic Polynomial Time
Problems whose solutions can be verified in polynomial time, but may take exponential time to find. Examples: subset sum, graph coloring, traveling salesman (decision version).
NP-Complete
The hardest problems in NP. If any NP-complete problem has a polynomial solution, then all NP problems do. Examples: 3-SAT, Hamiltonian cycle, knapsack (decision version), traveling salesman.
NP-Hard
At least as hard as NP-complete problems, but may not be in NP (decisions may not be verifiable in polynomial time). Examples: halting problem, traveling salesman (optimization version), knapsack (optimization version).
# NP-complete: Subset Sum (verifiable in polynomial time)
def verify_subset_sum(nums, subset, target):
"""O(n) verification of subset sum solution."""
return sum(nums[i] for i in subset) == target
# NP-hard: TSP (optimization) requires exponential time to solve exactly
def traveling_salesman_bruteforce(distances):
"""O(n!) exact TSP solution."""
n = len(distances)
cities = list(range(n))
min_distance = float('inf')
best_route = None
import itertools
for route in itertools.permutations(cities[1:]):
route = (0,) + route + (0,)
dist = sum(distances[route[i]][route[i+1]] for i in range(n))
if dist < min_distance:
min_distance = dist
best_route = route
return best_route, min_distance
P vs NP
The question “Is P = NP?” asks whether every problem whose solution can be quickly verified can also be quickly solved. This is one of the seven Millennium Prize Problems with a $1M reward. Most computer scientists believe P ≠ NP, meaning NP-complete problems have no efficient exact algorithms.
Common Algorithm Complexities Reference
| Algorithm | Time | Space | Notes |
|---|---|---|---|
| Sorting | |||
| Merge sort | O(n log n) | O(n) | Stable, good for linked lists |
| Quick sort | O(n log n) avg | O(log n) | In-place, cache-friendly |
| Heap sort | O(n log n) | O(1) | In-place, not stable |
| Counting sort | O(n + k) | O(k) | Requires small integer range |
| Radix sort | O(nk) | O(n + k) | For fixed-length integers |
| Search | |||
| Linear search | O(n) | O(1) | Unstructured data |
| Binary search | O(log n) | O(1) | Sorted array |
| BST search | O(h) | O(1) | h = log n (balanced) |
| Graph | |||
| BFS/DFS | O(V + E) | O(V) | All nodes and edges |
| Dijkstra | O((V+E) log V) | O(V) | Non-negative weights |
| Floyd-Warshall | O(V³) | O(V²) | All-pairs shortest |
| Bellman-Ford | O(VE) | O(V) | Negative weights allowed |
| Prim’s MST | O(E log V) | O(V) | Minimum spanning tree |
| Kruskal’s MST | O(E log E) | O(V) | Uses union-find |
| Dynamic Programming | |||
| Fibonacci (memo) | O(n) | O(n) | vs O(2ⁿ) naive |
| LCS | O(mn) | O(mn) | Longest common subsequence |
| Edit distance | O(mn) | O(mn) | String comparison |
| Floyd-Warshall | O(V³) | O(V²) | All-pairs shortest |
Comparison Growth Table
| n | O(1) | O(log n) | O(n) | O(n log n) | O(n²) | O(2ⁿ) |
|---|---|---|---|---|---|---|
| 10 | 1 | 3 | 10 | 33 | 100 | 1,024 |
| 100 | 1 | 7 | 100 | 664 | 10,000 | 10³⁰ |
| 1,000 | 1 | 10 | 1,000 | 9,966 | 1,000,000 | 10³⁰¹ |
| 10,000 | 1 | 13 | 10,000 | 132,877 | 100,000,000 | — |
| 100,000 | 1 | 17 | 100,000 | 1,660,964 | 10¹⁰ | — |
| 1,000,000 | 1 | 20 | 1,000,000 | 19,931,569 | 10¹² | — |
A practical rule: O(n²) works for n < 1000; O(n log n) for n < 10⁷; O(n) for n < 10⁸; O(log n) for any size.
Comments