Introduction
If there’s one data structure that every programmer uses daily without thinking about it, it’s the hash table. Whether you’re using a Python dictionary, a JavaScript object, a Java HashMap, or a C++ unordered_mapโyou’re using a hash table. They’re the backbone of databases, caches, and countless applications.
Understanding hash tables deeply helps you make better architectural decisions and debug performance issues. This article explores how hash tables work, different implementations, and practical considerations.
The Core Idea
A hash table stores key-value pairs, allowing fast lookup by key. The magic is converting keys (strings, numbers, objects) into array indices through a hash function.
็ๆณๆ ๅตไธ: Key โ Hash Function โ Index โ Value (O(1) lookup)
The challenge is handling collisionsโwhen two different keys hash to the same index.
Hash Functions
A good hash function distributes keys uniformly across the array. Properties:
- Deterministic: Same key always produces same hash
- Fast: Computing the hash should be quick
- Uniform: Keys spread evenly across all buckets
Common hash functions:
- Division method: h(k) = k mod m (choose m as prime, not power of 2)
- Multiplication method: h(k) = floor(m(kA mod 1)) where A is constant
- Universal hashing: Randomize hash function to prevent worst-case behavior
For strings, algorithms like FNV or MurmurHash perform well.
Collision Resolution
When two keys collide, we need a strategy:
Chaining (Separate Linking)
Each bucket is a linked list. Collisions just add to the list.
Pros: Simple, handles arbitrary load factor, delete is easy Cons: Cache performance poor, worst-case O(n)
Load factor (ฮฑ): n/m = number of elements / number of buckets. Keep ฮฑ < 0.75 for good performance.
Open Addressing
All elements stored in the table itself. On collision, probe for another slot.
Linear probing: Check next slot: h(k), h(k)+1, h(k)+2… Quadratic probing: Check slots with quadratic spacing Double hashing: Use second hash function for probe sequence
Pros: Better cache performance, no pointers Cons: Clustering issues, delete is tricky
Resizing and Rehashing
When load factor exceeds threshold (typically 0.7), resize the table:
- Create new, larger table (typically 2x size)
- Rehash all elements into new table
- Use new hash function
This is expensive (O(n)) but happens infrequently. Amortized cost remains O(1).
Practical Implementations
Python dict: Uses open addressing with randomized seeds. Keys must be hashable.
Java HashMap: Uses chaining with treeify threshold (JDK 8+ converts้พ่กจ to็บข้ปๆ for long chains).
Go map: Uses open addressing with random iteration order.
Rust HashMap: Uses SipHash for DoS resistance.
Handling Unhashable Keys
Some keys can’t be hashed (lists, dicts). Solutions:
- Convert to immutable equivalent (tuple)
- Use custom hash based on content
- Use identity hash
Advanced Topics
Consistent hashing: Used in distributed systems (Cassandra, DynamoDB). Minimizes reshuffling when adding/removing servers.
Bloom filters: Use multiple hash functions for membership testing (covered in earlier article).
Perfect hashing: Two-level scheme guaranteeing O(1) worst-case for static sets.
When to Use Hash Tables
Use when: Fast lookup by key is primary operation, keys are simple types, memory is available.
Avoid when: Ordered iteration matters (use tree), keys are complex objects, memory is tight.
Performance Considerations
- Time complexity: Average O(1) lookup/insert/delete, worst O(n) with poor hash function
- Space complexity: O(n) for stored elements plus overhead
- Cache: Chaining is cache-unfriendly; open addressing is better
- Thread safety: Not thread-safe; use concurrent implementations
External Resources
Conclusion
Hash tables are foundational data structures with nearly constant-time operations. Understanding their trade-offsโchaining vs open addressing, hash function quality, load factor managementโhelps you use them effectively.
Next time you reach for a dictionary or map, remember the elegant simplicity of hashing turning arbitrary keys into fast lookups.
Comments