Introduction
Understanding memory is fundamental to computer science. From RAM to CPU cache to virtual memory, how computers store and access data directly impacts performance. This guide explains memory hierarchies and how to optimize for them.
Memory Hierarchy
Levels
CPU Registers
โ (fastest, smallest)
L1 Cache
โ
L2 Cache
โ
L3 Cache
โ
RAM
โ
SSD/HDD
โ (slowest, largest)
Why Hierarchy Matters
| Level | Size | Latency | Bandwidth |
|---|---|---|---|
| Registers | 32 bytes | 0.3 ns | Very high |
| L1 | 32-64 KB | 1 ns | High |
| L2 | 256 KB | 3 ns | Medium |
| L3 | 8-64 MB | 12 ns | Medium |
| RAM | 8-64 GB | 100 ns | Medium |
| SSD | TB | 100K ns | Low |
Random Access Memory (RAM)
Types
DRAM
- Dynamic RAM
- Needs refresh
- Used for main memory
- Cheaper, slower
SRAM
- Static RAM
- No refresh needed
- Used for cache
- Expensive, faster
DDR Generations
| Type | Bandwidth | Power |
|---|---|---|
| DDR4 | 25.6 GB/s | Low |
| DDR5 | 51.2 GB/s | Lower |
RAM Timing
- CL (CAS Latency): Delay in cycles
- tRCD: RAS to CAS delay
- tRP: Row precharge time
Lower is faster but more expensive.
CPU Cache
Purpose
Bridge the speed gap between CPU and RAM.
Cache Levels
L1 Cache
- Smallest, fastest
- Split Instruction: + Data
- Usually 32-64 KB
L2 Cache
- Larger, slower
- Often shared
- 256 KB - 1 MB
L3 Cache
- Largest, slowest shared cache
- 8-64 MB
- Shared across cores
Cache Lines
- Smallest transfer unit (64 bytes)
- Prefetching reduces misses
- Aligned data accesses help
Cache Mapping
- Direct mapped: One location
- Fully associative: Any location
- Set associative: Compromise
Cache Performance
Cache Hits and Misses
| Type | Description |
|---|---|
| Hit | Data in cache |
| Miss | Not in cache |
Hit Rate
Hit Rate = Hits / (Hits + Misses)
Miss Types
- Compulsory: First access
- Capacity: Cache too small
- Conflict: Mapping collision
Improving Cache Performance
// Bad: Strided access
for (i = 0; i < N; i++)
for (j = 0; j < N; j++)
sum += matrix[j][i]; // Column-major
// Good: Sequential access
for (i = 0; i < N; i++)
for (j = 0; j < N; j++)
sum += matrix[i][j]; // Row-major
Virtual Memory
What Is Virtual Memory
- Memory management technique
- Extends address space beyond RAM
- Provides isolation between processes
How It Works
Virtual Address โ Page Table โ Physical Address
โ
Not in RAM โ Page Fault โ Load from Disk
Page Tables
VPN โ PTE โ PFN + Flags
Page Faults
- CPU requests address
- Page not in RAM
- OS loads from disk
- Resume execution
TLB (Translation Lookaside Buffer)
- Hardware cache for page table
- Reduces translation overhead
- Very fast lookup
Memory Optimization
Programming Tips
- Data locality: Access memory sequentially
- Cache-friendly: Small, hot data in cache
- Prefetching: Hint to load ahead
- Reduce allocations: Reuse buffers
Example: Loop Tiling
// Basic
for (i = 0; i < N; i++)
for (j = 0; j < N; j++)
C[i][j] += A[i][k] * B[k][j];
// Tiled (cache-friendly)
for (ii = 0; ii < N; ii += B)
for (jj = 0; jj < N; jj += B)
for (kk = 0; kk < N; kk += B)
for (i = ii; i < min(ii+B,N); i++)
for (j = jj; j < min(jj+B,N); j++)
for (k = kk; k < min(kk+B,N); k++)
C[i][j] += A[i][k] * B[k][j];
Data Structures
- Arrays: Better cache than linked lists
- SoA vs AoS: Structure of Arrays often faster
Conclusion
Memory hierarchy understanding is crucial for performance. Design code with locality, use appropriate data structures, and profile to find bottlenecks. The gap between CPU and memory continues to grow.
Resources
- Computer Architecture: A Quantitative Approach
- What Every Programmer Should Know About Memory
- Memory Optimization Guide
Comments