Understanding Memory: RAM, Cache, and Virtual Memory

Introduction

Understanding memory is fundamental to computer science. From RAM to CPU cache to virtual memory, how computers store and access data directly impacts performance. This guide explains memory hierarchies and how to optimize for them.

Memory Hierarchy

Levels

CPU Registers
    ↓ (fastest, smallest)
L1 Cache
    ↓
L2 Cache
    ↓
L3 Cache
    ↓
RAM
    ↓
SSD/HDD
    ↓ (slowest, largest)

Why Hierarchy Matters

Level	Size	Latency	Bandwidth
Registers	32 bytes	0.3 ns	Very high
L1	32-64 KB	1 ns	High
L2	256 KB	3 ns	Medium
L3	8-64 MB	12 ns	Medium
RAM	8-64 GB	100 ns	Medium
SSD	TB	100K ns	Low

Random Access Memory (RAM)

Types

DRAM

Dynamic RAM
Needs refresh
Used for main memory
Cheaper, slower

SRAM

Static RAM
No refresh needed
Used for cache
Expensive, faster

DDR Generations

Type	Bandwidth	Power
DDR4	25.6 GB/s	Low
DDR5	51.2 GB/s	Lower

RAM Timing

CL (CAS Latency): Delay in cycles
tRCD: RAS to CAS delay
tRP: Row precharge time

Lower is faster but more expensive.

CPU Cache

Purpose

Bridge the speed gap between CPU and RAM.

Cache Levels

L1 Cache

Smallest, fastest
Split Instruction: + Data
Usually 32-64 KB

L2 Cache

Larger, slower
Often shared
256 KB - 1 MB

L3 Cache

Largest, slowest shared cache
8-64 MB
Shared across cores

Cache Lines

Smallest transfer unit (64 bytes)
Prefetching reduces misses
Aligned data accesses help

Cache Mapping

Direct mapped: One location
Fully associative: Any location
Set associative: Compromise

Cache Performance

Cache Hits and Misses

Type	Description
Hit	Data in cache
Miss	Not in cache

Hit Rate

Hit Rate = Hits / (Hits + Misses)

Miss Types

Compulsory: First access
Capacity: Cache too small
Conflict: Mapping collision

Improving Cache Performance

// Bad: Strided access
for (i = 0; i < N; i++)
    for (j = 0; j < N; j++)
        sum += matrix[j][i];  // Column-major

// Good: Sequential access  
for (i = 0; i < N; i++)
    for (j = 0; j < N; j++)
        sum += matrix[i][j];  // Row-major

Virtual Memory

What Is Virtual Memory

Memory management technique
Extends address space beyond RAM
Provides isolation between processes

How It Works

Virtual Address → Page Table → Physical Address
     ↓
  Not in RAM → Page Fault → Load from Disk

Page Tables

VPN → PTE → PFN + Flags

Page Faults

CPU requests address
Page not in RAM
OS loads from disk
Resume execution

TLB (Translation Lookaside Buffer)

Hardware cache for page table
Reduces translation overhead
Very fast lookup

Memory Optimization

Programming Tips

Data locality: Access memory sequentially
Cache-friendly: Small, hot data in cache
Prefetching: Hint to load ahead
Reduce allocations: Reuse buffers

Example: Loop Tiling

// Basic
for (i = 0; i < N; i++)
    for (j = 0; j < N; j++)
        C[i][j] += A[i][k] * B[k][j];

// Tiled (cache-friendly)
for (ii = 0; ii < N; ii += B)
    for (jj = 0; jj < N; jj += B)
        for (kk = 0; kk < N; kk += B)
            for (i = ii; i < min(ii+B,N); i++)
                for (j = jj; j < min(jj+B,N); j++)
                    for (k = kk; k < min(kk+B,N); k++)
                        C[i][j] += A[i][k] * B[k][j];

Data Structures

Arrays: Better cache than linked lists
SoA vs AoS: Structure of Arrays often faster

Conclusion

Memory hierarchy understanding is crucial for performance. Design code with locality, use appropriate data structures, and profile to find bottlenecks. The gap between CPU and memory continues to grow.