Skip to main content
โšก Calmops

Understanding Memory: RAM, Cache, and Virtual Memory

Introduction

Understanding memory is fundamental to computer science. From RAM to CPU cache to virtual memory, how computers store and access data directly impacts performance. This guide explains memory hierarchies and how to optimize for them.

Memory Hierarchy

Levels

CPU Registers
    โ†“ (fastest, smallest)
L1 Cache
    โ†“
L2 Cache
    โ†“
L3 Cache
    โ†“
RAM
    โ†“
SSD/HDD
    โ†“ (slowest, largest)

Why Hierarchy Matters

Level Size Latency Bandwidth
Registers 32 bytes 0.3 ns Very high
L1 32-64 KB 1 ns High
L2 256 KB 3 ns Medium
L3 8-64 MB 12 ns Medium
RAM 8-64 GB 100 ns Medium
SSD TB 100K ns Low

Random Access Memory (RAM)

Types

DRAM

  • Dynamic RAM
  • Needs refresh
  • Used for main memory
  • Cheaper, slower

SRAM

  • Static RAM
  • No refresh needed
  • Used for cache
  • Expensive, faster

DDR Generations

Type Bandwidth Power
DDR4 25.6 GB/s Low
DDR5 51.2 GB/s Lower

RAM Timing

  • CL (CAS Latency): Delay in cycles
  • tRCD: RAS to CAS delay
  • tRP: Row precharge time

Lower is faster but more expensive.

CPU Cache

Purpose

Bridge the speed gap between CPU and RAM.

Cache Levels

L1 Cache

  • Smallest, fastest
  • Split Instruction: + Data
  • Usually 32-64 KB

L2 Cache

  • Larger, slower
  • Often shared
  • 256 KB - 1 MB

L3 Cache

  • Largest, slowest shared cache
  • 8-64 MB
  • Shared across cores

Cache Lines

  • Smallest transfer unit (64 bytes)
  • Prefetching reduces misses
  • Aligned data accesses help

Cache Mapping

  • Direct mapped: One location
  • Fully associative: Any location
  • Set associative: Compromise

Cache Performance

Cache Hits and Misses

Type Description
Hit Data in cache
Miss Not in cache

Hit Rate

Hit Rate = Hits / (Hits + Misses)

Miss Types

  • Compulsory: First access
  • Capacity: Cache too small
  • Conflict: Mapping collision

Improving Cache Performance

// Bad: Strided access
for (i = 0; i < N; i++)
    for (j = 0; j < N; j++)
        sum += matrix[j][i];  // Column-major

// Good: Sequential access  
for (i = 0; i < N; i++)
    for (j = 0; j < N; j++)
        sum += matrix[i][j];  // Row-major

Virtual Memory

What Is Virtual Memory

  • Memory management technique
  • Extends address space beyond RAM
  • Provides isolation between processes

How It Works

Virtual Address โ†’ Page Table โ†’ Physical Address
     โ†“
  Not in RAM โ†’ Page Fault โ†’ Load from Disk

Page Tables

VPN โ†’ PTE โ†’ PFN + Flags

Page Faults

  1. CPU requests address
  2. Page not in RAM
  3. OS loads from disk
  4. Resume execution

TLB (Translation Lookaside Buffer)

  • Hardware cache for page table
  • Reduces translation overhead
  • Very fast lookup

Memory Optimization

Programming Tips

  1. Data locality: Access memory sequentially
  2. Cache-friendly: Small, hot data in cache
  3. Prefetching: Hint to load ahead
  4. Reduce allocations: Reuse buffers

Example: Loop Tiling

// Basic
for (i = 0; i < N; i++)
    for (j = 0; j < N; j++)
        C[i][j] += A[i][k] * B[k][j];

// Tiled (cache-friendly)
for (ii = 0; ii < N; ii += B)
    for (jj = 0; jj < N; jj += B)
        for (kk = 0; kk < N; kk += B)
            for (i = ii; i < min(ii+B,N); i++)
                for (j = jj; j < min(jj+B,N); j++)
                    for (k = kk; k < min(kk+B,N); k++)
                        C[i][j] += A[i][k] * B[k][j];

Data Structures

  • Arrays: Better cache than linked lists
  • SoA vs AoS: Structure of Arrays often faster

Conclusion

Memory hierarchy understanding is crucial for performance. Design code with locality, use appropriate data structures, and profile to find bottlenecks. The gap between CPU and memory continues to grow.


Resources

Comments