Notes/Memory Systems
🧠

Memory Systems

L15L16Post-Midterm

DRAM organization, timing, and scheduling

Topics Covered

DRAM organization (channel/DIMM/rank/chip/bank)Page mode DRAM & row bufferRow buffer hit/miss/conflictDRAM refresh (burst vs distributed)Bank interleaving & channelsAddress mappingDRAM scheduling (FCFS, FR-FCFS)Latency components
01

DRAM Subsystem Organization

The DRAM subsystem has a deep hierarchy:

  • Channel: Independent memory controller connection with its own data bus (e.g., 64-bit wide). Multiple channels multiply bandwidth.
  • DIMM (Dual Inline Memory Module): Physical stick of memory containing multiple chips. Multiple DIMMs can share a channel.
  • Rank: A group of DRAM chips that operate in lockstep on the same DIMM. A 64-bit channel uses 8 chips per rank (each chip contributes 8 bits).
  • Chip: Individual DRAM chip containing multiple banks.
  • Bank: Independent array within a chip with its own row buffer (sense amplifiers). Banks can be accessed concurrently.
  • Row/Column: Within a bank, data is organized as a 2D array. A row is activated into the row buffer; columns are read from the row buffer.

Key hierarchy: Channel > DIMM > Rank > Chip > Bank > Row/Column

Transferring a 64-byte cache block: Across a rank, each of 8 chips contributes 8 bytes. This requires 8 column reads (each 8B burst) to fill a 64B cache line.

Key Points

  • •Hierarchy: Channel > DIMM > Rank > Chip > Bank > Row/Column
  • •Channel = independent data bus (64-bit wide)
  • •Rank = group of chips operating in lockstep (8 chips x 8 bits = 64 bits)
  • •Bank = independent array with own row buffer, enables concurrency
  • •64B cache line = 8 chips each contributing 8 bytes across the rank
âš 

Exam Tip

The hierarchy is crucial. Know how a cache block is spread across chips in a rank and why multiple banks/channels improve bandwidth.

02

Page Mode DRAM & Row Buffer

DRAM banks use a row buffer (implemented by sense amplifiers) to hold an entire activated row.

Three DRAM commands:

  1. ACTIVATE (ACT / RAS): Opens a row — copies the entire row from the DRAM array into the row buffer. Destructive to the array (must write back eventually). Takes t_RAS time.
  2. READ/WRITE (CAS): Accesses a specific column within the open row buffer. Much faster than a full row activation. Takes t_CAS time.
  3. PRECHARGE (PRE): Closes the current row — writes the row buffer contents back to the array and prepares the bank for a new row activation. Takes t_RP time.

Row buffer states:

  • Row buffer hit: Requested row is already open in the row buffer → just CAS → fastest (t_CAS)
  • Row buffer miss (closed bank): No row is open → ACT + CAS (t_RAS + t_CAS)
  • Row buffer conflict: A different row is open → PRE + ACT + CAS → slowest (t_RP + t_RAS + t_CAS)

Key Points

  • •Row buffer = sense amplifiers holding one activated row
  • •ACT opens a row (t_RAS), CAS reads/writes column (t_CAS), PRE closes row (t_RP)
  • •Row hit: CAS only (fast)
  • •Row miss (closed): ACT + CAS (medium)
  • •Row conflict (wrong row open): PRE + ACT + CAS (slow)
âš 

Exam Tip

Distinguish the three row buffer states clearly. Many exam questions give access patterns and ask for the total latency based on hit/miss/conflict.

03

DRAM Refresh

DRAM capacitors leak charge and must be refreshed periodically (typically every 64 ms for the entire memory).

Refresh strategies:

  • Burst refresh: Refresh all rows back-to-back. Memory is unavailable for a long burst. Simpler but causes long stalls.
  • Distributed refresh: Spread refresh commands throughout the 64ms window. Each refresh command handles a few rows. Shorter individual stalls but more frequent interruptions.

Impact: During refresh, the bank being refreshed cannot service memory requests → performance and energy overhead. As DRAM capacity grows, more rows need refreshing → refresh overhead increases.

Refresh penalty grows with capacity: More rows → more refresh commands → more time spent refreshing → less time available for actual accesses.

Key Points

  • •DRAM must be refreshed every ~64ms (capacitor charge leaks)
  • •Burst refresh: all at once (long stall), distributed: spread out (shorter stalls)
  • •Refresh blocks the bank — no accesses during refresh
  • •Refresh overhead grows with DRAM capacity (more rows to refresh)
04

Multiple Banks & Channels

Multiple banks within a chip/rank enable bank-level parallelism:

  • Each bank has its own row buffer and operates independently
  • While one bank is activating a row, another can be servicing a CAS
  • Interleaving accesses across banks hides latency

Multiple channels provide additional bandwidth:

  • Each channel has its own data bus, command bus, and memory controller port
  • Channels operate completely independently
  • Two channels → 2x peak bandwidth

Address mapping/interleaving: How physical addresses are mapped to channels, ranks, banks, rows, and columns significantly affects performance.

  • Using lower address bits for bank/channel selection provides better interleaving because lower bits have more entropy (vary more across consecutive accesses)
  • Row interleaving: consecutive cache blocks go to different banks → maximizes bank-level parallelism for sequential access patterns

Key Points

  • •Multiple banks enable bank-level parallelism (concurrent operations)
  • •Multiple channels multiply bandwidth (independent data buses)
  • •Address interleaving: use lower bits for bank/channel selection (more entropy)
  • •Good interleaving maximizes parallelism across banks/channels
âš 

Exam Tip

Understand address interleaving — which bits map to which part of the hierarchy. Lower bits change more frequently, so mapping them to bank/channel spreads accesses evenly.

05

DRAM Scheduling Policies

The memory controller decides the order in which queued memory requests are serviced. This order significantly impacts performance.

FCFS (First-Come, First-Served):

  • Service requests in arrival order
  • Simple, fair, but ignores row buffer state
  • May cause unnecessary row conflicts when a row hit request is waiting behind a row conflict request

FR-FCFS (First-Ready, First-Come, First-Served):

  • Priority 1: Row buffer hits first (they're fastest — CAS only)
  • Priority 2: Among equal-priority requests, oldest first (FCFS)
  • Maximizes row buffer hit rate → maximizes DRAM throughput
  • But can be unfair — streaming accesses (many hits to same row) can starve other requests

Key insight: FR-FCFS prioritizes throughput over fairness. This can cause quality-of-service issues in multi-core/multi-application systems.

Key Points

  • •FCFS: simple, fair, but misses row buffer optimization opportunities
  • •FR-FCFS: prioritize row hits → maximize throughput, then FCFS for ties
  • •FR-FCFS maximizes row buffer hit rate but can be unfair
  • •Memory scheduling directly impacts both throughput and fairness
âš 

Exam Tip

Know how to trace through a sequence of memory requests under both FCFS and FR-FCFS and calculate total latency. FR-FCFS reorders to exploit row buffer hits.

06

Memory Latency Components

The total memory access latency has several components:

  1. CPU to memory controller: On-chip interconnect latency
  2. Controller latency / queuing: Time spent waiting in the memory controller queue + scheduling decision time
  3. Controller to DRAM: Bus transfer of command (address/command bus)
  4. Bank access latency:
    • Row hit: t_CAS (Column Access Strobe)
    • Row miss (closed): t_RAS + t_CAS
    • Row conflict: t_RP + t_RAS + t_CAS (Precharge + Activate + CAS)
  5. DRAM to controller (data bus): Data transfer across the memory bus. For a 64B cache line at 64-bit bus width: 8 transfers (each 8 bytes)
  6. Controller to CPU: On-chip interconnect back to the processor

Total latency can easily be 100+ ns for a row conflict, vs. ~50 ns for a row buffer hit.

Key Points

  • •Latency path: CPU→controller→DRAM command→bank access→data bus→controller→CPU
  • •Bank latency dominates: t_CAS (hit), t_RAS+t_CAS (miss), t_RP+t_RAS+t_CAS (conflict)
  • •Data transfer: 64B line = 8 transfers on 64-bit bus
  • •Total memory latency is typically 50-100+ ns
Back to all notes