Notes/Memory Systems

🧠

Memory Systems

L15L16Post-Midterm

DRAM organization, timing, and scheduling

Topics Covered

DRAM organization (channel/DIMM/rank/chip/bank)Page mode DRAM & row bufferRow buffer hit/miss/conflictDRAM refresh (burst vs distributed)Bank interleaving & channelsAddress mappingDRAM scheduling (FCFS, FR-FCFS)Latency components

DRAM Subsystem Organization

The DRAM subsystem has a deep hierarchy:

Channel: Independent memory controller connection with its own data bus (e.g., 64-bit wide). Multiple channels multiply bandwidth.
DIMM (Dual Inline Memory Module): Physical stick of memory containing multiple chips. Multiple DIMMs can share a channel.
Rank: A group of DRAM chips that operate in lockstep on the same DIMM. A 64-bit channel uses 8 chips per rank (each chip contributes 8 bits).
Chip: Individual DRAM chip containing multiple banks.
Bank: Independent array within a chip with its own row buffer (sense amplifiers). Banks can be accessed concurrently.
Row/Column: Within a bank, data is organized as a 2D array. A row is activated into the row buffer; columns are read from the row buffer.

Key hierarchy: Channel > DIMM > Rank > Chip > Bank > Row/Column

Transferring a 64-byte cache block: Across a rank, each of 8 chips contributes 8 bytes. This requires 8 column reads (each 8B burst) to fill a 64B cache line.

Key Points

•Hierarchy: Channel > DIMM > Rank > Chip > Bank > Row/Column
•Channel = independent data bus (64-bit wide)
•Rank = group of chips operating in lockstep (8 chips x 8 bits = 64 bits)
•Bank = independent array with own row buffer, enables concurrency
•64B cache line = 8 chips each contributing 8 bytes across the rank

⚠

Exam Tip

The hierarchy is crucial. Know how a cache block is spread across chips in a rank and why multiple banks/channels improve bandwidth.

Page Mode DRAM & Row Buffer

DRAM banks use a row buffer (implemented by sense amplifiers) to hold an entire activated row.

Three DRAM commands:

ACTIVATE (ACT / RAS): Opens a row — copies the entire row from the DRAM array into the row buffer. Destructive to the array (must write back eventually). Takes t_RAS time.
READ/WRITE (CAS): Accesses a specific column within the open row buffer. Much faster than a full row activation. Takes t_CAS time.
PRECHARGE (PRE): Closes the current row — writes the row buffer contents back to the array and prepares the bank for a new row activation. Takes t_RP time.

Row buffer states:

Row buffer hit: Requested row is already open in the row buffer → just CAS → fastest (t_CAS)
Row buffer miss (closed bank): No row is open → ACT + CAS (t_RAS + t_CAS)
Row buffer conflict: A different row is open → PRE + ACT + CAS → slowest (t_RP + t_RAS + t_CAS)

Key Points

•Row buffer = sense amplifiers holding one activated row
•ACT opens a row (t_RAS), CAS reads/writes column (t_CAS), PRE closes row (t_RP)
•Row hit: CAS only (fast)
•Row miss (closed): ACT + CAS (medium)
•Row conflict (wrong row open): PRE + ACT + CAS (slow)

⚠

Exam Tip

Distinguish the three row buffer states clearly. Many exam questions give access patterns and ask for the total latency based on hit/miss/conflict.

DRAM Refresh

DRAM capacitors leak charge and must be refreshed periodically (typically every 64 ms for the entire memory).

Refresh strategies:

Burst refresh: Refresh all rows back-to-back. Memory is unavailable for a long burst. Simpler but causes long stalls.
Distributed refresh: Spread refresh commands throughout the 64ms window. Each refresh command handles a few rows. Shorter individual stalls but more frequent interruptions.

Impact: During refresh, the bank being refreshed cannot service memory requests → performance and energy overhead. As DRAM capacity grows, more rows need refreshing → refresh overhead increases.

Refresh penalty grows with capacity: More rows → more refresh commands → more time spent refreshing → less time available for actual accesses.

Key Points

•DRAM must be refreshed every ~64ms (capacitor charge leaks)
•Burst refresh: all at once (long stall), distributed: spread out (shorter stalls)
•Refresh blocks the bank — no accesses during refresh
•Refresh overhead grows with DRAM capacity (more rows to refresh)

Multiple Banks & Channels

Multiple banks within a chip/rank enable bank-level parallelism:

Each bank has its own row buffer and operates independently
While one bank is activating a row, another can be servicing a CAS
Interleaving accesses across banks hides latency

Multiple channels provide additional bandwidth:

Each channel has its own data bus, command bus, and memory controller port
Channels operate completely independently
Two channels → 2x peak bandwidth

Address mapping/interleaving: How physical addresses are mapped to channels, ranks, banks, rows, and columns significantly affects performance.

Using lower address bits for bank/channel selection provides better interleaving because lower bits have more entropy (vary more across consecutive accesses)
Row interleaving: consecutive cache blocks go to different banks → maximizes bank-level parallelism for sequential access patterns

Key Points

•Multiple banks enable bank-level parallelism (concurrent operations)
•Multiple channels multiply bandwidth (independent data buses)
•Address interleaving: use lower bits for bank/channel selection (more entropy)
•Good interleaving maximizes parallelism across banks/channels

⚠

Exam Tip

Understand address interleaving — which bits map to which part of the hierarchy. Lower bits change more frequently, so mapping them to bank/channel spreads accesses evenly.

DRAM Scheduling Policies

The memory controller decides the order in which queued memory requests are serviced. This order significantly impacts performance.

FCFS (First-Come, First-Served):

Service requests in arrival order
Simple, fair, but ignores row buffer state
May cause unnecessary row conflicts when a row hit request is waiting behind a row conflict request

FR-FCFS (First-Ready, First-Come, First-Served):

Priority 1: Row buffer hits first (they're fastest — CAS only)
Priority 2: Among equal-priority requests, oldest first (FCFS)
Maximizes row buffer hit rate → maximizes DRAM throughput
But can be unfair — streaming accesses (many hits to same row) can starve other requests

Key insight: FR-FCFS prioritizes throughput over fairness. This can cause quality-of-service issues in multi-core/multi-application systems.

Key Points

•FCFS: simple, fair, but misses row buffer optimization opportunities
•FR-FCFS: prioritize row hits → maximize throughput, then FCFS for ties
•FR-FCFS maximizes row buffer hit rate but can be unfair
•Memory scheduling directly impacts both throughput and fairness

⚠

Exam Tip

Know how to trace through a sequence of memory requests under both FCFS and FR-FCFS and calculate total latency. FR-FCFS reorders to exploit row buffer hits.

Memory Latency Components

The total memory access latency has several components:

CPU to memory controller: On-chip interconnect latency
Controller latency / queuing: Time spent waiting in the memory controller queue + scheduling decision time
Controller to DRAM: Bus transfer of command (address/command bus)
Bank access latency:
- Row hit: t_CAS (Column Access Strobe)
- Row miss (closed): t_RAS + t_CAS
- Row conflict: t_RP + t_RAS + t_CAS (Precharge + Activate + CAS)
DRAM to controller (data bus): Data transfer across the memory bus. For a 64B cache line at 64-bit bus width: 8 transfers (each 8 bytes)
Controller to CPU: On-chip interconnect back to the processor

Total latency can easily be 100+ ns for a row conflict, vs. ~50 ns for a row buffer hit.

Key Points

•Latency path: CPU→controller→DRAM command→bank access→data bus→controller→CPU
•Bank latency dominates: t_CAS (hit), t_RAS+t_CAS (miss), t_RP+t_RAS+t_CAS (conflict)
•Data transfer: 64B line = 8 transfers on 64-bit bus
•Total memory latency is typically 50-100+ ns

Back to all notes