Pipelining & Dependencies
Pipeline design, data/control dependencies, forwarding
Topics Covered
Single-Cycle Machine (MIPS Datapath)
In a single-cycle machine, every instruction completes in one clock cycle. The clock period must be long enough for the slowest instruction (typically a load: IF + ID + EX + MEM + WB all in one cycle).
MIPS datapath stages (which become pipeline stages):
- IF (Instruction Fetch): Read instruction from I-cache using PC
- ID (Instruction Decode): Decode instruction, read register file
- EX (Execute): ALU operation, address calculation, branch resolution
- MEM (Memory Access): Load/store access D-cache
- WB (Write Back): Write result to register file
Problem: The clock cycle time is determined by the longest instruction. A simple ADD doesn't need MEM, but it still takes the full cycle. This wastes time for fast instructions.
Solution: Pipelining β divide execution into stages and overlap multiple instructions.
Key Points
- β’Single-cycle: one instruction per cycle, clock = slowest instruction time
- β’MIPS stages: IF β ID β EX β MEM β WB
- β’Problem: fast instructions wait for slow instruction's cycle time
- β’Motivation for pipelining: overlap instructions in different stages
Pipelining Basics
Pipelining overlaps the execution of multiple instructions. While one instruction is in EX, the next is in ID, and the one after is in IF.
Ideal pipeline assumptions:
- All operations are identical (take the same time)
- All operations are independent (no dependencies)
- Sub-operations are uniform (all stages take the same time)
Throughput formula for k-stage pipeline processing N instructions:
BW = N / (k + N - 1) cycles
With latch (pipeline register) overhead S per stage:
BW_k = 1 / (T/k + S)
where T = total unpipelined delay, k = number of stages, S = latch delay per stage.
Ideal speedup: k (equal to number of stages). In practice, always less due to:
- Pipeline stalls: Dependencies force bubbles (empty cycles)
- Latch overhead: Each stage boundary adds S time
- Non-uniform stages: Clock period = longest stage β shorter stages waste time (internal fragmentation)
- External fragmentation: Not all operations use all stages (e.g., ADD doesn't need MEM)
Key Points
- β’Pipeline overlaps multiple instructions in different stages
- β’Ideal: all ops identical, independent, uniform sub-operations
- β’Throughput: BW_k = 1/(T/k + S), ideal speedup = k stages
- β’Reality: stalls, latch overhead, non-uniform stages reduce speedup
- β’Internal fragmentation: clock = longest stage; External: not all ops use all stages
Exam Tip
Be able to calculate throughput with the BW formula. Know the three ideal pipeline assumptions and what happens when each is violated.
Data Dependencies & Hazards
Dependencies between instructions cause pipeline hazards that require stalls or special handling.
Types of data dependencies:
- RAW (Read After Write) β True dependency: Instruction B reads a value that instruction A writes. B must wait for A's result.
ADD R1,R2,R3; SUB R4,R1,R5(SUB reads R1 that ADD writes). - WAR (Write After Read) β Anti-dependency: Instruction B writes a register that A reads. Only a problem in out-of-order execution.
ADD R1,R2,R3; SUB R2,R4,R5(SUB writes R2 that ADD reads). - WAW (Write After Write) β Output dependency: Both A and B write the same register. Only a problem with out-of-order or multi-cycle operations.
ADD R1,R2,R3; SUB R1,R4,R5.
In a simple in-order pipeline: Only RAW dependencies cause hazards (WAR and WAW are handled by in-order execution naturally).
Control dependencies: A branch instruction determines whether subsequent instructions should execute. If the branch is not yet resolved, subsequent instructions may be wrong-path.
Key Points
- β’RAW (true): B reads what A writes β always a real dependency
- β’WAR (anti): B writes what A reads β problem only in out-of-order
- β’WAW (output): both write same register β problem in out-of-order / multi-cycle
- β’In-order pipeline: only RAW causes stalls
- β’Control dependency: branch outcome determines subsequent instructions
Data Forwarding / Bypassing
Forwarding (bypassing) is a hardware technique to resolve RAW hazards without stalling (when possible).
Key insight: The result is computed at the end of the EX stage (or MEM for loads), but is not written to the register file until the WB stage. A dependent instruction needs the value at the start of its EX stage.
Without forwarding: Must stall until the producing instruction writes back (2 cycle bubble for EXβEX dependency).
With forwarding:
- Add multiplexers at ALU inputs
- Forward the result directly from the output of EX (or MEM) to the input of EX for the dependent instruction
- EX-to-EX forwarding: Result from EX of instruction i is forwarded to EX of instruction i+1. No stall needed.
- MEM-to-EX forwarding: Result from MEM of instruction i forwarded to EX of instruction i+1. Also no stall if available in time.
Load-use dependency: Even with forwarding, a load followed by a dependent instruction still requires 1 cycle stall. The load data is only available after MEM, but the dependent instruction needs it at EX β there's a 1-cycle gap that cannot be forwarded away.
The register file is a communication abstraction: Forwarding "short-circuits" the register file to get data to dependent instructions sooner.
Key Points
- β’Forwarding: send result directly from EX/MEM output to next instruction's EX input
- β’Eliminates most RAW stalls (EXβEX, MEMβEX forwarding)
- β’Load-use hazard: load + dependent instruction still needs 1-cycle stall
- β’Register file = communication abstraction; forwarding bypasses it for speed
Exam Tip
Draw pipeline diagrams with forwarding paths. The load-use case (1 stall even with forwarding) is a classic exam question. Be able to identify where stalls are unavoidable.