8 - Instruction Pipeline

ucla | CS M151B | 2024-02-01 16:11


Table of Contents

Single Cycle CPU

  • single cycle iin series, no parallelism
  • simple design, efficient
  • but long latency due to longest propagation delay due to signal loss
  • latency affects cycle time -> slower CPU time
  • solution: multi-cycle pipelining

    Multi-Cycle Processing

  • Improved Propagation Delay

  • Limitations

  • hardware undeer utilization -> idle cycles during operations
  • solution: pipeline

Pipelined Processing

  • fetch a new instruction each cycle
  • PC incremented each cycle
  • Performance Advantage

  • Pipelined Architeture

  • set next PC and instruction to control at each cycle

    Example Run

Latency

  • given base intruction cycle times
  • we can find single cycle latency as
  • while multi-cycle pipelined is
  • although the latency is reduced, there are issues with register dependency

    Pipeline Hazards

    Register Dependency

  • pipelined instructions disturb the sequential assumption of execution

    Control Hazards

  • control flow dependency b/w instrs
  • condition or branch addr may not yet be available to read

    Structural Hazards

  • resource sharing b/w instrs - prior instr using a resource needed by next instrs
  • specifically about the h/w sharing
  • e.g., multi-cycle divide

    Data Hazards

  • register sharing b/w instrs
  • prior operands req by next instr
  • read after write (RAW)
    • an issue bc read and write from same register but cycles apart is an issue
  • write after write (WAW) - not an issue
    • 2nd write happens a cycle after so no worries (in-order execution, issue if out of order)
  • write after read (WAR) - not an issue
    • not an issue due to in-order execution (due to separation in cycles)
  • read after read (RAR) - not an issue
    • no isue bc both are just reads and pipeline separates reads by >= 1 cycle

      Resolving RAW

  • given this RAW example
  • naive approach - stallling the pipleline
    • inserting a NOP - i.e., invalid op “bubbles” the register so it is not accessed until after the NOP
  • smarter solution - Data Forwarding

    Data Forwarding

  • include a wire (set high by a flag indicating RAW) that ties directly to the ALU that reads from the written reg
  • then MUX the source to the ALU bw the I1 op or the reg from I0 forward

    Cost

  • must add a wire from prev instr state to next state
  • also need a mux to select source
  • increases propagation delay

    Data Forwarding: Memory ALU

  • direct wire is not practical when reading from data due to propagation delay
  • in these cases we can add a NOP in between to reset the clock

General Designs

  • we decide the design based on propagation delay; reading from reg is expensive so adding mux is not great; instead we add after ALU bc propagation delay is low

    Selecting the MUX when data forwarding is required

  • look ahead to chek if the destination register is the same as the source for the next pipeline stage
  • then also look ahead for the other source/destination register of the second command -> LOR these then decide to handle if HI
  • then either add a NOP or set data forward select to HI
  • we can decide this using a combinational module at the pipeline stage that sends the select immediately before the execution i.e., in design 1, at the green bar after the ALU or the green bar after the Data Memory