13 - ROB

ucla | CS M151B | 2024-02-22 16:08

Modern processor pipeline so far

introduce register renaming to mitigate data dependency
introduce scoreboard to mitigate RAW hazards
Dynamic Scheduling
Limitations of Tomasulo
Common data Bus (CDB) bottleneck
- single broadcaast medium
- sharing aamong functional unit (FU), load/store unit (LSU) and reg file
- contention bw multiple producers and scalability bw multiple reveivers
- long latency broadcast
performance limit by CDB latency - multiple CDB allow multiple in praallel
Solutions
split buses: FU, LSU - expensive
crosbar switch - alo expensive
point to point connetions
higher level bus
Reorder Buffer (ROB)
reorder instructions
e.g., reorder to prevent branch prediction 3 stall
Motivation
Maintaining program order semantics
- Ensuring that CPU resources accessible to the application are updated in correct order – (e.g. code debugging)
Facilitate precise exception handling
- Ensuring that only the effects of instructions up to the one causing the exception are committed
Enabling Speculative Execution
- Supporting the ability to discard the result of speculatively executed instructions (e.g. branch misprediction)
  ROB operations
Separate architected (ARF) vs physical registers (PRF)
Track program order of all in-flight instructions
Enable in-order completion / commit / retirement
basically, ROB tracks head and tail pointer of regs in instructions
- operates like a circular linked list The ROB replaces the PRF:
  Process
  Issue Stage
read next instr from instr buffer
check for structural hazards
- check reservation station and ROB availability
- if none, then stall for hazard
read the RAT
- find PRF regs given source ARF reg
  - the PRF regs are entries in the ROB
allocate a new entry in the ROB
- ROB is sorted llist
- head points to oldest
- shift tail down
- copy ARF info to new entry in ROB - needed for callback/commit
update RAT
- if isntr has dest reg
- use rob dest index
- rob_idx => RAT->rd
- then RAT propagates renaming to reservation stations
dispatch instruction to reservation stations
Execute Stage
after reservation isntrs issued
wait for validity of source operands
- operands updated via CDB (common data bus)
proceed to execution once all operands ready
arbitrate selection of instr if multiple ready at RS (reservation stations)
then 1 RS intr is sent to ALU
Write
broaddcast result on CDB and update RS instrs
- dont update ARF
update ROB
- mark ready bit in rob to signify the instr has completed
free RS entry
Commit
wait until ROB’s head pointer’s ready bit is valid
- then write that back to ARF (update RAT) or update MEM
- shift head pointer down
- then wait until that ready bit is done
what the user sees:
- although C, F, G all executed before A, A’s ready bit in the ROB was not valid until now so it only gets reflected in ARF after it validates
- thus internally it is OOE but user still sees in order exec
Register renaming using ROB

Unified RS

instead of an RS for each add/sub and mult/div, use one unified RS
arbitration must be done to determine which ALU to go to from the URS
- OoO but not SuperScalar
  Because of only 1 shared CDB
CPU limited to 1 IPC issue
commit needs to happen >1 IPC to be sustained on superscalar
Faulty Solution: Dual (N) Issue
need to check resource availability of RS and ROB for 2 intrs ata time
then needs to do dual renaming
- simple unless RAW hazard:
- if there is RAW the dependency prevents parallel renaming
then update RS/ROBs in parallel
thus its expensive to heck for RAW for renaming for more than 4
Multiple CDB
what is used today
but requires RS entry to check operands acros both CDBs’ ROB/RAT
thus for >1 IPC we need multiple read ports on ROB and write on ARF
Checkpointing
Create checkpoints of RAT, if one succeeds then delete other chckpoints:
Handling LD/ST
handled separately bc cycle count is not deterministic and store is unpredictable
so we want to limit loads and eensure stores in order
Solution: queues
LD/ST queues
if LD ST is the same address and we observe ordering, then we can use loads from previous loads and stores in order
memory update in order via ROB
we determine address through ALU or independent Address Unit
Stores
wait for source operands
write value into table (queue)
wait for address
write address into table
then exec
Load
wait for address
write to table
search for earliest LD/ST of same address from queue to avoid another trip
write value to table
broadcast value via CDB

Optimization

Forwarding

Ready load with same address
- Forward value
Ready store with same address
- Forward value
Pending load with same address
- Wait
Pending store with unknown address
- Wait or speculate (forward value)?
Otherwise
- Go to memory
  Speculation
When load complete, check the store address
- If different address, we proceed as usual
- Otherwise load forwarding
  - Drop the fetched value
  - Doesn’t happen often
Common solution

13 - ROB

Table of Contents

Modern processor pipeline so far

Dynamic Scheduling

Limitations of Tomasulo

Solutions

Reorder Buffer (ROB)

Motivation

ROB operations

Process

Issue Stage

Execute Stage

Write

Commit

Unified RS

OoO but not SuperScalar

Faulty Solution: Dual (N) Issue

Multiple CDB

Checkpointing

Handling LD/ST

Solution: queues

Stores

Load

Optimization

Forwarding

Speculation