

# ודט עבקו בפשב

# Very Large Scale Integration II - VLSI II Basic Pipelining (cont'd)

#### Y. Fırat Kula

## ITU VLSI Laboratories Istanbul Technical University



www.vlsi.itu.edu.tr

26.05.2022



ודט עבקו נאפי

## ENGINEERING THE FUTURE

# **Basic Pipelining (cont'd)**

- Data Hazard Handling
  - RAW Hazards
  - Load-Use Hazards

## Control Hazard Handling

- Jumps
- Branches



ITU VLSI LAB

## ENGINEERING THE FUTURE

# **Hazards Review**

- Structural
  - Use seperate memories
  - Half-cycle register operation
- Data
  - RAW Hazards (Bypassing)
  - Load-Use Hazards (Bypassing & Stalling)
- Control
  - Jumps
  - Branches

#### INNOVATION • QUALITY • RELIABILITY

Stalling solves it all, but we want to avoid that...



ודט עבקו נפש

## ENGINEERING THE FUTURE

# **Hazards Review**

- Structural
  - Use seperate memories
  - Half-cycle register operation
- Data
  - RAW Hazards (Bypassing)
  - Load-Use Hazards (Bypassing & Stalling)
- Control
  - Jumps
  - Branches

#### INNOVATION • QUALITY • RELIABILITY



## ENGINEERING THE FUTURE

# ITU VLSI LABS

# **Read-After-Write (RAW) Hazard**

• Consider the following sequence...

sub x2, x1,x3
and x12,x2,x5
or x13,x6,x2
add x14,x2,x2
sd x15,100(x2)

• We can resolve this with forwarding

How do we decide when to forward?



## ENGINEERING THE FUTURE

# ודט עבקו באשב

# **Read-After-Write (RAW) Hazard**

• Reminder...





#### INNOVATION • QUALITY • RELIABILITY

www.vlsi.itu.edu.tr

7

26.05.2022



TU VL51

## ENGINEERING THE FUTURE

# **Read-After-Write (RAW) Hazard**

<u>sub</u> x2, x1,x3

and x12, x2,x5

<u>or</u> x13, x6,<mark>x2</mark>





ודט עבקו בפפ

## ENGINEERING THE FUTURE

# Forwarding: How to decide?

- Pass register numbers along pipeline
  - $ID/EX.RegisterRs = register number for Rs in ID/EX_$
  - ID/EX.RegisterRt = register number for Rt in ID/EX
  - ID/EX.RegisterRd = register number for Rd in ID/EX Destination Register
- Current instruction being executed in ID/EX register
- Previous instruction is in the EX/MEM register
- Second previous is in the MEM/WB register
- RAW Data hazards when

   Ia. EX/MEM.RegisterRd = ID/EX.RegisterRs
   Ib. EX/MEM.RegisterRd = ID/EX.RegisterRt
   2a. MEM/WB.RegisterRd = ID/EX.RegisterRs
   2b. MEM/WB.RegisterRd = ID/EX.RegisterRt



**ALU** inputs

#### INNOVATION • QUALITY • RELIABILITY



ודט עבקו בפפ

## ENGINEERING THE FUTURE

# Forwarding: How to decide?

- But only if forwarding instruction will write to a register!
  - EX/MEM.RegWrite, MEM/WB.RegWrite
- And only if Rd for that instruction is not R0
  - EX/MEM.RegisterRd  $\neq$  0
  - MEM/WB.RegisterRd  $\neq$  0



## ENGINEERING THE FUTURE

# Forwarding: How to decide?

- Detecting RAW hazard with Previous Instruction
- if (EX/MEM.RegWrite and (EX/MEM.RegisterRd ≠ 0) and (EX/MEM.RegisterRd = ID/EX.RegisterRs)) ForwardA = 01 (Forward from EX/MEM pipe stage)
- if (EX/MEM.RegWrite and (EX/MEM.RegisterRd ≠ 0) and (EX/MEM.RegisterRd = ID/EX.RegisterRt)) ForwardB = 01 (Forward from EX/MEM pipe stage)

Forward from EX logic (Forward from ALU output)

- Detecting RAW hazard with Second Previous
- if (MEM/WB.RegWrite and (MEM/WB.RegisterRd ≠ 0) and (MEM/WB.RegisterRd = ID/EX.RegisterRs))
   ForwardA = 10 (Forward from MEM/WB pipe stage)
- if (MEM/WB.RegWrite and (MEM/WB.RegisterRd ≠ 0) and (MEM/WB.RegisterRd = ID/EX.RegisterRt)) ForwardB = 10 (Forward from MEM/WB pipe stage)

Forward from MEM logic (Forward from Data Memory Output)

#### INNOVATION • QUALITY • RELIABILITY

#### **Forwarding Hardware**



INNOVATION • QUALITY • RELIABILITY

www.vlsi.itu.edu.tr

12

26.05.2022

#### **Reminder: Pipelined Control Signals**



INNOVATION • QUALITY • RELIABILITY

www.vlsi.itu.edu.tr



## ENGINEERING THE FUTURE

# **Forwarding Control Signals**

| Mux control   | Source | Explanation                                                                    |
|---------------|--------|--------------------------------------------------------------------------------|
| ForwardA = 00 | ID/EX  | The first ALU operand comes from the register file.                            |
| ForwardA = 10 | EX/MEM | The first ALU operand is forwarded from the prior ALU result.                  |
| ForwardA = 01 | MEM/WB | The first ALU operand is forwarded from data memory or an earlier ALU result.  |
| ForwardB = 00 | ID/EX  | The second ALU operand comes from the register file.                           |
| ForwardB = 10 | EX/MEM | The second ALU operand is forwarded from the prior ALU result.                 |
| ForwardB = 01 | MEM/WB | The second ALU operand is forwarded from data memory or an earlier ALU result. |

#### INNOVATION • QUALITY • RELIABILITY



ITU VLSI LAB

## ENGINEERING THE FUTURE

# Yet Another Complication...

• Consider the following sequence...

add x1,x1,x2 add x1,x1,x3 add x1,x1,x4

Which one to forward?

- > Both hazards occur
  - Want to use the most recent

We must revise our MEM forward condition – Only forward if EX hazard condition is not true



### ENGINEERING THE FUTURE

# Yet Another Complication...

add x1, x1,x2

add x1, x1, x3

add x1, x1,x4





## ENGINEERING THE FUTURE

# **Corrected Forwarding Logic**

- if (EX/MEM.RegWrite and (EX/MEM.RegisterRd ≠ 0) and (EX/MEM.RegisterRd = ID/EX.RegisterRs)) ForwardA = 01 (Forward from EX/MEM pipe stage)
- if (EX/MEM.RegWrite and (EX/MEM.RegisterRd ≠ 0) and (EX/MEM.RegisterRd = ID/EX.RegisterRt)) ForwardB = 01 (Forward from EX/MEM pipe stage)

Forward from EX logic (Forward from ALU output)

 if (MEM/WB.RegWrite and (MEM/WB.RegisterRd ≠ 0) and (MEM/WB.RegisterRd = ID/EX.RegisterRs)) and (EX/MEM.RegisterRd != ID/EX.RegisterRs)

ForwardA = 10 (Forward from MEM/WB pipe stage)

 if (MEM/WB.RegWrite and (MEM/WB.RegisterRd ≠ 0) and (MEM/WB.RegisterRd = ID/EX.RegisterRt)) and (EX/MEM.RegisterRd != ID/EX.RegisterRt)
 ForwardB = 10 (Forward from MEM/WB pipe stage) Forward from MEM logic (Forward from Data Memory Output)

INNOVATION • QUALITY • RELIABILITY

#### **Datapath with Complete Forwarding Hardware**



INNOVATION • QUALITY • RELIABILITY

www.vlsi.itu.edu.tr

26.05.2022



## ENGINEERING THE FUTURE

# **Memory-to-Memory Copies**



Can be resolved by adding a Forward Unit and a MUX to the MEM stage

#### INNOVATION • QUALITY • RELIABILITY

www.vlsi.itu.edu.tr



ITU VLSI LAB

## ENGINEERING THE FUTURE

# **Hazards Review**

- Structural
  - Use seperate memories
  - Half-cycle register operation
- Data
  - RAW Hazards (Bypassing)
  - Load-Use Hazards (Bypassing & Stalling)
- Control
  - Jumps
  - Branches

#### **Load-Use Hazards**

IΜ Reg DM Reg É lw x2, 4(x1)and x4, x2,x5 П IΜ Reg Ş Reg or x8, x2,x6 Reg IM Reg DM add x9, x4, x2 Reg IM Reg DM IM Rec Reg Ê DM

#### INNOVATION • QUALITY • RELIABILITY

21

26.05.2022

#### **Load-Use Hazards**



INNOVATION • QUALITY • RELIABILITY

22

www.vlsi.itu.edu.tr

26.05.2022



ודט עבקו נפפ

## ENGINEERING THE FUTURE

# **Load-Use Hazard Detection**

- Check when using instruction is decoded in ID stage
- ALU operand register numbers in ID stage are given by
  - IF/ID.RegisterRs, IF/ID.RegisterRt
- › Load-use hazard when

if(ID/EX.MemRead)
and (ID/EX.RegisterRd = IF/ID.RegisterRs)
or (ID/EX.RegisterRd = IF/ID.RegisterRt)
stall the pipeline (insert bubble)

#### INNOVATION • QUALITY • RELIABILITY



ITU VLSI LAB

## ENGINEERING THE FUTURE

# How to Stall in Hardware?

- Force control values in ID/EX register to 0
  - EX, MEM and WB do nop (no-operation)
- > Prevent update of PC and IF/ID register
  - Using instruction is decoded again
  - Following instruction is fetched again
  - 1-cycle stall allows MEM to read data for 1d
    - > Can subsequently forward to EX stage

A Hazard stall unit, and a MUX that selects between original control signals and zeros

#### **Datapath with Hazard Stall Hardware**



INNOVATION • QUALITY • RELIABILITY

www.vlsi.itu.edu.tr

25

26.05.2022



ודט עבקו בפפ

## ENGINEERING THE FUTURE

# Load-Use Hazards

- > Stalls reduce performance
  - But are required to get correct results
- Compiler can arrange code to avoid hazards and stalls
  - Requires knowledge of the pipeline structure



ודט עבקו נאפ

## ENGINEERING THE FUTURE

# **Control Hazards**

- When the flow of instruction adress is not sequential (PC = PC + 4)
  - Jumps (jal, jalr...)
  - Conditional Branches (beq, bge...)
  - Exceptions (unknown opcodes)
- Much less frequent than data hazards, but harder to deal with



## ENGINEERING THE FUTURE

# **RISC-V Jumps and Branches**

#### JAL: unconditional jump to PC+immediate

| 31      | 30       | 21        | 20      | 19        | 12 11 | 76     | 0 |
|---------|----------|-----------|---------|-----------|-------|--------|---|
| imm[20] | imm[10:1 | l]        | imm[11] | imm[19:12 | rd rd | opcode |   |
| 1       | 10       |           | 1       | 8         | 5     | 7      |   |
|         | off      | set[20:1] |         |           | dest  | JAL    |   |

#### JALR: indirect jump to rs1+immediate

| 31           | 20 19 | 15 14 12 1 | 11 7 | 6 0    |
|--------------|-------|------------|------|--------|
| imm[11:0]    | rs1   | funct3     | rd   | opcode |
| 12           | 5     | 3          | 5    | 7      |
| offset[11:0] | base  | 0          | dest | JALR   |

Branch: if (rs1 conds rs2), branch to PC+immediate

| 31      | 30 2       | 25 24 20 | 19 1 | 5 14 12 | 11       | 8 7     | 6      |
|---------|------------|----------|------|---------|----------|---------|--------|
| imm[12] | imm[10:5]  | rs2      | rs1  | funct3  | imm[4:1] | imm[11] | opcode |
| 1       | 6          | 5        | 5    | 3       | 4        | 1       | 7      |
| offset  | [12, 10:5] | src2     | src1 | BEQ/BNE | offset[1 | 1,4:1]  | BRANCH |
| offset  | [12,10:5]  | src2     | src1 | BLT[U]  | offset[1 | 1,4:1]  | BRANCH |
| offset  | [12, 10:5] | src2     | src1 | BGE[U]  | offset[1 | 1,4:1]  | BRANCH |

#### INNOVATION • QUALITY • RELIABILITY

www.vlsi.itu.edu.tr



ITU VLSI LABS

## ENGINEERING THE FUTURE

# **RISC-V Jumps and Branches**

| Each instruction fetch depends on one or two pieces of information from the preceding instruction:<br>1) Is the preceding instruction a taken branch?<br>2) If so, what is the target address? |                    |                    |  |  |  |  |
|------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|--------------------|--------------------|--|--|--|--|
| <ul> <li>JAL: unconditional jump to PC+immediate</li> <li>JALR: indirect jump to rs1+immediate</li> <li>Branch: if (rs1 conds rs2), branch to PC+immediate</li> </ul>                          |                    |                    |  |  |  |  |
| Instruction                                                                                                                                                                                    | Taken known?       | Target known?      |  |  |  |  |
| JAL                                                                                                                                                                                            | After Inst. Decode | After Inst. Decode |  |  |  |  |
| JALR                                                                                                                                                                                           | After Inst. Decode | After Reg. Fetch   |  |  |  |  |
| B <cond.></cond.>                                                                                                                                                                              | After Execute      | After Inst. Decode |  |  |  |  |

#### INNOVATION • QUALITY • RELIABILITY



ודט עבקו נפש

## ENGINEERING THE FUTURE

# **Hazards Review**

- Structural
  - Use seperate memories
  - Half-cycle register operation
- Data
  - RAW Hazards (Bypassing)
  - Load-Use Hazards (Bypassing & Stalling)
- Control
  - Jumps
  - Branches



TU VLSI LAB



## Jumps

Jumps are decoded in ID, so wee need one flush...

(jump)

"flush" this insturction

(jump target)



#### INNOVATION • QUALITY • RELIABILITY



ודט עבקו נאפ

## ENGINEERING THE FUTURE

# "Bubble" and "Flush"

- Bubble is when we insert a "nop" in the pipeline
  - We prevent the instructions earlier in the pipeline from progressing down the pipeline, for one cycle, by zeroing control signals
  - Later instructions progresses normally
- Flush is when we replace an ongoing instruction with a "nop"
  - Zero the control signals for the instruction to be flushed (we zeroed all bits in IF/ID register, before decoding the control signals)

#### **Flushing for Jumps**





TU VLSI LAB

## ENGINEERING THE FUTURE

## Jumps

- Can we resolve jumps in IF stage (CPI=1) ?
  - Pre-calculated Look-up Tables can be added in IF stage (compiler+hardware support)





ודנו עבקו בפפ

## ENGINEERING THE FUTURE

# **Hazards Review**

- Structural
  - Use seperate memories
  - Half-cycle register operation
- Data
  - RAW Hazards (Bypassing)
  - Load-Use Hazards (Bypassing & Stalling)
- Control
  - Jumps
  - Branches

#### **Reminder: Pipelined Control Signals**



INNOVATION • QUALITY • RELIABILITY

www.vlsi.itu.edu.tr

#### **Branches**



#### INNOVATION • QUALITY • RELIABILITY



www.vlsi.itu.edu.tr

26.05.2022



נח תרצו רשם

### ENGINEERING THE FUTURE

## **Branches: A Better Way**

Move branch decision hardware earlier in the pipeline

beq

flush

(beq target)



#### INNOVATION • QUALITY • RELIABILITY

38

26.05.2022



ודט עבקו בפפי

### ENGINEERING THE FUTURE

# **Revised Branch Decision**

- Move hardware to determine outcome to ID stage
  - Target address adder
  - Register comparator
- > Example: branch taken

36: sub x10, x4, x8 40: beq x1, x3, 16 // PC-relative branch // to 40+16\*2=72

44: and x12, x2, x5
48: orr x13, x2, x6
52: add x14, x4, x2
56: sub x15, x6, x7
...
72: ld x4, 50(x7)

INNOVATION • QUALITY • RELIABILITY

39

#### **Revised Branch Decision Hardware + Example cont'd**



INNOVATION • QUALITY • RELIABILITY

40

26.05.2022

#### **Revised Branch Decision Hardware + Example cont'd**



#### INNOVATION • QUALITY • RELIABILITY

41

www.vlsi.itu.edu.tr



ודט עבקו נפפ

### ENGINEERING THE FUTURE

## **Branch Hazards: More Details**

| WB  | add3   | ×1,                       |
|-----|--------|---------------------------|
| MEM | add2   | ×3,                       |
| EX  | add1   | ×4,                       |
| ID  | beq    | ×1, ×2,                   |
| IF  | (next_ | _instruction_in_sequence) |

| WB  | add3  | x3,                      |
|-----|-------|--------------------------|
| MEM | add2  | x1,                      |
| EX  | add1  | x4,                      |
| ID  | beq   | x1, x2,                  |
| IF  | (next | instruction in sequence) |

• No issue, Reg file write before read solves this.

• EX/MB forwarding: For cases like this, we must forward from EX/MB to the comparator inputs



ITU VLSI LAB

### ENGINEERING THE FUTURE

## **Branch Hazards: More Details**

### Branch EX/MEM forwarding decision logic

if (IDControl.Branch) and (EX/MEM.RegisterRd ! = 0) and (EX/MEM.RegisterRd = IF/ID.RegisterRs) ForwardC = 1

if (IDControl.Branch) and (EX/MEM.RegisterRd ! = 0) and (EX/MEM.RegisterRd = IF/ID.RegisterRt) ForwardD = 1



ודט עבאו באשב

### ENGINEERING THE FUTURE

## **Branch Hazards: More Details**

| WB d | add3 | x3, |  |
|------|------|-----|--|
|------|------|-----|--|

- MEM add3 x4, ...
- EX add3 x1, ...
- ID beq x1, x2, ...

• Similar case with Load-Use hazards, stall is unavoidable

IF

#### **Revised Branch Decision Hardware**



INNOVATION • QUALITY • RELIABILITY

45

www.vlsi.itu.edu.tr



ודט עבקו בפפ

### ENGINEERING THE FUTURE

# **Branch Hazards**

- Do nothing
  - Three cycle stall (flush)
- Resolve decision in EX stage
  - Stall reduced to two cycles
  - Increases critical time of EX stage, which is already high
- Resolve decision in ID stage
  - Stall reduced to one cycle
  - Extra comparator and selection logic
  - Needs another forwarding unit for ID stage
- Can we make branch decision in IF stage (No stall -> CPI=1)?



ודט עבקו נפש

### ENGINEERING THE FUTURE

## **Branch Hazards**

### Compiler based solution:

Delay slots: The one stall cycle after a branch instruction is called a delay slot. Compiler can find some other instruction to put there, that will not have any effect on endangered registers.

- But it may not be always feasible to find suitable instructions to place in delay slots, especially in longer pipelines...

#### INNOVATION • QUALITY • RELIABILITY

47



ודט עבקו באפ

### ENGINEERING THE FUTURE

# **Branch Prediction**

- Motivation:
  - Branch penalties limit performance of deeply pipelined processors
  - Modern branch predictors have high accuracy
  - (>95%) and can reduce branch penalties significantly
- Required hardware support:
  - Prediction structures:
    - Branch history tables, branch target buffers, etc.
  - Mispredict recovery mechanisms:
    - Keep result computation separate from commit
    - Kill instructions following branch in pipeline
    - Restore state to that following branch



ITU VLSI LAB

### ENGINEERING THE FUTURE

# **Static Branch Prediction**

 Resolve branch hazards by assuming an outcome, then proceeding without waiting for the actual branch outcome

Overall probability a branch is taken is ~60-70% but:



ISA can attach preferred direction semantics to branches, e.g., Motorola MC88110 bne0 (preferred taken) beq0 (not taken)

#### INNOVATION • QUALITY • RELIABILITY

49



립니

LU VL5

### ENGINEERING THE FUTURE

## **Static Branch Prediction**

• Flushing with misprediction (branch not taken)

beq x1, x0,16

and x12, x2,x5 (flush)

or x13, x6,x2 (destination)



Flush by asserting IF.Flush signal, converting bits in IF/ID instruction field to zero and turning to "nop" instruction

INNOVATION • QUALITY • RELIABILITY

www.vlsi.itu.edu.tr

50

26.05.2022



ודט עבקו נפפ

### ENGINEERING THE FUTURE

# **Static Branch Prediction**

- "Predict not taken" works well for top controlled loops
- "Branch taken" prediction
  - Always incurs one cycle stall (assuming branch decision is done in ID stage)
  - Is there a way to store the adress of the branch beforehand?
- For longer pipelines, branch penalty greatly increases, so a better way is required...



ודט עבקו נפפ

### ENGINEERING THE FUTURE

# **Dynamic Branch Prediction**

- In deeper and superscalar pipelines, branch penalty is more significant
- > Use dynamic prediction
  - Branch prediction buffer (aka branch history table)
  - Indexed by recent branch instruction addresses
  - Stores outcome (taken/not taken)
  - To execute a branch
    - > Check table, expect the same outcome
    - > Start fetching from fall-through or target
    - > If wrong, flush pipeline and flip prediction



TU VLSI LAB

### ENGINEERING THE FUTURE

# **Branch History Table**

- A branch prediction buffer (or branch history table BHT) in the IF stage indexed by the PC, conatins bits passed to the ID stage through IF/ID pipeline register that tells whether the branch was taken last time it was executed.
- If the prediction is wrong, flush and return the old instruction (handling this did not change, but the prediction correctness rate greatly increases)
- A 4096 bit BHT varies from %1 to %18 percent mispredction, tried on various benchmark programs



ITU VLSI LAB

### ENGINEERING THE FUTURE

## **Branch Target Buffer**

BHT tells us if we take the branch or not, but it does not tell us where to go!

A branch target buffer (BHT) in IF stage catches the branch target addresses. Using the prediction bits from IF/ID stage, we choose whether or not take the jump destination or the sequential instruction.







ודט עבקו נפפי

### ENGINEERING THE FUTURE

# **Dynamic Branch Prediction**

- 1-bit prediction scheme
  - Low-portion address as address for a one-bit flag for Taken or NotTaken historically
  - Simple
- 2-bit prediction
  - Miss twice to change



ודט עבקו בפפ



## **1-bit Predictor**

> Inner loop branches mispredicted twice!



- Mispredict as taken on last iteration of inner loop
- Then mispredict as not taken on first iteration of inner loop next time around

#### INNOVATION • QUALITY • RELIABILITY

56



ודט עבאו באשב

### ENGINEERING THE FUTURE

## **2-bit Predictor**

It takes two wrong predictions to change decision



#### INNOVATION • QUALITY • RELIABILITY



www.vlsi.itu.edu.tr



### ENGINEERING THE FUTURE

## **References and Further Reading**

- http://passlab.github.io/CSE564/not es/lecture09\_RISCV\_Impl\_pipeline. pdf
- https://inst.eecs.berkeley.edu/~cs61
   c/fa17/lec/13/L13%20Pipelining%20
   (1up).pdf

