| Computer Architecture                                                                     | License: https://creativecommons.org/licenses/by-nc-nd/4.0/                                                |
|-------------------------------------------------------------------------------------------|------------------------------------------------------------------------------------------------------------|
| 2. The                                                                                    | Pipeline                                                                                                   |
| In <b>pipelining</b> , <b>m</b> ultiple tasks (fo                                         | or example, instructions) are executed in parallel.                                                        |
| To use the pipelining approach                                                            | efficiently                                                                                                |
| 1. We must have tasks that ar                                                             | e <u>repeated</u> many times on different data.                                                            |
| <ol> <li>Tasks must be divided into <u>s</u><br/>performed <u>in parallel</u>.</li> </ol> | small pieces (operations or actions) that can be                                                           |
| Example of a pipeline: an autor                                                           | nobile assembly line.                                                                                      |
| The task                                                                                  |                                                                                                            |
| <ul> <li>is the construction of a car,</li> </ul>                                         |                                                                                                            |
| <ul> <li>is repeated many times for a</li> </ul>                                          | different cars,                                                                                            |
| <ul> <li>consists of some operations,</li> </ul>                                          | such as attaching the doors, attaching the tires.                                                          |
| Each operation                                                                            |                                                                                                            |
| <ul> <li>has its own station in the pip</li> </ul>                                        | beline (assembly line).                                                                                    |
| <ul> <li>is performed in parallel with</li> </ul>                                         | other operations but on a different car.                                                                   |
|                                                                                           | ching the doors of the i <sup>th</sup> car, another worker is<br>(i+1) <sup>st</sup> car at the same time. |
| http://akademi.itu.edu.tr/en/buzluca/<br>http:// www.buzluca.info                         | COOSO 2013 - 2021 Feza BUZLUCA 2.1                                                                         |

| Computer Architect                                                                  | ure               |                              |                     |                     |  |  |
|-------------------------------------------------------------------------------------|-------------------|------------------------------|---------------------|---------------------|--|--|
| Example: And                                                                        | automobile asse   | embly line with <sup>.</sup> | three stations      |                     |  |  |
|                                                                                     | $\Rightarrow$     |                              |                     | $\Longrightarrow$   |  |  |
|                                                                                     | Station 1         | Station 2                    | Station 3           |                     |  |  |
| Step = 1                                                                            | Car 1             |                              |                     |                     |  |  |
|                                                                                     | Station 1         | Station 2                    | Station 3           |                     |  |  |
| Step = 2                                                                            | Car 2             | Car 1                        |                     |                     |  |  |
|                                                                                     | Station 1         | Station 2                    | Station 3           |                     |  |  |
| Step = 3                                                                            | Car 3             | Car 2                        | Car 1               | ——> Car 1 is ready. |  |  |
|                                                                                     | Station 1         | Station 2                    | Station 3           |                     |  |  |
| A                                                                                   | t the end of Step | ) = 3 the Car 1 (T           | ask 1) has been com | pleted.             |  |  |
|                                                                                     |                   |                              |                     |                     |  |  |
| Step = 4                                                                            | Car 4             | Car 3                        | Car 2               | Car 2 is ready.     |  |  |
|                                                                                     | Station 1         | Station 2                    | Station 3           |                     |  |  |
| After Step = 3 (the pipeline is full), at each step, a new car (task) is completed. |                   |                              |                     |                     |  |  |
| http://akademi.itu.edu.t                                                            |                   |                              | 2013 - 202          | 1 Feza BUZLUCA 2.2  |  |  |





| Computer Architecture |  |
|-----------------------|--|
|-----------------------|--|

Example (cont'd):

- In this example, the task is decomposed into 3 operations: Reading, multiplication, and addition.
- We assume that arrays are in separate memory modules, which can be read in parallel.
- We start to read elements of array C one clock cycle after reading A and B.

Functioning of the pipeline with three stages:

| 1 diferioring                                                                     | or me pipem    |                | m ee stages.                   |                |                                             |         |
|-----------------------------------------------------------------------------------|----------------|----------------|--------------------------------|----------------|---------------------------------------------|---------|
| Clock cycle                                                                       |                |                | 2. Stage(Multiply)             |                | 3.Stage (Add)                               |         |
|                                                                                   | R1             | R2             | R3                             | R4             | R5                                          | _       |
| 1                                                                                 | A <sub>1</sub> | B <sub>1</sub> | -                              | -              | -                                           | -       |
| 2                                                                                 | A <sub>2</sub> | B <sub>2</sub> | A <sub>1</sub> *B <sub>1</sub> | C <sub>1</sub> | -                                           |         |
| 3                                                                                 | A <sub>3</sub> | B <sub>3</sub> | A2*B2                          | •              | $A_1^*B_1 + C_1$ (Firs                      |         |
| 4                                                                                 | A <sub>4</sub> | $B_4$          | $A_3^*B_3$                     | $C_3$          | $A_2 * B_2 + C_2$ (2nd                      | result) |
| 5                                                                                 | A <sub>5</sub> | B <sub>5</sub> | $A_4^*B_4$                     | $\tilde{C_4}$  | $A_3^*B_3 + C_3$ (3rd                       | result) |
| Note:                                                                             |                | ·              |                                |                |                                             |         |
| durations                                                                         |                | operatio       | ons and the d                  | lata is alw    | ificantly shorter th<br>ays ready to be rea |         |
| • In this case, the pipeline could be designed with two stages which perform only |                |                |                                |                |                                             |         |
|                                                                                   | cal operations |                |                                |                |                                             | ,       |
| http://akademi.itu.ec                                                             |                |                | e                              |                | 2013 - 2021 Feza BUZLUCA                    | 2.5     |
|                                                                                   |                |                |                                |                |                                             |         |

| Computer Archite                                                                                                                                             | cture                                                  |     |    |      |       |       |       |       |                                                   |
|--------------------------------------------------------------------------------------------------------------------------------------------------------------|--------------------------------------------------------|-----|----|------|-------|-------|-------|-------|---------------------------------------------------|
| 2.2 Space-Time Diagram of a pipeline with four stages                                                                                                        |                                                        |     |    |      |       |       |       |       |                                                   |
| Space-time di<br>processed in v                                                                                                                              |                                                        |     |    |      |       |       | how \ | which | task is currently being                           |
|                                                                                                                                                              |                                                        |     |    |      |       |       |       |       | are the column labels, stages<br>e table entries. |
| Example:<br>(4 stages)                                                                                                                                       |                                                        | Tim | e→ | Cloc | k Cyc | les ( | steps | 5)    |                                                   |
|                                                                                                                                                              |                                                        | 1   | 2  | 3    | 4     | 5     | 6     | 7     |                                                   |
|                                                                                                                                                              | <b>S</b> 1                                             | T1  | T2 | Т3   | T4    | T5    | T6    |       |                                                   |
| Jes                                                                                                                                                          | 52                                                     |     | T1 | T2   | Т3    | T4    | T5    | T6    |                                                   |
| Stages                                                                                                                                                       | 53                                                     |     |    | T1   | T2    | Т3    | T4    | T5    |                                                   |
|                                                                                                                                                              | 54                                                     |     |    |      | T1    | T2    | T3    | T4    |                                                   |
|                                                                                                                                                              |                                                        |     |    |      |       | •     |       |       |                                                   |
| The 1st task (T1) is completed in 4<br>clock cycles (number of stages k=4). After the k <sup>th</sup> cycle, a new task<br>is completed in each clock cycle. |                                                        |     |    |      |       |       |       |       |                                                   |
| Four task                                                                                                                                                    | Four tasks (T4) have been completed in 7 clock cycles. |     |    |      |       |       |       |       |                                                   |
| http://akademi.itu.edu.tr/en/buzluca/<br>http:// www.buzluca.info 2013 - 2021 Feza BUZLUCA 2.6                                                               |                                                        |     |    |      |       |       |       |       |                                                   |

| Compu                                                                 | ter Arc                                                                      | hitectur   | e        | License: https://creativecommons.org/licenses/by-nc-nd/4.0/ |      |       |      |    |                   |                                                                 |
|-----------------------------------------------------------------------|------------------------------------------------------------------------------|------------|----------|-------------------------------------------------------------|------|-------|------|----|-------------------|-----------------------------------------------------------------|
| Space-Time Diagram of a pipeline with four stages, cont'd             |                                                                              |            |          |                                                             |      |       |      |    |                   |                                                                 |
| We could also construct the space-time diagram in an alternative way. |                                                                              |            |          |                                                             |      |       |      |    |                   |                                                                 |
| In the<br>row la                                                      |                                                                              |            |          |                                                             |      |       |      |    |                   | lumn labels, tasks (Ti) are the                                 |
|                                                                       | 5013,                                                                        |            | ruges    | , (01)                                                      | ure  |       | 1010 |    |                   |                                                                 |
|                                                                       |                                                                              |            |          |                                                             |      |       |      | :  |                   |                                                                 |
|                                                                       |                                                                              | Tim        |          | Clock                                                       | Cucl | oc (c | tone |    |                   | st task (T1) is completed in 4<br>cycles (number of stages k=4) |
|                                                                       | $\longrightarrow Clock Cycles (steps) (clock cycles (number of stages k=4))$ |            |          |                                                             |      |       |      |    |                   |                                                                 |
|                                                                       | T1                                                                           | 51         | 52       | 53                                                          | 54*  |       | •    | ,  | A                 | fter the k <sup>th</sup> cycle, a new task                      |
| asks                                                                  | T2                                                                           |            | S1       | 52                                                          | 53   | 544   |      |    |                   | completed in each clock cycle                                   |
| Tas                                                                   | T3                                                                           |            |          | 51                                                          | 52   | 53    | 54   |    | 1                 |                                                                 |
|                                                                       | T4                                                                           |            |          |                                                             | 51   | 52    | 53   | 54 | J                 |                                                                 |
|                                                                       |                                                                              |            |          |                                                             |      |       |      |    |                   |                                                                 |
| Four tasks (T4) have been completed in 7 clock cycles.                |                                                                              |            |          |                                                             |      |       |      |    |                   |                                                                 |
| http://aka                                                            | .demi.itu                                                                    | ı.edu.tr/e | en/buzlu | ca/                                                         |      |       |      | 6  | <u>) () () ()</u> | 3 2013 - 2021 Feza BUZLUCA 2.7                                  |
| http:// ww                                                            |                                                                              |            |          | -                                                           |      |       |      | e  | BY NC             | 2013 - 2021 Feza BUZLUCA 2.7                                    |

| Computer Architecture                                                                                                                                                                                                                                                           |
|---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
| 2.3 Throughput and Speedup provided by the pipeline                                                                                                                                                                                                                             |
| Since all stages proceed at the same time, the time (delay) required for the <b>slowest stage</b> determines the length of the period of the clock signal (cycle time).                                                                                                         |
| The cycle time (the period of the clock) $\mathbf{t}_{\mathbf{p}}$ can be determined as follows:                                                                                                                                                                                |
| $\begin{split} t_p &= max(\tau_i) + d_r = \tau_M + d_r \\ t_p &: \text{ cycle time } \\ \tau_i &: \text{ time delay of the circuitry in the ith stage } \\ \tau_M &: \text{ maximum stage delay (the slowest stage)} \\ d_r &: \text{ time delay of the register } \end{split}$ |
| http://akademi.itu.edu.tr/en/buzluca/                                                                                                                                                                                                                                           |

**Computer Architecture** Speedup: k: number of stages in the pipeline t<sub>p</sub>: cycle time n: number of tasks t<sub>n</sub>: time required for a task without pipelining Calculation of the total time required for n tasks: Calculation of the total time required to complete the first task (T1). Time:  $T(1) = k \cdot t_p$ • remaining n-1 tasks require (n-1) cycles. • Total time required for n tasks:  $T(n) = (k+n-1)t_p$ **Speedup:**  $S = \frac{Execution time without the pipeline}{Execution time with the pipeline} S = \frac{n \cdot t_n}{(k+n-1) \cdot t_p}$ If the number of tasks increases significantly:  $n \rightarrow \infty$ ,  $S_{n \rightarrow \infty} = \frac{t_n}{t_p}$ If we assume  $t_n = k \cdot t_p$  , (If it were possible to divide the main task into k equal small operations and ignore the register delays, the cycle time would be  $t_p = t_n / k$ .) S<sub>max</sub> = k (Theoretical maximum speedup) http://akademi.itu.edu.tr/en/buzluca/ 2013 - 2021 Feza BUZLUCA 2.9 http:// www.buzluca.info

| Computer Architecture                                                                                                                                                                             |
|---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
| Comments on speedup:                                                                                                                                                                              |
| To improve the performance of the pipeline, tasks must be divided into <u>small</u> and <u>balanced</u> operations with equal (or at least similar) durations.                                    |
| If the durations of the operations are short, then the clock cycle $(t_p)$ can be short. Remember: The slowest stage determines the clock cycle.                                                  |
| Effects of increasing the number of stages of a pipeline:                                                                                                                                         |
| Advantage:                                                                                                                                                                                        |
| <ul> <li>If the task can be divided into many small operations, increasing the number of<br/>stages can lower the clock cycle (t<sub>p</sub>), and consequently the speedup increases.</li> </ul> |
| Disadvantages:<br>$S_{\text{max}} = k \text{ (Theoretical)}$                                                                                                                                      |
| <ul> <li>The cost of the pipeline increases. At each stage of the pipeline, there is some<br/>overhead (cost, energy, space) because of registers and additional connections.</li> </ul>          |
| • The completion time of the first task increases. $T(1) = k \cdot t_p$                                                                                                                           |
| <ul> <li>Branch penalties in the instruction pipeline caused by control hazards increase.</li> <li>We will discuss branch penalties in the section "2.5 Pipeline hazards".</li> </ul>             |
| While designing a pipeline, these advantages and disadvantages should be taken into consideration.                                                                                                |
| http://akademi.itu.edu.tr/en/buzluca/                                                                                                                                                             |

| Computer Arch | nitecture |
|---------------|-----------|
|               |           |
|               |           |

| Effects of                                                                                                                    | task partitioning on the                                                                                                                              | speedup:                                                               |                      |  |  |  |
|-------------------------------------------------------------------------------------------------------------------------------|-------------------------------------------------------------------------------------------------------------------------------------------------------|------------------------------------------------------------------------|----------------------|--|--|--|
| faster clock<br>Assume that<br>Assume that                                                                                    | an be partitioned into sn<br>signal (shorter cycle tim<br>we have a task T with a<br>we can decompose this t<br>partition the task into 2             | e) can be used.<br>total duration of 100 ns<br>task in different ways. |                      |  |  |  |
|                                                                                                                               | S1 = 50ns                                                                                                                                             | S2 = 50ns                                                              |                      |  |  |  |
| Т: [                                                                                                                          |                                                                                                                                                       |                                                                        | 7                    |  |  |  |
| If the delay                                                                                                                  | If the delay of the registers is 5 ns, then the clock cycle is $t_p = 50+5 = 55$ ns<br>Case B: We partition the task into 3 <u>unbalanced</u> stages. |                                                                        |                      |  |  |  |
| S1 = 25ns S2 = 25ns S3 = 50ns                                                                                                 |                                                                                                                                                       |                                                                        |                      |  |  |  |
| T:                                                                                                                            |                                                                                                                                                       |                                                                        |                      |  |  |  |
| The clock cyc                                                                                                                 | cle is t <sub>p</sub> = 50+5 = 55 ns (s                                                                                                               | slowest stage τ <sub>M</sub> =50ns )                                   | )                    |  |  |  |
| Although the pipeline has more stages, there is no speed improvement compared to case A, because $t_{\rm p}$ is still 55 ns . |                                                                                                                                                       |                                                                        |                      |  |  |  |
| Besides, the cost of the pipeline has increased.                                                                              |                                                                                                                                                       |                                                                        |                      |  |  |  |
| Also, the completion time of the first task has increased. $T(1) = k \cdot t_p$                                               |                                                                                                                                                       |                                                                        |                      |  |  |  |
| http://akademi.itu.edu<br>http:// www.buzluca.ir                                                                              |                                                                                                                                                       | CO O O O 2013 - 201                                                    | 21 Feza BUZLUCA 2.11 |  |  |  |



| Computer Architecture License: https://creativecommons.org/licenses/by-nc-nd/4.0/        |                                                                                                                                                                                           |                             |                       |  |  |  |  |  |
|------------------------------------------------------------------------------------------|-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|-----------------------------|-----------------------|--|--|--|--|--|
| 2.4 Instruction P                                                                        | ipeline (Instructi                                                                                                                                                                        | ion-Level Paralle           | lism)                 |  |  |  |  |  |
| During the execution of e                                                                | During the execution of each instruction the CPU repeats some operations.                                                                                                                 |                             |                       |  |  |  |  |  |
| The processing required f                                                                | or a single instruc                                                                                                                                                                       | ction is called an <b>i</b> | nstruction cycle.     |  |  |  |  |  |
| An instruction cycle is gen<br>decoding, operand fetch, e                                |                                                                                                                                                                                           |                             |                       |  |  |  |  |  |
| The simplest instruction p                                                               | ipeline can be cor                                                                                                                                                                        | istructed with tw           | o stages:             |  |  |  |  |  |
| 1) Fetch and decode inst                                                                 | truction 2) Fet                                                                                                                                                                           | ch operands and e           | execute instruction   |  |  |  |  |  |
| instruction, this time can                                                               | When the main memory is not being accessed during the execution of an instruction, this time can be used to fetch the next instruction in parallel with the execution of the current one. |                             |                       |  |  |  |  |  |
| Example:                                                                                 |                                                                                                                                                                                           |                             |                       |  |  |  |  |  |
| Cycle: 1                                                                                 | 2                                                                                                                                                                                         | 3                           | 4                     |  |  |  |  |  |
| Instr. 1 Fetch, decode                                                                   | Operand, exec.                                                                                                                                                                            |                             | 1                     |  |  |  |  |  |
| Instr. 2                                                                                 | Fetch, decode                                                                                                                                                                             | Operand, exec.              |                       |  |  |  |  |  |
| Instr. 3                                                                                 | Instr. 3 Fetch, decode Operand, exec.                                                                                                                                                     |                             |                       |  |  |  |  |  |
| The potential overlap among instructions is called <b>instruction-level</b> parallelism. |                                                                                                                                                                                           |                             |                       |  |  |  |  |  |
| Remember: To gain more s<br>durations.                                                   | speedup, the pipel                                                                                                                                                                        | ine must have mo            | re stages with short  |  |  |  |  |  |
| http://akademi.itu.edu.tr/en/buzluca/<br>http:// www.buzluca.info                        |                                                                                                                                                                                           | 000 2013 - 2                | 021 Feza BUZLUCA 2.13 |  |  |  |  |  |

| Computer Architecture                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                         |
|---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
| <ul> <li>Instruction Pipeline (cont'd)</li> <li>The instruction cycle can be decomposed into 6 operations to gain more speedup:</li> <li>1. Fetch instruction (FI): Read the next expected instruction into a buffer.</li> <li>2. Decode instruction (DI): Determine the opcode and the operand specifiers.</li> <li>3. Calculate addresses of operands (CO): Calculate the effective address.</li> <li>4. Fetch operands (FO): Fetch each operand from memory.</li> <li>5. Execute instruction (EI): Perform the indicated operation.</li> <li>6. Write operand (WO): Store the result in memory.</li> </ul> |
| <ul> <li>Such fine-grained decomposition may not significantly increase the performance because of the following problems :</li> <li>The various stages will be of different durations (unbalanced).</li> <li>Some instructions do not need all stages.</li> <li>Different segments may need the same resources (e.g., memory) at the same time.</li> </ul>                                                                                                                                                                                                                                                   |
| Therefore, some operations can be combined into the same stage so that a pipeline with fewer (for example 4 or 5), balanced stages is constructed. For example, the 80486 had 5 stages.                                                                                                                                                                                                                                                                                                                                                                                                                       |
| There are also processors that include instruction pipelines with more stages.<br>For example, Pentium 4 family processors have a pipeline with 20 stages. In these<br>processors, internal operations are decomposed into microoperations.                                                                                                                                                                                                                                                                                                                                                                   |
| http://akademi.itu.edu.tr/en/buzluca/                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                         |

# 2.4.1 An (exemplary) instruction pipeline (with 4 stages) 1. FI (Fetch Instruction): Read the next instruction the PC points to into a buffer. 2. DA (Decode, Address): Decode instruction, calculate operand addresses 3. FO (Fetch Operand): Read operands (memory/register) 4. EX (Execution): Perform the operation and update the registers (including the PC in branch/jump instructions) • In order to perform instruction and operand fetch operations at the same time, we assume that the processor has separate instruction and data memories. • Memory-write operations are ignored in these examples. • This an exemplary pipelined CPU. More realistic examples are given in section "2.4.2 An Exemplary RISC Processor with Pipelining". http://akademi.itu.edu.tr/en/buzluca/ 2013 - 2021 Feza BUZLUCA 2.15 http:// www.buzluca.info

| Computer Architecture                                                                                                                                                                                                                                                             |          |       |        |        |       |          |        |                                                 |
|-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|----------|-------|--------|--------|-------|----------|--------|-------------------------------------------------|
| 2.4.:                                                                                                                                                                                                                                                                             | l An     | (exer | nplar  | y) ins | truct | ion p    | ipelin | e (cont'd)                                      |
| A) Ideal Case:                                                                                                                                                                                                                                                                    | No bi    | ranch | es, no | o oper | and c | lepen    | denci  | es in the program                               |
| Timing diagram                                                                                                                                                                                                                                                                    | for t    | he e  | kempl  | lary i | nstru | ction    | pipel  | ine (ideal case):                               |
| Clock cycles<br>Instructions (Tasks)                                                                                                                                                                                                                                              | 1        | 2     | 3      | 4      | 5     | 6        | 7      | The first instruction has been completed.       |
| 1                                                                                                                                                                                                                                                                                 | FI       | DA    | FO     | EX₄    |       |          |        | 4 cycles<br>The pipeline is full.               |
| 2                                                                                                                                                                                                                                                                                 |          | FI    | DA     | FO     | EX    |          |        |                                                 |
| 3                                                                                                                                                                                                                                                                                 |          |       | FI     | DA     | FO    | EX       |        | After just one cycle,<br>the second instruction |
| 4                                                                                                                                                                                                                                                                                 |          |       |        | FI     | DA    | FO       | EX     | has been completed.                             |
| The first instruction was completed in 4 cycles (k=4).<br>After the 4 <sup>th</sup> cycle, a new instruction is completed in each cycle.<br>If the number of instructions approaches infinity, the completion time of an<br>instruction approaches 1 cycle (slide 2.9 "Speedup"). |          |       |        |        |       |          |        |                                                 |
| http://akademi.itu.edu.tr/en<br>http:// www.buzluca.info                                                                                                                                                                                                                          | /buzluca | /     |        |        |       | $\Theta$ |        | 2013 - 2021 Feza BUZLUCA 2.16                   |

| Computer Architecture                                                         |        |        |       |          |        |                                                       |
|-------------------------------------------------------------------------------|--------|--------|-------|----------|--------|-------------------------------------------------------|
| 2.4.1 An (exemp<br>B) Pipeline Hazards (Conflic<br>B.1 Data Conflict (Operand | ts)    |        |       | n pipe   | eline  | (cont'd)                                              |
| The operand of an instruction<br>Example :                                    | on dep | pends  | on th | ne res   | ult o  |                                                       |
| Clock cycles<br>Instructions                                                  | 1      | 2      | 3     | 4        | 5      | R2 is updated.                                        |
| ADD R1, <b>R2</b> ( <b>R2</b> ← R1+R2)                                        | FI     | DA     | FO    | EX       |        | Operand                                               |
| SUB <b>R2</b> , R3 (R3 ← <b>R2</b> -R3)                                       |        | FI     | DA    | F0,      | EX     | dependency                                            |
|                                                                               |        |        |       |          |        | Previous value (not valid)<br>of R2 is being fetched. |
| To prevent the program from applied.                                          | n run  | ning i | ncorr | ectly    | , a so | lution mechanism must be                              |
| For example: The pipeline co<br>instructions can be inserted                  |        | stopp  | ed (s | tall), ( | or N(  | DOP (No Operation)                                    |
| We will discuss possible solu<br>and Solutions".                              | tions  | in th  | e sec | tion "   | 2.5 P  | ipeline Hazards (Conflicts)                           |
| http://akademi.itu.edu.tr/en/buzluca/<br>http:// www.buzluca.info             |        |        | 6     |          |        | 2013 - 2021 Feza BUZLUCA 2.17                         |

| Computer Architecture                                                                                                                                                                                                                                            |                                                                                           |  |  |  |  |
|------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|-------------------------------------------------------------------------------------------|--|--|--|--|
|                                                                                                                                                                                                                                                                  | ary) instruction pipeline (cont'd)                                                        |  |  |  |  |
| <b>B.2 Control Hazards (Branches, Interrupts):</b><br>Since a pipeline processes instructions in parallel, during the processing of a branch instruction, the next instruction in the memory that should be actually skipped also enters the pipeline.           |                                                                                           |  |  |  |  |
| Here, a solution mechanism is necessary; otherwise, the instruction(s) that should be skipped according to the program will also be executed.                                                                                                                    |                                                                                           |  |  |  |  |
| Example:                                                                                                                                                                                                                                                         |                                                                                           |  |  |  |  |
| - 2                                                                                                                                                                                                                                                              | Jnconditional branch (or jump) instruction (BRA / JUMP)                                   |  |  |  |  |
| 2. JUMP Target<br>3. Instruction_3<                                                                                                                                                                                                                              | Next instruction in the memory<br>According to the program, it <b>should be skipped</b> . |  |  |  |  |
| 4. Target Instruction_4 <                                                                                                                                                                                                                                        | Target of the branch (target instruction)                                                 |  |  |  |  |
| During the processing of the unconditional branch instruction JUMP, Instruction_3<br>is also fetched into the pipeline.<br>To prevent the program from running incorrectly, the pipeline must be stopped<br>(stall) or emptied before Instruction_3 is executed. |                                                                                           |  |  |  |  |
| http://akademi.itu.edu.tr/en/buzluca/<br>http:// www.buzluca.info                                                                                                                                                                                                | 2013 - 2021 Feza BUZLUCA 2.18                                                             |  |  |  |  |

| Computer Architectu                                                                                                                                           | re             |                |    | L            | icense                             | : <u>https:</u> | //creativ | ecommons.org/licenses/by-nc-nd/4.0/                                      |
|---------------------------------------------------------------------------------------------------------------------------------------------------------------|----------------|----------------|----|--------------|------------------------------------|-----------------|-----------|--------------------------------------------------------------------------|
| a. Unconditiona                                                                                                                                               | ıl Bra         | nch            |    | Step         | s                                  |                 |           | After decoding, the<br>type of the instruction<br>is determined: branch! |
| Clock cycles<br>Instructions                                                                                                                                  | 1              | 2              | 3  | 4            | 5                                  | 6               | 7         | ii                                                                       |
| Instruction 1                                                                                                                                                 | FI             | DA             | FO | EX           |                                    |                 |           | The branch address is<br>fetched (absolute or                            |
| Instruction 2<br>JUMP                                                                                                                                         |                | FI             | DA | FO           | EX.                                |                 |           | relative).                                                               |
| Instruction 3                                                                                                                                                 |                |                | K  | I K          | -                                  |                 |           | <pre> Updating the PC (program counter)</pre>                            |
| Target Instr. 4                                                                                                                                               |                |                |    |              |                                    | FI₄             | DA        | PC = Target<br>(Target of branch)                                        |
| Hazard: This<br>fetched unnec<br>It must not be<br>It will (must)                                                                                             | essar<br>e exe | rily.<br>cuted | •  | It i<br>to s | nch p<br>s nec<br>stall o<br>pipel | essar<br>r emj  | y         | The new instruction<br>after branch operation<br>(Target of branch)      |
| After decoding (identification) of the unconditional branch instruction, one possible solution is to stop the "Fetch Instruction" stage (FI) of the pipeline. |                |                |    |              |                                    |                 |           |                                                                          |
|                                                                                                                                                               |                |                |    |              |                                    |                 |           | arget address is written to<br>I to fetch new instructions.              |
| http://akademi.itu.edu.tr/<br>http:// www.buzluca.info                                                                                                        | en/buzlu       | ca/            |    |              |                                    | 6               |           | 2013 - 2021 Feza BUZLUCA 2.19                                            |

| Computer Architectur                                                                                                                                                                                                                                                                                                                     | e        |      |     |     |     |         |                                                       |
|------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|----------|------|-----|-----|-----|---------|-------------------------------------------------------|
| <ul> <li>b. Conditional Branch:</li> <li>For a conditional branch instruction, there are two cases:</li> <li>1. condition is false (branch is not taken), 2. condition is true (branch is taken)</li> </ul>                                                                                                                              |          |      |     |     |     |         |                                                       |
| <b>b1.</b> Conditional Branch (if the condition is false):<br>If the condition is not true, it is not necessary to stop or empty the pipeline<br>because the execution will continue with the next instruction.                                                                                                                          |          |      |     |     |     |         |                                                       |
| Clock cycles<br>Instructions                                                                                                                                                                                                                                                                                                             | 1        | 2    | 3   | 4   | 5   | 6       | The previous instruction sets the conditions (flags). |
| Instruction 1                                                                                                                                                                                                                                                                                                                            | FI       | DA   | FO  | EX+ |     |         | PC is not changed. No branching.                      |
| Conditional bra. 2                                                                                                                                                                                                                                                                                                                       |          | FI   | DA  | FO  | EX◀ |         | ······································                |
| Instruction 3                                                                                                                                                                                                                                                                                                                            |          |      | ,FI | DĄ  | FO  | EX∢     | The instruction following the branch is executed.     |
| Without considering the condition,<br>next instruction is fetched.No need to empty<br>No branch penalty                                                                                                                                                                                                                                  |          |      |     |     |     |         |                                                       |
| <ul> <li>Here, the problem is that the previous instruction must be executed to determine if the condition is true or not (depends on the flags of the CPU).</li> <li>If condition is false (branch is not taken), there is no branch penalty.</li> <li>If condition is true, a solution mechanism is necessary (next slide).</li> </ul> |          |      |     |     |     |         |                                                       |
| http://akademi.itu.edu.tr/e<br>http:// www.buzluca.info                                                                                                                                                                                                                                                                                  | en/buzlu | ica/ |     |     |     | $\odot$ | 2013 - 2021 Feza BUZLUCA 2.20                         |



| Computer Architect                                                                                                        | ture                                                              |                                                                                                     |
|---------------------------------------------------------------------------------------------------------------------------|-------------------------------------------------------------------|-----------------------------------------------------------------------------------------------------|
| <ul> <li>Instruction</li> <li>This simplimetry</li> <li>Most instruction</li> <li>memory-to</li> <li>There are</li> </ul> | ns are fixed-length<br>fies fetch and deco<br>uctions are registe |                                                                                                     |
| • ADD<br>ADD                                                                                                              | Rs1,Rs2,Rd<br>R3, R4, R12                                         |                                                                                                     |
| • ADD<br>ADD                                                                                                              | Rs,S2,Rd<br>R1, #\$1A, R2                                         | $Rd \leftarrow Rs + S2$ (S2: immediate data)<br>$R2 \leftarrow R1 + \$1A$                           |
| • LDL<br>LDL                                                                                                              | S2(Rs),Rd<br>\$500(R4), R5                                        | $Rd \leftarrow M[Rs + S2]$ Load long (32 bits)<br>$R5 \leftarrow M[R4 + $500]$                      |
| • STL<br>STL                                                                                                              | S2(Rs), Rm<br>\$504(R6), R7                                       | $M[Rs + S2] \leftarrow Rm$ Store long (32 bits)<br>$M[R6 + $504] \leftarrow R7$                     |
| • BRU<br>BRU                                                                                                              | Y<br>\$0A                                                         | $PC \leftarrow PC + Y$ Unconditional branch $PC \leftarrow PC + $0A$ Branch relative (Y: Offset)    |
| <ul> <li>Bcc<br/>BGT</li> </ul>                                                                                           | Y<br>\$0A                                                         | If (cc) then $PC \leftarrow PC + Y$ Conditional branch<br>If greater, then $PC \leftarrow PC + $0A$ |
| http://akademi.itu.edu.<br>http:// www.buzluca.inf                                                                        |                                                                   | 2013 - 2021 Feza BUZLUCA 2.22                                                                       |







| Computer Architecture                                                                                                                                                                                                                                                                                                                                                                                                                          |
|------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
| Pipelined RISC Alternatives                                                                                                                                                                                                                                                                                                                                                                                                                    |
| <ul> <li>There are different ways of designing pipelined RISC processors.</li> <li>For example;</li> <li>ARM7 has 3 stages <ul> <li>IF: Instruction fetch;</li> <li>DR: Decode and read registers;</li> <li>EX: ALU Operation; access memory (if necessary), write the result to the registers</li> </ul> </li> <li>MIPS R3000 has 5 stages</li> <li>MIPS R4000 has 8 stages (superpipelined)</li> <li>ARM Cortex-A8 has 13 stages.</li> </ul> |
| http://akademi.itu.edu.tr/en/buzluca/                                                                                                                                                                                                                                                                                                                                                                                                          |

| An Exemplary 5-Stage RISC Pipeline                                                                          |
|-------------------------------------------------------------------------------------------------------------|
| In this course, to explain the concepts, we will use an exemplary five-stage RISC load-store architecture : |
| 1. Instruction fetch (IF):                                                                                  |
| Get instruction from memory, increment PC (depending on the instruction length).                            |
| If instruction length is 4 bytes, PC $\leftarrow$ PC + 4.                                                   |
| 2. Instruction Decode, Read registers (DR)                                                                  |
| Translate opcode into control signals and read registers (operands).                                        |
| 3. Execute (EX)                                                                                             |
| Perform ALU operation, compute jump/branch targets.                                                         |
| 4. Memory (ME)                                                                                              |
| Access memory if needed (only load/store instructions).                                                     |
| 5. Write back (WB)                                                                                          |
| Update register file (write results).                                                                       |
|                                                                                                             |
| http://akademi.itu.edu.tr/en/buzluca/                                                                       |







| Computer Architecture                                                                                                                                                                 | License: https://creativecommons.org/licenses/by-nc-nd/4.0/                                                                  |
|---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|------------------------------------------------------------------------------------------------------------------------------|
| Stage                                                                                                                                                                                 | 3: Execute (EX)                                                                                                              |
| <ul> <li>Read the control bits and<br/>register (DR/EX).</li> </ul>                                                                                                                   | l data (offset/immediate, RA, RB) from the pipeline                                                                          |
|                                                                                                                                                                                       | on.<br>memory addresses for LOAD/STORE instructions.<br>(R4), R5 $R5 \leftarrow M[R4 + $500]$                                |
| • Compute target addresse                                                                                                                                                             | 00 is added to the contents of R4 in the ALU.<br>is for the branch instructions<br>If greater, then $PC \leftarrow PC + $0A$ |
| •                                                                                                                                                                                     | essor, an additional adder is used for target address                                                                        |
| <ul> <li>Decide if the jump/branc<br/>ALU are used)</li> </ul>                                                                                                                        | h should be taken (control bits and flags from the                                                                           |
| <ul> <li>Write the following data         <ul> <li>Control bits</li> <li>Result of the ALU (D) o</li> <li>RB for memory store o</li> <li>Branch target address</li> </ul> </li> </ul> |                                                                                                                              |
| http://akademi.itu.edu.tr/en/buzluca/<br>http:// www.buzluca.info                                                                                                                     | 2013 - 2021 Feza BUZLUCA 2.31                                                                                                |







| Computer Architecture                                       |                                                           |       |       |        |        |       |                |        |                                           |
|-------------------------------------------------------------|-----------------------------------------------------------|-------|-------|--------|--------|-------|----------------|--------|-------------------------------------------|
| Timing diagram                                              | for                                                       | the   | exen  | nplar  | y RI   | SC p  | ipelir         | ne (ic | deal case):                               |
| Ideal Case: No b                                            | oranc                                                     | hes,  | no co | onflic | ts     |       |                |        |                                           |
| Clock cycles<br>Instructions                                | 1                                                         | 2     | 3     | 4      | 5      | 6     | 7              | 8      | The first instruction has been completed. |
| 1                                                           | IF                                                        | DR    | ΕX    | ME     | WB.    | (     |                |        | 5 cycles<br>Pipeline is full.             |
| 2                                                           |                                                           | IF    | DR    | EX     | ME     | WB    |                |        | Just after one cycle, the                 |
| 3                                                           |                                                           |       | IF    | DR     | EX     | ME    | WB             |        | second instruction has                    |
| 4                                                           |                                                           |       |       | IF     | DR     | ΕX    | ME             | WB     | been completed.                           |
| The first instruc                                           | tion v                                                    | vas c | ompl  | eted   | in 5   | cycle | <b>s (</b> k : | = 5).  |                                           |
| After the 5 <sup>th</sup> cyc                               | le, a                                                     | new   | instr | uctio  | n is c | ompl  | eted           | in ea  | ach cycle.                                |
|                                                             |                                                           |       |       |        |        |       |                |        | completion time of an                     |
| instruction appro                                           | instruction approaches 1 cycle (see slide 2.9 "Speedup"). |       |       |        |        |       |                |        |                                           |
| IF and ME stages try to access the memory at the same time. |                                                           |       |       |        |        |       |                |        |                                           |
| To solve the reso<br>data are used (Ho                      |                                                           |       |       |        |        | eparc | ite rr         | iemoi  | ries for instruction and                  |
| http://akademi.itu.edu.tr/en<br>http:// www.buzluca.info    | /buzluca                                                  | a/    |       |        |        | 6     |                |        | 2013 - 2021 Feza BUZLUCA 2.35             |

| Computer Architecture                                                                                                                                                                                                       |  |  |  |  |  |  |
|-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|--|--|--|--|--|--|
| 2.5 Pipeline Hazards (Conflicts) and Solutions                                                                                                                                                                              |  |  |  |  |  |  |
| There are three types of hazards                                                                                                                                                                                            |  |  |  |  |  |  |
| 1. Resource Conflict (Structural hazard):                                                                                                                                                                                   |  |  |  |  |  |  |
| A resource hazard occurs when two (or more) instructions that are already in the pipeline need the same resource (memory, functional unit).                                                                                 |  |  |  |  |  |  |
| 2. Data Conflict (Hazard)                                                                                                                                                                                                   |  |  |  |  |  |  |
| Data hazards occur when data is used before it is ready.                                                                                                                                                                    |  |  |  |  |  |  |
| 3. Control Hazards (Branch, Jump, Interrupt):                                                                                                                                                                               |  |  |  |  |  |  |
| During the processing of a branch instruction, the next instruction in the memory that should actually be skipped also enters the pipeline.                                                                                 |  |  |  |  |  |  |
| Which <b>target instruction</b> should be fetched into the pipeline is unknown, unless the CPU executes the branch instruction (updating the PC).                                                                           |  |  |  |  |  |  |
| <b>Conditional branch</b> problem: Until the actual execution of the instruction that alters the flag values, the flag values are unknown (greater?, equal?), so it is impossible to determine if the branch will be taken. |  |  |  |  |  |  |
| Stalling solves all these conflicts but it reduces the performance of the system.                                                                                                                                           |  |  |  |  |  |  |
| There are more efficient solutions.                                                                                                                                                                                         |  |  |  |  |  |  |
| http://akademi.itu.edu.tr/en/buzluca/                                                                                                                                                                                       |  |  |  |  |  |  |

|                                                                                    | License: https://creativecommons.org/licenses/by-nc-nd/4.0/                                                                                                   |
|------------------------------------------------------------------------------------|---------------------------------------------------------------------------------------------------------------------------------------------------------------|
| Computer Architecture                                                              |                                                                                                                                                               |
| 2.5.1. Resource Conflict (S                                                        | itructural hazard):                                                                                                                                           |
| A resource hazard occurs when<br>pipeline need the same resource                   | n two (or more) instructions that are already in the ce (memory, functional unit).                                                                            |
| <ul> <li>a) Memory conflict: "An operative performed in parallel with a</li> </ul> | nd read to or write from memory cannot be<br>an instruction fetch."                                                                                           |
| Solutions:                                                                         |                                                                                                                                                               |
| • Instructions must be executive the pipeline (stall). (Perform                    | ted serially rather than in parallel for a portion of<br>nance drops.)                                                                                        |
| • Harvard architecture: Sepa                                                       | rate memories for instructions and data.                                                                                                                      |
| an instruction when main me                                                        | memory: There are times during the execution of<br>mory is not being accessed. This time could be used<br>ction and write it to a queue (instruction buffer). |
| b) Functional unit (ALU, FPU) a<br>Solutions:                                      | conflict.                                                                                                                                                     |
| <ul> <li>Increasing available function</li> </ul>                                  | nal units and using multiple ALUs.                                                                                                                            |
| 2                                                                                  | an be used address calculation and data operations.                                                                                                           |
| •                                                                                  | unit (for example, a floating point unit FPU)                                                                                                                 |
|                                                                                    |                                                                                                                                                               |
| http://akademi.itu.edu.tr/en/buzluca/<br>http:// www.buzluca.info                  | 2013 - 2021 Feza BUZLUCA 2.37                                                                                                                                 |
| http:// www.bizinca.into                                                           | BY NO NO                                                                                                                                                      |

| Computer Architecture                                                                                                                                  |                                        |  |  |  |  |  |  |  |  |  |  |  |  |
|--------------------------------------------------------------------------------------------------------------------------------------------------------|----------------------------------------|--|--|--|--|--|--|--|--|--|--|--|--|
| 2.5.2. Data Conflict (Hazard):                                                                                                                         | rad hafana it is naadu                 |  |  |  |  |  |  |  |  |  |  |  |  |
| Data hazards occur when data is use                                                                                                                    | •                                      |  |  |  |  |  |  |  |  |  |  |  |  |
| because of the use of pipelining.                                                                                                                      | rogram may produce an incorrect result |  |  |  |  |  |  |  |  |  |  |  |  |
| Example:                                                                                                                                               |                                        |  |  |  |  |  |  |  |  |  |  |  |  |
| ADD R1, R2, <b>R3</b> R3 ← R1 + R2                                                                                                                     |                                        |  |  |  |  |  |  |  |  |  |  |  |  |
| SUB <b>R3</b> , R4, R5 $R5 \leftarrow R3 - R4$<br>written to the                                                                                       |                                        |  |  |  |  |  |  |  |  |  |  |  |  |
| Data dependency in the pipeline register file (R3).                                                                                                    |                                        |  |  |  |  |  |  |  |  |  |  |  |  |
| Clock cycles<br>Instructions 1 2                                                                                                                       | 2 3 4 5 6                              |  |  |  |  |  |  |  |  |  |  |  |  |
| ADD R1,R2,R3 IF D                                                                                                                                      | DR EX ME WB <sup>4</sup>               |  |  |  |  |  |  |  |  |  |  |  |  |
| SUB <b>R3</b> ,R4,R5 I                                                                                                                                 | IF DR EX ME WB                         |  |  |  |  |  |  |  |  |  |  |  |  |
|                                                                                                                                                        |                                        |  |  |  |  |  |  |  |  |  |  |  |  |
| SUB reads R3 before it has been updated.<br>R3 does not contain the result of the<br>previous ADD instruction; it has not been<br>processed in WB yet. |                                        |  |  |  |  |  |  |  |  |  |  |  |  |
| http://akademi.itu.edu.tr/en/buzluca/<br>http:// www.buzluca.info                                                                                      | 2013 - 2021 Feza BUZLUCA 2.38          |  |  |  |  |  |  |  |  |  |  |  |  |

2.5.2. Data Conflict (cont'd):

There are three types of data hazards:

• Read after write (RAW), or true dependency: An instruction modifies a register or memory location, and a succeeding instruction reads the data in that memory or register location.

A hazard occurs if the read takes place before the write operation is complete.

• Write after read (WAR), or antidependency: An instruction reads a register or memory location, and a succeeding instruction writes to the location.

A hazard occurs if the write operation completes before the read operation takes place.

• Write after write (WAW), or output dependency: Two instructions both write to the same location.

A hazard occurs if the write operations take place in the reverse order of the intended sequence.

http://akademi.itu.edu.tr/en/buzluca/ http:// www.buzluca.info

2013 - 2021 Feza BUZLUCA

2.39

| Computer                        | Architecture                           |       |         |       |         |                  |       |       |        |       |                                         |
|---------------------------------|----------------------------------------|-------|---------|-------|---------|------------------|-------|-------|--------|-------|-----------------------------------------|
|                                 | Solut                                  |       |         |       |         |                  | -     |       |        |       |                                         |
| A) Stall                        | ling, Hardware                         | inte  | erloc   | k (H  | ardw    | are-l            | oase  | d sol | utio   | n):   |                                         |
| register                        |                                        |       |         |       |         |                  |       |       |        |       | s) in the pipeline<br>the pipeline when |
| The inst<br>is solved           |                                        | lses  | the     | hazo  | ard is  | s dela           | ayed  | (not  | fet    | ched  | l) until the conflict                   |
| Example                         | Clock cycles<br>Instructions           | 1     | 2       | 3     | 4       | 5                | 6     | 7     | 8      | 9     | First write to R3,                      |
|                                 | ADD R1,R2, <b>R3</b>                   | IF    | DR      | ΕX    | ME      | WB⁴              |       |       |        |       | then read it.<br>Write and read         |
|                                 | SUB <b>R3</b> ,R4,R5                   |       | IF<br>7 | -     | -       | -                | DR≮   | ĒΧ    | ME     | WB    | in different clock cycles.              |
|                                 |                                        |       |         |       | Ŷ       |                  |       |       |        |       | L                                       |
|                                 | ta conflict is dete<br>DR.Rs1 == DR/EX |       |         |       |         | stalle<br>cles c |       |       |        |       |                                         |
| Stalling t                      | he pipeline:                           |       |         |       |         |                  |       |       |        |       |                                         |
| • IF/DR re                      | egister is disable                     | ed (I | no up   | odate | 2).     |                  |       |       |        |       |                                         |
| • Control                       | bits of the NOC                        | P (   | No O    | perat | tion) i | instru           | uctio | n ar  | e ins  | serte | d into the DR stage                     |
|                                 | is not updated.                        |       | ·       |       | -       |                  |       |       |        |       | -                                       |
| http://akadem<br>http:// www.bi | ni.itu.edu.tr/en/buzluca/              |       |         |       |         | 6                | ) 🛈 🕄 | Θ     | 2013 - | 2021  | Feza BUZLUCA 2.40                       |



### Solutions to Data Hazards (cont'd):

#### Fixing the register file access hazard:

The register file can be accessed in the same cycle for reading and writing. Data can be written in the first half of the cycle (rising edge) and read in the second half (falling edge).

This method reduces the waiting (stalling) time from 3 cycles to 2 cycles.





| Computer Architectur                           | e                                                                                                                                                                                                                                                                                                                                                                                     |             | License        | e: <u>https:</u> | //creati        | vecomr        | mons.org/licenses/by-nc-nd/4.0/                                                                  |  |  |  |  |  |
|------------------------------------------------|---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|-------------|----------------|------------------|-----------------|---------------|--------------------------------------------------------------------------------------------------|--|--|--|--|--|
| the same registe                               | nit detects that<br>er as the source<br>varded result (b                                                                                                                                                                                                                                                                                                                              | the<br>of t | desti<br>he cu | nation<br>rrent  | n of t<br>• ALU | he pr<br>oper | cont'd <b>):</b><br>evious ALU operation is<br>ation, the control logic<br>rather than the value |  |  |  |  |  |
| Example:<br><u>I</u>                           | Clock cycles                                                                                                                                                                                                                                                                                                                                                                          | 1           | 2              | 3                | 4               | 5             |                                                                                                  |  |  |  |  |  |
| ADD R1, R2, <b>R3</b> ;                        | R3←R1 + R2                                                                                                                                                                                                                                                                                                                                                                            | IF          | DR             | ΕX               | ME              | WB            |                                                                                                  |  |  |  |  |  |
| SUB <b>R3</b> , R4, R5; R5←R3 - R4 IF DR EX ME |                                                                                                                                                                                                                                                                                                                                                                                       |             |                |                  |                 |               |                                                                                                  |  |  |  |  |  |
|                                                |                                                                                                                                                                                                                                                                                                                                                                                       |             |                | -                |                 |               |                                                                                                  |  |  |  |  |  |
| fetch<br>This i                                | Previous value (not valid) of R3 is<br>fetched.<br>This invalid value will <b>not be used</b><br>in the EX cycle.<br>This invalid value will <b>not be used</b><br>in the EX cycle.<br>The control unit of the pipeline<br>selects the output of the previous<br>ALU operation ( <i>bypass</i> ) as the<br>input, not the value that has been<br>read in the DR stage (A_select = 0). |             |                |                  |                 |               |                                                                                                  |  |  |  |  |  |
| to stall the pip                               | If it is possible to solve the register conflict by forwarding, it is not necessary to stall the pipeline.<br>The performance does not drop.                                                                                                                                                                                                                                          |             |                |                  |                 |               |                                                                                                  |  |  |  |  |  |
| http://akademi.itu.edu.tr/e                    |                                                                                                                                                                                                                                                                                                                                                                                       | r •         |                | 6                |                 | <b>2</b> 01   | 3 - 2021 Feza BUZLUCA 2.43                                                                       |  |  |  |  |  |





| <b>Solution</b><br>Example: | with forwarding                                                            | <b>)</b> + : | 1 су  | cle s | talli | ng      |                |               |                                                                                                |
|-----------------------------|----------------------------------------------------------------------------|--------------|-------|-------|-------|---------|----------------|---------------|------------------------------------------------------------------------------------------------|
|                             | Sol                                                                        | utio         | n wit | th fo | orwa  | rding   | ) (+s          | talli         | ng)                                                                                            |
|                             | Clock cycles<br>Instructions                                               | 1            | 2     | 3     | 4     | 5       | 6              | 7             |                                                                                                |
| LDL                         | \$500(R4), R1                                                              | IF           | DR    | ΕX    | MĘ    | WB      |                |               |                                                                                                |
| ADD                         | R1, R2, R3                                                                 |              | IF    | -     | DR    | ĒΧ      | ME             | WB            |                                                                                                |
|                             |                                                                            | '            |       |       |       |         |                |               | ·                                                                                              |
| R<br>TI                     | he previous value<br>1 is fetched.<br>his invalid value v<br>the EX cycle. |              |       | -     |       | s:<br>t | elect<br>ne in | rs th<br>put, | ol unit of the pipeline<br>e forwarding path as<br>not the value that<br>read in the DR stage. |
|                             |                                                                            |              |       |       |       |         |                |               |                                                                                                |

Solutions to Data Hazards (cont'd):

C) Inserting NOOP (No Operation) instructions (Software-based):

The effect of this solution is similar to stalling the pipeline.

The  ${\bf compiler}$  inserts NOOP instructions between the instructions that cause the data hazard.

Example:

|      | Clock cycles<br>Instructions         | 1  | 2  | 3  | 4  | 5           | 6    | 7    | 8     |                                        |
|------|--------------------------------------|----|----|----|----|-------------|------|------|-------|----------------------------------------|
|      | ADD R1,R2, <b>R3</b>                 | IF | DR | ΕX | ME | <u>,</u> WB |      |      |       | First write to R3                      |
| Inse | rted by NOOP                         |    | IF | DR | ΕX | ME          | WB   |      |       | in the first half, then read it in the |
|      | compiler NOOP                        |    |    | IF | DR | EX          | ME   | ₩₿   |       | second half.                           |
|      | SUB <b>R3</b> ,R4,R5                 |    |    |    | IF | DR          | ÉX   | ME   | WB    |                                        |
|      | e NOOP is a ma<br>pipeline just like |    |    |    |    | iction      | of t | he p | roces | ssor, it is processed in               |

The performance drops because of the delay caused by the NOOP instructions.

| http://akademi.itu.edu.tr/en/buzluca/<br>http:// www.buzluca.info | CO CO CO 2013 - 2021 Feza BUZLUCA 2.47 |
|-------------------------------------------------------------------|----------------------------------------|
|                                                                   |                                        |

| Computer Archit                                                                                                                                                                                                                                                                 | ecture                                                                   |     |                      |        |              |         |                 |        |        |                  |  |
|---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|--------------------------------------------------------------------------|-----|----------------------|--------|--------------|---------|-----------------|--------|--------|------------------|--|
| D) Optimize                                                                                                                                                                                                                                                                     | Solutions<br>ed Solution (Soft                                           |     |                      |        | <b>ds</b> (c | ont'd   | ):              |        |        |                  |  |
|                                                                                                                                                                                                                                                                                 | <b>er</b> rearranges the<br>e instructions tha                           |     |                      |        |              |         | ain in          | stru   | ction  | s (if possible)  |  |
| This rearrai                                                                                                                                                                                                                                                                    | This rearrangement must not change the algorithm or cause new conflicts. |     |                      |        |              |         |                 |        |        |                  |  |
| Example:<br>STL \$00(R6), R1 $M[R6 + $00] \leftarrow R1$<br>STL \$04(R6), R2 $M[R6 + $04] \leftarrow R2$<br>ADD R1, R2, <b>R3</b> $R3 \leftarrow R1 + R2$<br>SUB <b>R3</b> , R4, R5 $R5 \leftarrow R3 - R4$<br>Write to<br>R3 in the<br>first half,<br>read it in<br>the second |                                                                          |     |                      |        |              |         |                 |        |        |                  |  |
|                                                                                                                                                                                                                                                                                 | Clock cycles<br>Instructions                                             | 1   | 2                    | 3      | 4            | 5       | 6               | 7      | .8     | the second half. |  |
|                                                                                                                                                                                                                                                                                 | ADD R1,R2, <b>R3</b>                                                     | IF  | DR                   | EX     | ME           | WB∠     |                 | 1      | 1      |                  |  |
| Moved by                                                                                                                                                                                                                                                                        | STL \$00(R6), R1                                                         |     | IF                   | DR     | ΕX           | ME      | WB <sub>.</sub> | ,í     |        |                  |  |
|                                                                                                                                                                                                                                                                                 | STL \$04(R6), R2                                                         |     |                      | IF     | DR           | ΕX      | MÉ              | WB     |        |                  |  |
|                                                                                                                                                                                                                                                                                 | SUB <b>R3</b> ,R4,R5                                                     |     |                      |        | IF           | DR      | EX              | ME     | WB     |                  |  |
| The perform                                                                                                                                                                                                                                                                     | ance is improved.                                                        |     |                      |        |              |         |                 |        |        |                  |  |
| There is no c                                                                                                                                                                                                                                                                   | lelay caused by N                                                        | OOF | <sup>&gt;</sup> inst | ructio | ons (o       | r stall | ling).          |        |        |                  |  |
| http://akademi.itu.eo<br>http:// www.buzluca                                                                                                                                                                                                                                    |                                                                          |     |                      |        |              |         | 2013 - 2        | 2021 F | eza BU | ZLUCA 2.48       |  |

| Computer Architecture                                                     | License: https://creativecommons.org/licenses/by-nc-nd/4.0/                                                  |
|---------------------------------------------------------------------------|--------------------------------------------------------------------------------------------------------------|
| 2.5.3. Control Hazards                                                    | (Branches, Interrupts):                                                                                      |
| In the exemplary RISC pro<br>branch/jump instructions:                    | ocessor, the following operations are performed for the                                                      |
| • The target address for<br>Execution (EX) stage (sl                      | the branch (jump) instruction is calculated in the ide 2.32).                                                |
| <ul> <li>The target address is w</li> </ul>                               | ritten to the EX/ME pipeline register.                                                                       |
|                                                                           | nade in the Memory (ME) stage based on the values of<br>ed after the execution in the EX stage (slide 2.32). |
| <ul> <li>After the EX stage, the<br/>address are sent to Stage</li> </ul> | result of the decision (PC_Select) and the target<br>ge 1 (IF).                                              |
| • In the IF stage, the nex<br>PC is updated (slide 2.29                   | (t instruction the PC points to is fetched first, then the<br>9).                                            |
| During these operations, th<br>branch) are fetched into tl                | ne next instructions in sequence (not the target of<br>he pipeline.                                          |
| However, if the branch is t                                               | aken, these instructions should be skipped.                                                                  |
| In this case, either a hard<br>solutions (delayed branch)                 | ware unit must empty the pipeline, or compiler-based<br>must be applied.                                     |
|                                                                           | ons must be stopped before they are processed in the isters of the CPU are changed in that stage.            |
| http://akademi.itu.edu.tr/en/buzluca/                                     | 2013 - 2021 Feza BUZLUCA 2.49                                                                                |

| Computer Architecture                                              |                                             |                 |             |                             |     |  |  |  |  |  |  |  |
|--------------------------------------------------------------------|---------------------------------------------|-----------------|-------------|-----------------------------|-----|--|--|--|--|--|--|--|
| Conditional Branch Hazards                                         | ;:                                          |                 |             |                             |     |  |  |  |  |  |  |  |
| Example:                                                           |                                             |                 |             |                             |     |  |  |  |  |  |  |  |
| 100 SUB R1, R2, R1<br><b>104 BGT \$1C</b><br>108 ADD R1, R1, R2    | $R1 \leftarrow R1 - R2$<br>Branch if greate | r (\$108 + \$1C | C = \$124 7 | Target addres               | ss) |  |  |  |  |  |  |  |
| 10C ADD R3, R4, R2<br>110 STL \$00(R5), R2<br>114 LDL \$0A(R6), R1 |                                             |                 |             |                             |     |  |  |  |  |  |  |  |
| 124 STL \$00(R6), R2 Target of BGT                                 |                                             |                 |             |                             |     |  |  |  |  |  |  |  |
| <b>Remember:</b> Bcc conditional by the last ALU operation.        | branch instructio                           | ons check the   | e flag valu | ies generate                | ed  |  |  |  |  |  |  |  |
| For example, the BGT instru<br>(Negative), V (Overflow), an        |                                             | nparison) che   | cks the f   | lags N                      |     |  |  |  |  |  |  |  |
| After the operation                                                | Overflow (V)                                | Sign (N)        | Zero (Z)    | Comparison                  |     |  |  |  |  |  |  |  |
|                                                                    | X (not important)                           | Positive (0)    | YES (1)     | A=B                         |     |  |  |  |  |  |  |  |
| $\mathbf{R} = \mathbf{A} - \mathbf{B}$                             | NO (0)                                      | Positive (0)    | NO (0)      | A>B                         |     |  |  |  |  |  |  |  |
| the table on the right is                                          | NO (0)                                      | Negative (1)    | NO (0)      | A <b< td=""><td>1</td></b<> | 1   |  |  |  |  |  |  |  |
| used to compare the                                                | YES (1)                                     | Positive (0)    | NO (0)      | A <b< td=""><td>1</td></b<> | 1   |  |  |  |  |  |  |  |
| signed integers.                                                   | YES (1)                                     | Negative (1)    | NO (0)      | A>B                         |     |  |  |  |  |  |  |  |
| http://akademi.itu.edu.tr/en/buzluca/<br>http:// www.buzluca.info  |                                             |                 | 3-2021 Feza | BUZLUCA 2.                  | .50 |  |  |  |  |  |  |  |

| The target addr<br>has been calculc<br>the EX/ME regi | ited in EX |           |            |    | is n   | nade                    | nch de<br>(Afte<br>le bra                                   | r EX     | ).          |                       | fror | n the      | e EX | ss is<br>/ME       |
|-------------------------------------------------------|------------|-----------|------------|----|--------|-------------------------|-------------------------------------------------------------|----------|-------------|-----------------------|------|------------|------|--------------------|
|                                                       | Instru     |           | <u></u>    | -  |        | <b></b>                 | ,<br>,                                                      |          | i.          |                       |      | 10 II<br>1 |      | 1 <b>90</b> .<br>1 |
|                                                       | SUB        | R1, R2,   | ΠÌ         | ĬF | DR     |                         | ME                                                          | .1       |             |                       |      |            |      | -                  |
| ·                                                     | BGT        | •         | <b>D</b> 0 |    | IF     | DR                      | EX⇒                                                         | <u>'</u> |             | <u></u>               |      | <b></b>    | Ļ    | -                  |
| These                                                 | ADD        | R1, R1,   | R2         |    |        | ÍF                      | DR                                                          |          | ME          |                       |      |            | ľ    |                    |
| instructions<br>should be                             | ADD        | R3, R4,   | R2         |    |        |                         | IF                                                          | DR       | ΈX          | ME                    | ₩B   | -          | ŀ    |                    |
| skipped.                                              | STL        | \$00(R5)  | , R2       |    |        |                         |                                                             | ĬĘ∍      | ÐR          | EX                    | ME   | WB         | ;    | 1                  |
| Tạrge                                                 | t: STL s   | \$00(R6), | R2         |    |        |                         |                                                             |          | ĮF          |                       |      |            | WВ   | 1                  |
|                                                       |            |           |            |    |        | <u> </u>                | ΎΥ<br>( <sup>1</sup> , ···································· |          | <del></del> |                       |      | •          |      | -                  |
| ompiler-based solution must be                        |            |           |            |    | l of † | dateo<br>he IF<br>Targe |                                                             |          |             | tar <u>a</u><br>3GT i |      |            |      | on                 |

| Conditional Bra<br>Example (cont'                     | d): If t   | he br   | anch is                                   | NO             | T ta  | ken   |                                         |          |          | he t<br>ent f                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                  |                               |                |       |  |  |
|-------------------------------------------------------|------------|---------|-------------------------------------------|----------------|-------|-------|-----------------------------------------|----------|----------|--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|-------------------------------|----------------|-------|--|--|
| The target addr<br>has been calculc<br>the EX/ME regi | ated in EX |         |                                           | 0              | is ma |       | h dec<br>Ifter<br>ch"                   |          | T<br>u   | egist<br>his c<br>sed,                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                         | addre<br>becc                 | ess i:<br>ause | s not |  |  |
|                                                       | Instru     | ictions |                                           | ί.             |       |       | , , , , , , , , , , , , , , , , , , , , |          | _; _; IS | 3 NO                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                           | 1 10                          | ken.           |       |  |  |
|                                                       | SUB        | R1, R   | 2, R1                                     | ÎF             | DR.   | EX    | ME                                      | WB       |          |                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                | 11                            |                |       |  |  |
|                                                       | BGT        | \$1C    |                                           |                | IF    | DR    | EX≥                                     | ME       | WB       | de la constante de la constant | 1                             |                |       |  |  |
|                                                       | ADD        |         |                                           | IF             | DR    | ΕX    | ME                                      | ŴВ       |          |                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                |                               |                |       |  |  |
|                                                       | ADD        | R3, R   | 4, R2                                     |                |       |       | IF                                      | DR       | ΈX       | ME                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                             | WB                            |                |       |  |  |
|                                                       | STL        | \$00(F  | 15), R2                                   |                |       |       |                                         | Ĭ₽       | DR       | EX                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                             | ME                            | WB             | 5     |  |  |
|                                                       | LDL        | \$0A(F  | 86), R1                                   |                |       |       |                                         |          | ĨŁ       | DR                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                             | ΕX                            | ME             | WB    |  |  |
|                                                       |            |         | end of $PC \leftarrow P$<br><b>Not th</b> | end of the lr. |       |       |                                         |          |          |                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                | Next instruction in sequence. |                |       |  |  |
| If the branc                                          | h ia not   | takan   | thong                                     | ic no          | bna   | nch r | anal                                    | !<br>+\/ |          |                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                |                               |                |       |  |  |



| Computer Ar                           | chitecture                                                         |       |                                                                        |        |      |                 |       |                 |        |       |       |           |
|---------------------------------------|--------------------------------------------------------------------|-------|------------------------------------------------------------------------|--------|------|-----------------|-------|-----------------|--------|-------|-------|-----------|
| Reducing 1                            | the branch penalty (                                               | cont' | d):                                                                    |        |      |                 |       |                 |        |       |       |           |
| Conditional                           | <b>branch</b> (cont'd) : If                                        | fbrar | nch i                                                                  | s tak  | ken  |                 |       |                 |        |       |       |           |
| Example:                              | The target address (S<br>been calculated.<br>The branch decision h |       |                                                                        |        | ,    |                 |       | he to<br>ent to |        |       |       |           |
|                                       | Instructions                                                       |       |                                                                        |        |      |                 |       |                 |        |       |       |           |
|                                       | SUB R1, R2,                                                        | R1    | IF                                                                     | DR     | ĘΧ   | ME              | WB    | 1               | ľ      |       |       |           |
|                                       | BGT \$1C                                                           |       |                                                                        | IF     | DR   | ЕX              | ME    | WΒ              |        |       |       |           |
|                                       | uctions ADD R1, R1,                                                | R2    |                                                                        |        | ÍIF  | DRi             | EX    | ME              | WB     |       |       |           |
| should be sl                          | (ipped. ADD R3, R4,                                                | R2    |                                                                        |        |      | IF₩             | ÐR    | EX              | ME     | ₩B    |       |           |
| `` <b>`</b>                           | Target: STL \$00(R6),                                              | R2    |                                                                        | 7      |      |                 | ,IF ⊲ | DR              | ΕX     | ME    | WВ    |           |
|                                       |                                                                    |       |                                                                        |        |      | Y <sub>F,</sub> | ,     |                 |        |       |       |           |
| (emptied by                           | e must be stalled<br>hardware), or a<br>sed solution must be       | t     | The PC is updated at the end of the IF. PC $\leftarrow$ \$124 (Target) |        |      |                 |       |                 |        |       |       |           |
| In the cas                            | e of a stall, the <b>branc</b>                                     | :h pe | nalty                                                                  | / is i | 2 cy | cles            | for   | this            | exe    | mplo  | ary p | oipeline. |
| http://akademi.it<br>http:// www.buzl | tu.edu.tr/en/buzluca/<br>uca.info                                  |       |                                                                        | 6      |      |                 | 2013  | - 202           | I Feza | a BUZ | LUCA  | 2.54      |



| Computer Architecture                                                                                                                                                                                                                     |      |      |       |          |     |      |         |               |                                         |                   |   |
|-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|------|------|-------|----------|-----|------|---------|---------------|-----------------------------------------|-------------------|---|
| Reducing the branch penalty (co<br>Unconditional branch (cont'd):                                                                                                                                                                         | ont' | d):  |       |          |     |      |         |               |                                         |                   |   |
| Example:                                                                                                                                                                                                                                  |      |      |       |          |     |      |         |               |                                         |                   |   |
| The target address (\$1<br>been calculated.                                                                                                                                                                                               | 08 + | \$1C | = \$1 | 24) h    | as  |      |         |               |                                         | ress is<br>stage. |   |
| Instructions                                                                                                                                                                                                                              |      | · .  |       |          |     |      |         |               | ,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,,, |                   |   |
| SUB R1, R2, R                                                                                                                                                                                                                             | 1    | IF   | DR    | ΕX       | ME  | WB   |         |               |                                         |                   |   |
| BRU \$1C                                                                                                                                                                                                                                  |      |      | IF    | DR       | EX  | MÉ   | WB      |               |                                         |                   |   |
| Should be skipped. ADD R1, R1, R                                                                                                                                                                                                          | 2    |      |       | IFv      | ÐR  | EX   | ME      | ₩B            | l)                                      |                   |   |
| Target: STL \$00(R6), R                                                                                                                                                                                                                   | 2    |      | >     |          | IF∢ | DR   | ΕX      | ME            | WB                                      |                   |   |
|                                                                                                                                                                                                                                           |      |      |       | /        |     |      |         | · · · · · · · |                                         |                   |   |
| The pipeline must be stalled<br>(emptied by hardware), or a<br>compiler-based solution must be<br>applied.<br>The PC is updated at<br>the end of the IF.<br>$PC \leftarrow $124 (Target)$<br>The target instruction<br>of BRU is fetched. |      |      |       |          |     |      |         |               |                                         |                   |   |
| For the unconditional branch inst<br>moving the address calculation of                                                                                                                                                                    |      |      |       |          |     |      | lty i   | s 1           | cycl                                    | l <b>e</b> after  |   |
| http://akademi.itu.edu.tr/en/buzluca/<br>http:// www.buzluca.info                                                                                                                                                                         |      |      | 6     | <u>.</u> | \$Ð | 2013 | 8 - 202 | 1 Feza        | a BUZI                                  | LUCA 2.5          | 6 |



| Computer Architecture                                                                                                                                                                                                     |                                                 |    |          |          |            |       |      |     |      |
|---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|-------------------------------------------------|----|----------|----------|------------|-------|------|-----|------|
| Solutions to Control (Branch)                                                                                                                                                                                             | Solutions to Control (Branch) Hazards (cont'd): |    |          |          |            |       |      |     |      |
| <b>B) Inserting NOOP (No Operation) instructions</b> (Software-based):<br>The <b>compiler</b> inserts NOOP instructions after the branch instruction.<br>The effect of this solution is similar to stalling the pipeline. |                                                 |    |          |          |            |       |      |     |      |
| Example: Unconditional branch, address calculation is in DR stage                                                                                                                                                         |                                                 |    |          |          |            |       |      |     |      |
| The target address has been calculated.<br>The target address is sent to the IF stage.                                                                                                                                    |                                                 |    |          |          |            |       |      |     |      |
| Instructions                                                                                                                                                                                                              | 1                                               |    |          |          |            |       | /    |     |      |
| SUB R1, R2, R1                                                                                                                                                                                                            | IF                                              | DR | ΕX       | ME       | WB         |       |      |     |      |
| BRU \$1C                                                                                                                                                                                                                  |                                                 | IF | DR       | EX       | ŴЕ         | WB    |      |     |      |
| Inserted by the compiler. NOOP                                                                                                                                                                                            |                                                 |    | IFy      | DR       | ΕX         | ME    | WB   |     |      |
| Target: STL \$00(R6), R2                                                                                                                                                                                                  |                                                 |    |          | IF 🕫     | <b>D</b> R | ΕX    | ME   | WB  |      |
| The PC is updated at the end of the IF. PC $\leftarrow$ Target                                                                                                                                                            |                                                 |    |          |          |            |       |      |     |      |
| http://akademi.itu.edu.tr/en/buzluca/<br>http:// www.buzluca.info                                                                                                                                                         | $\odot$                                         |    | 20<br>ND | )13 - 20 | 21 Fe      | za BU | ZLUC | A 2 | 2.58 |

Computer Architecture B) Inserting NOOP (No Operation) instructions (cont'd): The number of NOOP instructions that need to be inserted depend on the number of stall cycles required. Example: Conditional branch; address calculation and branch decisions are in EX (slide 2.53). In this case, 2 stall cycles are necessary. Therefore, 2 NOOPs are inserted. The target address (\$108 + \$1C = \$124) has The target address is been calculated. sent to the IF stage. The branch decision has been made (In EX). Instructions SUB R1, R2, R1 IF DR EX ME WB BGT \$1C IF DR EX ME WB NOOP DRI EX ME WB IF Inserted by the compiler. NOOP IF DR EX ME WB IF DR EX ME WB Target: STL \$00(R6), R2 The PC is updated at the end The target instruction of the IF.  $\mathsf{PC} \gets \mathsf{Target}$ of BGT is fetched. http://akademi.itu.edu.tr/en/buzluca/ 2013 - 2021 Feza BUZLUCA 2.59 http:// www.buzluca.info

| Computer Architecture                                                                                                                                                                                                                                                                                                                                                                                      |                                                                                            |                    |        |       |            |       |      |                |      |                |
|------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|--------------------------------------------------------------------------------------------|--------------------|--------|-------|------------|-------|------|----------------|------|----------------|
|                                                                                                                                                                                                                                                                                                                                                                                                            | Solutions to Control (Branch) Hazards (cont'd):<br>C) Optimized Solution (Software-based): |                    |        |       |            |       |      |                |      |                |
| <b>The compiler</b> rearranges the program and moves certain instructions (if possible) to immediately after the branch instruction.                                                                                                                                                                                                                                                                       |                                                                                            |                    |        |       |            |       |      |                |      |                |
| This rearrangement                                                                                                                                                                                                                                                                                                                                                                                         | must not ch                                                                                | lange the algo     | orith  | im or | cau        | se ne | w co | onfli          | cts. |                |
| Example: Uncondi                                                                                                                                                                                                                                                                                                                                                                                           | tional branc                                                                               | h, address co      | alculo | ation | is ir      | DR    | stag | e              |      |                |
| SUB R1, R2, F<br>BRU \$1C<br>ADD R3, R4, R2<br>STL \$00(R6), F                                                                                                                                                                                                                                                                                                                                             | <u>&gt;</u>                                                                                | The tai<br>has bee |        |       |            |       |      | get c<br>the ] |      | ess is<br>age. |
|                                                                                                                                                                                                                                                                                                                                                                                                            | BRU                                                                                        | \$1C               | IF     | Ď₽₄   | EX         | ME    | WВ   |                |      |                |
| Moved by the compi                                                                                                                                                                                                                                                                                                                                                                                         | ler. 📴 SUB                                                                                 | R1, R2, R1         |        | IFy   | DR         | ΕX    | ME   | WB             |      |                |
|                                                                                                                                                                                                                                                                                                                                                                                                            | Target: STL                                                                                | \$00(R6), R2       |        | j.    | $IF_{\pi}$ | DR    | ΕX   | ME             | WB   |                |
| This instruction is fetched<br>before the branch instruction<br>updates the PC.The PC is updated at<br>the end of the IF.<br>PC $\leftarrow$ TargetThe target instruction<br>of BRU is fetched.The program is not changed.PC $\leftarrow$ TargetThe target instruction<br>of BRU is fetched.If the optimized solution is possible, there is no branch penalty.The target instruction<br>of BRU is fetched. |                                                                                            |                    |        |       |            |       |      |                |      |                |
| •                                                                                                                                                                                                                                                                                                                                                                                                          | http://akademi.itu.edu.tr/en/buzluca/                                                      |                    |        |       |            |       |      |                |      |                |

License: https://creativecommons.org/licenses/by-nc-nd/4.0/

## C) Optimized Solution (cont'd):

The number of instructions to be moved depends on the number of necessary stall cycles.

This rearrangement must not change the algorithm or cause new conflicts.

**Example:** Conditional branch, address calculation and branch decisions are in EX. In this case 2 stall cycles are necessary. Therefore, 2 instructions must be moved after the branch instruction.

| These 2 instructions<br>can be moved to<br>immediately after the<br>branch instruction | 0F8 LDL<br>0FC ADD<br>100 SUB<br>104 BGT<br>108 ADD<br>10C ADD<br>110 STL<br>114 LDL<br><br>124 STL | \$00(R5), R7<br>R0, R7, R7<br>R1, R2, R1 •<br>\$1C<br>R1, R1, R2<br>R3, R4, R2<br>\$00(R5), R2<br>\$00(R6), R1<br>\$00(R6), R2 | This instruction cannot<br>be moved after BGT,<br>because it alters the<br>condition bits. |
|----------------------------------------------------------------------------------------|-----------------------------------------------------------------------------------------------------|--------------------------------------------------------------------------------------------------------------------------------|--------------------------------------------------------------------------------------------|
| http://akademi.itu.edu.tr/en/buzluca/<br>http:// www.buzluca.info                      |                                                                                                     |                                                                                                                                | 3 - 2021 Feza BUZLUCA 2.61                                                                 |

| Computer Architecture                                                                                                                                                   |
|-------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
| Important points about changing the order of the instructions:                                                                                                          |
| <ul> <li>An instruction from before the branch can be placed immediately after the<br/>branch.</li> </ul>                                                               |
| - The branch (condition or address ) must not depend on the moved instruction.                                                                                          |
| - This method (if possible) always improves the performance (compared to inserting NOOP).                                                                               |
| - Especially for conditional branches, this procedure must be applied carefully.                                                                                        |
| If the condition that is tested for the branch is altered by the immediately preceding instruction, then the complier cannot move this instruction to after the branch. |
| Other possibilities:                                                                                                                                                    |
| The compiler can select instructions to move                                                                                                                            |
| <ul> <li>From branch target</li> </ul>                                                                                                                                  |
| <ul> <li>Must be OK to execute moved instruction even if the branch is not taken</li> <li>Improves performance when branch is taken</li> </ul>                          |
| <ul> <li>From fall through (else)</li> </ul>                                                                                                                            |
| - Must be OK to execute moved instruction even if the branch is taken                                                                                                   |
| - Improves performance when branch is not taken                                                                                                                         |
| http://akademi.itu.edu.tr/en/buzluca/<br>http://www.buzluca.info 2013 - 2021 Feza BUZLUCA 2.62                                                                          |

#### Solutions to Control (Branch) Hazards (cont'd):

D) Branch Prediction:

Remember: The existence of branch/jump instructions in the program causes two main problems:

1. The CPU does not know the target instruction to fetch into the pipeline until it calculates the target address of the branch instruction.

 $\text{PC} \gets \text{PC} + \text{offset}$ 

Later stages of the pipeline (not IF stage) carry out this calculation. Options:

- a) If address calculation is in EX and result is sent from EX/ME register to IF stage (slide 2.32), branch penalty = 3 cycles.
- b) If address calculation is in EX and result is directly sent to IF stage (slide 2.53), branch penalty = 2 cycles.
- c) If address calculation is in DR and result is directly sent to IF stage (slide 2.55), branch penalty = 1 cycle (valid for unconditional branch/jump instructions).

To solve this problem, a **branch target table** (slide 2.66) is used to determine the target address in advance.

The branch target table is a cache memory in the IF stage that keeps the addresses of the branch instructions and their target addresses.

| http://akademi.itu.edu.tr/en/buzluca/<br>http:// www.buzluca.info | 2013 - 2021 Feza BUZLUCA | 2.63 |
|-------------------------------------------------------------------|--------------------------|------|
|-------------------------------------------------------------------|--------------------------|------|

| Computer Architecture                                                                                                                                                                                    |  |  |  |  |
|----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|--|--|--|--|
| Branch/jump instructions in the program cause two main problems (cont'd):                                                                                                                                |  |  |  |  |
| 2. Conditional branch problem: Until the previous instruction is actually executed, it is impossible to determine whether the branch will be taken or not because the values of the flags are not known. |  |  |  |  |
| If the branch is not taken, $PC \leftarrow PC$ + 4 (1 instruction = 4 bytes for the exemplary RISC processor )                                                                                           |  |  |  |  |
| If the branch is taken, $PC \leftarrow PC + offset$                                                                                                                                                      |  |  |  |  |
| <ul> <li>a) If the branch decision logic is in ME stage (after EX) (slide 2.32), branch<br/>penalty = 3 cycles.</li> </ul>                                                                               |  |  |  |  |
| <ul> <li>b) If the branch decision logic is in EX (slide 2.53), branch penalty = 2 cycles.</li> <li>To solve this problem, prediction mechanisms are used.</li> </ul>                                    |  |  |  |  |
| When a conditional branch is recognized, a branch prediction mechanism predicts whether the branch will be taken or not.                                                                                 |  |  |  |  |
| According to the prediction, either the next instruction in the memory or the target instruction of the branch is prefetched.                                                                            |  |  |  |  |
| If the prediction is correct, there is no branch penalty.                                                                                                                                                |  |  |  |  |
| In case of misprediction, the pipeline must be stopped and emptied.                                                                                                                                      |  |  |  |  |
| http://akademi.itu.edu.tr/en/buzluca/                                                                                                                                                                    |  |  |  |  |

| D) Branch Prediction (cont'd):                                                                                                                          |  |  |  |  |  |
|---------------------------------------------------------------------------------------------------------------------------------------------------------|--|--|--|--|--|
| There are two types of branch prediction mechanisms: static and dynamic.                                                                                |  |  |  |  |  |
| Static branch prediction strategies:                                                                                                                    |  |  |  |  |  |
| <ul> <li>Always <u>predict not taken</u>: Always assumes that the branch will not be taken<br/>and fetches the next instruction in sequence.</li> </ul> |  |  |  |  |  |
| b) Always <u>predict taken</u> : Always predicts that the branch will be taken and<br>fetches the target instruction of the branch.                     |  |  |  |  |  |
| To determine the target of the branch in advance (without calculation), the <b>branch target table</b> is used (slide 2.66).                            |  |  |  |  |  |
| Studies analyzing program behavior have shown that conditional branches are taken more than 50% of the time.                                            |  |  |  |  |  |
| Therefore, always prefetching from the branch target address should give better performance than always prefetching from the sequential path.           |  |  |  |  |  |
|                                                                                                                                                         |  |  |  |  |  |
|                                                                                                                                                         |  |  |  |  |  |
| http://akademi.itu.edu.tr/en/buzluca/                                                                                                                   |  |  |  |  |  |

| Computer Architecture                                                                                                                                        |                                                                                                                                                                           |                        |                            |  |  |  |  |  |
|--------------------------------------------------------------------------------------------------------------------------------------------------------------|---------------------------------------------------------------------------------------------------------------------------------------------------------------------------|------------------------|----------------------------|--|--|--|--|--|
| Branch target table (BTT): Target Instruction prefetch                                                                                                       |                                                                                                                                                                           |                        |                            |  |  |  |  |  |
| "Always predict tak                                                                                                                                          | "Always predict taken" strategy: Always fetches target instruction of the branch.                                                                                         |                        |                            |  |  |  |  |  |
| However, the CPU does <b>not know the target instruction</b> to fetch into the pipeline<br>until it calculates the target address of the branch instruction. |                                                                                                                                                                           |                        |                            |  |  |  |  |  |
| To determine the target of the branch in advance, the <b>branch target table</b> (BTT) is used.                                                              |                                                                                                                                                                           |                        |                            |  |  |  |  |  |
|                                                                                                                                                              | In the <b>branch target table</b> , addresses of recent branch instructions and their target addresses (where they jump) are kept in a cache memory ( <i>Chapter 6</i> ). |                        |                            |  |  |  |  |  |
|                                                                                                                                                              | The BTT makes it possible for the target instruction to be prefetched in the 1. stage (IF) without calculating the branch target address.                                 |                        |                            |  |  |  |  |  |
| There is a separate                                                                                                                                          | row for each brar                                                                                                                                                         | nch instruction that   | has recently run.          |  |  |  |  |  |
| The number of rece                                                                                                                                           | ent branch instruct                                                                                                                                                       | tions stored is limite | d by the size of the table |  |  |  |  |  |
| When a branch instruction runs for the first time, its target address is calculated and written into the BTT.                                                |                                                                                                                                                                           |                        |                            |  |  |  |  |  |
| E                                                                                                                                                            | Branch instruction ad                                                                                                                                                     | dr. Target address     | Example:                   |  |  |  |  |  |
| One row for                                                                                                                                                  | \$A000                                                                                                                                                                    | \$B000                 | <br>¢A000_BCT_Target       |  |  |  |  |  |
| each branch                                                                                                                                                  |                                                                                                                                                                           |                        | \$A000 BGT Target          |  |  |  |  |  |
| instruction that                                                                                                                                             |                                                                                                                                                                           |                        |                            |  |  |  |  |  |
| has recently run.                                                                                                                                            |                                                                                                                                                                           |                        | \$B000 Target ADD          |  |  |  |  |  |
| http://akademi.itu.edu.tr/en/bu<br>http:// www.buzluca.info                                                                                                  | zluca/                                                                                                                                                                    |                        | 3 - 2021 Feza BUZLUCA 2.66 |  |  |  |  |  |

| Computer Architecture                                             | License: https://creativecommons.org/licenses/by-nc-nd/4.0/                                                                  |
|-------------------------------------------------------------------|------------------------------------------------------------------------------------------------------------------------------|
| <b>D) Branch Prediction</b> (cont'd                               | ):                                                                                                                           |
| Dynamic branch prediction st                                      | trategies:                                                                                                                   |
|                                                                   | cord the history of all conditional branch<br>ram to predict whether the condition will be true                              |
|                                                                   | or counters) are associated with each conditional<br>m that reflect the recent history of the                                |
|                                                                   | t in a <b>branch history table</b> - <b>BHT</b> (slide 2.69)<br>about the branch history of the instruction<br>evious runs). |
|                                                                   |                                                                                                                              |
|                                                                   |                                                                                                                              |
|                                                                   |                                                                                                                              |
| http://akademi.itu.edu.tr/en/buzluca/<br>http:// www.buzluca.info | COSC 2013 - 2021 Feza BUZLUCA 2.67                                                                                           |

| Computer Architecture                                                                                                                                                                                                                                                                                                             |     |
|-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|-----|
| 1-bit dynamic prediction scheme:                                                                                                                                                                                                                                                                                                  |     |
| For each conditional branch instruction (i), a single individual <b>prediction bit</b> $(p_i)$ is stored in the branch history table (BHT).                                                                                                                                                                                       | S   |
| The prediction bit ${\bm p}_i$ records only whether the last execution of this instruction resulted in a branch or not.                                                                                                                                                                                                           | (i) |
| If the branch was taken last time, the system predicts that the branch will be taken next time.                                                                                                                                                                                                                                   |     |
| Algorithm:                                                                                                                                                                                                                                                                                                                        |     |
| Fetch the ith conditional branch instruction                                                                                                                                                                                                                                                                                      |     |
| If $(p_i = 0)$ then predict <b>not to take</b> the branch, fetch the next instruction in sequence<br>If $(p_i = 1)$ then predict <b>to take</b> the branch, prefetch the target instruction of the branch<br>If the branch is really taken, then $p_i \leftarrow 1$<br>If the branch is not really taken, then $p_i \leftarrow 0$ |     |
| The <b>initial value of</b> $\mathbf{p}_i$ is determined depending on the case in the first run of the conditional branch instruction.                                                                                                                                                                                            | e   |
| In the first run, the target address is calculated and stored in the BHT.                                                                                                                                                                                                                                                         |     |
| During the calculation of the target address, next instructions in sequence (not target of branch) are fetched into the pipeline. In case of a branch, there will be branch penalty.                                                                                                                                              |     |
| http://akademi.itu.edu.tr/en/buzluca/                                                                                                                                                                                                                                                                                             |     |



| Compute                                                                                                                                                                                                                                                     | er Architecture                                                                                                                                                                                                                                                                                                                                              |                                                                                                                             |  |  |  |  |
|-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|-----------------------------------------------------------------------------------------------------------------------------|--|--|--|--|
| Examp                                                                                                                                                                                                                                                       | le: 1-bit dynamic                                                                                                                                                                                                                                                                                                                                            | prediction scheme and loops:                                                                                                |  |  |  |  |
| Predict                                                                                                                                                                                                                                                     | tion mechanisms o                                                                                                                                                                                                                                                                                                                                            | are advantageous if there are loops in the program.                                                                         |  |  |  |  |
| Exampl                                                                                                                                                                                                                                                      | e:                                                                                                                                                                                                                                                                                                                                                           |                                                                                                                             |  |  |  |  |
| LOOP                                                                                                                                                                                                                                                        | counter ← 100<br><br>                                                                                                                                                                                                                                                                                                                                        | ; register or memory location<br>; instructions in the loop                                                                 |  |  |  |  |
|                                                                                                                                                                                                                                                             | Decrement counte<br>BNZ LOOP<br>                                                                                                                                                                                                                                                                                                                             | r ; counter ← counter - 1<br>; Branch if not zero (conditional branch, it has a p bit)<br>; Next instruction after the loop |  |  |  |  |
| <u>is in th</u><br>In the                                                                                                                                                                                                                                   | <ul> <li>A) We assume that in the beginning of the given piece of code, the BNZ instruction is in the BHT and the value of its p bit is 1 (predict to take the branch).</li> <li>In the first iteration (step) of the loop, the prediction at BNZ will be correct and the pipeline will prefetch the correct instruction (beginning of the loop).</li> </ul> |                                                                                                                             |  |  |  |  |
| The p b                                                                                                                                                                                                                                                     | oit (p=1) is not ch                                                                                                                                                                                                                                                                                                                                          | nanged until the last iteration of the loop.                                                                                |  |  |  |  |
| In the last iteration of the loop, the p bit is still 1, and the prediction is to take the branch; however, as the counter is zero, the program will not jump, and it will instead continue with the next instruction following the branch (misprediction). |                                                                                                                                                                                                                                                                                                                                                              |                                                                                                                             |  |  |  |  |
| The p bit of BNZ is cleared (p $\leftarrow$ 0) because the branch is not taken in the last step.                                                                                                                                                            |                                                                                                                                                                                                                                                                                                                                                              |                                                                                                                             |  |  |  |  |
| As a result, in a loop with 100 iterations, there are 99 correct predictions and only one incorrect prediction.                                                                                                                                             |                                                                                                                                                                                                                                                                                                                                                              |                                                                                                                             |  |  |  |  |
|                                                                                                                                                                                                                                                             | demi.itu.edu.tr/en/buzluca/<br>w.buzluca.info                                                                                                                                                                                                                                                                                                                | COSC 2013 - 2021 Feza BUZLUCA 2.70                                                                                          |  |  |  |  |

**Example:** 1-bit dynamic prediction scheme and loops (cont'd):

**B)** If in the beginning of the given piece of code, the BNZ instruction is not in the BHT, the system cannot make a prediction in the first run.

After the calculation of the target address of the BNZ, the related information is written into the BHT.

During the calculation of the target address, next instructions in sequence (not the target of branch) are fetched into the pipeline.

In the first run, the branch is taken, and the program jumps to the beginning of the loop, so there will be a branch penalty.

The initial value of **p** becomes 1 (predict that the branch will be taken).

The value of p(p = 1) does not change until the last iteration (step) of the loop. In the last iteration of the loop, the p bit is still 1, and the prediction is that the branch will be taken; however, as the counter is zero, the program will not jump, and it will instead continue with the next instruction following the branch (misprediction).

The p bit of BNZ is cleared (p  $\leftarrow$  0).

As a result, in a loop with 100 iterations, in the first iteration, a prediction cannot be made. Then, there are 98 correct predictions and one incorrect prediction. In total, there are 2 branch penalties.

http://akademi.itu.edu.tr/en/buzluca/ http:// www.buzluca.info 2013 - 2021 Feza BUZLUCA

2.71

**Computer Architecture** Problem with the 1-bit dynamic prediction scheme:  $\rightarrow$  LOOP\_EX (Nested loops: the same loop is executed many times) . . . LOOP In nested loops, a one-bit prediction scheme will cause two mispredictions for the inner loop: one in the first iteration, and BNZ LOOP one on exiting . . . BNZ LOOP EX Remember: in the previous example, after exiting the loop, the p bit of the inner BNZ LOOP was 0 ("don't take the branch") (p=0). Now, if the same loop runs again (2nd run), in the first iteration (step), the prediction about the BNZ will be "not to take the branch" (p=0). However, the program will jump to the beginning of the loop (first misprediction). Now, the p bit will be 1 because branch is taken (p  $\leftarrow$  1). Until the last iteration of the loop, predictions will be correct. In the last iteration of the loop, there will be a misprediction as in the previous example (second misprediction). Hence, misprediction will occur **twice** for each full iteration of the inner loop. @ 080 2013 - 2021 Feza BUZLUCA http://akademi.itu.edu.tr/en/buzluca/ 2.72 www.buzluca.inf









- If the instruction is in states 11 or 10, the scheme predicts that the branch will be taken.
- If the instruction is in states 00 or 01, the scheme predicts that the branch will **not** be taken.



| based mechanisms are used<br>includes two nested loops.<br>; Any instruction<br>; Any instruction<br>; Any instruction |
|------------------------------------------------------------------------------------------------------------------------|
| ; Any instruction<br>; Any instruction                                                                                 |
| ; Any instruction<br>; Any instruction                                                                                 |
| ; Any instruction                                                                                                      |
| ; Any instruction                                                                                                      |
|                                                                                                                        |
| <b>B</b> 1 17 1                                                                                                        |
| ; Branch if not zero                                                                                                   |
| ; Instruction after loop2                                                                                              |
|                                                                                                                        |
| ; Branch if not zero                                                                                                   |
| ; Instruction after loop1                                                                                              |
| er of correct predictions<br>BNZ) in the given piece of                                                                |
|                                                                                                                        |

| Co | computer Architecture                                                                                            |                                                                                 |                                                                      |  |
|----|------------------------------------------------------------------------------------------------------------------|---------------------------------------------------------------------------------|----------------------------------------------------------------------|--|
| So | Solution:                                                                                                        |                                                                                 |                                                                      |  |
|    |                                                                                                                  | a. Static predic                                                                | tion                                                                 |  |
| i) | <ul> <li>Always predict not taken (For this method, a BTT (branch target table) is<br/>not necessary)</li> </ul> |                                                                                 |                                                                      |  |
|    | BNZ LOOP1                                                                                                        | : There is a correct predic<br>Other predictions are ind<br>Correct : 1         | tion only in the last iteration (exit).<br>:orrect.<br>Incorrect : 9 |  |
|    | BNZ LOOP2                                                                                                        | : There is a correct predic<br>Other predictions are ind<br>Correct : 10x1 = 10 |                                                                      |  |
|    | Total:                                                                                                           | Correct : 11                                                                    | Incorrect : 99                                                       |  |
|    | This method                                                                                                      | is not suitable for loops.                                                      |                                                                      |  |
|    | ://akademi.itu.edu.tr/e<br>:// www.buzluca.info                                                                  | en/buzluca/                                                                     | 2013 - 2021 Feza BUZLUCA 2.77                                        |  |

| Computer Architecture          |                                                                                       |                                                                                |
|--------------------------------|---------------------------------------------------------------------------------------|--------------------------------------------------------------------------------|
|                                | a. Static prediction (con                                                             | nt'd)                                                                          |
| ii-1) Always predi             | <b>ct taken</b> under the assumption                                                  | on that instructions are in the BTT                                            |
|                                | There is a misprediction only<br>Other predictions are correc                         | et.                                                                            |
|                                | Correct: 9                                                                            | Incorrect: 1                                                                   |
|                                | There is a misprediction only<br>Other predictions are correc                         | et.                                                                            |
|                                | Correct : 10x9 = 90                                                                   | Incorrect : 10x1 = 10                                                          |
| Total:                         | Correct: 99                                                                           | Incorrect: 11                                                                  |
| BNZ LOOP1:                     | •                                                                                     | on that instr. are NOT in the BTT<br>y in the first and last iterations.<br>t. |
|                                | Correct: 8                                                                            | Incorrect: 2                                                                   |
| f<br>I                         | irst and last iterations; other<br>n the 2nd -10th runs, there is<br>reration (exit). | a misprediction only in the last                                               |
|                                | Correct : 8+9x9 = 89                                                                  | Incorrect : 2+9x1 = 11                                                         |
| Total:                         | Correct: 97                                                                           | Incorrect: 13                                                                  |
| http://akademi.itu.edu.tr/en/b | ouzluca/                                                                              | 2013 - 2021 Feza BUZLUCA 2.78                                                  |

| Computer Architectur                                    | e License:                                                                                                      | https://creativecommons.org/licenses/by-nc-nd/4.0/       |    |
|---------------------------------------------------------|-----------------------------------------------------------------------------------------------------------------|----------------------------------------------------------|----|
| Solution (cont'd                                        | ):                                                                                                              |                                                          |    |
|                                                         | b. Dynamic prediction with one bit                                                                              |                                                          |    |
| Note: Differen<br>2.69).                                | <b>Note:</b> Different prediction bits are used for each branch instruction (Slides 2.68) 2.69).                |                                                          | 8, |
|                                                         | i) Assumption: In the beginning, instructions are in the BHT, and initial decision is to <b>take the branch</b> |                                                          | ı  |
| BNZ LOOP1                                               | There is a misprediction o<br>predictions are correct.<br>Correct: 9                                            | only in the last iteration (exit). Other<br>Incorrect: 1 |    |
| BNZ LOOP2:                                              | BNZ LOOP2: In the first run of the loop, there is a misprediction only in the last iteration (exit).            |                                                          |    |
|                                                         | Other predictions are cor                                                                                       | rect.                                                    |    |
|                                                         | will not be taken".                                                                                             |                                                          | th |
| Total:                                                  | Correct: 90                                                                                                     | Incorrect: 20                                            |    |
| http://akademi.itu.edu.tr/e<br>http:// www.buzluca.info | n/buzluca/                                                                                                      | 2013 - 2021 Feza BUZLUCA 2.79                            |    |

| Computer Architecture                                    |                                                                      |                                    |
|----------------------------------------------------------|----------------------------------------------------------------------|------------------------------------|
| b. Dyna                                                  | mic prediction with one bit (cont                                    | 'd):                               |
| ii) In the beginni<br>to take the bran                   | ng instructions are NOT in the BF<br>ch                              | 1T, or the initial decision is NOT |
|                                                          | There are mispredictions in the fi<br>Other predictions are correct. | rst and last iterations.           |
|                                                          | Correct: 8                                                           | Incorrect: 2                       |
|                                                          | here are mispredictions in the firs<br>ther predictions are correct. | st and last iterations.            |
|                                                          | Correct: 10×8 = 80                                                   | Incorrect: 10x2 =20                |
| Total:                                                   | Correct: 88                                                          | Incorrect: 22                      |
|                                                          |                                                                      |                                    |
|                                                          |                                                                      |                                    |
|                                                          |                                                                      |                                    |
|                                                          |                                                                      |                                    |
| http://akademi.itu.edu.tr/er<br>http:// www.buzluca.info | /buziuca/                                                            | 2013 - 2021 Feza BUZLUCA 2.80      |

| Computer Architecture                                        |                                                                                       |                          |      |
|--------------------------------------------------------------|---------------------------------------------------------------------------------------|--------------------------|------|
| c. [                                                         | Dynamic prediction with two                                                           | ) bits:                  |      |
|                                                              | he beginning, instructions ar<br><b>the branch</b> , prediction bits                  |                          |      |
|                                                              | nere is a misprediction only in<br>ther predictions are correct<br>Correct: 9         |                          |      |
|                                                              | here is a misprediction only in<br>ther predictions are correct<br>Correct: 10x9 = 90 |                          |      |
| Total:                                                       | Correct: 99                                                                           | Incorrect: 11            |      |
|                                                              |                                                                                       |                          |      |
|                                                              |                                                                                       |                          |      |
|                                                              |                                                                                       |                          |      |
| http://akademi.itu.edu.tr/en/bu:<br>http:// www.buzluca.info | ziuca/                                                                                | 2013 - 2021 Feza BUZLUCA | 2.81 |

| Computer Architectur                                                                                                                                               | e                                                                      |                                                                                                     |
|--------------------------------------------------------------------------------------------------------------------------------------------------------------------|------------------------------------------------------------------------|-----------------------------------------------------------------------------------------------------|
| c. Dy                                                                                                                                                              | namic prediction with two bits                                         | (cont'd):                                                                                           |
| ii) In the beginn                                                                                                                                                  | ing, instructions are NOT in the                                       | внт                                                                                                 |
|                                                                                                                                                                    | un of the BNZ instructions, since<br>ons in sequence (not the target o |                                                                                                     |
| Hence, there is a misprediction in the first iteration.                                                                                                            |                                                                        |                                                                                                     |
| After the CPU has decided to branch and the target address has been calculated, information about the BNZ is stored in the BHT, and prediction bits are set to 11. |                                                                        |                                                                                                     |
| BNZ LOOP1: 1                                                                                                                                                       | There are mispredictions in the f                                      | irst and last iterations.                                                                           |
|                                                                                                                                                                    | Correct: 8                                                             | Incorrect: 2                                                                                        |
|                                                                                                                                                                    | in the first run, there are misproterations.                           | edictions in the first and last                                                                     |
| Γ                                                                                                                                                                  | in the last iteration.                                                 | s still "branch will be taken".<br>s, there will be a misprediction only<br>Incorrect: 2 + 9x1 = 11 |
| Total:                                                                                                                                                             | Correct: 97                                                            | Incorrect: 13                                                                                       |
| http://akademi.itu.edu.tr/e<br>http:// www.buzluca.info                                                                                                            | n/buzluca/                                                             | 2013 - 2021 Feza BUZLUCA 2.82                                                                       |