INSTRUCTION PIPELINING (I)

1. The Instruction Cycle
2. Instruction Pipelining
3. Pipeline Hazards
4. Structural Hazards
5. Data Hazards
6. Control Hazards

The Instruction Cycle

- Fetch instruction (FI)
- Decode instruction (DI)
- Fetch operand (FO)
- Calculate operand address (CO)
- Execute instruction (EI)
- Write back operand (WO)

Instruction Pipelining

- Instruction execution is extremely complex and involves several operations which are executed successively (see slide 2). This implies a large amount of hardware, but only one part of this hardware works at a given moment.

- Pipelining is an implementation technique whereby multiple instructions are overlapped in execution. This is solved without additional hardware but only by letting different parts of the hardware work for different instructions at the same time.

- The pipeline organization of a CPU is similar to an assembly line: the work to be done in an instruction is broken into smaller steps (pieces), each of which takes a fraction of the time needed to complete the entire instruction. Each of these steps is a pipe stage (or a pipe segment).

- Pipe stages are connected to form a pipe:

  Stage 1 → Stage 2 → ... → Stage n

- The time required for moving an instruction from one stage to the next: a machine cycle (often this is one clock cycle). The execution of one instruction takes several machine cycles as it passes through the pipeline.

Acceleration by Pipelining

Two stage pipeline: FI: fetch instruction
        EI: execute instruction

Clock cycle: 1 2 3 4 5 6 7 8

Instr. i    FI EI
Instr. i+1   FI EI
Instr. i+2   FI EI
Instr. i+3   FI EI
Instr. i+4   FI EI
Instr. i+5   FI EI
Instr. i+6   FI EI

We consider that each instruction takes execution time $T_{ex}$.

Execution time for the 7 instructions, with pipelining:

$$
(\frac{T_{ex}}{2})^2 \cdot 8 = 4^*T_{ex}
$$
Acceleration by Pipelining (cont’d)

Six stage pipeline (see also slide 2):
- FI: fetch instruction
- FO: fetch operand
- DI: decode instruction
- EI: execute instruction
- CO: calculate operand address
- WO: write operand

Clock cycle → 1 2 3 4 5 6 7 8 9 10 11 12

Instr. i

```
FI DI CO FO EI MX
```

Instr. i+1

```
FI DI CO FO EI MX
```

Instr. i+2

```
FI DI CO FO EI MX
```

Instr. i+3

```
FI DI CO FO EI MX
```

Instr. i+4

```
FI DI CO FO EI MX
```

Instr. i+5

```
FI DI CO FO EI MX
```

Instr. i+6

```
FI DI CO FO EI MX
```

Execution time for the 7 instructions, with pipelining:
\[(\text{T}_{ex} / 6)^*12 = 2 \text{T}_{ex}\]

- After a certain time (N-1 cycles) all the N stages of the pipeline are working: the pipeline is filled. Now, theoretically, the pipeline works providing maximal parallelism (N instructions are active simultaneously).

Pipeline Hazards

- Pipeline hazards are situations that prevent the next instruction in the instruction stream from executing during its designated clock cycle. The instruction is said to be stalled. When an instruction is stalled, all instructions later in the pipeline than the stalled instruction are also stalled. Instructions earlier than the stalled one can continue. No new instructions are fetched during the stall.

- Types of hazards:
  1. Structural hazards
  2. Data hazards
  3. Control hazards

Structural Hazards

- Structural hazards occur when a certain resource (memory, functional unit) is requested by more than one instruction at the same time.

Instruction ADD R4,X fetches in the FO stage operand X from memory. The memory doesn’t accept another access during that cycle.

Clock cycle → 1 2 3 4 5 6 7 8 9 10 11 12

```
ADD R4,X
```

Instr. i+1

```
FI DI CO FO EI MX
```

Instr. i+2

```
FI DI CO FO EI MX
```

Instr. i+3

```
FI DI CO FO EI MX
```

Instr. i+4

```
FI DI CO FO EI MX
```

Penalty: 1 cycle

- Certain resources are duplicated in order to avoid structural hazards. Functional units (ALU, FP unit) can be pipelined themselves in order to support several instructions at a time. A classical way to avoid hazards at memory access is by providing separate data and instruction caches.
Data Hazards

- We have two instructions, I1 and I2. In a pipeline the execution of I2 can start before I1 has terminated. If in a certain stage of the pipeline, I2 needs the result produced by I1, but this result has not yet been generated, we have a data hazard.

I1: MUL R2, R3  \( R2 \leftarrow R2 \times R3 \)
I2: ADD R1, R2  \( R1 \leftarrow R1 + R2 \)

Clock cycle → 1 2 3 4 5 6 7 8 9 10 11 12
MUL R2, R3  \( \text{FI DI CO FO EI NY} \)
ADD R1, R2  \( \text{FI DI CO FO EI NY} \)
Instr. i+2  \( \text{FI DI CO FO EI NY} \)

Before executing its FO stage, the ADD instruction is stalled until the MUL instruction has written the result into R2.

Penalty: 2 cycles

Data Hazards (cont’d)

- Some of the penalty produced by data hazards can be avoided using a technique called forwarding (bypassing).

Control Hazards

- Control hazards are produced by branch instructions.

Unconditional branch

\[ \begin{align*}
& \text{BR TARGET} \\
& \text{TARGET}
\end{align*} \]

After the FO stage of the branch instruction the address of the target is known and it can be fetched

Clock cycle → 1 2 3 4 5 6 7 8 9 10 11 12
BR TARGET  \( \text{FI DI CO FO EI NY} \)
target  \( \text{FI DI CO FO EI NY} \)
target+1  \( \text{FI DI CO FO EI NY} \)

The instruction following the branch is fetched; before the DI is finished it is not known that a branch is executed. Later the fetched instruction is discarded.

Penalty: 3 cycles

Control Hazards (cont’d)

Conditional branch

\[ \begin{align*}
& \text{ADD R1, R2} \\
& \text{BEZ TARGET} \quad \text{branch if zero instruction i+1}
\end{align*} \]

TARGET

Branch is taken

At this moment, both the condition (set by ADD) and the target address are known.

Clock cycle → 1 2 3 4 5 6 7 8 9 10 11 12
ADD R1, R2  \( \text{FI DI CO FO EI NY} \)
BEZ TARGET  \( \text{FI DI CO FO EI NY} \)
target  \( \text{FI DI CO FO EI NY} \)

Penalty: 3 cycles

Branch not taken

At this moment the condition is known and instr+i+1 can go on.

Clock cycle → 1 2 3 4 5 6 7 8 9 10 11 12
ADD R1, R2  \( \text{FI DI CO FO EI NY} \)
BEZ TARGET  \( \text{FI DI CO FO EI NY} \)
Instr i+1  \( \text{FI DI CO FO EI NY} \)

Penalty: 2 cycles
Control Hazards (cont’d)

• With conditional branch we have a penalty even if the branch has not been taken. This is because we have to wait until the branch condition is available.

• Branch instructions represent a major problem in assuring an optimal flow through the pipeline. Several approaches have been taken for reducing branch penalties (see slides of the following lecture).

Summary

• Instructions are executed by the CPU as a sequence of steps. Instruction execution can be substantially accelerated by instruction pipelining.

• A pipeline is organized as a succession of N stages. At a certain moment N instructions can be active inside the pipeline.

• Keeping a pipeline at its maximal rate is prevented by pipeline hazards. Structural hazards are due to resource conflicts. Data hazards are produced by data dependencies between instructions. Control hazards are produced as consequence of branch instructions.