The basic concept of pipelining is to break up instruction execution activities into stages that can operate independently. Every instruction passes through the same stages much like an assembly line.
For example, we could set up the following stages for a MIPS pipeline.
With these pipeline stages, a sequence of instructions can be executed as shown below. Time progresses from left to right. Each horizontal division represents one clock period.
l.s | $f0, 0($t1) | IF | ID | EX | MEM | WB | |||||
l.s | $f2, 0($t2) | IF | ID | EX | MEM | WB | |||||
mul.s | $f4, $f0, $f2 | IF | ID | EX | MEM | WB | |||||
add.s | $f6, $f6, $f4 | IF | ID | EX | MEM | WB | |||||
addi | $t1, $t1, 4 | IF | ID | EX | MEM | WB | |||||
addi | $t2, $t2, 4 | IF | ID | EX | MEM | WB |
As you can see in from the figures below, pipelining increases instruction throughput. Notice that after the 5th cycle, the unpipelined execution completes only one instruction every 5 cycles, while the idealized pipelined execution completes 5.
Ideally, instruction throughput is increased to 1 instruction per clock. In other words, the clocks per instruction (CPI) factor in the performance equation is reduced from 5.0 to 1.0.
instr1 | IF | ID | EX | MEM | WB | ||||||||||
instr2 | IF | ID | EX | MEM | WB | ||||||||||
instr3 | IF | ID | EX | MEM | WB |
instr1 | IF | ID | EX | MEM | WB | ||||||||||
instr2 | IF | ID | EX | MEM | WB | ||||||||||
instr3 | IF | ID | EX | MEM | WB | ||||||||||
instr4 | IF | ID | EX | MEM | WB | ||||||||||
instr5 | IF | ID | EX | MEM | WB | ||||||||||
instr6 | IF | ID | EX | MEM | WB | ||||||||||
instr7 | IF | ID | EX | MEM | WB | ||||||||||
instr8 | IF | ID | EX | MEM | WB | ||||||||||
instr9 | IF | ID | EX | MEM | WB | ||||||||||
instr10 | IF | ID | EX | MEM | WB | ||||||||||
instr11 | IF | ID | EX | MEM | WB |
The best starting point for a pipelined implementation is a single-cycle implementation. For example, for a MIPS pipeline you could start with an implementation whose high-level data path shown as the "Before Pipelining" diagram below.
To implement pipelining registers are added between stages. The pipelining registers are shown in light green in the "After Pipelining" diagram below. The pipelining registers hold data and control signals that are produced in an early stage for use in later stages.
Signals generated in a stage cannot be held for more than one cycle. A signal that is generated in an early stage and used several stages later must pass through all of the intermediate pipeline registers. For example, a control signal that is produced in the ID stage and used in the WB stage must pass through 3 pipelining registers: the ID/EX registers, the EX/MEM registers, and the MEM/WB registers.
The analogy between a pipeline and an assembly line breaks down in one important respect. Putting together a door for a car does not depend on cars further along in the assembly line.
But there are dependences between instructions. These can be seen in the diagram below where data is passed back from a later stage to an earlier stage. The ones that involve updating the PC (red) are called control hazards. The ones that involve writing data back to registers (purple) are called data hazards.
Both of these dependences are inherent in the instruction set. Compiler writers call them control and data dependences. In both cases the execution of a later instruction depends on the results of earlier instruction. There are other obstacles, called structural hazards that arise from the starting point of the pipelining implementation
Control hazards arise from branches and jumps. They involve signals that are passed from a later stage to an earlier stage:
Data hazards arise from instructions producing data that is used in later instructions. They involve signals that are selected by the MemtoReg multiplexer in the WB stage to be written to a register. The register may be read by a later instruction in its ID stage.
Structural hazards are hazards that depend on the starting point for the implementation. For example, if we started with a multicycle implementation, we would have problems in a pipeline because the ALU is used in more than one stage by the same instruction. Executing a branch instruction, the ALU is used to increment the PC, compute a branch target address, and compare two source operands. These uses are going to prevent other instructions in the pipeline from using the ALU.
Pipelining is one of the primary reasons why RISC processors have a significant speed advantage over CISC processors. If arithmetic and logical instructions can access memory for source or destination operands then it is much more difficult to break down instruction execution into stages with equal durations. If memory addressing modes are complex then this problem just gets harder. If instructions have varying lengths it is more difficult to start a new instruction every cycle.
When pipelining is done with a CISC processor it is done at a different level. The execution of instructions is broken down into smaller parts which can then be pipelined. In effect, The CISC instructions are translated into a sequence of internal RISC instructions, which are then pipelined. This adds complexity to the processor and generally does not produce as much benefit. For upward compatibility, the Intel 80x86 family of processors, including Pentium processors since the early 1990s, have used this approach.