What is the canonical processor pipeline?
Fetch Read/Decode ALU Memory Write
Fetch: Use program counter (PC) to get instruction from instruction memory
Read/Decode: Read registers and decode instruction
ALU: Perform arithmetic and logical operations on info read from registers, using logic decoded from instruction
Memory: If instruction was a load or store, ALU will return address and we’ll load/store from that address in data memory
Write: Write result to register
Why would CPI be greater than 1 in a pipeline?
Why might there be a stall in the pipeline?
What is a control dependence?
When execution of instructions depends on the behavior of a branch.
What two factors determine how control dependence will affect CPI in a pipeline?
The deeper the pipeline, the better your branch prediction needs to be!
Types of data dependencies
RAW (read after write): When an instruction needs to read from a register that a previous instruction is writing to, it can’t move forward until the previous one completes. Also called a “flow” or “true” dependence.
WAW (write after write): When an instruction needs to write to a register that previous instruction is writing to (if they go out of order, the wrong item will be in the register). Also called an “output” dependence.
WAR (write after read): When an instruction needs to write to a register after another reads from it. Order matters so that the earlier instruction reads the correct data. Also called “anti” dependence because it reverses the order of RAW.
Which data dependencies are “true” and which “false”?
True: RAW (also called “flow” or “true” dependence)
False: WAW and WAR
Another name for “false” dependence
Name dependence
Is RAR (read after read) a dependence?
Nope! It doesn’t matter what order instructions read in as long as no one is writing to that register
Dependencies vs Hazards
Dependencies are a property of the program alone. Not all dependencies are hazards (just RAW I guess?)
Hazards are a property of the program and pipeline! If an instruction has a data dependency on an earlier instruction, but the earlier one leaves the pipeline before it reads the register, there’s no hazard! Also, false dependencies are never(?) hazards.
How can we handle hazards?
How does the number of stages affect performance?
As number of stages increases, the number of hazards increase (and so does misprediction penalty), increasing CPI.
But the work needed per stage decreases, decreases clock time per cycle.
How to choose number of stages to optimize performance?
Best performance is achieved by adding more stages so that we decrease work needed per stage (clock time), without adding so many stages that we significantly increase hazards. The graph of execution time vs # stages is an upward facing parabola.
In modern processors, the minima is around 30-40 stages.
Should we optimize pipelines strictly for performance?
No! Consider power consumption too. Power consumption increases linearly with number of stages, so we choose the point where this intersects the performance optimization parabola.
On modern processors, this is around 10-15 stages.