Steps for instruction execution
From PC number, go to instruction memory, fetch instruction.
Then decode the instruction and corresponding register number.
For load/store may need to calculate memory address.
ALU used to calculate:
Then access data memory for load/store
Then update PC to PC +4

Describe Register Design

Describe Register with Write Control

Describe Clocking Methodology
Combinational logic transforms data during clock cycles:
• Between clock edges;
• Input from state elements, output to state element;
• State elements are latches or registers;
• Longest delay determines clock period.

Define a datapath
Elements that process data and addresses in the CPU.
They are registers, ALUs, muxes, memories, …
Steps for R-format Instruction

Steps for Load/Store Instructions

Steps For Branch Instructions
Read register operands.
• Compare operands:
• Use ALU, subtract and
check Zero output.
• Calculate target address:
• Sign-extend displacement;
• Shift left 2 places (word
displacement);
• Add to PC + 4…
• Already calculated by
instruction fetch.

Draw Full Datapath without Pipelining

Describe ALU Control

Describe Opcodes For instructions
Control Signals Derived From Instructions
R-Type = 0
Load/Store = 35, 43
Branch Instructions = 4
Steps For implementing Jumps in Datapath

What are the five processing stages?
How is the performance of a single cycle processor evaluated?
To improve performance, load instruction must be implemented as fast as possible as the longest path.

Performance of Multicycle Datapath
Clocking Methodology Of MultiCycle

Describe Multicycle Datapath

Describe Pipelined Processor
How does a pipelined processor improve performance
parallelism improves performance as the processor can simultaneously execute different stages allowing faster execution. In summary, parallelism allows the processor to perform several computations at the same time.
What has most recently improved performance in computer architecture?
Although increases in clock speed has provided benefits, the greatest improvements have been due to better pipelining and multithreadiing.

What are the basic principles of processing?
Sequential processing
• Each instruction executes completely before next instruction starts
Pipeline processing
• Instruction i + 1 starts executing before instruction i finishes
• Overlapped execution
• Maximum number of instructions executing simultaneously = Number of pipeline stages
• Increases performance by increasing throughput
• individual instructions not executed faster
• Ideal case: no idle components
• Significant issue: hardware in each stage must be independent, so can no longer share
hardware between stages
Is an efficient multicycle datapath suitable for pipelining?
This shares many components to try and maximise the efficiency i.e. ALU is used
for next instruction calculations… Shared resources, not suitable for pipelining!

How are pipelining implemented in single cycle datapaths?

How does pipelined speedup work?
• If all stages are balanced
• i.e., all take the same time (very unlikely!)
• Time between instructions pipelined = Time between instructions non-pipelined
Number of stages
• If not balanced, the speedup is less!
• Overall speedup due to increased throughput
• Latency (time for each instruction) does not decrease (normally increases).