Week 10 Flashcards

4.8-4.10, 4.14-4.15 (57 cards)

1
Q

control/branch hazards

A

Hazard where pipelined processor doesn’t know which instruction to fetch next because some previous instruction isn’t done yet (even with forwarding)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

hazard

A

A clock cycle where we can’t get the pipeline to do what we want to do

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

dynamic branch prediction

A
  • Each time a branch instruction is given, save information (taken or not taken) to update prediction for that branch instruction
  • stored in a branche prediction cache
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

how many bytes for each instruction on a 32 byte machine?

A
  • 4 bytes
  • good to know for things like CBZ X1, 8 (move fwd 8 instructions, so by 32 bytes)
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

how to “kill off” an instruction if we no longer need it

A

Use the NOP instruction so the instructions don’t continue

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

control dependence definition

A

determines the ordering of an instruction J with respect to a branch instruction so that J is executed in correct program order and only when it should be

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

control dependence constraints (2 things)

A
  • instruction that is control dependent on a branch cannot be put before the branch
  • instruction that is NOT control dependent cannot be put after the branch
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

To increase performance most processors execute instructions that should not be executed if this can be done ____.

A

without affecting program correctness

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

instruction execution trace

A
  • the correct sequence of instruction execution
  • may be unknowable at time and doesn’t have to be sequential
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

which can cause greater pipeline performance loss, control hazards or data hazards

A
  • control hazards
  • may fetch where instruction shouldn’t, don’t know what to fetch soon enough
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

superscalar

A
  • a dynamic multi-issue processor that executes more than one instruction per clock cycle by selecting them during execution
  • helps w/control hazards but hard to scale
  • better than VLIW bc instructions guaranteed to be correct
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

methods to resolve control dependencies (3 things)

A
  • to stall (slowest)
  • get enough circuitry to do more operations at once (superscalar)
  • branch prediction
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

branch prediction

A

Guessing execution path before the path is known can reduce time

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

predict branch not taken

A
  • Predict not taken the same as continuing to fetch instruction at default next instruction (PC + 4)
  • when right goes as normal, when wrong pipeline stalls
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

predict branch taken

A
  • Predict taken is great for when working with loops (will be incorrect once)
  • is a rigid approach relying on stereotypical branch behavior
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

branch prediction cache

A
  • used to store info for dynamic branch prediction
  • ALU determines if condition is true (branch) or false (default) when EX resolves a branch instruction
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

branch target buffer (BTB)

A
  • cache in processors that stores target addresses of recently executed branches to improve efficiency by predicting and pre fetching instructions
  • usually organized as a cache with tags in the IF stage, making it more costly than a simple prediction buffer
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

correlating predictor

A

Branch predictor that combines local behavior of a particular branch and global information about the behavior of some recent number of executed branches

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
19
Q

tournament branch predictor

A

a branch predictor with multiple predictions for each branch and a selection mechanism that chooses which predictor to enable for a given branch

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
20
Q

what exactly is a branch instruction?

A

an opcode bit string that gets interpreted according to instruction format

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
21
Q

Instruction addresses in memory ____.

A
  • NEVER change
  • good for prediction, can fetch instruction only once and then save value
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
22
Q

lookup table

A
  • When pipeline remembers address of a branch instruction, a lookup table can reveal when that branch is fetched again
  • When pipeline remembers the computed target address of a branch, then a lookup can provide target address
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
23
Q

explanation of circuit that searches dictionary in <1 clock cycle

A
  • Add circuit to IF, store taken branch pairs (branch_instr_addr, target_addr) as they occur
  • Search for branch instruction memory, then read target without waiting for computation
24
Q

content-addressed memory (CAM)

A
  • doesn’t use pointer to read, uses “value” and parallel search to find location to read
  • takes constant time, has N storage registers, each with a bitwise comparison unit and 1 search register
  • works a lot like a hash table, use key to find value
25
compilers and compiling for ILP (instruction level parallelism)
- Compiler can find ILP at a higher level of abstraction than hardware - Compiler has access to source code, can analyze it, use techniques to reduce stalls
26
loop unrolling
- technique to get more performance from loops that access arrays, in which multiple copies of the loop body are made and instructions from different iterations are scheduled together - can reduce dependence caused stalls if iterations are independent of another
27
latency
time between the input and the output of an instruction
28
static instruction scheduling
- Reorder machine instructions from obvious compiled order to reduce dependence caused stalls - technique to reduce dependence caused stalls
29
latency table
- a practical reference tool used by programmers and system designers to understand and compare the typical delays associated with various operations - can help you figure out how to reorder instructions to reduce stalls
30
exception/interrupt
- an unscheduled event that disrupts program execution - used to detect overflow
31
interrupt
- An exception that comes from outside of the processor - Some architectures use the term interrupt for all exceptions
32
vectored interrupt
An interrupt for which the address to which control is transferred is determined by the cause of the exception
33
flush
To discard instructions in a pipeline, usually due to an unexpected event
34
how to reduce the cost of the taken branch? (two things)
- move up the branch address calculations from EX to ID stage (calculate all possibilities at once) - moving up the branch decision (much more difficult)
35
imprecise interrupt/exception
Interrupts/exceptions in pipelined computers that are not associated with the exact instruction that was the cause of the interrupt or exception
36
precise interrupt/exception
An interrupt or exception that is always associated with the correct instruction in pipelined computers
37
basic action processor must do when exception occurs
- save address of problem instruction in exception link register (ELR) - transfer control to OS at some specified address - OS handles exception, either stops program or handles it and continues
38
exception link register (ELR)
- A 64-bit register used to hold the address of instruction when exception happens - needed for vectored interrupt
39
exception syndrome register (ESR)
- A register used to record the cause of the exception - n LEGv8, this register is 32 bits, although some bits are currently unused
40
instruction level parallelism (ILP)
The parallelism among instructions
41
basic blocks
- a sequence of instructions with a single entry point (the first instruction) and a single exit point (the last instruction) - blocks execute sequentially within themselves
42
overhead code
extra resources used to support non essential tasks like function calls, etc. that manage execution of task but don't influencer result of intended computation
43
productive code
the final version of code that has been optimized for performance
44
speculation
- an approach where the compiler or processor guesses the outcome of the instruction to remove it as a dependence in executing other instructions - ex. assuming branch taken so instructions that come after can be taken earlier
45
issue slots
- dedicated pipelines where instructions wait to be processed, allowing processors to fetch and dispatch multiple instructions per cycle - may be determined statically by the compiler or dynamically by the processor - task of multiple issue is determining which issue slots should be used for which instructions
46
Very Long Instruction Word (VLIW)
- style of instruction set architecture that launches many operations that are defined to be independent in a single wide instruction - typically has many separate opcode fields - can think of issue packet at VLIW
47
multiple issue
- A scheme whereby multiple instructions are launched in one clock cycle - this allows the CPI to be less than 1
48
static multiple issue
An approach to implementing a multiple-issue processor where many decisions are made by the compiler before execution
49
dynamic multiple issue
An approach to implementing a multiple-issue processor where many decisions are made during execution by the processor
50
single issue
when one instruction is launched per clock cycle
51
issue packet
- the set of instructions that issues together in one clock cycle - the packet may be determined statically by the compiler or dynamically by the processor
52
why you must choose carefully when using speculation
- can take time to backtrack if guess wrong - can make exceptions happen that shouldn't
53
how to fix hardware for multiple issue pipelining
- basically more hardware - add more ports in register file for reading/writing - add another adder to calculate addresses
54
name dependence/antidependence
An ordering forced by the reuse of a name, typically a register, rather than by a true dependence that carries a value between two instructions
55
register renaming
- the renaming of registers by the compiler or hardware to remove antidependence (name dependence) - used with loop unrolling
56
use latency
number of clock cycles between a load instruction and an instruction that can use the result of the load without stalling the pipeline
57
BTB (branch target buffer) only holds ____.
- taken branches - place BTB in IF stage and then fetch a target instruction as the actual next instruction