17-GPU-Synchronisation Flashcards

Question 1

Q

What is the scalar product?

Answer

A

a · b = Σ(a_i × b_i) for vectors a and b

Question 2

Q

What is the serial scalar product algorithm?

Answer

A

Initialize sum = 0, loop adding a[i] * b[i]

Question 3

Q

What is parallel reduction?

Answer

A

Using binary tree to combine elements: pair, combine, repeat

Question 4

Q

What is the kernel for parallel reduction?

Answer

A

Each work item does partial sum, then binary tree reduction in local memory

Question 5

Q

What is barrier synchronization?

Answer

A

barrier(CLK_LOCAL_MEM_FENCE) - all work items in group wait

Question 6

Q

Why are barriers needed?

Answer

A

Ensure all work items complete writing before reading shared data

Question 7

Q

What is the problem with global barriers?

Answer

A

GPUs cannot synchronize between work groups - no global barrier

Question 8

Q

What is the solution for cross-work-group reduction?

Answer

A

Use multiple kernel launches, each completing before next starts

Question 9

Q

What are lockstep execution?

Answer

A

SIMD cores execute same instruction across all threads simultaneously

Question 10

Q

What is divergence?

Answer

A

When threads in same warp execute different code paths, causing serialization

Question 11

Q

What causes divergence?

Answer

A

if/else branches where different threads take different paths

Question 12

Q

What is the performance impact of divergence?

Answer

A

Code executes serially for different paths, losing parallelism

Question 13

Q

What is subgroup size?

Answer

A

Number of threads executing in lockstep (32 for Nvidia warps, 64 for AMD)

Question 14

Q

Why exploit subgroups in reduction?

Answer

A

Once reduced to subgroup size, no need for explicit synchronization

Question 15

Q

What is the subgroup reduction optimization?

Answer

A

Skip barriers for final reduction steps within subgroup

Question 16

Q

What is the reduction pattern with subgroups?

Answer

Study These Flashcards

A

Normal reduction with barriers until subgroup size, then lockstep reduction

Question 17

Q

What is the advantage of subgroup-aware code?

Answer

Study These Flashcards

A

Reduced synchronization overhead in final reduction steps

Question 18

Q

What is the trade-off with divergence?

Answer

Study These Flashcards

A

Some algorithms accept divergence for cleaner code if benefit outweighs cost

Question 19

Q

What is the key insight about GPU synchronization?

Answer

Study These Flashcards

A

Local synchronization works, global requires multiple kernels

Question 20

Q

What makes GPU synchronization different from CPU?

Answer

Study These Flashcards

A

Work groups are independent, no global coordination primitives

17-GPU-Synchronisation Flashcards

(20 cards)