What is an atomic operation?
Operation that completes without interruption - appears instantaneous
What is the load-compute-store problem?
Multiple threads reading, modifying, writing same variable cause races
What is the histogram example?
Counting frequency of values - multiple threads incrementing same counters
What is the problem with naive histogram?
Race conditions cause lost increments
What is the solution?
Use atomic_inc() for thread-safe increments
What are OpenCL atomic operations?
atomic_inc, atomic_dec, atomic_add, atomic_min, atomic_max, etc.
What is atomic_inc(p)?
Atomically increments *p, returns old value
What is atomic_cmpxchg(p
cmp
What is the spinlock pattern?
while(atomic_cmpxchg(&lock, 0, 1) == 1) - busy wait for lock
What is the problem with spinlocks on GPU?
Global memory lock causes contention, divergence issues
What is lock-free programming?
Thread-safe data structures without locks, using atomics
What is the linked list prepend problem?
Multiple threads adding to list head simultaneously
What is the lock-free solution?
Use atomic_cmpxchg in loop to update head pointer atomically
What is optimistic concurrency?
Assume operation succeeds, retry if conflict detected
What is the trade-off with atomics?
Lower overhead than locks but limited to simple operations
When to use atomics vs locks?
Atomics for simple operations, locks for complex multi-statement sections
What is the histogram optimization?
Use local memory per work group, then atomic add to global
Why use local memory for histogram?
Reduces contention on global memory locations
What is the local histogram pattern?
Each work group builds local histogram, then atomically adds to global
What is the performance benefit?
Less contention, local memory is faster
What is the key insight about GPU atomics?
Essential for coordinating access to shared resources like histograms