Keep active threads together - better!
Threads and memory both behave like this for coalescing
__manage__ memory
accessible from both CPU and GPU
atomic operations
atomicAdd, atomicSub, AromicExch, atomicMin,
atomicMax, atomicInc, atomicDec, atomicCAS,
atomicAND, atomicOR, atomicXor
why don’t always use atomic operations?
leads to sequential performances