Cache Organization
caches are organized in fixed-size chunks called cache lines
* on most CPUs a cache line is 64 bytes
* data accesses go through cache, which is transparently managed by the CPU
* caches implement a replacement strategy to evict pages
Cache Coherency Protocol
CPU manages caches and provides the illusion of a single main memory
using a cache coherency protocol
Performance Implications of Cache Coherency
x86-64 Memory Model
x86-64 implements Total Store Order (TSO), which is a strong memory model
* this means that x86 mostly executes the machine code as given
* loads are not reordered with respect to other loads, writes are not reordered
with respect to other writes
* however, writes are buffered (in order to hide the L1 write latency), and reads
are allowed to bypass writes
* a fence or a locked write operations will flush the write buffer (but will not
flush the cache)
* important benefit from TSO: sequentially consistent loads do not require
fences
Concurrent List-Based Set
Coarse-Grained Locking
Lock Coupling
hold at most two locks at a time
* interleave lock acquisitions/releases pair-wise
* read/write locks allow for concurrent readers
+ easy to implement
+ no restarts
− does not scale
Optimistic Lock Coupling
Terminology: Non-Blocking Algorithms
Lock-Free List
Lazy
+ usually only one traversal necessary
+ scalable
− insert/remove have to lock
Locks
Memory Reclamation for Lock-Free Data Structures
after deleting a node in a lock-free data structure, lock-free readers might
still be accessing that node
* freeing/reusing that node immediately can cause correctness problems
and/or crashes
Reference Counting
Epoch-Based Memory Reclamation
+ fairly low overhead, scales well
+ no need to change data structure code (simply wrap operations)
− a single slow/blocking thread may prevent any memory reclamation
Traditional Hazard Pointers
Pointers for lock free operations
Implementing Shared Concurrent Counters
ART: lock coupling
ART: optimistic lock coupling
ART with ROWEX
+ wait-free lookups
− much more complicated than OLC, requires invasive changes
B tree: lock coupling
must detect root node changes (e.g., additional lock)
* structure modification operations may propagate up multiple levels:
* eagerly split full inner nodes during traversal (good solution for fixed-size keys)
* restart operation from the root holding all locks
B tree: optimistic lock coupling
B-link tree
B-tree Contention Split
for skewed workloads, lock contention at some lock can become a problem
* contention may be caused by page granularity rather than one very hot entry
* contention split optimization: if lock cannot be acquired on first try, with a
certain probability split node even if it is not full