insert and remove are lock-free, contains is wait-free remove: marks node for deletion, but does not physically remove it marker is stored within the next pointer (by stealing a bit of the pointer) contains traverses marked nodes (but needs same check as Lazy variant) + contains always succeeds + scalable − insert/remove may restart

contains is wait-free insert/remove traverse list only once (as long as there is no contention) add marker to each node for logically deleting a key + usually only one traversal necessary + scalable − insert/remove have to lock

locks are usually implemented using atomics waiting and suspending threads is an interesting problem 4 tradeoff between speed on ’fast path’ and resource usage on ’slow path’

associate every node with an atomic counter effectively results in similar behavior as locks + easy to use − does not scale

Synchronization Flashcards by Oleksandra Talalaieva

Cache Organization

caches are organized in fixed-size chunks called cache lines
* on most CPUs a cache line is 64 bytes
* data accesses go through cache, which is transparently managed by the CPU
* caches implement a replacement strategy to evict pages

How well did you know this?

Not at all

Perfectly

Cache Coherency Protocol

CPU manages caches and provides the illusion of a single main memory
using a cache coherency protocol

How well did you know this?

Not at all

Perfectly

Performance Implications of Cache Coherency

reads result in copies of data in local caches
writes invalidate cache lines on other cores
communication between cores is fairly expensive (happens through shared
L3 or main memory)

How well did you know this?

Not at all

Perfectly

x86-64 Memory Model

x86-64 implements Total Store Order (TSO), which is a strong memory model
* this means that x86 mostly executes the machine code as given
* loads are not reordered with respect to other loads, writes are not reordered
with respect to other writes
* however, writes are buffered (in order to hide the L1 write latency), and reads
are allowed to bypass writes
* a fence or a locked write operations will flush the write buffer (but will not
flush the cache)
* important benefit from TSO: sequentially consistent loads do not require
fences

How well did you know this?

Not at all

Perfectly

Concurrent List-Based Set

keys are stored in a (single-)linked list sorted by key
head and tail are always there (unendlichkeit elements)

How well did you know this?

Not at all

Perfectly

Coarse-Grained Locking

use a single lock to protect the entire data structure
+ very easy to implement
− does not scale at all

How well did you know this?

Not at all

Perfectly

Lock Coupling

hold at most two locks at a time
* interleave lock acquisitions/releases pair-wise
* read/write locks allow for concurrent readers
+ easy to implement
+ no restarts
− does not scale

How well did you know this?

Not at all

Perfectly

Optimistic Lock Coupling

associate lock with update counter
write:
acquire lock (exclude other writers)
increment counter when unlocking
do not acquire locks for nodes that are not modified (traverse like a reader)
read:
do not acquire locks, proceed optimistically
detect concurrent modifications through counters (and restart if necessary)
can track changes across multiple nodes (lock coupling)
+ easy to implement
+ scalable
− restarts

How well did you know this?

Not at all

Perfectly

Terminology: Non-Blocking Algorithms

killing a thread at any point of time should not affect consistency of the data
structure (this precludes locks)
non-blocking data structures may be beneficial for (soft) real-time
applications
classification (from strong to weak):
wait-free: every operation is guaranteed to succeed (in a constant number of
steps)
lock-free: overall progress is guaranteed (some operations succeed, while
others may not finish)
obstruction-free: progress is only guaranteed if there is no interference from
other threads

How well did you know this?

Not at all

Perfectly

Lock-Free List

insert and remove are lock-free, contains is wait-free
remove: marks node for deletion, but does not physically remove it
marker is stored within the next pointer (by stealing a bit of the pointer)
contains traverses marked nodes (but needs same check as Lazy variant)
+ contains always succeeds
+ scalable
− insert/remove may restart

How well did you know this?

Not at all

Perfectly

Lazy

contains is wait-free
insert/remove traverse list only once (as long as there is no contention)
add marker to each node for logically deleting a key

+ usually only one traversal necessary
+ scalable
− insert/remove have to lock

How well did you know this?

Not at all

Perfectly

Locks

locks are usually implemented using atomics
waiting and suspending threads is an interesting problem 4
tradeoff between speed on ’fast path’ and resource usage on ’slow path’

How well did you know this?

Not at all

Perfectly

Memory Reclamation for Lock-Free Data Structures

after deleting a node in a lock-free data structure, lock-free readers might
still be accessing that node
* freeing/reusing that node immediately can cause correctness problems
and/or crashes

How well did you know this?

Not at all

Perfectly

Reference Counting

associate every node with an atomic counter
effectively results in similar behavior as locks
+ easy to use
− does not scale

How well did you know this?

Not at all

Perfectly

Epoch-Based Memory Reclamation

global epoch counter (like timestamp, but incremented infrequently)
per-thread, local epoch counters
before every operation: enter epoch by setting local epoch to global epoch
after every operation: set local epoch to ∞ indicating that this thread is not
accessing any node

+ fairly low overhead, scales well
+ no need to change data structure code (simply wrap operations)
− a single slow/blocking thread may prevent any memory reclamation

How well did you know this?

Not at all

Perfectly

Traditional Hazard Pointers

Study These Flashcards

Pointers for lock free operations

observation: most lock-free operations only require a bounded number of
node pointers at any point in time: hazard pointers
during traversal, one can store these into a limited number of thread-local
locations
hazard pointers effectively announce to other threads “I am currently
accessing this node, do not reclaim it”
after storing the hazard pointer, one may need to check reachability again
before physically reclaiming a node, check if any thread has a hazard pointer
referencing that node
normally the hazard pointer stores have to be sequentially consistent
+ non-blocking
− high overhead: stores to HP must be sequentially-consistent
− invasive: requires some data structure changes

Implementing Shared Concurrent Counters

Study These Flashcards

alternative 1: shared global atomic counter
alternative 2: per-thread counters, add up on demand
alternative 3: approximate counter storing the logarithm of the counter value
(incremented probabilistically)

ART: lock coupling

Study These Flashcards

easy to apply to ART
modifications only change 1 node and (potentially) its parent
additional lock for root note changes
can use read/write locks to allow for more concurrency

ART: optimistic lock coupling

Study These Flashcards

fairly straightforward: add lock and version to each node
how to detect that root node changed?
1. extra optimistic lock for root pointer (outside of any node)
2. always keep the same Node256 as root node (similar to static head element in
list-based set)
3. after reading the version of the current root, validate that the root pointer still
points to this node

ART with ROWEX

Study These Flashcards

+ wait-free lookups
− much more complicated than OLC, requires invasive changes

B tree: lock coupling

Study These Flashcards

must detect root node changes (e.g., additional lock)
* structure modification operations may propagate up multiple levels:
* eagerly split full inner nodes during traversal (good solution for fixed-size keys)
* restart operation from the root holding all locks

B tree: optimistic lock coupling

Study These Flashcards

writers usually only lock one leaf node (important for scalability)
additional issues due to optimism:
infinite loops: one has to ensure that the intra-node (binary) search terminates
in the presence of concurrent modifications
segmentation faults/invalid behavior: a pointer read from a node may be
invalid (additional check needed)
OLC is a good technique for B-trees

B-link tree

Study These Flashcards

B-tree synchronization with only a single lock at a time (no coupling)
observation: as long as there are only splits (no merges), the key that is
being searched may have moved to a right neighbor
solution: add next pointers to inner and leaf nodes, operations may have to
check neighbor(s)
fence keys may help determining whether it is necessary to traverse to
neighbor
the B-Link tree idea was very important when data was stored on disk but is
also sometimes used for in-memory B-tree synchronization (

B-tree Contention Split

Study These Flashcards

for skewed workloads, lock contention at some lock can become a problem
* contention may be caused by page granularity rather than one very hot entry
* contention split optimization: if lock cannot be acquired on first try, with a
certain probability split node even if it is not full

Bw tree

Latch-free B+Tree index → Threads never need to set latches or block. Key Idea #1: Deltas → No updates in place → Reduces cache invalidation. Key Idea #2: Mapping Table → Allows for CAS of physical locations of pages.

Bw tree

bw-tree: structure modifications

Split Delta Record → Mark that a subset of the base page’s key range is now located at another page. → Use a logical pointer to the new page. Separator Delta Record → Provide a shortcut in the modified page’s parent on what ranges to find the new page.

Lock-Free Hash Table: Split-Ordered List

all entries are stored in a single lock-free list (see lock-free list-based set) sorted by the hash value of the key

Recursive Split Ordering

key idea: store keys ordered by bit-wise reverse hash

Deletion

* problem: deleting a node using CAS pointed to from a bucket does not work (because it is also being pointed to from the list) * solution: for every split bucket, insert a special sentinel node into the list

Synchronization Flashcards

(30 cards)