Define a side channel.
An unintended information leakage path (e.g., timing, power, cache state) that reveals secrets without reading them directly.
What makes caches powerful side channels?
Access latency depends on whether data is cached (hit) or not (miss). Attackers measure timing differences to infer victim memory accesses.
Cache line size on x86_64 (typical)?
64 bytes. The lowest 6 address bits are the line offset.
Page size (typical) and consequence for indexing?
4 KiB pages; the lowest 12 bits are the page offset (includes the 6-bit line offset). Many L1/L2 set-index bits lie within these 12 bits.
Define cache associativity.
Number of lines per set. An N-way cache can hold N distinct lines per set before evicting.
Number of sets formula.
(#cache bytes) / (line size × associativity).
Which levels are per-core vs shared?
L1 and L2 are per-core; L3 (LLC) is shared among cores (often split into slices hashed by physical address).
Inclusive vs non-inclusive caches.
Inclusive: L1⊆L2⊆L3; evicting from L3 invalidates copies in upper levels. Non-inclusive (or mostly-inclusive) has no strict subset relation.
Do loads fill all cache levels?
Typically yes (on inclusive hierarchies): a miss fills L3→L2→L1. Later evictions can leave a line only in L2 or L3.
Define Prime+Probe (high level).
Prime: fill a set. Victim runs. Probe: time reaccesses; slower lines indicate eviction by victim → reveals which set the victim used.
Define Flush+Reload (high level).
Flush: invalidate a specific shared line (e.g., clflush). Victim runs. Reload: time access; fast=the victim reloaded it, slow=not touched.
When do you use Flush+Reload?
When attacker and victim share physical pages (shared libraries, shared mmap). Gives high spatial resolution.
When do you use Prime+Probe?
When no shared memory exists. You only need conflicting (same-set) addresses; works cross-process and cross-VM with care.
How to time memory on x86_64?
Use rdtsc/rdtscp with fences. Example: lfence; rdtsc before and rdtscp; lfence after a volatile load.
Why fences around rdtsc?
To serialize instructions so timing boundaries don’t get reordered by the CPU.
Hit vs miss timing (typical ballpark).
L1: ~4–5 cycles; L2: ~10–20; L3: ~30–60+; DRAM: ~100–300+. Measure on your CPU to set thresholds.
How to empirically pick a hit/miss threshold?
Measure cold (after clflush) and warm (repeated access) latencies; set threshold ~ midpoint, or use distributions and cluster.
Define conflicting addresses.
Different virtual/physical addresses mapping to the same cache set (and same slice for LLC), competing for associativity.
How to generate same-set addresses for L1/L2?
Use identical page offsets across many pages (e.g., base + k*4096 + offset where offset is multiple of 64).
Why does LLC (L3) need extra care?
LLC is physically indexed and sliced by a hash; set index uses bits above page offset. You need physical info or measurement-based grouping.
Simple conflict test idea.
Warm A; hammer candidate B; time A. If A slows (miss) more often than baseline, A and B likely conflict (same set/slice). Repeat and vote.
Eviction set definition.
A set of ≥ associativity addresses mapping to the same set (and slice) so that accessing them evicts any victim line from that set.
Greedy eviction-set refinement.
Build a large candidate pool; remove one address at a time and test if the victim still gets evicted. Keep only necessary addresses until ≈ associativity.
Why many samples?
Microarchitectural noise, OS interrupts, and prefetchers add variance. Aggregation (median/mean) yields reliable classification.