What are the 12 Linux/Systems Internals Flows?
The mental model:
- TLB lookups/misses, syscalls, interrupts, and context switches are the heartbeat (always happening).
- Page faults and network/block I/O are the breathing (workload-driven).
- Everything else is “occasional” to “rare”.
From most to least frequent on a busy production server:
- TLB misses — millions/sec to 10s of millions.
- Syscall flow — millions/sec. Every read, write, open, close, epoll_wait. The most frequent transition in the kernel by far.
- Interrupt flow — hundreds of thousands/sec. Timer ticks (250-1000 Hz × cores), NIC interrupts, disk completions, IPI (inter-processor interrupts). On a busy network box, NAPI (1 int = batches of packets) reduces count, still massive.
- Context switch flow — tens of thousands to hundreds of thousands/sec. Every time a process blocks or its timeslice expires. A busy web server: 50K-200K/sec easily.
- Page fault flow — thousands to tens of thousands/sec. Mostly minor faults (COW, lazy allocation, mmap first-touch). Major faults (actual disk I/O) should be rare on a healthy system — if they’re not, something’s wrong.
- Network packet flow — thousands to millions/sec depending on workload. A 10Gbps NIC doing small packets can hit 14M pps. Each packet traverses the full protocol stack.
- Block I/O flow — hundreds to tens of thousands/sec. SSDs handle ~100K-500K IOPS. Mostly hidden by page cache — on a well-cached system, very few reads actually hit disk.
- VFS path walk + page cache flow — thousands/sec. Every open() walks the path. But dcache makes repeated lookups nearly free (just pointer chasing in memory), so the expensive walks (actual directory reads) are much rarer.
- Signal flow — tens to hundreds/sec normally. SIGCHLD from child exits, SIGALRM from timers, the occasional SIGHUP reload. Spikes during process storms. Low frequency in steady state.
- Process lifecycle flow — tens to hundreds/sec. Each HTTP request in Apache prefork = fork+exit. But most modern servers use persistent processes/threads, so this drops. Container orchestration (K8s) adds some.
- Memory reclaim flow — ideally near zero. kswapd wakes occasionally to maintain free page watermarks. If direct reclaim is firing frequently, you’re in trouble. OOM killer is a once-in-a-crisis event.
- Boot flow — once. Ever. Until the next kernel panic or planned reboot. Months apart in well-run production.
+1 Informational:
- Device hotplug flow — rare in production. Maybe network link flap, USB token, cloud disk attach. Datacenter servers: a few events per day or less. Exception: SR-IOV VF creation in cloud environments.
Costs of kernel transitions
Shortcut correlation: ns → μs → ms. Three 100x cliffs.
Orders of magnitude to remember:
TLB miss: ~10ns
Syscall: ~100ns (10x TLB miss)
Interrupt: ~1-5μs (10-50x syscall - hardirq + softirq)
Ctx switch: ~1-5μs (10-50x syscall)
/ ~3-8μs (processes: CR3+TLB flush)
Minor fault: ~1-5μs (same ballpark as context switch)
Maj. fault SSD: ~0.1ms (100x context switch)
Maj. fault HDD: ~10ms (100x SSD)Interview-friendly version:
- TLB miss and Syscalls are nanoseconds
- Interrupts, Context switches and Minor faults are microseconds
- Major faults are milliseconds.
Three orders of magnitude between cheapest and most expensive.
The KPTI tax:
- Before Meltdown (2018), a trivial syscall was ~50ns. KPTI added a page table switch on every kernel entry/exit, roughly doubling syscall cost.
- Spectre mitigations (retpolines, IBRS) added more.
- Modern syscalls are ~2-3x more expensive than pre-2018. This is why vDSO matters — gettimeofday() and clock_gettime() never enter the kernel at all, they read a shared page mapped into userspace.
TLB miss + Page table walk flow
Freq: 0 Cost: 10ns (10 cycles @1Ghz) + minor/major fault
TLB misses: millions to tens of millions/sec (10-100x more than Syscalls)
Multi-level look-up tree like the Filesystem walk to indirect blocks. Actually, before that look-up, there is PageWalkCache (PWC) to look at (before L1/2/3/RAM).
CPU executes instruction that accesses virtual address:
Huge pages shortcuts the walk:
- 4KB pages: CR3 → PGD → PUD → PMD → PTE (4 levels)
- 2MB pages: CR3 → PGD → PUD → PMD (3 levels, PMD → 2MB frame)
- 1GB pages: CR3 → PGD → PUD (2 levels)
Very fast but it adds up. That’s why hugepages (2MB) matter for DBs.
Syscall (mode switch) flow
Frequency: 1 - Logical steps: 3, Cost: ~100ns (10x TLB miss)
Syscall (mode switch) flow (millions/s):
Interrupt flow
Frequency: 2 - Logical steps: 3, Cost: ~1-5μs (10-50x syscall)
Interrupt flow (hundreds of thousands/s)
Context switch flow
Frequency: 3 - Logical steps: 3, Cost: ~1-5μs (10-50x syscall)
Context switch flow (tens-k to hundreds of thousands/sec)
Page fault flow
Freq: 4 - Logical steps: 3, Cost: ~1-5μs (minor); ~100μs to 10ms (major)
Page fault flow (thousands to tens of thousands/sec)
Network packet flow
Freq: 5 - Logical steps: 4, Cost: ~5-15μs (NIC to buffer)
Network packet receive flow (thousands to millions/sec)
Block I/O flow
Freq: 6 - L.Steps: 4, Cost: ~3-10μs (doorbell) + 50μs-15ms (device)
Block I/O flow (hundreds to tens of thousands/sec)
VFS path walk flow
Freq: 7 - L.Steps: 3, Cost: ~1-3μs (cached) + bio if not
VFS path walk flow (thousands/sec)
Signal flow
Freq: 8 - Steps: 3, Cost: ~1-2μs (similar to ints; excl. handler)
Signal flow (tens to hundreds/sec)
Process lifecycle flow
How/when a new process starts to be executed?
Freq: 9 - Steps: 4, Ctx switch Cost vs thread: ~3-8μs / ~1-3μs
Process lifecycle flow (tens to hundreds/sec)
Memory reclaim flow
kswapd BG, direct blocking -> called stalls
Freq: 10 - Steps: 3, Cost: ~10-100μs to ~1-10ms
Memory reclaim flow (ideally near zero)
Direct reclaim (clean): ~10-100μs ← process stalls here
Direct reclaim (dirty): ~1-10ms ← disaster territory
Boot flow
Freq: 12 -
Boot flow (once)
Device hotplug flow
Just informational
Device hotplug flow (most likely zero)
Removal is the reverse: hardware signals detach → driver’s ->remove() called → tear down interfaces → release resources → remove sysfs entries → udevd cleans up /dev/ nodes.