Virtual_Memory_ch41-42_48-50 Flashcards

(12 cards)

1
Q

What is mmap()? What are the different types of mappings and their use cases?

malloc, file-reading, IPC

(Ch 49)

A

mmap() = map a region of virtual address space to either a file or just raw memory. Instead of read/write syscalls, you access content directly through pointers. The kernel handles faulting pages in and out via the page cache.

File-backed Types:
(1) Private file mapping (MAP_PRIVATE): COW copy of file. Writes don’t affect file. Used by: loading executables, read-only shared libraries.
(2) Shared file mapping (MAP_SHARED): changes visible to other processes, written back to file. Used by: shared memory, memory-mapped I/O.

Anonymous Types:
(3) Private (MAP_PRIVATE|MAP_ANONYMOUS): private zero-filled memory. Used by: malloc() for large allocations.
(4) Shared (MAP_SHARED|MAP_ANONYMOUS): shared between parent/child after fork. Used by: IPC without named object.

Key syscalls: mmap(), munmap(), mprotect(), msync().

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

What’s the difference between anonymous memory and file-backed memory? How do they behave under swap pressure?

vm.swappiness

(Ch 49)

A

Anonymous memory has no backing file (no name) - created by malloc/mmap(MAP_ANONYMOUS), includes heap, stack, and private data. File-backed memory maps a file into memory - changes can be written back to disk.

Under memory pressure:
(1) File-backed clean pages can be immediately discarded (re-read from file) because they have a faster recovery path.
(2) File-backed dirty pages must be written back first.
(3) Anonymous pages must be written to swap space - more expensive since swap is typically slower.

vm.swappiness controls the behaviour:
swappiness=0 — almost never swap anonymous pages, strongly prefer dropping file cache;
swappiness=60 (default) — moderate balance, still prefers file-backed eviction;
swappiness=100 — treat anonymous and file-backed pages equally;
swappiness=200 — Recent kernels (5.8+, with cgroup v2 memory controller): prefer swapping anonymous pages over dropping file cache.

file-backed + MAP_PRIVATE means “I want my own copy, don’t touch the original file.” If you wanted to change the file, you’d use MAP_SHARED.

Google uses zswap by default, before actually swapping to disk.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What does memory overcommit mean? Why does Linux allow it? Walk through the different modes.

vm.overcommit_memory

(Ch 49)

A

Trade-off back in the day: Limited memory - Not run the program, or run it with a calculated risk that it might be OOM killed?

Memory overcommit allows Linux to promise more virtual memory than physical RAM + swap available. Modes:

(0, Default) Heuristic - kernel uses heuristics to decide (% of free pages, swap pages, reclamable mem); allows ‘reasonable’ overcommit, rejects obviously excessive requests.
(1) Always overcommit - never refuse memory requests (malloc always succeeds).
(2) Never overcommit - strict accounting, only allocate what’s backed by RAM+swap.

Linux allows overcommit because most allocated memory is never fully used: 20-50%; sparse arrays (most data is 0), lazy initialization, reference locality (spatial, temporal) (pg 118).

The OOM killer handles the case when memory actually runs out.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

What is the OOM killer? Describe the activation path, how it scores processes, and how to protect critical services.

oom_score / oom_score_adj

(Ch 49)

A

Reminder: vm.overcommit_memory controls what happens at mmap/brk time — when a process asks for memory.

Mode 0 is heuristic (allow reasonable overcommit), mode 1 is always allow, mode 2 is strict (never overcommit beyond swap + ratio of physical RAM). That’s the allocation policy.

The OOM killer doesn’t fire because an overcommit ratio was reached. It fires on the page fault path. A process has been promised memory via mmap, and now it actually touches a page (lazy allocation).

  • Page fault → need physical page
  • alloc_pages → try to get free page
  • kswapd wakes (if free < low watermark)
  • Direct reclaim if kswapd not fast enough:
    • Scan LRU lists (or MGLRU generations on 6.1+)
    • Drop clean file pages (free immediately)
    • Flush dirty pages → write to disk → then free
    • Swap anonymous pages to swap device
  • OOM killer — only when all above fails, select victim based on oom_score, send SIGKILL.

Scoring: /proc/PID/oom_score (0-2000, kernel-computed based on memory usage proportion). /proc/PID/oom_score_adj (-1000 to +1000, user-tunable). -1000 = never kill (protect critical services like DB). Kernel picks highest oom_score. Children’s memory counted. Prefers killing fewest processes to free most memory.

0-2000 since Linux 5.9.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

What is mprotect()? How does it relate to memory protection?

(Ch 50)

A

mprotect(addr, len, prot): changes protection on memory pages.

prot flags: PROT_NONE (no access), PROT_READ, PROT_WRITE, PROT_EXEC. Can be ORed together.

Use cases:
(1) Guard pages - PROT_NONE pages to detect stack overflow.
(2) JIT compilation - allocate with PROT_WRITE, fill code, change to PROT_READ|PROT_EXEC.
(3) Copy-on-write implementation.
(4) Memory debugging - make regions read-only to catch unwanted writes.

Accessing memory violating protection → SIGSEGV. W^X (write XOR execute): security policy preventing memory from being both writable and executable simultaneously.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

What are mlock() and mlockall()? When would you use them?

what is sensitive?

(Ch 50)

A

mlock(addr, len): locks pages in RAM, prevents swapping.
mlockall(flags): locks all current (MCL_CURRENT) and/or future (MCL_FUTURE) pages. munlock()/munlockall(): unlock.

Use cases:
(1) Real-time applications - avoid page fault latency during critical sections.
(2) Security - prevent sensitive data (keys, passwords) from being written to swap.
(3) Performance-critical code - ensure hot pages stay resident.

Constraints: requires CAP_IPC_LOCK capability or sufficient RLIMIT_MEMLOCK. Locked pages count against RLIMIT_MEMLOCK. Excessive locking can starve other processes. Check with mincore() whether pages are resident. Best practice: lock only what’s necessary, unlock when done.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Compare fsync(), fdatasync(), and msync() – when would you choose one over the others for data durability?

data vs metadata vs mmap

(Ch 13, 49)

A

write() → data lands in page cache as dirty pages → write() returns → data is NOT on disk. The sync family exists to flush those dirty pages to the storage device.

Trade-off on durability needs vs performance - fdatasync() is often the best compromise.

fsync(fd) - flushes file data AND metadata (size, timestamps, etc.) to disk. Most comprehensive, slowest. Use when: metadata changes matter (new file, size change).

fdatasync(fd) - flushes file data and only metadata needed to retrieve data (like file size). Skips unnecessary metadata updates. Use when: only data integrity matters, not access times. Faster than fsync().

msync(addr, len, flags) - syncs memory-mapped regions to backing file. MS_SYNC = synchronous (blocks), MS_ASYNC = schedule for later. Use when: working with mmap’d files but careful - recovering from msync errors is difficult vs an error from write (not sure which block did not get written to disk).

Page cache diagram: https://lambdafunc.medium.com/understanding-the-linux-page-cache-a-beginners-guide-0f1cc8bdb04d

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

How are shared libraries (.so files) mapped into process memory?

(Ch 41, 42)

A

Shared libraries are memory-mapped into process address space using mmap() with MAP_SHARED. Key points:
(1) Location: mapped between heap and stack in a region called the ‘memory mapping segment’.
(2) Sharing: multiple processes map the same physical pages for code (text segment), saving RAM.
(3) Copy-on-write: data sections are MAP_PRIVATE, so modifications are process-private.
(4) Dynamic linker (ld.so): loads libraries at runtime, resolves symbols.
(5) Position Independent Code (PIC): libraries compiled with -fPIC can be loaded at any address.

View mappings: cat /proc/PID/maps shows library locations. ldd binary shows library dependencies. Benefits: reduced memory usage, shared code pages, easier updates (replace .so without recompiling).

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

What are /proc/PID/maps and /proc/PID/smaps? How do you read them and what extra info does smaps provide?

A

/proc/PID/maps and smaps shows virtual memory mappings and their details.

Use: understand memory layout, find loaded libraries, debug memory issues. /proc/PID/smaps has detailed stats per region.

maps (overview) - Each line:
address, perms, offset, dev, inode, pathname. Example: 00400000-00452000 r-xp 00000000 08:01 393449 /bin/bash.

  • address: start-end virtual addresses.
  • perms: r(read)w(write)x(execute)p(private)/s(shared).
  • offset: file offset (0 for anonymous).
  • dev: major:minor of device.
  • inode: inode number (0 for anonymous).
  • pathname: file path, [heap], [stack], [vdso], [vsyscall], or blank for anonymous.

Common regions:
- text segment (r-xp) - executable code.
- data/bss (rw-p) - initialized/uninitialized data.
- heap (rw-p [heap]) - malloc memory.
- stack (rw-p [stack]) - thread stack.
- libraries (r-xp, r–p, rw-p) - .so files mapped.
- mmap regions - shared memory, mapped files.

smaps (per-mapping detail):
Size, Rss, Pss, Shared_Clean, Shared_Dirty, Private_Clean, Private_Dirty, Referenced, Anonymous, Swap.

Essential for diagnosing which mappings consume actual physical memory vs shared pages. smaps_rollup gives process-wide totals.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

How shared memory is accessible via VFS by processes?

(Ch 14)

A

Shared memory goes through VFS via pseudo-filesystems (tmpfs, shmfs) — the kernel reuses the file/inode/dentry abstractions even though there’s no real disk.

The only difference: under memory pressure they can only go to swap. This is why shared memory and tmpfs consume RAM (or swap), and why df /dev/shm shows memory usage, not disk usage.

SRE relevance: This is why /dev/shm filling up kills applications using POSIX shared memory — it’s just tmpfs running out of space. And why SysV shared memory segments (ipcs -m) survive process death — the inode persists until ipcrm or reboot. Also why you see shared memory in /proc/{pid}/maps as /dev/shm/… or /SYSV… entries — it’s all files.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

What is the lookup flow on memory access?

(add)

A

Memory access lookup flow:
CPU instruction accesses virtual address:

  1. MMU checks TLB → hit? Done (~1-4 cycles, pipelined, effectively invisible)
  2. TLB miss → MMU reads CR3, walks 4 levels of page tables in memory (usually cached in L1/L2/L3)
  3. Pgtable entry found and valid → populate TLB, retry access
  4. Pgtable entry invalid/not present/perms → CPU raises #PF exception → page fault handler

Cost of Step 2+3:
- ~10-30 cycles if page table entries are in L1/L2 cache
- Up to ~200 cycles if in L3
- Up to ~400-600 cycles if hitting DRAM at each level (4 DRAM accesses × ~100-150 cycles each)

Cost of Step 4:
- Hardware trap mechanism (privilege switch, IDT lookup, stack swap): ~100-200 cycles
- Minor fault total: ~1,000-10,000 cycles
- Major fault total: ~3,000,000-10,000,000 cycles. A single disk read is 50-100μs on SSD, 5-10ms on spinning disk. At 3GHz that’s 150K-30M cycles. Catastrophic for latency.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

What are the 3 types of CPU exceptions?

caractheristics: recoverable, intentional, unrecoverable

(add)

A

Intel’s manual actually distinguishes three types of exceptions:

Faults — retryable. The faulting instruction is restarted after the handler runs. Page fault is a fault — after the kernel maps the page, the CPU retries the same load/store instruction.

Traps — non-retryable, RIP points to the next instruction. int 3 (breakpoint) is a trap.

Aborts — unrecoverable. Double fault, machine check. Usually fatal.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly