Performance and Optimization Flashcards

(16 cards)

1
Q

How do you frame SQL Server performance tuning in an interview — what planes do you triage, and where does infrastructure fit for a cloud SRE?

A

Three planes: instance (resources and engine defaults: max server memory, MAXDOP, tempdb, storage), schema (indexes, keys, types, partitioning, statistics strategy), query / application (T-SQL shape, sargability, ORM patterns, parameter sniffing, transaction length). A fourth layer — platform / infrastructure (VM size, RDS parameter groups, network RTT) — is fair to call out when waits point at IO or CPU limits outside the engine. Panel line: “I classify where the lever lives — config, structures, or workload — and I fold in capacity when metrics say it’s not SQL-only.”

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

What are wait stats, which DMVs do you use, and why don’t waits alone tell the whole story?

A

Waits show what sessions or the instance are waiting on (disk, scheduler, locks, latches, network). Instance-wide: sys.dm_os_wait_stats (reset on restart or DBCC SQLPERF); live sessions: sys.dm_exec_requests (wait_type, wait_time, last_wait_type). Dominant waits hint at IO (PAGEIOLATCH_*, WRITELOG), CPU/parallelism (CXPACKET, CXCONSUMER, SOS_SCHEDULER_YIELD), memory (RESOURCE_SEMAPHORE), locks (LCK_M_*). You still pair waits with queries, plans, and counters — waits are a compass, not a diagnosis by themselves.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

How do you use sys.dm_db_missing_index_* without index sprawl?

A

Missing-index DMVs (and Query Store hints in tools) suggest candidates where seeks/scans could be cheaper — but every index has write, storage, and maintenance cost; duplicate or narrow indexes can hurt. Correlate suggestions with top resource queries, execution plans (missing-index XML), cardinality, and selectivity. Prefer fewer well-chosen indexes over accepting every DMV row.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

What do MAXDOP and cost threshold for parallelism control, and what goes wrong when they’re mis-set?

A

MAXDOP caps how many processors parallel work may use (scope rules vary by version/edition context). cost threshold for parallelism is the estimated cost above which the optimizer may parallelize. Too high MAXDOP → overhead and waits like CXPACKET / CXCONSUMER; too low → underused CPU on large analytics-style work. Many orgs set MAXDOP from core/NUMA guidance; MAXDOP 1 appears for some OLTP-heavy small VMs or targeted hints — validate with Query Store and actual plans, not folklore alone.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

How do you reason about SQL Server memory on a busy host — buffer pool, max server memory, and signals like PLE?

A

The buffer pool caches data pages; pressure often shows as IO waits or lazy writer activity. max server memory should leave headroom for the OS, other instances, and non-buffer use (plan cache, threads, linked servers, etc.); min server memory limits sudden shrinkage. PLE (page life expectancy) is a rough signal — working set and storage speed matter; don’t treat a single PLE number as an SLA. On Windows, LPIM (Locked Pages In Memory) reduces paging of SQL’s working set (privilege + policy).

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Why does tempdb matter for performance, and what operational choices reduce contention?

A

tempdb is shared: sorts, hashes, worktables, row versions for RCSI / snapshot isolation, and internal objects. Allocation and metadata contention still appears on busy systems despite version improvements. Use enough data files with same initial size and growth on fast storage; follow current Books Online guidance for file count per version (historically “one per logical CPU up to 8” was a starting point — say you follow current Microsoft guidance). Avoid long-lived user objects in tempdb unless intentional.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Define blocking in SQL Server terms and when it becomes an incident-level problem.

A

Blocking is normal short-term: session B waits on a lock session A holds in a conflicting mode until A releases (commit/rollback) or compatibility allows progress. It’s a problem when waits are long or cascade (many sessions behind one head blocker). Common drivers: long transactions, missing/wrong indexes (scans → more/larger locks), lock escalation, over-strong isolation (REPEATABLE READ / SERIALIZABLE). Panel line: “I find the head blocker, what it’s running, and how long the transaction is open — then I reduce lock duration and footprint.”

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Which DMVs and patterns do you use to diagnose blocking in production?

A

sys.dm_exec_requests: blocking_session_id, wait_type, wait_time. sys.dm_tran_locks for who holds what. sys.dm_exec_sql_text and sys.dm_exec_query_plan for blocked and blocker sessions. Legacy sp_who2 active or Activity Monitor for a quick picture. Often the fix is app-side (transaction boundaries, batch design) as much as an index change.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

What mitigations do you propose for chronic blocking — beyond “add an index”?

A

Root cause: shorten transactions; commit in smaller safe units; fix hotspot updates; avoid SELECT * inside long transactions. Indexes that turn scans into seeks narrow locks and duration. Isolation: where business rules allow, options like READ COMMITTED SNAPSHOT at the database level reduce reader/writer blocking — with awareness of tempdb version store cost. Retries with backoff for transient blocking, not as a band-aid for 30-minute blockers.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Blocking vs deadlock — shape, how SQL Server resolves each, and what the application sees.

A

Blocking: usually a chain (A blocks B, B blocks C…); the waiter waits until the holder finishes. Deadlock: a cycle of lock waits (A→B→A or longer); the engine detects it and picks a deadlock victim, rolls back that work (error 1205). Apps often retry 1205; blocking usually surfaces as slow queries or timeouts. Panel line:Deadlock is circular; SQL kills one victim. Blocking is linear until someone finishes.”

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

What causes deadlocks in real systems, and how do you inspect them?

A

Causes: inconsistent lock order (Proc1 updates A then B, Proc2 updates B then A), same rows touched in different order from different code paths, heavy HOLDLOCK / serializable range locks. Use Extended Events xml_deadlock_report (or SSMS deadlock graph) for resources, processes, and victim; error logs may summarize; Query Store / plan handles tie statements to recurrence.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

How do you mitigate deadlocks without only telling apps to retry forever?

A

Standardize lock order across procedures for the same object set (documented ordering: e.g. Orders then OrderLines). Smaller transactions; indexes so locks are narrow and short. RCSI / snapshot can reduce some reader/writer deadlock classes — writer/writer cycles still happen. Retry 1205 for idempotent operations is normal — still fix a frequent graph.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

When do you escalate a performance issue to application / query owners vs keep it in the DBA/SRE lane?

A

Escalate when the fix is application behavior: N+1 queries, fat WHERE, implicit conversions killing sargability, ORM cartesian patterns, transaction scope, retry storms, cache design — you prove with plans and stats, but code wins. Stay platform-owned for instance config, storage, maintenance (stats, integrity), parameterization / plan stability (RECOMPILE vs forced plans), resource governor if used.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

Name isolation levels and lock escalation at a level you’d use in a senior panel — why they tie to performance.

A

Isolation: READ UNCOMMITTED, READ COMMITTED (default), REPEATABLE READ, SERIALIZABLE, plus snapshot models (ALLOW_SNAPSHOT_ISOLATION, READ_COMMITTED_SNAPSHOT). Stronger isolation → more / longer-held locks → more blocking risk. Lock escalation: SQL may promote many row locks to page or table, increasing blast radius; heavy partitioning or LOCK_ESCALATION options matter in edge cases. You choose isolation to match correctness needs, not defaults by accident.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

Give a tight closing line for a DEEP&T-style reliability interview tying waits, planes, and partnership.

A

“I use waits and plans to classify CPU vs IO vs locks; I own instance and schema levers and propose concrete changes; I partner with engineering when the workload shape or transaction design has to change — with metrics and repro so the conversation stays evidence-based at scale.”

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

What is a DMV?

A

A DMV (dynamic management view) is a server-state catalog view in the sys schema (names like sys.dm_*) that exposes live or cumulative internal metrics: sessions, waits, locks, plans, IO, memory, and more. DMFs (dynamic management functions) are the same family but parameterized (e.g. pass a session or plan handle). They are read-only, often lighter than full traces for ad hoc diagnosis, and reset / scope rules differ by object (e.g. sys.dm_os_wait_stats resets on restart or DBCC SQLPERF). Panel line: “I use DMVs first to see what the engine is doing now and to tie waits to sessions and plans.”