Before you patch production SQL Server (AG, FCI, or standalone), what process and safety steps do you insist on?
Test on a clone or lower environment; read release notes for regressions; validate agent jobs, linked servers, SSIS, and logins after the change. For AG/FCI, follow a rolling order (secondaries first, then planned failover, then former primary — details in clustering runbooks). On RDS, lean on maintenance windows, Multi-AZ rolling behavior, and snapshots. On EC2, pair changes with backup/snapshot and post-patch validation queries. Document and automate what repeats.
For Always On with two or more replicas, is rolling patching “zero downtime”? How do you answer in a panel?
You can get very low downtime for writes with a listener and planned failover, but true zero for every client, every read path, and every in-flight transaction is not a safe promise. During failover, sessions may disconnect briefly and in-flight work may roll back or need retry; while one node is down you may have reduced redundancy (e.g. two-node AG = no spare if the other fails). Panel line: “I call it rolling / minimal downtime — seconds for writes if apps reconnect; I drain readable secondaries before patch; I don’t guarantee zero for all connections.”
What does rolling patch on a multi-replica AG usually look like operationally?
Patch every secondary first (sequencing may be one-at-a-time vs parallel depending on capacity and risk appetite). Each restart removes that replica’s read capacity until it’s back — redirect reporting if needed. When secondaries are done, manual failover to a patched node when healthy and caught up (mind sync mode for RPO). Patch the former primary last. Tune DNS/listener, MultiSubnetFailover, and app resiliency for the failover blip.
In-place upgrade vs new server (side-by-side) + migrate — contrast rollback, downtime shape, and risk.
In-place: run Setup on the existing instance (within supported hop matrix). Rollback is hard — uninstall/rebuild or restore from backup; often one long maintenance window on that host; all-or-nothing risk on the live instance. Side-by-side: new OS + instance, move data (backup/restore, AG/replication/log shipping, etc.). Rollback is easier — keep old server until sign-off; cutover can be short if you synced ahead. Prefer side-by-side for big jumps, OS refresh, unknown legacy, or when you want a blue/green mental model.
When would you choose in-place vs side-by-side for a major SQL Server change?
In-place when the upgrade path is supported, the estate is simple, you’ve proven the build in lower envs, and you accept painful rollback — still with solid backups. Side-by-side when you need new Windows/hardware/EBS layout, longer validation without touching prod daily, shorter final cutover, easier escape hatch, multi-app phased moves, or provider/cloud moves. Panel line: “In-place for supported + understood risk; side-by-side for clean stack, parallel test, or easy fail-back.”
After any upgrade path, what database-level follow-ups do you plan beyond “it starts”?
Revisit COMPATIBILITY_LEVEL deliberately (not necessarily “latest” on day one). Establish or refresh Query Store baseline and regression tests. Watch waits, plan shape, and deprecated feature warnings. Align monitoring and runbooks with the new version’s behaviors.
Can Always On replicas run different SQL Server major versions indefinitely?
No. Mixed versions in one AG are only supported as a short-lived state during a documented rolling major-version upgrade (or related patterns like distributed availability groups where appropriate). An upgraded primary cannot ship logs to older secondaries — so you upgrade secondaries first, fail over to a synchronous upgraded replica when SYNCHRONIZED, then upgrade the ex-primary. Do not leave version skew as steady state.
During a rolling major-version upgrade on AG, what happens to readability on secondaries mid-transition?
After the first failover to a newer SQL Server, databases go through the upgrade step — Microsoft notes there are no readable secondaries for those DBs during that upgrade period. Treat the window as restricted for reporting and finish the sequence quickly per supported guidance.
On RDS for SQL Server Multi-AZ, who runs Setup, and what is the high-level engine upgrade sequence AWS describes?
AWS orchestrates the change — you don’t RDP and run Setup yourself. For Multi-AZ (mirroring or Always On–style under the hood), a major upgrade is roughly: remove secondary → upgrade it → restore transaction log backups accumulated while disconnected → rejoin and sync → failover to upgraded side → upgrade former primary and re-pair. User-visible impact is roughly failover-length disconnects plus any crash recovery — plan app retries and MultiSubnetFailover to the listener endpoint where applicable.
RDS SQL Server — how do major vs minor engine upgrades differ in who initiates and automation?
Major version upgrades are manual — you modify the instance to the new major after testing (e.g. snapshot restore / clone to non-prod per AWS guidance). Minor upgrades are backward-compatible; you either apply when you choose or enable automatic minor version upgrade so eligible minors land in the maintenance window — useful for security/CU-style churn at fleet scale.
What RDS gives you for rollback after an engine upgrade, and what won’t work?
With backup retention > 0, RDS typically takes two DB snapshots — before and after the upgrade. You cannot downgrade the engine in place on the same instance; rollback means restore the pre-upgrade snapshot to a new instance (or your org’s PITR/snapshot workflow). Plan connection strings and cutover accordingly.
Does RDS Blue/Green apply to RDS for SQL Server the way it does for some other engines?
No — per AWS’s published engine matrix, Blue/Green deployments are not supported for RDS for SQL Server. For a “green” cutover you use patterns like restore snapshot to new instance, DMS, or application-level migration — not native RDS blue/green for SQL Server.
What AWS-recommended operational habits reduce pain during RDS SQL Server upgrades?
Schedule during low write volume; avoid long transactions and heavy log churn so catch-up is faster; tune log / VLF health ahead of time. Expect parameter group and option group work on majors — often new groups, not blind reuse. Read current read replica behavior for your engine version (replicas may be tied to primary upgrade sequencing).
Tie it together: how does patching/upgrades intersect with backup/restore discipline for a Sr. DBRE?
Pre-change backups or snapshots anchor rollback and forensics; AG rolling and RDS rolling both assume healthy replication and log catch-up — same muscle as RPO/RTO thinking. Side-by-side upgrades often lean on backup/restore, log shipping, or sync patterns you already operate. Panel line: “I don’t separate change from recoverability — every major move has a tested back-out path and validated cutover checks.”