What is the RADIO framework for System Design interviews?
Requirements (5-10 min: functional=what system does + non-functional=scale-latency-availability-consistency) + Architecture (10 min: high-level diagram - identify major components) + Data model (5-10 min: entities - schema - DB choice) + Interface/API (5 min: key endpoints + request/response) + Optimizations (15-20 min: deep dive on bottlenecks - caching - sharding - trade-offs). Drive the conversation. State assumptions. Clarify before designing. Never jump to solutions without requirements
What questions do you ask at the start of a System Design interview?
Functional: Who are the users? What are the core features? What does the system do? What does it NOT need to do? Non-functional: How many users (DAU/MAU)? Read-heavy or write-heavy? What is acceptable latency? What availability SLA is required (99.9% vs 99.999%)? Is strong consistency required or eventual consistency OK? What is the data retention period? Any geographic distribution? Mobile or web or both? Starting with requirements shows senior engineering thinking
What numbers should every System Design candidate memorize?
Latency: L1 cache=1ns + L2 cache=10ns + RAM=100ns + SSD random read=100µs + HDD=10ms + same-DC network=1ms + cross-continent=150ms. Storage: 1 char=1B + 1 tweet=300B + 1 photo=1MB + 1 video-min=10MB + 1 song=5MB. Throughput: 1 server handles 1K-10K req/s. Scale math: 100M DAU * 10 actions/day = 1B req/day = ~11K req/s. Data volume: 1M users * 1KB/user = 1GB. 1B * 1KB = 1TB. Bandwidth: 1Gbps = 125MB/s
How do you estimate scale in a System Design interview?
DAU (Daily Active Users) -> requests per second: DAU * actions_per_day / 86400. Storage: users * data_per_user * retention_period. Bandwidth: req/s * avg_response_size. Example: Twitter 100M DAU * 10 tweets/day = 1B tweets/day = 11.5K writes/sec. Read:write = 100:1 -> 1.15M reads/sec. Storage: 1B tweets/day * 300 bytes = 300GB/day = 110TB/year. Always round numbers. Show your math. Peak traffic = 2-3x average
What is the difference between Latency and Throughput?
Latency: time to complete ONE request (milliseconds). Lower is better. P50 (median) + P95 + P99 (tail latency) are key metrics. Throughput: number of requests completed per unit time (requests/second). Higher is better. They interact: high throughput often increases latency (queuing). Low latency requires fast processing per request. Design target: minimize latency for user-facing APIs (P99 < 100ms) + maximize throughput for batch processing
What is SLA - SLO and SLI?
SLI (Service Level Indicator): actual measurement (request latency + error rate + availability percentage). SLO (Service Level Objective): internal target for SLI (P99 latency < 200ms + error rate < 0.1% + availability > 99.9%). SLA (Service Level Agreement): external contract with customers including penalties for violations. Hierarchy: SLA >= SLO (internal targets must be stricter than customer commitments). Error budget: 100% - SLO = allowed downtime (99.9% SLO = 8.7h error budget/year)
What are the main non-functional requirements to consider in System Design?
Availability (what % uptime: 99.9%=8.7h downtime/yr + 99.99%=52min + 99.999%=5min). Scalability (handle 10x growth without redesign). Latency (P99 response time target). Throughput (requests per second). Consistency (strong vs eventual). Durability (data never lost: replication + backups). Fault Tolerance (continue operating despite failures). Security (auth + encryption + rate limiting). Maintainability (easy to change + monitor). Cost efficiency
MEMORY BOOSTER: System design approach
RADIO: Requirements -> Architecture -> Data model -> Interface/API -> Optimizations. Ask first: DAU + read/write ratio + latency requirement + consistency need + availability SLA. Scale math: DAU * actions / 86400 = req/s. Latency hierarchy: L1(1ns) < RAM(100ns) < SSD(100µs) < network(1ms) < HDD(10ms) < cross-continent(150ms). SLI=measurement. SLO=internal target. SLA=customer contract. Peak = 2-3x average. Always state trade-offs. Drive the conversation
What are the core building blocks of every large-scale system?
DNS (domain -> IP resolution) + CDN (static content + edge caching) + Load Balancer (distribute traffic + health checks) + API Gateway (auth + rate limit + routing) + Stateless App Servers (horizontally scalable) + Cache (Redis/Memcached - reduce DB load) + Message Queue (Kafka/SQS - async + decoupling) + Primary Database (writes) + Read Replicas (scale reads) + Object Storage (S3 - files/images/videos) + Search Engine (Elasticsearch) + Monitoring (metrics + logs + traces)
What is a Load Balancer and what algorithms does it use?
Load Balancer distributes incoming traffic across multiple servers. Algorithms: Round Robin (rotate through servers - simple) + Weighted Round Robin (more traffic to powerful servers) + Least Connections (route to server with fewest active connections - best for long-lived connections) + IP Hash (same client always goes to same server - session stickiness) + Random. Types: Layer 4 (TCP/UDP - fast - no content inspection) + Layer 7 (HTTP - content-based routing - can read headers/cookies/URLs). Health checks remove failed servers automatically
What is a CDN (Content Delivery Network)?
CDN is a globally distributed network of edge servers that cache content close to users. Reduces latency (serve from nearest PoP) + reduces origin server load + improves availability. Content types: static (images - CSS - JS - videos - constant) + dynamic (can be cached with short TTL or with edge compute). Cache invalidation: TTL expiry + manual purge + versioned URLs (main.v2.js). Examples: CloudFront + Akamai + Cloudflare. CDN absorbs DDoS attacks. Use CDN for: any content served globally + large file downloads + streaming
What is an API Gateway and what does it do?
API Gateway is a single entry point for all client requests. Handles: authentication + authorization + rate limiting + SSL termination + request routing to microservices + request/response transformation + load balancing + logging + caching + protocol translation (HTTP->gRPC) + API versioning + circuit breaking. Examples: AWS API Gateway + Kong + NGINX + Apigee. Prevents clients knowing internal service topology. Single place to enforce cross-cutting concerns. Potential bottleneck: must be highly available + scaled independently
What is the difference between Forward Proxy and Reverse Proxy?
Forward Proxy: sits in front of CLIENTS. Client -> Forward Proxy -> Internet. Use cases: anonymize clients + bypass geo-restrictions + corporate internet filtering + cache responses for internal clients. Reverse Proxy: sits in front of SERVERS. Internet -> Reverse Proxy -> Servers. Use cases: load balancing + SSL termination + caching + DDoS protection + hide internal server topology. Nginx and HAProxy act as reverse proxies. CDN is a distributed reverse proxy
What is DNS and how does it work?
DNS (Domain Name System) translates domain names to IP addresses. Hierarchy: root servers -> TLD servers (.com .org) -> authoritative name servers. DNS resolution: browser checks local cache -> OS cache -> ISP resolver -> root -> TLD -> authoritative. TTL controls cache duration. DNS record types: A (domain->IPv4) + AAAA (domain->IPv6) + CNAME (alias to another domain) + MX (mail server) + TXT (verification). DNS-based load balancing: multiple A records for same domain. Failover: change A record when server fails
What is the difference between TCP and UDP?
TCP: connection-oriented + reliable (ACK + retransmit) + ordered delivery + flow control + congestion control + higher latency. Use for: web (HTTP/HTTPS) + email + file transfer + databases. UDP: connectionless + unreliable (no ACK) + unordered + no flow control + lower latency + lower overhead. Use for: video streaming (prefer speed over retransmit) + gaming + DNS + VoIP + live video. HTTP/3 uses QUIC (UDP-based) for lower latency with reliability built in
What is HTTP/1.1 vs HTTP/2 vs HTTP/3?
HTTP/1.1: one request per connection (pipelining rarely works) + HOL blocking + plain text headers. HTTP/2: multiplexing (multiple requests per connection) + header compression (HPACK) + server push + binary protocol + still uses TCP (TCP HOL blocking). HTTP/3: built on QUIC (UDP-based) + eliminates TCP HOL blocking + faster connection setup (0-RTT) + connection migration (phone switches WiFi->cellular seamlessly). Modern browsers support HTTP/2 and HTTP/3. gRPC uses HTTP/2
What is WebSocket and when do you use it?
WebSocket provides full-duplex bidirectional communication over a single TCP connection. Starts as HTTP upgrade request. Use when: real-time updates needed (chat + live feed + gaming + collaborative editing + stock prices + notifications). Unlike HTTP polling (client asks repeatedly): WebSocket server can push data anytime. Challenges: stateful connections (harder to scale horizontally - need sticky sessions or pub-sub for server fan-out) + connection management at scale. Alternatives: Server-Sent Events (SSE - one-way push - simpler)
What is Long Polling - SSE and WebSocket differences?
Short Polling: client requests every N seconds (simple + wasteful + high latency). Long Polling: client requests -> server holds connection until data available -> client immediately re-requests. Moderate latency + HTTP compatible + higher server connections. SSE (Server-Sent Events): server pushes stream to client over HTTP + one-way (server->client only) + auto-reconnect built-in + simple. WebSocket: full-duplex (both directions) + lowest latency + most complex. Use SSE for: news feeds + notifications. Use WebSocket for: chat + gaming + collaborative tools
MEMORY BOOSTER: Core building blocks
Every system: DNS + CDN + LB + API Gateway + Stateless Servers + Cache + Queue + DB + Object Storage + Monitoring. LB algorithms: Round Robin + Least Connections + IP Hash + Weighted. CDN: cache at edge (TTL + versioned URLs for invalidation). API Gateway: auth + rate limit + route + transform + log. Reverse Proxy = in front of servers (LB + SSL + cache). Forward Proxy = in front of clients. TCP = reliable ordered. UDP = fast unreliable. HTTP/2 = multiplexing. HTTP/3 = QUIC. WebSocket = full-duplex real-time
What are the differences between SQL and NoSQL databases?
SQL (Relational): ACID transactions + strong consistency + complex joins + normalized schema + fixed structure + vertical scaling primary + examples: PostgreSQL - MySQL - Oracle. NoSQL: BASE (Basically Available Soft-state Eventually consistent) + horizontal scaling + flexible schema + simple access patterns + types: Document(MongoDB) + Key-Value(Redis/DynamoDB) + Wide-Column(Cassandra/HBase) + Graph(Neo4j). Choose SQL for: financial transactions + complex relationships. Choose NoSQL for: massive scale + simple access + flexible schema + high write throughput
When do you choose each type of NoSQL database?
Key-Value (Redis/DynamoDB): O(1) get/put by key + session storage + caching + shopping cart + user preferences + leaderboards. Document (MongoDB/Firestore): JSON documents + flexible schema + content management + user profiles + catalogs. Wide-Column (Cassandra/HBase): time-series + high write throughput + IoT data + activity logs + partition key determines data location. Graph (Neo4j/Neptune): relationships are first-class + social networks + recommendation engines + fraud detection + knowledge graphs. Search (Elasticsearch): full-text search + log analytics + faceted search
What is database sharding and how does it work?
Sharding (horizontal partitioning) splits data across multiple database instances. Each shard holds a subset of data. Shard key determines which shard stores a record. Strategies: Range-based (shard by user_id 1-1M on shard1 - 1M-2M on shard2 - can cause hot spots) + Hash-based (hash(user_id) % num_shards - even distribution but hard to range query) + Directory-based (lookup service maps key to shard - flexible but extra hop). Challenges: cross-shard joins (avoid by denormalization) + rebalancing when adding shards (consistent hashing helps)
What is database replication and what are the types?
Replication copies data to multiple nodes for availability + durability + read scaling. Types: Single-Leader (master-slave): writes go to leader - replicated async to followers - followers serve reads - leader failure requires failover. Multi-Leader: multiple leaders accept writes - conflict resolution needed - good for geo-distributed writes. Leaderless (Dynamo-style): any node accepts writes - quorum reads/writes (W+R>N) - eventual consistency - high availability. Synchronous vs Async replication: sync = stronger consistency but higher write latency
What is the difference between Read Replicas and Multi-AZ?
Read Replicas: async replication + replicas are readable + scale reads horizontally + can be in different regions + NOT for HA (not automatic failover). Use for: heavy read workloads + analytics + reporting + geographic read distribution. Multi-AZ (AWS): synchronous replication + standby is NOT readable + automatic failover when primary fails + same region. Use for: high availability + disaster recovery. Strategy: use BOTH - Multi-AZ for HA + Read Replicas for read scaling