CDN Flashcards

(10 cards)

1
Q

At a high level, what is a CDN, and what core problems does it solve for large-scale systems?

A

A CDN is a geographically distributed network of edge servers that caches and delivers content closer to users, reducing latency, bandwidth costs, and improving availability—primarily for static assets but also for some dynamic content.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

How does a CDN decide which edge server a user request should be routed to? Name at least two mechanisms and explain them briefly.

A

-DNS-based routing → resolves the domain to an edge IP based on geographic proximity
-Anycast routing → BGP routes the request to the “nearest” edge in network topology
-Load-aware routing → shifts traffic away from overloaded edges

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What happens when a user requests an object that is not present in the edge cache? Walk me through the request flow.

A

-What you described is called a cache miss, and the canonical flow is:
-User request hits the edge server
-Edge checks cache → miss
-Edge forwards request to the origin server (or parent/regional cache)
-Origin returns the object
-Edge stores the object (respecting TTL/cache headers)
-Edge serves the response to the user

Key words interviewers love hearing:
origin server
cache miss
TTL / cache headers
parent cache (optional but impressive)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Why is cache invalidation considered one of the hardest problems in CDNs? Name two common strategies to handle it and their tradeoffs.

A

Your two strategies are valid. Here’s how to frame them crisply, with tradeoffs:
TTL-based expiration
-CDN automatically evicts content after a fixed time
-✅ Simple, scalable
-❌ Can serve stale data until expiry, or cause extra origin load if TTL is too short

Revalidation / origin check
-CDN checks with the origin (e.g., If-Modified-Since, ETag)
-✅ Fresher content
-❌ Adds latency and increases origin traffic

Other strategies you could optionally mention:
Explicit cache purge / invalidation APIs (fast but operationally risky)
Versioned URLs (best for static assets)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Why do CDNs dramatically improve latency, even when the cached object is relatively small?

A

Primary reasons CDNs reduce latency
1) Reduced physical distance
Edge servers are closer → fewer network hops → lower RTT
2) Faster TCP/TLS setup
Handshakes are expensive; shorter RTT = faster connection setup
3) Network path optimization
CDNs sit on high-quality backbone networks vs public internet
4) Parallel fetching (secondary benefit)
Browsers fetch many assets concurrently from nearby edges

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

CDNs often favor availability over strong consistency. Why is this tradeoff acceptable, and in what situations might it not be?

A

CDNs prioritize availability because serving slightly stale content is usually acceptable for static assets and dramatically improves user experience and reliability. In distributed systems with network partitions, enforcing strong consistency would require blocking or failing requests, which is worse for most web workloads.

When this tradeoff is not acceptable:
-Authentication / authorization data
-Financial or transactional data
-Personalized or real-time content (shopping carts, stock prices)
-Security-sensitive content (revoked access, signed URLs)

In those cases, content is often:
-Not cached
-Cached with very short TTLs
-Served directly from origin or via authenticated edge logic

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

CDNs are traditionally used for static assets. How can modern CDNs support dynamic or personalized content without breaking correctness?

Give me 2–3 techniques 🧩

A

CDNs increasingly act as programmable edge platforms, not just static file caches.

Key Techniques
1) Edge compute (serverless at the edge)
Run lightweight logic close to the user
Examples: auth checks, header rewriting, A/B tests
Keeps dynamic logic off the origin

2) Partial caching
Cache the static portions of a page
Fetch personalized data separately (e.g., user profile, recommendations)

3) Cache key customization
Cache based on headers, cookies, device type, locale
Avoids serving the wrong version to the wrong user

4) Short TTL + revalidation
Cache dynamic responses briefly
Balance freshness with latency

5) Edge-side includes (ESI)
Assemble pages at the edge from cached + dynamic fragments

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Name three security benefits provided by CDNs and explain each briefly.

A

DDoS protection
-CDNs absorb and distribute massive traffic spikes
-Shields the origin server from volumetric attacks
(This maps directly to your “takes load off origin” point — just name it explicitly.)

Attack surface reduction
-Origin servers are hidden behind the CDN
-Attackers can’t easily discover or directly target the origin IP
(Your “separate network” idea fits perfectly here.)

Web Application Firewall (WAF)
-CDNs inspect traffic at the edge
-Block common attacks (SQL injection, XSS, bots) before they reach the backend
(This sharpens your “simpler servers” intuition.)

Optional bonus points:
-TLS termination & certificate management
-Bot mitigation and rate limiting

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

What are the main costs or downsides of using a CDN, and when might you avoid one?

A

Main costs / downsides
-Monetary cost
CDNs charge for bandwidth, requests, and features
Can be expensive at high scale or for large assets

-Cache complexity
Cache invalidation is hard
Risk of serving stale or incorrect content

-Reduced control & observability
Debugging becomes harder
Behavior depends on CDN configuration and vendor quirks

-Not ideal for highly dynamic or write-heavy workloads
Low cache hit ratio
Benefits diminish when content changes constantly

When you might avoid a CDN
-Internal tools
-Low-traffic apps
-Highly personalized, real-time systems

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q
A
How well did you know this?
1
Not at all
2
3
4
5
Perfectly