System Design Flashcards

Question 1

Q

Design Distributed Message Queue

Answer

A

What’s the format and average size of the messages? –> Text only and small to medium size (max being in the kb range)
Can messages be consumed repeatedly? Yes
Are messages consumed in the same order they were produced? Yes
Does data need to be persisted and what is the retention? Yes and 2 week retention
What’s the data delivery mechanism? At most once, at least once, or exactly once? At a minimum want to support at least once

Non-functional requirements
- High throughput or low latency
- Scalable, system should be distributed in nature
- Persistent and durable

Apache Kafka uses topics and partitions for efficient data processing. Topics are data categories to which records are published; consumers subscribe to these topics. For scalability and performance, topics are divided into partitions, allowing parallel data processing and fault tolerance. Partitions enable multiple consumers to read concurrently and are replicated across nodes for resilience against failures

Each partition is an ordered, immutable sequence of records, where order is maintained only within the partition, not across the entire topic. This partitioning mechanism enables Kafka to handle a high volume of data efficiently, as multiple producers can write to different partitions simultaneously, and multiple consumers can read from different partitions in parallel

At least once data delivery (kafka default)
Producer sends a message synchronously or asynchronously with a response callback, setting ack=1 or ack=all, to make sure messages are delivered to the broker. If the message delivery fails or timeouts, the producer will keep retrying.

Consumer fetches the message and commits the offset only after the data is successfully processed. If the consumer fails to process the message, it will re-consume the message so there won’t be data loss. On the other hand, if a consumer processes the message but fails to commit the offset to the broker, the message will be re-consumed when the consumer restarts, resulting in duplicates.

Question 2

Q

Design Unique ID Generator

Answer

A

What are the characteristics of the unique IDs? Must be unique and sortable
Do IDs increment by 1? The IDs increment by time but not necessarily only increments by 1
Do IDs only contain numerical values? Yes
What is the ID length requirement? IDs should fit into 64-bit
The system should be able to generate 10k IDs per second

ID will be made up of:
- Sign bit: 1 bit, reserved for future use cases
- Timestamp: 41 bits, milliseconds since epoch or custom datetime. 2 ^ 41 = 2199023255552 which gives us ~69 years = 2199023255552 ms / 1000 / 365 days / 24 hours / 3600 seconds
- Data center ID: 5 bits, 2 ^ 5 = 32 data centers
- Machine ID: 5 bits, 2 ^ 5 = 32 machines / data center
- Sequence number: 12 bits. For each machine/process, the sequence number is incremented by 1. The number is reset every millisecond. 2 ^ 12 = 4096 possible values every millisecond

Future discussions
- Clock synchronization. We’re assuming each ID generation server has the same clock. This might not be necessarily true across multiple machines. Consider Network Time Protocol (NTP) as a solution

Question 3

Q

Design Search Auto Complete

Answer

A

How many autocomplete suggestions should be returned? 5
How does the system know which results to return? Determined by popularity, decided by the historical frequency count
Case insensitive? Yes
Can I assume English only? Yes

Non-Functional Requirements
- Fast response time
- Relevancy and Sorted
- High availability

High Level Design
- Data gathering service: aggregates queries and async process for processing historical frequency
- Real time query service: Given a prefix return top k results

Question 4

Q

Design URL Shortener

Answer

A

How long is the shortened URL? As short as possible. (let’s assume 7 chars)
What characters are allowed in the shortened URL? -> a-zA-Z0-9

Two primary components:

(1) GET /api/v1/<short_url>
return 301 in Location header</short_url>

301 = permanent
302 = temporary

(2) POST /api/v1/url

longURL -> hash function -> hash value (short url)

The hash value can have 62 different characters so how long should it be

62 ^ 1 = 62 possible values
62 ^ 2 = 3,844
62 ^ 7 = 3.5 trillion

We can use a known hash function (eg MD5, SHA1) but we’ll need to get the first 7 chars and perform DB lookup to ensure no collision

2nd approach. Use base conversion to convert the same number between its different number representation systems. Will need a unique ID generator but can use auto-incrementing on DB

base_62(11157) –> 2TX

Question 5

Q

Design a Rate Limiter

Answer

A

Rate Limiter: limits/throttles the number of client requests allowed to be sent over a time period

Client side or server side rate limiting? Server side
Does the rate limiter support different kinds of throttling rules? Yes
Does the rate limiter need to operate in a distributed fashion? Yes

Non functional requirements
- Low latency
- Distributed

HTTP 429: Too Many Requests
Use response headers to determine ratelimit attributes (eg remaining, limit, retry-after)

Types of rate limiting rules
- Number of requests / second

Algorithms for rate limiting:
- Token bucket. Simple, well understood, used by Stripe and Amazon. It is a bucket with pre-defined capacity and tokens are replenished periodically (say once / second or once / minute, whatever the time period is). Each request consumes one token so if there aren’t enough tokens we drop the request
- How many buckets do we need? Examples include bucket / IP address., global bucket across all requests, bucket / endpoint (eg POST vs GET)
- Leaking bucket: Uses FIFO queue
- Fixed window counter: Divide requests into fixed size time windows
- Sliding window

Counters should be stored in a distributed redis cluster. Why Redis?
- Fast, in memory
- Supports time-based expiration
- Includes atomic INCR and EXPIRE

Why not DB? Slow for random access

Question 6

Q

Design a news feed

Answer

A

Understand scope of problem

Important features are a user publishing a post and seeing their friends posts on the feed
How is the feed sorted? To keep it simple let’s use reverse chronological order
Can feed contain images, videos, or just text? It can contain all

Two primary workflows
1) Publishing to the feed
2) Building the news feed for a user and reading it

APIs
1) POST /api/v1/me/feed {content, auth}
2) GET /api/v1/me/feed

User sends a request to view her news feed, GET /v1/api/me/feed
LB -> WebServers -> NewsFeed service -> Fetch news feed from cache
News feed cache stores news feed IDs needed to render the news feed

Fanout service: The process of delivering a post to friends
Two types of fanout models:

1) Fanout on write. News feed is pre-computed at write time
Pros: Fetching news feed is fast
Cons: For inactive users producing the feed wastes resources. Also hot-spot problem in which servers are taxed for popular individual

2) Fanout on read. Feed is generated at read time
Pros: Limits compute on servers, doesn’t waste resources for inactive users
Cons: Fetching the news feed is slow because it’s not pre-computed

A solution? Take a hybrid approach. Try to pre-compute the feed but for users who have a lot of friends let that be on demand

1) Fetch friend_ids (can use a graph db)
2) Send friends list and new post_id to kafka
3) Workers fetch data from Kafka. Append user_id and post_id to the feeds table. This feeds table serves as the news feed cache and we only store post_id so that we’re not storing expensive content (eg images)

Question 7

Q

Design a KV store

Answer

A

There is no perfect design with the following tradeoffs
- Read, write, and memory usage
- Consistency vs availability

We’ll design a system that can:
- Store big data
- Key/value pair is small: less than 10kb
- High availability: Responds quickly, even during failures
- High scalability: Can scale to support larger data sets

Single server:
- Store everything in memory and use a hash table
- Optimizations that can be added:
1) Data compression
2) Only hold frequently used data, spill rest to disk

Distributed KV store

Data Partition
- Infeasible to fit entire data set on single server
- Partition data across multiple servers evenly but must try to minimize data movement when nodes are added/removed (consistent hashing, servers and keys use same hash function on ring)

Data Replication
- Replicate data asynchronously over N servers where N is configurable
- Since data is replicated across multiple nodes, it must be synchronized across replicas. A write quorum of size W, for a write operation to be considered as successful, write operation must be acknowledged from W replicas

Failure detection
- Each node maintains a membership list
- Each node periodically sends heartbeats to random nodes, which in turn propagate to others
- If the heartbeat has not been recorded in predefined period, it’s considered as offline

Question 8

Q

Design Google Maps

Answer

A

What features to focus on?
- Location updates, navigation ETA, and map rendering

How large is the road data and do we have access to it?
- Yes. Several TBs

Consider traffic conditions for accurate time estimates

Consider different travel modes such as driving, walking, bus

Accuracy, users should not be given wrong turns
Smooth map rendering
Data and battery usage, use as little as possible
General availability and scalability requirements

Map Projection:
- The process of translating the points from a 3D globe to a 2D plane
- Many different ways with strengths and limitations. All of them distort the actual geometry in some way

Geohashing
- An encoding system that encodes a geographic area into a short string of letters and digits
- Recursively divides the grid into subgrid squares
- 0th division is a single square represents entire earth
- 1st division is cut into 4 squares.
01 11
00 10
- 2nd division, cut each square 4 more times –> 16 squares now
- Geohashing can be used for map tiling

Map rendering
- Foundational concept is tiling
- World is broken up into smaller tiles and client only downloads relevant tiles for the area, then stitches them together
- Distinct set of tiles of tiles at different zoom levels
- Client only needs tiles for the zoom level of the map viewport
- Prevents consuming too much bandwidth
- For example, a single tile could be 256x256 pixel image

Road data processing for navigation algorithms
- Most routing algos use A* pathfinding
- They operate on a graph data structure
- Intersections are nodes, roads are edges
- Routing algos can use similar concept as tiles.
- Load on demand

Back of the envelope
- What are the storage requirements for the entire collection of map tile images?

Zoom 0: 1 tile
Zoom 1: 4 tiles
Zoom 2: 16 tiles
….
Zoom 21: 4.3 trillion tiles

After roughly 20 zoom levels, you hit practicality limits. Adding more doesn’t really help the user

Assume each tile is 256x256 pixel compressed PNG, means image size is about 100kb

Total size roughly equal 440 PB. Keep in mind 90% of earth is uninhabited. Therefore roughly 50 PB

3 big features
1. Location service
2. Navigation service
3. Map Rendering

Responsible for recording a user’s location update
Record location update every second but batch them to the server every 15 seconds to significantly reduce update traffic by all clients
POST /api/v1/locations JSON encoded array of lat, lon, timestamp tuples
Millions of writes a day. Use KV store
Go with availability over consistency. Location data because stale quick
user_id is the partition key, then sort by timestamp
user_id, timestamp, lat, long user_mode, nav_mode
All data with same partition key are stored together

Responsible for finding a reasonbly fast route from point A to point B
GET /api/v1/nav?origin=123+main+st,SF&dest=DisneyLand

Client fetches map tiles based on zoom level and map view port
We pre generate these and can also sever via CDN
They are static
Each tile is represent by its geohash
Optimization: Send vector information (paths and polygons) instead of images over the network. The client draws the paths from the vector info
Vectorized images provide a smoother zoom experience and don’t get stretched

Geocoding database
- DB which stores places and their corresponding lat/lng pairs

Question 9

Q

Design Google Drive

Answer

A

Important Features?
Upload/download (any file type), file sync, notifications
Do files need to be encrypted? Yes
File size limit? Yes, 10GB

Non-functional requirements
- Reliable. Data loss is unacceptable
- Fast sync speeds
- Minimizing bandwidth usage
- Highly available

Upload a file (two types supported)
Simple: Use when file size is small
Resumable: When file size is large and high chance of network interruption
Download a file
GET https://api.example.com/files/download {‘path’: /a/b/c.txt’}

One option is to use S3
Another option it to use file block storage and shard by user_id
Also need a metadata DB (Postgres) for handling user data, login info, etc

Happens when two users modify the same file at the same time. Conflict occurs
We can resolve by saying the first version which gets processed wins, the later version will receive a conflict error

These are responsible for uploading the blocks to cloud storage
A file can be split into blocks, each with their own unique hash value, stored in our metadata DB
Each block is treated as an independent object and stored on S3
To reconstruct a file, blocks are joined in a particular order
This is optimal because sending the whole file (particularly large ones) on each update consumes a lot of bandwidth –> Delta Sync
Delta Sync: When a file is modified, only modified blocks are synced instead of the whole file
We should also apply compression to significantly reduce the size
In this system:
Client uploads to block servers
Block server split file into blocks
Compress and encrypt each one

Use long polling because this is not bi-directional

De duplicate data blocks
Limit number of versions of a particular file
Move infrequent data to cold storage

Question 10

Q

Design YouTube

Answer

A

What are the important features?
- Video upload
- Watch a video
- Do we need to support international users? Yes
- Encryption required? Yes
- Max file size for video upload? 1GB
- Can we leverage some existing cloud services? Yes

High level design
- Videos are stored in CDN (AWS Cloudfront, Cloudfare). When you press play it’s streamed from the CDN
- API Servers. Everything else. Feed recommendation, generating video upload URL, updating meta DB, user signup, etc

Transcoding:
- Video encoding. Translate to other file types to provide best
- Two components
1) Container. Contains the content and the format is the file extension, eg .avi, .mov, .mp4
2) Codecs. Compression and decompression algorithms. Reduce the video size while preserving quality, eg H.264, VP9, HEVC

File Upload Flow
- Videos are uploaded to S3
- Transcoding servers fetch video and start transcoding
possible format given device and bandwidth issues
- Once transcoding complete, send to CDN
- Completion handler updates the metadata DB

Video Streaming Flow
- We’re streaming the video from the CDN, not downloading to local device
- We will use a standard streaming protocol (eg MPEG-DASH, Apple HLS, etc)
- We will serve the video from the edge which is closest to the user

Safety Optimization: Presigned upload URLs
1. Client makes an HTTP POST /upload to API servers to receive a pre-signed URL
2. Client uploads the video using the pre-signed URL to S3

System Design Flashcards

(10 cards)