How should you track API calls for a rate limiter?
Where can you add caching for an API rate limiter?
Using a “write back cache” by updating all counters and timestamps in cache only
Least Recently Used can be a reasonable cache eviction policy for this system
Should you rate limit by IP or User ID?
Hybrid
For authenticated endpoints, IP is not desirable because multiple users can share a single public IP (like in an internet cafe)
However, if we only rate limit on the user then a malicious actor could rate limit the login API for a user by repeatedly entering the wrong credentials and thus locking out the legitimate user
How would you implement a search index for Twitter search?
Do you need to add caching to Twitter Search, and if so why and how?
The search index tells you which rows are relevant, but you will still need to go fetch the full records of those rows afterwards
To deal with hot tweets we can introduce a cache in front our database (such as memcached)
Application servers, before hitting the backend database can quickly check if the cache has that tweet
Where would you add load balancing in Twitter Search?
- Between application servers and db servers