Why is sharding/data partitioning is used?
Because sometimes the only viable option in terms of cost and scaling for an application is adding more servers instead of using a more powerful server.
What are the two sharding methods? Quickly explain.
What is dictionary based sharding?
It is a technique of extracting the sharding logic to a lookup service. This moves the complexity away from the app, which queries the lookup service to know where to store/get data from.
Cite three sharding/partition criteria.
Cite three challenges of sharding.
What does the locality of reference principle say?
recently requested data is likely to be requested again
What is a “distributed cache”?
Cache layer is composed of many nodes, each of which stores a piece of the overall cache, in memory.
Usually a consistent hashing is used to determine which node to query for the data.
What are the three main schemes for cache invalidation during write?
What are indexes used for?
To improve performance of read operations
What is the drawback of using indexes?
All write operations are degraded because you have to write also on the index.
What is a proxy server?
A proxy server is an intermediary piece of hardware/software that sits between the client and the back-end server.
Give 3 uses for a proxy server.
Request logging
Request filtering
Batch several requests into one
What are queues used for?
To enable async communications between systems.
What is redundancy?
Redundancy means duplication of critical data or services with the intention of increased reliability of the system
What does the CAP theorem states?
CAP theorem states that it is impossible for a distributed software system to simultaneously provide more than two out of three of the following guarantees (CAP): Consistency, Availability and Partition tolerance
How is Consistency achieved in a distributed system?
Reads are not allowed until all nodes are updated.
How is Availability achieved in distributed systems?
Data is replicated across multiple servers.
What does partition tolerance means?
Means that a system continues to work despite message loss or partial failure.
How is Partition Tolerance achieved in distributed systems?
Data is sufficiently replicated across combinations of nodes and networks to keep the system up through intermittent outages.
What is Consistent Hashing?
It is a technique for hashing that minimizes the impact of adding/removing buckets in the hash table.
How does Consistent Hashing works?
It works by assigning the buckets to the hash values space (imagine a circle from 0 to N, and the bucket are positioned in that circle).
When we hash(key), the result is a place in the circle. The bucket that will be used is the next bucket found by following the circle of values.
What is the disadvantage of ajax polling?
The client has to poll the server at a fixed rate, and many of the responses will be empty, creating an unecessary HTTP overhead.
What is Websocket?
It is a communications protocol that supports full-duplex conversation between the browser and the webserver.
What are Server Sent Events (SSE)?
It is a technology over HTTP that is used to maintain a long-running connection to the server that allows the server to send a stream of messages back to the client.
It does not allow the client to send messages (simplex).
The server responds with a mime-type “text/event-stream”.