Describe a stateless server for a distributed service?
A stateless server does not store state information, so it doesn’t know:
Describe a stateful server for a distributed service design?
A stateful server stores information about clients, which files are being accessed, which types of accesses, which clients have cached which file, and which clients have read/written a file.
What are the tradeoffs of a stateful server?
Pros:
Cons:
What are the tradeoffs of a stateless server?
PROS
CONS
What are the tradeoffs between replication and partitioning in DFS?
REPLICATION
Pros:
- fault tolerant because, if a machine fails, others still make the files available.
- highly available because any machine can service any request.
Cons:
- NOT scalable (need to increase capacity on each machine)
PARTITIONED
Pros:
- highly scalable: if you need to support more files, just add more machines!
Cons:
What is an alternative to replication and partitioning in DFS?
An alternative is a system of peers in which every machine both maintains files and services requests (blurring distinction between servers and clients). Each peer handles some portion of the load, typically for files local to it.
What are the tradeoffs involved in caching on DFS?
A compromise between upload/download and true remote file access is to allow caching (with prefetching).
PROS:
CONS:
What are two extreme DFS models?
At one extreme, we have the upload/download model. When the client wants to modify a file, it downloads the whole thing, modifies, then uploads it back to the server.
At the other extreme, we have true remote file access. The file remains on the server and every single operation has to be sent over the network to the server. The client does not use local caching or buffering.
Describe how the empirical data in the Sprite caching paper motivates the Sprite design
33% of all file accesses are writes. This implies that caching could help performance! 2/3 of accesses are reads and therefore would benefit. But what about the writes? Caching is good, but write-through wouldn’t be sufficient because it doesn’t benefit the write operation in any away. So would session semantics help?
75% of files are open less than 0.5 seconds and 90% are open less than 10 seconds. This implies session semantics are no good! We’d have to update most files after only 1/2 second and almost all of them within 10 seconds. Too much overhead!
20-30% of new data is deleted within 30 seconds and 50% is deleted in 5 minutes/ This means write-back on close is unnecessary. We’ll just be writing it to the server and then it will get deleted anyway.
All of the decisions so far are unfriendly to concurrent access (no write-through and no session semantics). But it turns out that file sharing is rare on their system! This means we don’t have to optimize for it. (still support it though)
What are the final design decisions for the Sprite system?
Cache with write-back every 30 seconds. Data younger than 30 secs is likely to be modified again soon, i.e., client is still working on it. Note that after 30 seconds, we’re past the point where 20-30% of data is being deleted anyway.
When a client opens a file, the server checks whether another client is working on it and, if so, retrieves the dirty blocks.
This requires that all open calls go to the server, so directories cannot be cached.
When concurrent writes occur (however rare), Sprite disables caching and all writes are serialized on the server.
What are Sprite’s sharing semantics?
Sequential write sharing: caching and sequential semantics
Concurrent write sharing: no caching at all. Cost of this is low since sharing is rare.
What kind of data structures are needed on the server and client sides to support the Sprite system?
Per file, the client tracks:
Per file, server tracks:
What is a common compromise between partitioning and replication?
A common compromise is to partition the files among machines, then replicate each partition.
What are the pros/cons of the upload/download model?
PROS
- Modifications are done locally, hence quickly, with no network overhead.
CONS