Distributed file systems allow ________ to access _______ systems on ________ servers
applications
file
remote
There are two ways for a client to access a file in a DFS (distributed file system). What are their names?
2. upload/download model
What is the remote access model for a DFS (distributed file system)?
- Anytime a client wants to read or write, it needs to issue an RPC (or a request)
What is the upload/download model for a DFS (distributed file system)?
What does NFS stand for? when was it created? and by which organization? Does ecelinux use sun?
Network File System
Created by Sun Microsystems
in 1984
yes, ecelinux uses NFSv4
In NFS, does the server have an RPC stub? or the client? or both?
both the client and the server have their own RPC stubs
What is the step-by-step process of the client trying to access a chunk of a file on the remote server?
NFS supports client-side caching. What is the motivation behind this?
What are the caches used for in NFS?
Whenever caches are used in NFS, the cache holds modifications that have been made to a specific file.
When are file modifications propagated to server after sitting in the cache?
What issue arrises if this took place in a distributed NFS with replication?
Describe the “delegation of authority” mechanism used in NFS for upload/download of files
What is the purpose of this delegation? What does it mean in the context of two clients trying to access the file?
Delegation in step 2 ensures that only one client can modify a specific file at a time (since it has the authority from the server). Other clients cannot access it until the authority recalls delegation
NFS uses RPCs internally. An optimization to the NFS product was adding compound procedures.
What does a file read between a client and a server in NFS look like with and without compound procedures?
How is the latency decreased with compound procedures?
In NFS, whenever the client makes a request to the server to read a file, it has to first perform a LOOKUP, and then performs a READ on the file
This mechanism without compound procedures:
With compound procedures:
1. client makes a LOOKUP and a READ call in the same network request
Therefore, with compound procedures there is only one network request, but without it, there are two. Thus, the latency is reduced
Usually, an NFS server generally exports only a part of its local file system to the remote client.
What does a client typically do to its local file structure to integrate this part from the server?
The client imports this segment, and adds this portion of the server’s file system to its local file system
The remote file segment is mounted onto the client under a certain path
Suppose a client imports a directory which contains a subdirectory which was imported from another remote host.
How does this client access the nested directory in terms of imports?
If a client imports a directory from server A which contains another imported directory from server B, then the client will import the nested directory DIRECTLY from server B
A large scale DFS may distribute files across multiple servers in order to manage very large files.
What are the two ways of doing this?
In a large scale DFS which distributes files across numerous servers by storing files in chunks (just like sharding in databases), how can this result in improved throughput?
In the case where the server is the bottleneck of the system, the partitioned files allow load to be balanced across numerous servers - thereby improving Tput
In the Google File System (GFS), describe the following:
Why does GFS (Google file system) distribute the files across numerous chunk servers?
What other famous file system is an open-source implementation of GFS?
Distribution of the files across numerous chunks provides fault tolerance in software
HDFS (Hadoop distributed file system) is an open-source implementation of GFS
What is a Google file system (GFS) made up of?
In GFS (Google file system), the master node’s metadata about chunks are ______ in main memory and ______ are logged to local storage
cached
updates
How does the master node in GFS (Google file system) keep the meta-data consistent with the state of the chunk servers?
The master periodically polls the chunk servers to keep the meta-data consistent
In GFS (google file system), what are the steps for the client read data from a file?
What is the step-by-step mechanism in which GFS (google file system) updates data in a given file?
In a centralized sharing setting, what are the semantics of file sharing? (two points)
Under what condition can these same semantics be achieved in a DFS?
Centralized file sharing semantics:
This can be a DFS as long as there is only one file server and the files are not cached
When a cached file is modified in a DFS, it is ________ but ________ to propagate the changes _______ to the file server. Instead, they are made after the file is closed
possible
impractical
immediately