Module 6 - Distributed File Systems Flashcards by Alex Jabbour

Distributed file systems allow ________ to access _______ systems on ________ servers

applications
file
remote

How well did you know this?

Not at all

Perfectly

There are two ways for a client to access a file in a DFS (distributed file system). What are their names?

Remote access model

2. upload/download model

How well did you know this?

Not at all

Perfectly

What is the remote access model for a DFS (distributed file system)?

The file always lives on the server

- Anytime a client wants to read or write, it needs to issue an RPC (or a request)

How well did you know this?

Not at all

Perfectly

What is the upload/download model for a DFS (distributed file system)?

The file is managed by the server, but transfers a copy of the file to the client
Once the file is received, the client locally performs reads and writes on it
When the client is done accessing the file, the new version of the file is then transferred back to the server

How well did you know this?

Not at all

Perfectly

What does NFS stand for? when was it created? and by which organization? Does ecelinux use sun?

Network File System

Created by Sun Microsystems
in 1984

yes, ecelinux uses NFSv4

How well did you know this?

Not at all

Perfectly

In NFS, does the server have an RPC stub? or the client? or both?

both the client and the server have their own RPC stubs

How well did you know this?

Not at all

Perfectly

What is the step-by-step process of the client trying to access a chunk of a file on the remote server?

Client makes a system call to the kernel, and specifies the path of the file which it is trying to access
Turns into a request which passes through the VFS (virtual file system) client layer, and the NFS client
The NFS’s client uses its RPC client stub to make a call to the RPC server stub, which triggers a execution in the NFS server program
The NFS server program makes a call to the VFS server layer which fetches the file from the server’s file system
The fetched file is returned to the client through the RPC stubs and then propagated back to the client’s VFS

How well did you know this?

Not at all

Perfectly

NFS supports client-side caching. What is the motivation behind this?

What are the caches used for in NFS?

Caching reduces communication between the client and the server
The cache is used to hold UPDATES to a file

How well did you know this?

Not at all

Perfectly

Whenever caches are used in NFS, the cache holds modifications that have been made to a specific file.

When are file modifications propagated to server after sitting in the cache?

What issue arrises if this took place in a distributed NFS with replication?

File modifications are flushed back to the server whenever the client closes the file
In a distributed NFS, this could lead to inconsistencies in files across replicas

How well did you know this?

Not at all

Perfectly

Describe the “delegation of authority” mechanism used in NFS for upload/download of files

What is the purpose of this delegation? What does it mean in the context of two clients trying to access the file?

Client asks server for the file
Server delegates authority of the file to the client
Server recalls delegation
Client sends returns file

Delegation in step 2 ensures that only one client can modify a specific file at a time (since it has the authority from the server). Other clients cannot access it until the authority recalls delegation

How well did you know this?

Not at all

Perfectly

NFS uses RPCs internally. An optimization to the NFS product was adding compound procedures.

What does a file read between a client and a server in NFS look like with and without compound procedures?

How is the latency decreased with compound procedures?

In NFS, whenever the client makes a request to the server to read a file, it has to first perform a LOOKUP, and then performs a READ on the file

This mechanism without compound procedures:

client makes a LOOKUP network request, gets a response
client makes a READ network request, and then gets another response

With compound procedures:
1. client makes a LOOKUP and a READ call in the same network request

Therefore, with compound procedures there is only one network request, but without it, there are two. Thus, the latency is reduced

How well did you know this?

Not at all

Perfectly

Usually, an NFS server generally exports only a part of its local file system to the remote client.

What does a client typically do to its local file structure to integrate this part from the server?

The client imports this segment, and adds this portion of the server’s file system to its local file system

The remote file segment is mounted onto the client under a certain path

How well did you know this?

Not at all

Perfectly

Suppose a client imports a directory which contains a subdirectory which was imported from another remote host.

How does this client access the nested directory in terms of imports?

If a client imports a directory from server A which contains another imported directory from server B, then the client will import the nested directory DIRECTLY from server B

How well did you know this?

Not at all

Perfectly

A large scale DFS may distribute files across multiple servers in order to manage very large files.

What are the two ways of doing this?

Making all chucks of each file reside at their own server (chunks of a file are not partitioned across servers)
Split the chunks of a file across numerous servers (just like sharding in databases)

How well did you know this?

Not at all

Perfectly

In a large scale DFS which distributes files across numerous servers by storing files in chunks (just like sharding in databases), how can this result in improved throughput?

In the case where the server is the bottleneck of the system, the partitioned files allow load to be balanced across numerous servers - thereby improving Tput

How well did you know this?

Not at all

Perfectly

In the Google File System (GFS), describe the following:

The master node
The chunk servers
The underlying file system existing in each chunk server

Study These Flashcards

Master node stores meta-data about the files (size, path, access rights) and chunks - servers it to the client
The chunk servers store a chunk (which could be a replica) of the overall file system with no metadata
The underlying file system is a linux file system in each chunk

Why does GFS (Google file system) distribute the files across numerous chunk servers?

What other famous file system is an open-source implementation of GFS?

Study These Flashcards

Distribution of the files across numerous chunks provides fault tolerance in software

HDFS (Hadoop distributed file system) is an open-source implementation of GFS

What is a Google file system (GFS) made up of?

Study These Flashcards

Master node
GFS client
Collection of chunk servers

In GFS (Google file system), the master node’s metadata about chunks are ______ in main memory and ______ are logged to local storage

Study These Flashcards

cached

updates

How does the master node in GFS (Google file system) keep the meta-data consistent with the state of the chunk servers?

Study These Flashcards

The master periodically polls the chunk servers to keep the meta-data consistent

In GFS (google file system), what are the steps for the client read data from a file?

Study These Flashcards

Client sends the file name and chunk index to the master
The master responds with a contact address of how to access this file
The client then pulls data directly from a chunk server, bypassing the master

What is the step-by-step mechanism in which GFS (google file system) updates data in a given file?

Study These Flashcards

A client contacts the nearest chunk server holding the data, and pushes its updates to that server
This server will push the update to the next closest server which is holding the data (secondary), and so on, in a pipelined fashion until all replicas receive the data
The primary chunk server assigns a sequence number to the update operation and passes it on to the secondary chunk servers (bypassing master)
Primary replica informs client that the update is complete

In a centralized sharing setting, what are the semantics of file sharing? (two points)

Under what condition can these same semantics be achieved in a DFS?

Study These Flashcards

Centralized file sharing semantics:

Operations are strictly ordered in time
Application can ALWAYS read its own writes

This can be a DFS as long as there is only one file server and the files are not cached

When a cached file is modified in a DFS, it is ________ but ________ to propagate the changes _______ to the file server. Instead, they are made after the file is closed

Study These Flashcards

possible
impractical
immediately

In file sharing semantics, let's use the session semantics method. 1. When a client makes a modification to a file in a DFS (without closing it), what is the visibility of this modification? 2. When do the changes get propagated to the other clients viewing the files? 3. Which party determines the final version of the file?

1. The modifications are only visible to the process that modified that file 2. The modifications are only made visible to other clients when the file is closed 3. The final version of the file is determined by the last client that closes that file

The semantics of file sharing in a DFS can be defined in numerous ways 1. What does NFS use? 2. What about HDFS?

1. NFS uses session semantics | 2. HDFS uses immutable files but supports an append function so that logs can be made

What does UNIX file sharing semantics describe?

Every operation on a file is instantly visible to all processes

What does session semantics describe?

No changes are visible to other processes until the file is closed

What does immutable file sharing semantics describe?

No updates are possible. Makes it very simple for sharing and replication

What does Transactions file sharing semantics describe?

All changes occur atomically

Module 6 - Distributed File Systems Flashcards

(30 cards)