Week 10: Distributed File Systems Flashcards

(15 cards)

1
Q

What are the desirable features of a DFS?

A

Transparency
- Access
- Location
- Concurrency
- Failure
- Replication
- Migration

Heterogeneity - file service should be provided across different OS and hardware
Scalability - small and big data

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

What is a CFS?

A

A clustered file system is not a single server with a set of clients it is a cluster of servers that all work together to provide high performance servers to the clients.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What are the steps in a read operation in GFS?

A
  1. Client request - issues a request to read a file from GFS. Request specifies filename and byte range to be read
  2. Communicating with Master Server - client contacts the GFS master server to obtain metadata about the file. Client has specifically requested the chunks locations, including which chunk servers hold the replicas of the required file chunks
  3. Contacting the Chunk Servers - Once client knows the chunk locations, it directly contacts the closest chunk server to minimize network latency
  4. Reading the Data - client sends a read request to the chosen chunk server, specifying the chunk handle and the byte range of the chunk it needs. Chunk server responds by reading the requested chunk data and sending it back to the client.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

What are the steps in a write operation in GFS?

A
  1. Client asks master which chunk server holds the current leases of chunk locations of other replicas
  2. Master replies with identity of primary and locations of secondary replicas
  3. Client pushes data to all replicas
  4. Once all replicas ack receiving data, clients sends write request to primary. Primary assigns consecutive serial numbers to all mutations it receives, providing serialization. Applies mutations in serial number order.
  5. Primary forwards write request to all secondary replicas. They apply mutations in the same order.
  6. Secondary replicas reply to primary indicating they have completed operation.
  7. Primary replies to the client with success or error message.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Explain what is an NFS? What are its advantages?

A

Distributed file system protocol originally developed by Suns Microsystems (1984)

client server app users can update, store and view files on a client computer to access files over a computer network like local storage,

Advantages
- easy sharing of data across clients
- centralized admin (backups on multiple servers)
- security (behind firewall)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

What is GFS?

A

Google File System is a scalable distributed file system, designed to provide reliable access to data using large clusters of commondity hardware (also a CFS)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

What is HDFS?

A

Hadoop Distributed File System was developed off the idea of GFS has many of the same motiviations.

Can store many big files (MBS-GBS)

two types of reads
- large streaming reads
- small random reads
once written files are seldom modified
- random writes are supported but do not have to be efficient
sustained high bandwidth more important than low latency
128mb block size

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Compare HDFS to GFS

A

Platform

HDFS - Cross platform (linux, windows, mac)
GFS - Linux

Development

HDFS - Java
GFS - C++

Chunk Size

HDFS - 128mb
GFS - 64mb

Node

HDFS - NameNode / DataNodes
GFS - Master Node and Chunk Server

Log

HDFS - Editlog
GFS - Operational Log

Write Operation

HDFS - no more than one writer at a time
GFS - can have multiple writers to a file at a time (replicas)

File deletion
HDFS - Deleted files are renamed into particular folder and then it will be removed via garbage.
GFS - deleted files are not reclaimed immediately and are renamed in hidden name space and it will be deleted in 3 days if not in use.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

In GFS files are divided into _____ chunks

A. Variable-size
B. Fixed-size
C. Both fixed and variable
D. None of the above

A

B. Fixed-size

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Which group of the following operations are supported by GFS architecture?

A. Read, write, create, delete
B. Update, append, read, write
C. Snapshot, append, update, delete
D. None of the above

A

A. Read, write, create, delete

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Which on of the following belongs to a CFS?

A. NFS
B. HDFS
C. CIFS
D. Google Big Table

A

B. HDFS

NFS is different from CFS
CIFS is a protocol over smb or something, not CFS
Bigtable is a fully managed wide-column and k,v NoSQL database service for large analytical and operational workloads.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Which of the following statements is correct about HDFS?

A. HDFS needs to run on expensive commodity hardware and cannot deal with failures

B. HDFS is to handle large files and block size is 128 MB by default.

C. DFS supports multi-users to write one file simultaneously.

D. In HDFS, low latency is more important than high sustained bandwidth

A

Answer: B

A. HDFS needs to run on expensive commodity hardware and cannot deal with failures.
(HDFS is expected to run on CHEAP (not expensive) commodity hardware.)

B. HDFS is to handle large files and block size is 128 MB by default. (HDFS is to
handle large files and block size is 128 MB by default.)

C. HDFS supports multi-users to write one file simultaneously. (No, HDFS does not
support multi-user writes, which is different from GFS.)

D. In HDFS, low latency is more important than high sustained bandwidth. (No, this
statement is incorrect. Low latency is not prioritized in HDFS design.)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

What are the design motivations of HDFS?

A

Many inexpensive commodity hardware and failures are common
- many big files
- two types of reads: large streaming, small random
- high sustained bandwidth more important than low latency.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

What are the components of HDFS architecture?

A
  • HDFS client
  • Namenode
  • fsImage
  • secondary namenode
  • Datanodes
  • local disks
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

What are the differences among DFS, NFS and CFS

A

DFS - Distributed File System (DFS): A file system that has its
components spread across multiple systems. On the other hand, an
NFS is inherently a distributed file system as well. The client
component is on a different system than the underlying physical
storage or its management.

NFS - Files
are not local. They are served over a
network, with the physical storage
units and their management hosted
by a different entity.

CFS - It is
built by pooling several different
discrete components, typically multiple
servers, multiple disks, working
together to provide a unified
namespace. A client is not aware of
the physical boundaries that make up
the file system.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly