What are the desirable features of a DFS?
Transparency
- Access
- Location
- Concurrency
- Failure
- Replication
- Migration
Heterogeneity - file service should be provided across different OS and hardware
Scalability - small and big data
What is a CFS?
A clustered file system is not a single server with a set of clients it is a cluster of servers that all work together to provide high performance servers to the clients.
What are the steps in a read operation in GFS?
What are the steps in a write operation in GFS?
Explain what is an NFS? What are its advantages?
Distributed file system protocol originally developed by Suns Microsystems (1984)
client server app users can update, store and view files on a client computer to access files over a computer network like local storage,
Advantages
- easy sharing of data across clients
- centralized admin (backups on multiple servers)
- security (behind firewall)
What is GFS?
Google File System is a scalable distributed file system, designed to provide reliable access to data using large clusters of commondity hardware (also a CFS)
What is HDFS?
Hadoop Distributed File System was developed off the idea of GFS has many of the same motiviations.
Can store many big files (MBS-GBS)
two types of reads
- large streaming reads
- small random reads
once written files are seldom modified
- random writes are supported but do not have to be efficient
sustained high bandwidth more important than low latency
128mb block size
Compare HDFS to GFS
Platform
HDFS - Cross platform (linux, windows, mac)
GFS - Linux
Development
HDFS - Java
GFS - C++
Chunk Size
HDFS - 128mb
GFS - 64mb
Node
HDFS - NameNode / DataNodes
GFS - Master Node and Chunk Server
Log
HDFS - Editlog
GFS - Operational Log
Write Operation
HDFS - no more than one writer at a time
GFS - can have multiple writers to a file at a time (replicas)
File deletion
HDFS - Deleted files are renamed into particular folder and then it will be removed via garbage.
GFS - deleted files are not reclaimed immediately and are renamed in hidden name space and it will be deleted in 3 days if not in use.
In GFS files are divided into _____ chunks
A. Variable-size
B. Fixed-size
C. Both fixed and variable
D. None of the above
B. Fixed-size
Which group of the following operations are supported by GFS architecture?
A. Read, write, create, delete
B. Update, append, read, write
C. Snapshot, append, update, delete
D. None of the above
A. Read, write, create, delete
Which on of the following belongs to a CFS?
A. NFS
B. HDFS
C. CIFS
D. Google Big Table
B. HDFS
NFS is different from CFS
CIFS is a protocol over smb or something, not CFS
Bigtable is a fully managed wide-column and k,v NoSQL database service for large analytical and operational workloads.
Which of the following statements is correct about HDFS?
A. HDFS needs to run on expensive commodity hardware and cannot deal with failures
B. HDFS is to handle large files and block size is 128 MB by default.
C. DFS supports multi-users to write one file simultaneously.
D. In HDFS, low latency is more important than high sustained bandwidth
Answer: B
A. HDFS needs to run on expensive commodity hardware and cannot deal with failures.
(HDFS is expected to run on CHEAP (not expensive) commodity hardware.)
B. HDFS is to handle large files and block size is 128 MB by default. (HDFS is to
handle large files and block size is 128 MB by default.)
C. HDFS supports multi-users to write one file simultaneously. (No, HDFS does not
support multi-user writes, which is different from GFS.)
D. In HDFS, low latency is more important than high sustained bandwidth. (No, this
statement is incorrect. Low latency is not prioritized in HDFS design.)
What are the design motivations of HDFS?
Many inexpensive commodity hardware and failures are common
- many big files
- two types of reads: large streaming, small random
- high sustained bandwidth more important than low latency.
What are the components of HDFS architecture?
What are the differences among DFS, NFS and CFS
DFS - Distributed File System (DFS): A file system that has its
components spread across multiple systems. On the other hand, an
NFS is inherently a distributed file system as well. The client
component is on a different system than the underlying physical
storage or its management.
NFS - Files
are not local. They are served over a
network, with the physical storage
units and their management hosted
by a different entity.
CFS - It is
built by pooling several different
discrete components, typically multiple
servers, multiple disks, working
together to provide a unified
namespace. A client is not aware of
the physical boundaries that make up
the file system.