What is the NameNode
the term specified in the HDFS Hadoop framework for the master node. The name node holds the Resource Manager / Job tracker Daemon
Works as part of yarn to hold onto which Data node / slave nodes have additional resources in HDFS
What 5 pillars of Hadoop
1) Data Management
2) Data Access
3) Data Governance and Integration
4) Security
5) Operations
Do should you not use the Hadoop Framework?
Low Latency data access : Quick access to small
parts of data
❑ Multiple data modification : Hadoop is a better fit
only if we are primarily concerned about reading
data and not writing data.
❑ Lots of small files : Hadoop is a better fit in scenarios,
where we have few but large files.
Where is job tracker stored
on the Namenode
What does the Master node hold
NameNode (HDFS) and ResourceManager (Map-Reduce)
where is Yarn located
Yet another resource negotiator (YARN) is located on the name-node
What are the largest challenges (per the powerpoint) facing the big data space?
❑ Lack of skilled staff
❑ Data governance issues – With so much data available, it becomes even more critical to have a framework in place for deciding what data belongs in the system. However, just 30% of the companies surveyed by TDWI said that data governance teams were heavily involved in Big Data management.
❑ Organizational readiness – As with business intelligence, successfully analyzing Big Data takes more than just installing software and other tools. The entire organization needs to be on the same page, and there must be a clearly articulated strategy built around actual business goals.
What are the 7 Hadoop file formats?
What is YARN?
A framework for job scheduling and cluster
resource management. It is the data processing layer of
Hadoop.
What is the MapReduce? is it the storage or processing layer of hadoop
A YARN-based system for parallel processing of large data sets. It is the data processing layer of Hadoop.
What is the the Hadoop HDFS get syntax
get [-crc]
❑ Hadoop HDFS get Command Description
This HDFS fs command copies the file or directory in HDFS identified by the source to the local file system path identified by local destination. This HDFS basic command retrieves all files that match to the source path entered by the user in HDFS, and creates a copy of them to one single, merged file in the local file system identified by local destination.
❑ Hadoop HDFS get Command Example:
hdfs dfs -get /user/dataflair/dir2/sample /home/dataflair/Desktop
what are the Read/Write Files commands in Hadoop
hdfs dfs -text {file_name}
hdfs dfs -cat /hadoop/test #cat command
hdfs dfs -appendtofile {source} {destination} /*puts name for the file */
How to copy files a file from the place locally onto the hadoop file
hdfs dfs-copyFromLocal {source} {new destination path}
hdfs dfs -get {source} {new destination}
hdfs dfs -copyToLocal {source path} {new destination path}
hdfs dfs -put {source} {new destination}
Create a directory in specified HDFS location. This command does not fail even if the directory already exists.
hdfs dfs -mkdir -f {destination e.g: ‘ /hadoop2’}
What are the three stages of MapReduce? what order do they go in?
MapReduce program executes in three stages, namely map stage, shuffle stage, and reduce stage.