What are the steps involved in MapReduce
Input
Split
Mapping
Shuffling
Reducing
Output
What is Hive
Hive (HQL) is like SQL, its tables are stored on HDFS as flat files, query language similar to SQL
Developed by Facebook and open source. Provides necessary SQL abstraction to integrate SQL like queries into java.
What is Pig
Pig Latin is a bit like Perl, scripts written in pig latin a dataflow language, developed by yahoo, can exectute Hadoop jobs in MapReduce, Tez or Spark. Can be extended with user defined functions which can be written in a variety of languages.
What are the differences between Hive and Pig?
Hive
Used by analysts
Used for reporting
Declarative SQLish language
Works on the server side of a cluster
for structured data
Pig
Used by programmers and researchers
Used for programming
Procedural data-flow language
Works on the client side of a cluster
for semi-structured data
What are the MapReduce Architecture Components
Which of the following maps input k,v pairs to a set of intermediate k,v pairs?
A. Mapper
B. Reducer
C. Both A and B
A. Mapper
Q2: What is the correct sequence of data flow in MapReduce?
a. InputFormat
b. Mapper
c. Combiner
d. Reducer
e. Partitioner
f. OutputFormat
A. abcdfe
B. abcedf
C. acdefb
D. abcdef
B. abcedf
Q3: The total number of partitioners is equal to
A. The number of reducers
B. The number of mappers
C. The number of combiners
D. All of the above
A. The number of reducers