Week 11 - MapReduce Flashcards

(8 cards)

1
Q

What are the steps involved in MapReduce

A

Input
Split
Mapping
Shuffling
Reducing
Output

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

What is Hive

A

Hive (HQL) is like SQL, its tables are stored on HDFS as flat files, query language similar to SQL
Developed by Facebook and open source. Provides necessary SQL abstraction to integrate SQL like queries into java.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What is Pig

A

Pig Latin is a bit like Perl, scripts written in pig latin a dataflow language, developed by yahoo, can exectute Hadoop jobs in MapReduce, Tez or Spark. Can be extended with user defined functions which can be written in a variety of languages.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

What are the differences between Hive and Pig?

A

Hive

Used by analysts
Used for reporting
Declarative SQLish language
Works on the server side of a cluster
for structured data

Pig

Used by programmers and researchers
Used for programming
Procedural data-flow language
Works on the client side of a cluster
for semi-structured data

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

What are the MapReduce Architecture Components

A
  • Job client
  • Job tracker: monitors resources and coords jobs, health of all the task trackers (transfers jobs to other nodes once failures found), monitors execution percentage of jobs and resources availablility.
  • Task tracker periodically heartbeat with resource information job execution stauts to jobtracker, receive and execute commands from JobTracker (start new tasks or kill existing tasks)
  • Task: map task and reduce task
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Which of the following maps input k,v pairs to a set of intermediate k,v pairs?

A. Mapper
B. Reducer
C. Both A and B

A

A. Mapper

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Q2: What is the correct sequence of data flow in MapReduce?

a. InputFormat
b. Mapper
c. Combiner
d. Reducer
e. Partitioner
f. OutputFormat

A. abcdfe
B. abcedf
C. acdefb
D. abcdef

A

B. abcedf

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Q3: The total number of partitioners is equal to

A. The number of reducers
B. The number of mappers
C. The number of combiners
D. All of the above

A

A. The number of reducers

How well did you know this?
1
Not at all
2
3
4
5
Perfectly