Which technology is Map Reduce a part of?
Hadoop. Hadoop consists of HDFS and Map Reduce
What is the data format of the Input in Map Reduce?
(Key, Value) pairs, of arbitrary serializable types, that should fit in memory
What strategy should be employed when cluster components fail during computation in Map Reduce?
To address cluster component failures, it is advisable to parallelize computation into small tasks. In the event that a task fails to deliver results, the recommended approach is to restart that specific task.
Where does the data come from in Map Reduce?
The (H)DFS
What are the 4 steps in the Map task?
What is the main operation of the Shuffle (Master controller) task?
Keeps track of the (key, value) pairs in the output of all Map tasks. It then does a distributed group by key operation, which outputs the key(s) and its list of values
What 3 qualities defines the Reduce task?
What are the 3 switch levels of data acquisition from HDFS to the MapReduce task, in order of fastest to slowest?
In general terms, what does the MapReduce task do?
It compresses several data entries of the same value, to a single self-specified new key, value (often count). Example: given the input “w1, w2, w3, w2, w3, w3, w3”, the output could be “(w1, 1), (w2, 2), (w3, 4)”.