Name the four data processing methods
a) batch, for processing of massive datasets at once
b) periodic, for unpredictable workloads
c) near real-time, for small bursts of data that must be collected and processed within minutes
d) real-time, for tiny bursts of data that must be processed continually
Name the four Hadoop modules
a) Common (or Core)
b) the Hadoop Distributed File System (or HDFS)
c) Yet Another Resource Negotiator (or YARN)
d) MapReduce
Name the difference in purpose between Hive and Presto
Hive is optimised for query throughput whereas Presto is optimised for interactivity