Processing: 24% (EMR, Spark, Hive, Lambda, Glue, ECS) Flashcards by Kate McKenzie

Name the four data processing methods

a) batch, for processing of massive datasets at once
b) periodic, for unpredictable workloads
c) near real-time, for small bursts of data that must be collected and processed within minutes
d) real-time, for tiny bursts of data that must be processed continually

How well did you know this?

Not at all

Perfectly

Name the four Hadoop modules

a) Common (or Core)
b) the Hadoop Distributed File System (or HDFS)
c) Yet Another Resource Negotiator (or YARN)
d) MapReduce

How well did you know this?

Not at all

Perfectly

Name the difference in purpose between Hive and Presto

Hive is optimised for query throughput whereas Presto is optimised for interactivity

How well did you know this?

Not at all

Perfectly

Processing: 24% (EMR, Spark, Hive, Lambda, Glue, ECS) Flashcards

Be able to: a) determine appropriate data processing solution requirements b) design a solution for transforming and preparing data for analysis c) automate and operationalize a data processing solution (3 cards)