What is scaling up?
Is scaling up a short or long term fix?
Short term fix
Adding servers for parallel computing
Using lots of small machines in a cluster
Is scaling out a short or long term fix?
Long term as more servers may be added when needed
When is scaling up a good option?
Name a batch only framework
Apache Hadoop
Name 2 stream only frameworks
Name 2 hybrid frameworks
Datasets in batch processing are typically:
Batch processing is well suited for
Batch processing is not appropriate when?
processing time is imporant
Batch processing involves
Stream processing systems
datasets in stream processing are considered;
unbounded
in stream processing what does the total dataset refer to
the total amount of data that has entered the system so far
Hybrid frameworks attempt to offer
a general solution for data processing
For Apache Hadoop describe the following:
For Apache Spark describe the following:
Apache spark benefits of in-memory processing
Runs up to 100x faster in memory
Runs up to 10x faster when it uses disk over traditional map-reduce
Disk sharing is slow in MapReduce due to
Ways to create RDDs