scale-up (vertical scaling)
upgrade existing machine (SMP,MPP)
scale-out (horizontal scaling)
adding more machines to network, unlimited scaling
Symmetric Multi-processing (SMP)
traditional PC, multiple processors share same RAM and storage
Massively Parallel Processing (MPP)
Each processor has its own dedicated RAM and storage (vendor lock-in)
Cluster Architecture
many computers connected to work as a single system.
how are nodes connected in cluster
usually gigabit ethernet, 8-64 per rack
Only pro of MPP over cluster
faster message passing between nodes. Ideal for single, vertical solutions like data warehousing
commodity hardware
standardized, market priced hardware
distributed computing
tasks split into smaller units processed simultaneously across machines
challenges of distributed computing (4) (ARFA)
Assigning SPLIT tasks,
Resource Allocation, Fault-tolerance, Aggregating (results)
solution to challenges of distributed computing
Use a framework to hide complexity of distributed computing from developers
Grid computing
What is HPC?
high-performance computing (GPU intensive)
What is NIST reference architecture for?
How a Big Data system should be designed
5 Components of NIST Reference Architecture
What happens in application component?
Programs are written to process data from collection to visualization and user access
What happens in Data Provider component besides just data collection? (4) (Suck My Ass!)
4 components professor adds to NIST
Batch vs Stream
Running analytical algorithms on:
Lamda Architecture
Two pipelines:
1. Cold path (batch)
2. Hot path (real-time)
Kappa Architecture
One hot path (real-time)