What is big data?
A collection of large and complex data sets which are difficult to process using
common database management tools or traditional data processing
applications
What are the 4 Vs of Big data?
Volume -> Data at rest
Velocity -> Data in Motion
Variety -> Data in many forms
Veracity -> Data in doubt
What are the two types of sclaling? (ability of the system to adapt to increased demands)
What are the advantages and disadvantages of horizontal scaling?
Advantages:
- increases performance in small steps as needed
- financial investment is relatively small
can scale up as much as needed
Disadvantages:
What are the advantages and disadvantages of vertical scaling?
Advantages:
- Most softwares can easily take advantage of vertical scaling
- easy to install hardware within a single machine
Disadvantages:
- requires substantial financial investment
- system has to be more powerful to handle future workloads
- does not necessarily scale up vertically after a certain limit
What are horizontal scaling platforms?
Peer to peer networks
apache hadoop
What are vertical scaling platforms?
Multicore processors
HPC high performance computing clusters
Graphics processing units
What is a peer to peer network?
drawbacks:
- communication is a major bottleneck
What is apache hadoop?
an open source software for storing and processing large datasets
what are high performance computing clusters? (HPC)
also known as super computers with throusands of processing cores
built powerful hardware optimized for speed and
throughput
What are multicore CPUs?
One machine having dozens of processing
cores