Some of the challenges of creating big data applications
Best approach to scaling problems?
Use multiple database servers and spread the table across all servers.
Each server will have a subset of the data.
Scaling using multiple databases. How?
Fault-tolerance issues
When we have many databases it starts to become frequent that the hard drive in one of the databases goes bad
Our system is not resilient to hardware errors
Data corruption issues
At some point we deploy code with a bug: instead of incrementing each video viewership by one unit, our code increments by two units. We notice the mistake only 24 hours later.
Now we have corrupted data: every video watched in the past 24 hours have their viewership inflated. How do we solve this?
Our system is not resilient to human errors
The desired properties of Big Data systems are related both to
Complexity and scalability
Complexity
generally used to characterize something with many parts where those parts interact with each other in multiple ways
Scalability
ability to maintain performance in the face of increasing data or load by adding resources to the system
A big data system must
Desired properties of a Big Data system