You decide to build a simple web analytics application to better understand the behavior of your users.
Requirements:
R1 - The application should track the number of views for any video you tell it to track
-> The application’s web server gets a message every time a tracked video is watched
R2 - Additionally, the application should be able to tell you at any point what the top 100 videos are by number of views
Desired Properties of a Big Data System
Batch Layer
Speed Layer
Serving Layer
Hadoop File System is the open source alternative to Google File System
(batch layer)
How distributed file systems work?
Hadoop MapReduce
The Split-Apply-Combine Approach
Each “Apply” operation can be performed independently of other “Apply” operations
Big Data infrastructure as a service
Elastic clouds
Examples of suppliers elastic clouds
Many machine learning tools can be used on top of this infrastructure (elastic clouds)
All of these tools implement state-of-the-art ML algorithms out of the box. Some of them are also extensible, i.e., you can implement your own algorithms
A Client / Server Model
1 Clients connect to a server over the Internet
2 Clients perform a request
3 Server issues response
4 Clients display response