S3
Advantages and disadvantages of data lakes
Adv: many sources, defined schema, lower cost that data warehouse solutions, tolerant of low-quality data
Disadv: unsuitable for transactional systems, needs cataloguing before analysis
Simple ML workflow
Security in S3
AWS Glue
Glue ETL capability
Glue jobs system
Glue FindMatches
Database Migration Service (DMS)
Athena
Quicksight
Kinesis
- E.g. lots of video data from few sources, or small amounts of data from many sources (IoT)
Kinesis video streams
Kinesis data streams
Kinesis Data Firehose
Kinesis Data Analytics
Glue vs Kinesis
Sample architecture from IoT device
Sample architecture from video camera
EMR
Elastic Map Reduce
Apache Spark
EC2 for Machine Learning