How many hours is data available in the moving time window that Kinesis Stream uses?
24 hours (can be increased to 7 days for additional cost)
How many MB does a single Shard in Kinesis allow for ingestion and consumption?
1 MB for Ingestion
2 MB for Consumption
How many Shards does a Kinesis Stream have when newly created?
1
What’s the size of a single Kinesis Data Record?
1 MB
How quickly is data delivered using Kinesis Firehose?
Near-Real-Time, anything between 1-60 seconds (depends on the amount being ingested, i.e. how quickly the 1 MB buffer it uses is filled up).
What are the 11 valid destinations for Kinesis Firehose?
How quickly is data delivered through Kinesis Streams?
In Real-Time (~ 200 ms)
Not to be confused with Kinesis Firehose, that delivers Near-Real-Time only!
What’s the right product to use when (potentially complex) real-time SQL processing is required?
Kinesis Data Analytics
What are the six 3rd party big data products does Amazon EMR provides as a managed service?
Is Amazon EMR a Multi-AZ or Single-AZ product?
Single-AZ
What compute products can be used with Amazon EMR (i.e. which compute products are used to run EMR)?
EC2 & EKS
What’s the master node used for with Amazon EMR?
What are core nodes used for with Amazon EMR?
Note: losing a core node means losing HDFS and track of tasks => should not be run on Spot instances!
Note #2: Multi-node clusters have at least one core node.
What are task nodes used for with Amazon EMR?
Note: ideal to be run on Spot instances
What’s EMRFS?
S3-based file system for EMR. Can be used to store results of EMR workloads to ensure resilience with EMR.
What’s the right product to use when you want to directly query S3 data via Redshift?
Redshift Spectrum
Is Amazon Redshift a Multi-AZ or Single-AZ product?
Single-AZ
What’s the role of the Leader Node in Amazon Redshift?
Receive query input and distribute it to Compute nodes for execution
If you want to customize the network options for Amazon Redshift, what do you need to enable?
Enhanced VPC Routing
At which intervals are automatic snapshots taken with Amazon Redshift?
Every ~8 hours or ~5 GB
What are valid data sources for Amazon Redshift (name 7)?
Amazon S3 Amazon RDS Amazon DynamoDB Amazon EMR AWS Glue AWS Data Pipeline SSH-enabled host on Amazon EC2 or on-premises
What are the available retention periods available for automatic snapshots taken with Amazon Redshift?
Anything between 1 day (default) up to 35 days.
What are valid data sources for AWS Batch?
What’s the right product to use for long-running (> 15 minutes) compute tasks?
NOT AWS Lambda!
Use AWS Batch, EC2, ECS instead for example