AWS Services Flashcards

(44 cards)

1
Q

Amazon Data Firehose (formerly Amazon Kinesis Data Firehose)

A
  • Fully managed service
  • Delivers near real-time streaming data to destinations like Amazon S3, Redshift, and Elasticsearch
  • Not the lowest latency solution because it copies data to S3 first
  • Can transform JSON (CSV) into Apache Parquet or ORC
  • Serverless (no infra to be managed)
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Kinesis (aka Kinesis Data Streams)

A
  • real-time data processing and analysis, requiring manual scaling and provisioning of shards
  • AWS version of kafka
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Kinesis Features (Focus, Data Storage, Scaling, Processing, Replay, Use Cases)

A

Focus: Real-time data ingestion, processing, and analysis.

Data Storage: Stores data for a configurable period (default 24 hours, up to 7 days).

Scaling: Requires manual scaling through shard management.

Processing: Allows for custom processing with sub-second latency.

Replay: Supports replay capability.

Use Cases: Building real-time dashboards, processing clickstream data, and building personalized recommendations.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Kinesis Data Firehose Features (Focus, Data Storage, Scaling, Processing, Replay, Use Cases)

A

Focus: Delivering streaming data to various destinations (S3, Redshift, Elasticsearch, etc.).

Data Storage: Does not store data; delivers directly to specified destinations.

Scaling: Fully managed, automatic scaling.

Processing: Near real-time processing with configurable buffer sizes and intervals.

Replay: Does not support replay capability.

Use Cases: Loading data into data warehouses, building data lakes, and sending data to analytics platforms.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Lex

A

Service for building conversational interfaces for applications using voice and text.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Polly

A

Amazon Polly is a fully-managed service that generates voice on demand, converting any text to an audio stream.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Rekognition

A

Identifies a wide range of objects, scenes, and activities within images and videos.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Comprehend

A

NLP service that uses machine learning to extract insights from text

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

EMR

A

Elastic Map Reduce

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Redshift

A
  • petabyte-scale cloud data warehouse
  • designed for online analytical processing (OLAP) workloads and enables businesses to analyze large datasets efficiently and cost-effectively using standard SQL queries and business intelligence tools.
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

OLAP

A
  • OnLine Analytical Processing
  • Analyze large, multidimensional databases
  • Allows users to quickly and efficiently extract, summarize, and analyze data from various perspectives, facilitating informed business decisions
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

OLTP

A
  • OnLine Transaction Processing
  • Focused on day-to-day transactions
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Lake Formation

A
  • A managed service that makes it easy to set up, secure, and manage your data lakes
  • Data governance
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

MSK

A

Managed Streaming for Apache Kafka

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

Redshift Streaming Ingestion

A
  • Lowest latency way to ingest data into Redshift from real-time data sources
  • Better than Data Firehose
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

Redshift Spectrum

A

Redshift Spectrum is a Redshift feature that allows you to query data in Amazon S3 without loading them into Redshift tables. Redshift Spectrum is not capable of moving data from S3 to Redshift.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
17
Q

Why isn’t Amazon Data Firehose the lowest latency solution?

A

It stages the data in S3 first then copies it to the final destination

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
18
Q

FSx for Lustre

A

a fully managed, high-performance file system optimized for compute-intensive workloads like high-performance computing (HPC), machine learning, and video processing

19
Q

Amazon IQ

A

a freelancing platform designed to help customers quickly find, engage, and pay AWS Certified third-party experts for on-demand project work

20
Q

DeepLens

A
  • Cancelled
  • create and deploy deep learning models on edge devices, such as cameras and robots
  • use pre-built models or create their own models to perform tasks such as object detection, image classification, and facial recognition
  • Hardware for experimentation. Not designed to replace security cams.
21
Q

RTSP

A
  • Real-time streaming protocol
  • Used to ingest data into Amazon Kinesis video stream
22
Q

Kinesis Video Streams

A
  • Securely stream video from connected devices to AWS for analytics, ML, playback, and other processing
  • Autoscales to support millions of devices
23
Q

Lambda Functions

A

-Serverless and event -driven
- Execute single discrete tasks in response to events such as image upload to S3 or a new entry in a DynamoDB table
- Max excution duration of 15 min. Not suitable for long-running jobs like ETL.
- Effective for simple, short-lived tasks

24
Q

Step Functions

A
  • Severless workflow orchestration that coordinates multiple AWS services
  • Designed to manage complex, long-running workflows that may involve multiple steps, branching logic, error handling, and human interventions
25
Custom Entity Recognition
Extends the capability of Comprehend to identify new entity types not supported as one of the preset generic entity types
26
Elastic Inference
- allows you to attach low-cost GPU-powered acceleration to Amazon EC2 and Sagemaker instances or Amazon ECS tasks, to reduce the cost of running deep learning inference by up to 75%. - supports TensorFlow, Apache MXNet, PyTorch, and ONNX models. - It allows you to attach just the right amount of GPU-powered inference acceleration to any Amazon EC2 or Amazon SageMaker instance type or Amazon ECS task. - This means you can now choose the instance type that is best suited to the overall compute, memory, and storage needs of your application, and then separately configure the amount of inference acceleration that you need.
27
Inference Pipeline
- a linear sequence of two to fifteen containers that process requests for inferences on data - used to manage different stages of inference - can be used to preprocess data for real-time predictions (Glue can only be used to preprocess data for batch jobs)
28
Mapping templates
API Gateway feature that makes it possible for the REST API to be integrated directly with an Amazon SageMaker runtime endpoint, thereby avoiding the use of any intermediate compute resource (such as AWS Lambda or Amazon ECS containers) to invoke the endpoint
29
AWS Panorama
- machine learning appliance and SDK that allows you to bring computer vision to **on-premises** cameras to make **predictions locally** with **high accuracy and low latency** - With the AWS Panorama Appliance, you can automate tasks that have traditionally required human inspection to improve visibility into potential issues - evaluate manufacturing quality, identify bottlenecks in industrial processes, and monitor workplace security even in environments with limited or no internet connectivity - allows you to leverage existing IP cameras - can perform inference locally
30
How do you stream video data?
Kinesis Video Streams Kinesis *Data* Streams can't stream video
31
Athena
enables users to analyze data directly in Amazon S3 using standard SQL. It is designed for ad-hoc analysis and reporting, allowing users to quickly explore large datasets without the need to load or transform the data into a traditional database
32
AWS Batch
Manages compute resources for batch jobs Not used for ingestion or transformation
33
Kinesis Analytics
- Provides lambda blueprints for common use cases like transforming data (gzip to json) - Can do real-time querying
34
Apache Flink Studio HOTSPOTS
detects dense regions in dataset
35
How can you train polly on specific pronunciations?
- pronunciation lexicons - allow you to customize how Amazon Polly synthesizes speech by providing rules for how to pronounce specific words or phrases - xml file
36
AWS IoT Greengrass V2
- enables machine learning inference **at the edge**, allowing the trained model to label images locally in remote environments with intermittent internet connectivity
37
AWS DMS
Database Migration Service
38
Amazon Redshift ML
makes it easy for data analysts and database developers to create, train, and apply machine learning models using familiar SQL commands in Amazon Redshift data warehouses.
39
Amazon Augmented AI
- aka Amazon A2I - build the **workflows that are required for human review** of machine learning predictions. - Amazon Textract is directly integrated with Amazon A2I so that you can easily get low-confidence results from Amazon Textract's AnalyzeDocument API operation reviewed by humans.
40
Amazon Managed Service for Apache Flink
allows you to process and analyze streaming data using standard SQL. The service enables you to quickly author and run powerful SQL code against streaming sources to perform **time-series analytics, feed real-time dashboards, and create real-time metrics**.
41
What's the I/O capacity of a single shard in AWS Kinesis Data Streams?
One shard provides a capacity of 1 MB/sec data input and 2 MB/sec data output. One shard can support up to 1000 PUT records per second.
42
AWS Snowball
Snowball Edge is a device with on-board storage and compute power for select AWS capabilities. Snowball Edge can process data locally, run edge-computing workloads, and transfer data to or from the AWS Cloud.
43
Does Sagemaker use ALBs?
No
44
Sagemaker Canvas
- No code ML models - Time series prediction isn't offered