Chapter 9 - Data Ingestion, Transformation, and Analytics Flashcards

(20 cards)

1
Q

What are valid use cases for transforming data when importing it into a data lake?
(Select three.)

A) Imposing consistent timestamps

B) Removing corrupted data

C) Creating a schema

D) Removing duplicate data

E) Visualizing data

A

A, B, D. Data transformation includes changing the contents or format of data. Creating a
schema is something you do with structured data, and a data lake doesn’t impose or require
a schema. Visualizing data is something that occurs with data that’s already stored in the
data lake.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

What AWS Data Lake transform detects duplicate data?

A) MatchFinder

B) FindMatches ML

C) Elastic MapReduce

D) Spark

A

B. FindMatches ML is the name of the transform that detects duplicate data. MatchFinder
doesn’t exist. Spark is a big data framework. Elastic MapReduce allows searching and sorting
large data sets.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

What’s the most efficient way to import data from an on-premises SQL database into an
AWS Data Lake?

A) Dump the database into an S3 bucket and then import the data into the data lake.

B) Import the data into RDS and then into the data lake.

C) Use the Glue Connector.

D) Use the JDBC connector.

A

D. Connecting to the SQL database using the JDBC connector and importing the data
directly into the data lake is the most efficient solution. The Glue Connector isn’t real. The
other options are feasible but inefficient.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

What protocols does AWS Transfer Family support? (Choose two.)

A) SFTP

B) SMB

C) FTP

D) CIFS

E) HTTPS

A

A, C. AWS Transfer Family supports SFTP, FTPS, and FTP. It doesn’t support CIFS/SMB or
HTTPS for file transfer.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

AWS Transfer Family can be used to transfer files to or from which of the following?
(Choose two.)

A) EBS

B) EFS

C) RDP

D) S3

E) DynamoDB

A

B, D. AWS Transfer Family can transfer files to or from EFS or S3. It doesn’t support transfers with EBS or DynamoDB. RDP is not a file storage system.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

What technology does AWS Glue use to search large data sets and perform data
transformation?

A) Amazon Athena

B) Apache Spark

C) Apache Elephant Stack

D) AWS Data Lake

A

B. AWS Glue uses the Apache Spark big data framework to perform search and transforms.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

What is the difference between a data lake and a data warehouse? (Choose two.)

A) A data warehouse can store unstructured data.

B) A data warehouse is a relational database.

C) A data lake requires structured data.

D) A data lake can store unstructured, schema-less data.

A

B, D. A data warehouse is a relational OLAP database. A data lake can store both structured
and unstructured, nonrelational data.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Which of the following can AWS Data Lake import from? (Choose two.)

A) EBS

B) ELB

C) CloudFront

D) IAM

E) CloudWatch

A

B, C. AWS Data Lake can import data from ELB and CloudFront, but none of the others.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Which of the following can analyze data in an AWS Data Lake? (Choose two.)

A) Amazon EMS

B) Athena

C) RedShift Spectrum

D) RedShift

E) S3

A

B, C. Athena and RedShift Spectrum can analyze data in an AWS Data Lake. RedShift and
S3 cannot. Amazon EMS doesn’t exist.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Which of the following is not an appropriate use of AWS Glue?

A) Searching data

B) Ingesting real-time streaming data

C) Preparing data for analysis

D) Transforming data

A

B. Kinesis is more appropriate for real-time streaming data. For all other cases, AWS Glue is
the best.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

You’re developing an application to predict future weather patterns based on RADAR
images. Which of the following Kinesis services is the best choice to support this application?

A) Kinesis Data Streams

B) Kinesis Video Streams

C) Kinesis Data Firehose

D) Kinesis ML

A

B. Kinesis Video Streams is designed to work with time-indexed data such as RADAR
images. Kinesis ML doesn’t exist.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

You’re streaming image data to Kinesis Data Streams and need to retain the data for 30 days.
How can you do this? (Choose two.)

A) Create a Kinesis Data Firehose delivery stream.

B) Increase the stream retention period to 14 days.

C) Specify an S3 bucket as the destination.

D) Specify CloudWatch Logs as the destination.

A

A, C. You can’t specify a retention period over 7 days, so your only option is to create a
Kinesis Data Firehose delivery stream that receives data from the Kinesis Data Stream and
sends the data to an S3 bucket.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Which of the following Kinesis services requires you to specify a destination for the stream?

A) Kinesis Video Streams

B) Kinesis Data Streams

C) Kinesis Data Firehose

D) Kinesis Data Warehouse

A

C. Kinesis Data Firehose requires you to specify a destination for a delivery stream. Kinesis
Video Streams and Kinesis Data Streams use a producer-consumer model that allows consumers to subscribe to a stream. There is no such thing as Kinesis Data Warehouse.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
14
Q

You’re running an on-premises application that frequently writes to a log file. You want
to stream this log file to a Kinesis Data Stream. How can you accomplish this with the
least effort?

A) Use the CloudWatch Logs Agent.

B) Use the Amazon Kinesis Agent.

C) Write a script that uses the Kinesis Producer Library.

D) Move the application to an EC2 instance.

A

B. The Amazon Kinesis Agent can automatically stream the contents of a file to Kinesis.
There’s no need to write any custom code or move the application to EC2. The CloudWatch
Logs Agent can’t send logs to a Kinesis Data Stream.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
15
Q

When deciding whether to use SQS or Kinesis Data Streams to ingest data, which of the
following should you take into account?

A) The frequency of data

B) The total amount of data

C) The number of consumers that need to receive the data

D) The order of data

A

C. SQS and Kinesis Data Streams are similar. But SQS is designed to temporarily hold a
small message until a single consumer processes it, whereas Kinesis Data Streams is designed
to provide durable storage and playback of large data streams to multiple consumers.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
16
Q

You want to send streaming log data into Amazon Redshift. Which of the following services
should you use? (Choose two.)

A) SQS with a standard queue

B) Kinesis Data Streams

C) Kinesis Data Firehose

D) SQS with a FIFO queue

A

B, C. You should stream the log data to Kinesis Data Streams and then have Kinesis Data
Firehose consume the data and stream it to Redshift.

17
Q

Which of the following is not an appropriate use case for Kinesis?

A) Stock feeds

B) Facial recognition

C) Static website hosting

D) Videoconferencing

A

C. Kinesis is for streaming data such as stock feeds and video. Static websites are not streaming data.

18
Q

You need to push 2 MB per second through a Kinesis Data Stream. How many shards do you
need to configure?

A) 1

B) 2

C) 4

D) 8

A

B. Shards determine the capacity of a Kinesis Data Stream. A single shard gives you writes of
up to 1 MB per second, so you’d need two shards to get 2 MB of throughput.

19
Q

Multiple consumers are receiving a Kinesis Data Stream at a total rate of 3 MB per second.
You plan to add more consumers and need the stream to support reads of at least 5 MB per
second. How many shards do you need to add?

A) 1

B) 2

C) 3

D) 4

A

A. Shards determine the capacity of a Kinesis Data Stream. Each shard supports 2 MB of
reads per second. Because consumers are already receiving a total of 3 MB per second, it
implies you have at least two shards already configured, supporting a total of 4 MB per second. Therefore, to support 5 MB per second you need to add just one more shard.

20
Q

Which of the following does Kinesis Data Firehose not support?

A) Videoconferencing

B) Transforming video metadata

C) Converting CSV to JSON

D) Redshift

A

A. Kinesis Data Firehose is designed to funnel streaming data to big data applications, such
as Redshift or Hadoop. It’s not designed for videoconferencing.