Storage Flashcards

Question 1

Q

What is the max size of an object in S3 Bucket?

Answer

A

The max size is 5TB

Question 2

Q

How upload more than 5GB?

Answer

A

You must use “multi-part upload”

Question 3

Q

Amazon S3 is strong consistency? What that means?

Answer

A

Yes, it is.

After a successsful write of a new object (new PUT) or an overwrite or delete of an existing object (overwrite PUT or DELETE)

Any subsequent read request immediately receives the last version of the object (read after write consistency)
subsequent list request immediately reflects changes (list consistency)

Question 4

Q

What are the classess of S3 Storage Classes?

Answer

A

• Amazon S3 Standard - General Purpose

– High durability, 99.99% Availability
– Use Cases: Big Data analytics, mobile & gaming applications, content distribution…

• Amazon S3 Standard-Infrequent Access (IA)

– Suitable for data that is less frequently accessed, but requires rapid access when needed
– High durability and Availability
– Use Cases: As a data store for disaster recovery, backups…

• Amazon S3 One Zone-Infrequent Access

– Low cost compared to IA (by 20%)
– Use Cases: Storing secondary backup copies of on-premise data, or storing data you can recreate

• Amazon S3 Intelligent Tiering
— Automatically moves objects between two access tiers based on changing access patterns

• Amazon Glacier

– Low cost object storage meant for archiving / backup
– Data is retained for the longer term (10s of years)
– Time to retrieve object: Expedited (1 to 5 minutes) / Standard (3 to 5 hours) / Bulk (5 to 12 hours)
– Minimum storage duration of 90 days

• Amazon Glacier Deep Archive
— Time to retrieve object: Standard (12 hours) / Bulk (48 hours) / Minimum storage duration of 180 days

(Slide 115)

Question 5

Q

What are the S3 Lifecycle Rules?

Answer

A

Transition actions
--- It defines when objects are transitioned to another storage class (move objects to Standard IA class 60 days after creation)

Expiration actions
— Configure objects to expire (delete) after some time
Can be used to delete old version of files

Rules can be created for a certain prefix (ex - s3://mybucket/mp3/*)
Rules can be created for certain objects tags (ex - Department: Finance)

Question 6

Q

How does S3 Performance work?

Answer

A

Amazon S3 automatically scales to high request rates, latency 100-200 ms

Your application can achieve at least 3,500
PUT/COPY/POST/DELETE and 5,500 GET/HEAD requests per second per prefix in a bucket.

Question 7

Q

How does upload in S3 work?

Answer

A

We have two options:

• Multi-Part upload:
    • recommended for files > 100MB,
must use for files > 5GB
    • Can help parallelize uploads
(speed up transfers)

• S3 Transfer Acceleration
    • Increase transfer speed by
transferring file to an AWS edge
location which will forward the data
to the S3 bucket in the target region
    • Compatible with multi-part upload

Question 8

Q

How does Download in S3 work?

Answer

A

You can use S3 byte-range feches:
• Parallelize GETs by requesting
specific byte ranges
• Better resilience in case of failures

Can be used to speed up downloads

Can be used to retrieve only partial
data (for example the head of a
file)

Question 9

Q

How does S3 Encryption work?

Answer

A

There are 4 methods of encrypting objects in S3
• SSE-S3: encrypts S3 objects using keys handled & managed by AWS
— AES-256 encryption type
— Must set header: “x-amz-server-side-encryption”: “AES256”

• SSE-KMS: leverage AWS Key Management Service to manage
encryption keys
— KMS Advantages: user control + audit trail
— Must set header: “x-amz-server-side-encryption”: ”aws:kms”

• SSE-C: when you want to manage your own encryption keys

– HTTPS must be used
– Encryption key must provided in HTTP headers, for every HTTP request made

• Client Side Encryption

— Customer fully manages the keys and encryption cycle

(slide 128)

Question 10

Q

How does S3 Security Access work?

Answer

A

User based
• IAM policies - which API calls should be allowed for a specific user from IAM console

Resource Based
• Bucket Policies - bucket wide rules from the S3 console - allows cross
account
• Object Access Control List (ACL) – finer grain
• Bucket Access Control List (ACL) – less common

Note: an IAM principal can access an S3 object if
• the user IAM permissions allow it OR the resource policy ALLOWS it
• AND there’s no explicit DENY

Question 11

Q

How does S3 Security work?

Answer

A

Can be user based (IAM polices), Resource Based (bucket policies).

Networking - Supports VPC Endpoints
Logging and Audit - S3 Access Logs can be stored in other S3 bucket / API calls can be logged in AWS CloudTrail

User Security - MFA Delete / Pre-Signed URLs: URLs that are valid only for a limited time (ex:
premium video service for logged in users)

Question 12

Q

How does DynamoDb Partition work?

Answer

A

You start with one partition
Each partition:
- Max of 3000 RCU / 1000 WCU
- Max of 10GB
To compute the number of partitions:
- By capacity: (TOTAL RCU / 3000) + (TOTAL WCU / 1000)
- By size: Total Size / 10 GB
- Total partitions = CEILING(MAX(Capacity, Size))

• WCU and RCU are spread evenly between partitions

Question 13

Q

How do DynamoDb Conditional Writes work?

Answer

A

Accept a write / update only if conditions are respected, otherwise reject
Helps with concurrent access to items
No performance impact

Question 14

Q

How do DynamoDb Batching Writes work? What are the benefits?

Answer

A

BatchWriteItem
- Up to 25 PutItem and / or DeleteItem in one call
- Up to 16 MB of data written
- Up to 400 KB of data per item
Batching allows you to save in latency by reducing the number of API calls done against DynamoDB
Operations are done in parallel for better efficiency
It’s possible for part of a batch to fail, in which case we have the try the failed items (using exponential back-off algorithm)

Question 15

Q

How do DynamoDb Batching Read work? What are the benefits?

Answer

A

GetItem:
- Read based on Primary key
- Primary Key = HASH or HASH-RANGE
- Eventually consistent read by default
- Option to use strongly consistent reads (more RCU - might take longer)
- ProjectionExpression can be specified to include only certain attributes
BatchGetItem:
- Up to 100 items
- Up to 16 MB of data
- Items are retrieved in parallel to minimize latency

DynamoDB – Query

Query returns items based on:
- PartitionKey value (must be = operator)
- SortKey value (=, , >=, Between, Begin) – optional
- FilterExpression to further filter (client side filtering)
Returns:
- Up to 1 MB of data
- Or number of items specified in Limit
Able to do pagination on the results
Can query table, a local secondary index, or a global secondary index

DynamoDB - Scan
• Scan the entire table and then filter out data (inefficient)
• Returns up to 1 MB of data – use pagination to keep on reading
• Consumes a lot of RCU
• Limit impact using Limit or reduce the size of the result and pause
• For faster performance, use parallel scans:
• Multiple instances scan multiple partitions at the same time
• Increases the throughput and RCU consumed
• Limit the impact of parallel scans just like you would for Scans
• Can use a ProjectionExpression + FilterExpression (no change to RCU)

Question 16

Q

What is Glue?

Answer

Study These Flashcards

A

Serverless discovery and definition
of table definitions and schema (S3 datalakes, RDS,…)

Custom ETL jobs fully managed, trigger-drive, on a schedule, or on demand.

Question 17

Q

How Glue and S3 partitions work?

Answer

Study These Flashcards

A

Glue crawler will extract partitions based on how your S3 data is organized
Think up front about how you will be querying your data lake in S3
Example: devices send sensor data every hour

Do you query primarily by time ranges?
• If so, organize your buckets as yyyy/mm/dd/device

Do you query primarily by device?
• If so, organize your buckets as device/yyyy/mm/dd

(slide 188)

Question 18

Q

How Glue + Hive work?

Answer

Study These Flashcards

A

• Hive lets you run SQL-like queries
from EMR
• The Glue Data Catalog can serve as
a Hive “metastore”
• You can also import a Hive
metastore into Glue

Storage Flashcards

(18 cards)