Databases in AWS Flashcards

Question 1

Q

Choosing the Right Database — What questions should you ask?

Answer

A

When evaluating AWS databases, consider:

Workload type: Read-heavy, write-heavy, or balanced?

Throughput needs: Will they scale or fluctuate through the day?

Data volume: How much data? How fast will it grow? Average object size?

Access patterns: How is the data accessed?

Durability needs: Is this the system of record?

Latency & concurrency: Required response time? Number of users?

Data model: Structured? Semi-structured? Need joins?

Schema flexibility: Strong schema vs. flexible?

Use case: Reporting? Search? RDBMS vs. NoSQL?

Licensing: Any cost considerations? Should you move to a cloud-native DB like Aurora?

Question 2

Q

AWS Database Types — What options exist and what are they for?

Answer

A

AWS offers multiple managed database categories:

RDBMS (SQL / OLTP):
RDS, Aurora — best for structured data, joins, transactions.

NoSQL:
DynamoDB (~JSON key-value),
ElastiCache (in-memory key/value),
Neptune (graph),
DocumentDB (MongoDB-compatible),
Keyspaces (Cassandra).

Object Storage:
S3 (large objects), Glacier (archival & backups).

Data Warehouse / Analytics:
Redshift (OLAP), Athena, EMR.

Search:
OpenSearch — full-text search, unstructured queries.

Graph:
Amazon Neptune — relationship-heavy data.

Ledger:
QLDB — immutable, cryptographically verifiable history.

Time Series:
Timestream — optimized for time-indexed data.

Question 3

Q

Amazon RDS — What does it provide?

Answer

A

Amazon RDS gives you fully managed relational databases:

Supports PostgreSQL, MySQL, MariaDB, Oracle, SQL Server, DB2, Custom.

Choose instance size, EBS volume type, and storage auto-scaling.

High availability via Multi-AZ and Read Replicas.

Security: IAM, Security Groups, KMS encryption, SSL in transit.

Backups: automated PITR (up to 35 days) + manual snapshots.

Managed maintenance, including patching (may cause downtime).

Supports IAM authentication and Secrets Manager integration.

RDS Custom allows deeper OS-level customization (Oracle & SQL Server).

Best for relational datasets, SQL queries, and transactional workloads.

Question 4

Q

Amazon Aurora — What makes it different from RDS?

Answer

A

Amazon Aurora is a high-performance, cloud-native relational database:

MySQL/PostgreSQL compatible with faster performance.

Storage & compute separated:

Storage: 6 replicas across 3 AZs, auto-healing, auto-scaling.

Compute: cluster of DB instances with read replica auto-scaling.

Cluster endpoints for writers vs. readers.

Same security & maintenance model as RDS.

Backup & restore features similar to RDS.

Aurora Serverless: automatic scaling for variable workloads.

Aurora Global: up to 16 read replicas per region, sub-1s replication.

Aurora Machine Learning: integrates with SageMaker & Comprehend.

Database cloning: rapid, copy-on-write, low-cost environment duplication.

Use case: same as RDS but with higher performance, lower maintenance, and more features.

Question 5

Q

Amazon ElastiCache — What is it and why use it?

Answer

A

Amazon ElastiCache provides fully managed in-memory caching:

Managed Redis and Memcached (similar to RDS, but for caches).

Sub-millisecond latency, very high performance.

Choose cache instance types (e.g., cache.m6g.large).

Supports:

Clustering (Redis)

Multi-AZ

Read replicas (sharding)

Security: IAM, Security Groups, KMS encryption.

Used to reduce load on databases for read-heavy workloads.

Helps make applications stateless.

AWS handles maintenance, patching, recovery, monitoring, config.

Typically requires application code changes to integrate caching.

Question 6

Q

Amazon DynamoDB — What are its key features?

Answer

A

Amazon DynamoDB is a fully managed, serverless NoSQL database:

Proprietary AWS service, single-digit millisecond latency.

Capacity modes:

Provisioned (with auto-scaling)

On-demand

Can act as a key/value store, even replacing ElastiCache for session data via TTL.

Highly available, Multi-AZ by default; reads/writes decoupled; supports transactions.

DAX: in-memory read cache with microsecond latency.

Security via IAM (auth & authorization).

Event processing:

DynamoDB Streams → Lambda, Kinesis Data Streams.

Global Tables: active-active multi-region replication.

Backups: PITR (35 days) + on-demand backups.

Import/export to S3 without consuming RCU/WCU (within PITR window).

Flexible for rapid schema evolution.

Use cases: serverless apps, small JSON-like items (hundreds of KB), distributed cache.

Question 7

Q

Amazon S3 — What should you know?

Answer

A

Amazon S3 is a massively scalable object storage service:

Key/value store for objects (not ideal for many very small objects).

Serverless, virtually unlimited scalability; max object size 5 TB.

Storage tiers:

S3 Standard

S3 Standard-IA

S3 Intelligent-Tiering

Glacier (Flexible Retrieval), Glacier Deep Archive

Use lifecycle policies to transition data.

Features:

Versioning

Encryption (SSE-S3, SSE-KMS, SSE-C, client-side)

Replication (CRR/ SRR)

MFA Delete

Access Logs

Access Points, Object Lambda

CORS

Performance:

Multi-part upload

S3 Transfer Acceleration

S3 Select

Automation: S3 Event Notifications → SNS, SQS, Lambda, EventBridge.

Use cases: static files, data lake storage, backups, big-object key/value store, website hosting.

Question 8

Q

Amazon DocumentDB — What is it and when to use it?

Answer

A

Amazon DocumentDB is AWS’s managed MongoDB-compatible document database:

Similar to how Aurora re-implements MySQL/PostgreSQL, DocumentDB re-implements MongoDB.

Designed to store, query, and index JSON documents.

Fully managed and highly available, with replication across 3 AZs.

Storage auto-grows in 10 GB increments.

Scales automatically to serve millions of requests per second.

Suitable for document workloads requiring nested, flexible, semi-structured data.

Question 9

Q

Amazon Neptune — What is it used for?

Answer

A

Amazon Neptune is a fully managed graph database:

Built for highly connected datasets: social networks, knowledge graphs, fraud graphs, recommendations.

Supports graph models and query languages (Gremlin, SPARQL).

Highly available with replication across multiple AZs.

Designed for complex, relationship-focused queries with millisecond latency.

Can store billions of relationships and query them efficiently.

Ideal where understanding links between entities is critical.

Question 10

Q

Amazon Neptune Streams — What does it provide?

Answer

A

Neptune Streams captures a real-time, ordered log of changes to graph data:

Records every create / update / delete operation.

Delivered immediately after the write, in strict order with no duplicates.

Accessible through an HTTP REST API.

Supports use cases like:

Triggering notifications on graph updates

Syncing graph data into other stores (S3, OpenSearch, ElastiCache)

Cross-region replication

Enables downstream systems to react to graph changes in real time.

Question 11

Q

Amazon Keyspaces — What is it and why use it?

Answer

A

Amazon Keyspaces is a serverless, fully managed Apache Cassandra–compatible database:

Built on open-source Cassandra semantics and uses CQL (Cassandra Query Language).

Serverless, automatically scales up/down with traffic.

Highly available, with data replicated 3× across AZs.

Single-digit millisecond latency at any scale; supports thousands of requests/sec.

Capacity options: on-demand or provisioned with auto-scaling.

Integrated with encryption, backups, and Point-In-Time Recovery (PITR) up to 35 days.

Great for IoT, time-series data, and workloads needing massive write scalability.

Question 12

Q

Amazon Timestream — What is it optimized for?

Answer

A

Amazon Timestream is a serverless, fast, scalable time-series database:

Automatically scales compute and storage as data grows.

Handles trillions of events per day with low latency.

Up to 1,000× faster and 1/10th the cost of relational DBs for time-series workloads.

Supports multi-measure records and scheduled queries.

Uses SQL-compatible querying.

Tiered storage: recent data in-memory; historical data in cost-optimized storage.

Built-in time-series analytics functions for near real-time insights.

Encryption in transit and at rest.

Ideal for IoT telemetry, operational monitoring, and real-time analytics.

Question 13

Q

Amazon Timestream — Architecture Overview

Answer

A

Timestream integrates seamlessly with AWS services for ingest, processing, and analytics:

Ingest sources:

AWS IoT

Kinesis Data Streams

Amazon MSK

Lambda

Prometheus

Processing & analytics:

Kinesis Data Analytics for Apache Flink

SageMaker

QuickSight

Access:

JDBC connections

Direct SQL queries

Storage tiering:

Recent data kept in-memory for fast queries

Historical data stored in cost-efficient tier

Used for large-scale time-series pipelines with analytics, ML, dashboards, and streaming ingestion.

Question 14

Q

Databases in AWS Flashcards

(14 cards)