Choosing the Right Database — What questions should you ask?
When evaluating AWS databases, consider:
Workload type: Read-heavy, write-heavy, or balanced?
Throughput needs: Will they scale or fluctuate through the day?
Data volume: How much data? How fast will it grow? Average object size?
Access patterns: How is the data accessed?
Durability needs: Is this the system of record?
Latency & concurrency: Required response time? Number of users?
Data model: Structured? Semi-structured? Need joins?
Schema flexibility: Strong schema vs. flexible?
Use case: Reporting? Search? RDBMS vs. NoSQL?
Licensing: Any cost considerations? Should you move to a cloud-native DB like Aurora?
AWS Database Types — What options exist and what are they for?
AWS offers multiple managed database categories:
RDBMS (SQL / OLTP):
RDS, Aurora — best for structured data, joins, transactions.
NoSQL:
DynamoDB (~JSON key-value),
ElastiCache (in-memory key/value),
Neptune (graph),
DocumentDB (MongoDB-compatible),
Keyspaces (Cassandra).
Object Storage:
S3 (large objects), Glacier (archival & backups).
Data Warehouse / Analytics:
Redshift (OLAP), Athena, EMR.
Search:
OpenSearch — full-text search, unstructured queries.
Graph:
Amazon Neptune — relationship-heavy data.
Ledger:
QLDB — immutable, cryptographically verifiable history.
Time Series:
Timestream — optimized for time-indexed data.
Amazon RDS — What does it provide?
Amazon RDS gives you fully managed relational databases:
Supports PostgreSQL, MySQL, MariaDB, Oracle, SQL Server, DB2, Custom.
Choose instance size, EBS volume type, and storage auto-scaling.
High availability via Multi-AZ and Read Replicas.
Security: IAM, Security Groups, KMS encryption, SSL in transit.
Backups: automated PITR (up to 35 days) + manual snapshots.
Managed maintenance, including patching (may cause downtime).
Supports IAM authentication and Secrets Manager integration.
RDS Custom allows deeper OS-level customization (Oracle & SQL Server).
Best for relational datasets, SQL queries, and transactional workloads.
Amazon Aurora — What makes it different from RDS?
Amazon Aurora is a high-performance, cloud-native relational database:
MySQL/PostgreSQL compatible with faster performance.
Storage & compute separated:
Storage: 6 replicas across 3 AZs, auto-healing, auto-scaling.
Compute: cluster of DB instances with read replica auto-scaling.
Cluster endpoints for writers vs. readers.
Same security & maintenance model as RDS.
Backup & restore features similar to RDS.
Aurora Serverless: automatic scaling for variable workloads.
Aurora Global: up to 16 read replicas per region, sub-1s replication.
Aurora Machine Learning: integrates with SageMaker & Comprehend.
Database cloning: rapid, copy-on-write, low-cost environment duplication.
Use case: same as RDS but with higher performance, lower maintenance, and more features.
Amazon ElastiCache — What is it and why use it?
Amazon ElastiCache provides fully managed in-memory caching:
Managed Redis and Memcached (similar to RDS, but for caches).
Sub-millisecond latency, very high performance.
Choose cache instance types (e.g., cache.m6g.large).
Supports:
Clustering (Redis)
Multi-AZ
Read replicas (sharding)
Security: IAM, Security Groups, KMS encryption.
Used to reduce load on databases for read-heavy workloads.
Helps make applications stateless.
AWS handles maintenance, patching, recovery, monitoring, config.
Typically requires application code changes to integrate caching.
Amazon DynamoDB — What are its key features?
Amazon DynamoDB is a fully managed, serverless NoSQL database:
Proprietary AWS service, single-digit millisecond latency.
Capacity modes:
Provisioned (with auto-scaling)
On-demand
Can act as a key/value store, even replacing ElastiCache for session data via TTL.
Highly available, Multi-AZ by default; reads/writes decoupled; supports transactions.
DAX: in-memory read cache with microsecond latency.
Security via IAM (auth & authorization).
Event processing:
DynamoDB Streams → Lambda, Kinesis Data Streams.
Global Tables: active-active multi-region replication.
Backups: PITR (35 days) + on-demand backups.
Import/export to S3 without consuming RCU/WCU (within PITR window).
Flexible for rapid schema evolution.
Use cases: serverless apps, small JSON-like items (hundreds of KB), distributed cache.
Amazon S3 — What should you know?
Amazon S3 is a massively scalable object storage service:
Key/value store for objects (not ideal for many very small objects).
Serverless, virtually unlimited scalability; max object size 5 TB.
Storage tiers:
S3 Standard
S3 Standard-IA
S3 Intelligent-Tiering
Glacier (Flexible Retrieval), Glacier Deep Archive
Use lifecycle policies to transition data.
Features:
Versioning
Encryption (SSE-S3, SSE-KMS, SSE-C, client-side)
Replication (CRR/ SRR)
MFA Delete
Access Logs
Access Points, Object Lambda
CORS
Performance:
Multi-part upload
S3 Transfer Acceleration
S3 Select
Automation: S3 Event Notifications → SNS, SQS, Lambda, EventBridge.
Use cases: static files, data lake storage, backups, big-object key/value store, website hosting.
Amazon DocumentDB — What is it and when to use it?
Amazon DocumentDB is AWS’s managed MongoDB-compatible document database:
Similar to how Aurora re-implements MySQL/PostgreSQL, DocumentDB re-implements MongoDB.
Designed to store, query, and index JSON documents.
Fully managed and highly available, with replication across 3 AZs.
Storage auto-grows in 10 GB increments.
Scales automatically to serve millions of requests per second.
Suitable for document workloads requiring nested, flexible, semi-structured data.
Amazon Neptune — What is it used for?
Amazon Neptune is a fully managed graph database:
Built for highly connected datasets: social networks, knowledge graphs, fraud graphs, recommendations.
Supports graph models and query languages (Gremlin, SPARQL).
Highly available with replication across multiple AZs.
Designed for complex, relationship-focused queries with millisecond latency.
Can store billions of relationships and query them efficiently.
Ideal where understanding links between entities is critical.
Amazon Neptune Streams — What does it provide?
Neptune Streams captures a real-time, ordered log of changes to graph data:
Records every create / update / delete operation.
Delivered immediately after the write, in strict order with no duplicates.
Accessible through an HTTP REST API.
Supports use cases like:
Triggering notifications on graph updates
Syncing graph data into other stores (S3, OpenSearch, ElastiCache)
Cross-region replication
Enables downstream systems to react to graph changes in real time.
Amazon Keyspaces — What is it and why use it?
Amazon Keyspaces is a serverless, fully managed Apache Cassandra–compatible database:
Built on open-source Cassandra semantics and uses CQL (Cassandra Query Language).
Serverless, automatically scales up/down with traffic.
Highly available, with data replicated 3× across AZs.
Single-digit millisecond latency at any scale; supports thousands of requests/sec.
Capacity options: on-demand or provisioned with auto-scaling.
Integrated with encryption, backups, and Point-In-Time Recovery (PITR) up to 35 days.
Great for IoT, time-series data, and workloads needing massive write scalability.
Amazon Timestream — What is it optimized for?
Amazon Timestream is a serverless, fast, scalable time-series database:
Automatically scales compute and storage as data grows.
Handles trillions of events per day with low latency.
Up to 1,000× faster and 1/10th the cost of relational DBs for time-series workloads.
Supports multi-measure records and scheduled queries.
Uses SQL-compatible querying.
Tiered storage: recent data in-memory; historical data in cost-optimized storage.
Built-in time-series analytics functions for near real-time insights.
Encryption in transit and at rest.
Ideal for IoT telemetry, operational monitoring, and real-time analytics.
Amazon Timestream — Architecture Overview
Timestream integrates seamlessly with AWS services for ingest, processing, and analytics:
Ingest sources:
AWS IoT
Kinesis Data Streams
Amazon MSK
Lambda
Prometheus
Processing & analytics:
Kinesis Data Analytics for Apache Flink
SageMaker
QuickSight
Access:
JDBC connections
Direct SQL queries
Storage tiering:
Recent data kept in-memory for fast queries
Historical data stored in cost-efficient tier
Used for large-scale time-series pipelines with analytics, ML, dashboards, and streaming ingestion.