AWS module 8.3 amazon redshift Flashcards

(10 cards)

1
Q

Q1

What problem does Amazon Redshift solve compared to traditional data warehouses?

Q2

Why are traditional data warehouses considered inefficient in modern analytics environments?

Q3

What type of workloads is Amazon Redshift specifically designed for?

A

A1

It provides a fast, scalable, cost-effective data warehouse for analysing large datasets.

A2

Because they are expensive, slow to set up, and difficult to scale.

A3

Complex analytical queries on large-scale (petabyte) data.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Q4

What is a key difference between OLTP systems and Redshift workloads?

Q5

Why is Redshift not suitable for transactional systems?

Q6

What does “columnar storage” in Redshift improve?

A

A4

OLTP → small frequent transactions
Redshift → large analytical queries

A5

Because it is optimised for analytics, not real-time transactions.

A6

Query performance and compression efficiency

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Q7

Explain the role of the leader node in Redshift.

Q8

Explain the role of compute nodes in Redshift.

Q9

What happens if the leader node fails?

A

A7

It manages queries, parses SQL, and coordinates execution.

A8

They execute query tasks and process data.

A9

Query coordination fails → cluster cannot process queries properly.

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Q10

How does Redshift achieve parallel processing?

Q11

Why is parallel processing important in data analytics?

Q12

What is the main advantage of distributing queries across compute nodes?

A

A10

By splitting queries and distributing tasks across multiple nodes.

A11

It speeds up processing of large datasets.

A12

Faster query execution and reduced latency

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Q13

A query is slow even though multiple nodes are available. What is a likely issue?

Q14

How does data distribution affect Redshift performance?

Q15

Why is poor data distribution a common performance bottleneck?

A

A13

Uneven workload distribution or poor data partitioning

A14

It determines how data is spread across nodes for processing

A15

Because some nodes become overloaded while others are idle

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Q16

How does Redshift support scalability?

Q17

What is meant by “elastic scaling” in Redshift?

Q18

Why is scalability critical for analytics workloads?

A

A16

By adding/removing compute nodes

A17

Ability to adjust cluster size based on demand

A18

Because data volume and query complexity increase over time

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Q19

What administrative tasks are automated in Redshift?

Q20

Why is automation important in managed data warehouses?

Q21

What role does monitoring play in Redshift optimisation?

A

A19

Scaling, backups, monitoring, maintenance

A20

Reduces manual effort and operational overhead

A21

Identifies performance issues and resource usage

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Q22

How does Redshift ensure data security?

Q23

What is the difference between encryption at rest and in transit?

Q24

Why is security especially important in data warehouses?

A

A22

Through encryption and access controls

A23

At rest → stored data
In transit → data being transferred

A24

Because sensitive business and analytical data is store

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

Q25

Why is Redshift compatible with SQL-based tools important?

Q26

How does Redshift integrate with business intelligence tools?

Q27

What is JDBC and why is it relevant?

A

A25

Allows use of existing tools and skills

A26

Through SQL connections to BI tools

A27

Java Database Connectivity — enables external tools to connect

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

Q28

A company wants to analyse massive historical datasets cheaply. Why is Redshift suitable?

Q29

Why is Redshift ideal for SaaS applications?

Q30

What is the biggest limitation of Redshift compared to DynamoDB?

A

A28

Because it handles large-scale analytics efficiently at lower cost

A29

Because it scales analytics as demand grows

A30

It is not designed for low-latency real-time queries

How well did you know this?
1
Not at all
2
3
4
5
Perfectly