Q1
What problem does Amazon Redshift solve compared to traditional data warehouses?
Q2
Why are traditional data warehouses considered inefficient in modern analytics environments?
Q3
What type of workloads is Amazon Redshift specifically designed for?
A1
It provides a fast, scalable, cost-effective data warehouse for analysing large datasets.
A2
Because they are expensive, slow to set up, and difficult to scale.
A3
Complex analytical queries on large-scale (petabyte) data.
Q4
What is a key difference between OLTP systems and Redshift workloads?
Q5
Why is Redshift not suitable for transactional systems?
Q6
What does “columnar storage” in Redshift improve?
A4
OLTP → small frequent transactions
Redshift → large analytical queries
A5
Because it is optimised for analytics, not real-time transactions.
A6
Query performance and compression efficiency
Q7
Explain the role of the leader node in Redshift.
Q8
Explain the role of compute nodes in Redshift.
Q9
What happens if the leader node fails?
A7
It manages queries, parses SQL, and coordinates execution.
A8
They execute query tasks and process data.
A9
Query coordination fails → cluster cannot process queries properly.
Q10
How does Redshift achieve parallel processing?
Q11
Why is parallel processing important in data analytics?
Q12
What is the main advantage of distributing queries across compute nodes?
A10
By splitting queries and distributing tasks across multiple nodes.
A11
It speeds up processing of large datasets.
A12
Faster query execution and reduced latency
Q13
A query is slow even though multiple nodes are available. What is a likely issue?
Q14
How does data distribution affect Redshift performance?
Q15
Why is poor data distribution a common performance bottleneck?
A13
Uneven workload distribution or poor data partitioning
A14
It determines how data is spread across nodes for processing
A15
Because some nodes become overloaded while others are idle
Q16
How does Redshift support scalability?
Q17
What is meant by “elastic scaling” in Redshift?
Q18
Why is scalability critical for analytics workloads?
A16
By adding/removing compute nodes
A17
Ability to adjust cluster size based on demand
A18
Because data volume and query complexity increase over time
Q19
What administrative tasks are automated in Redshift?
Q20
Why is automation important in managed data warehouses?
Q21
What role does monitoring play in Redshift optimisation?
A19
Scaling, backups, monitoring, maintenance
A20
Reduces manual effort and operational overhead
A21
Identifies performance issues and resource usage
Q22
How does Redshift ensure data security?
Q23
What is the difference between encryption at rest and in transit?
Q24
Why is security especially important in data warehouses?
A22
Through encryption and access controls
A23
At rest → stored data
In transit → data being transferred
A24
Because sensitive business and analytical data is store
Q25
Why is Redshift compatible with SQL-based tools important?
Q26
How does Redshift integrate with business intelligence tools?
Q27
What is JDBC and why is it relevant?
A25
Allows use of existing tools and skills
A26
Through SQL connections to BI tools
A27
Java Database Connectivity — enables external tools to connect
Q28
A company wants to analyse massive historical datasets cheaply. Why is Redshift suitable?
Q29
Why is Redshift ideal for SaaS applications?
Q30
What is the biggest limitation of Redshift compared to DynamoDB?
A28
Because it handles large-scale analytics efficiently at lower cost
A29
Because it scales analytics as demand grows
A30
It is not designed for low-latency real-time queries