Intro to Apache Spark Flashcards

(13 cards)

1
Q

A unified analytics framework providing a consistent interface for handling big data across multiple domains.

A

Apache Spark

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Provides the foundation for all Spark applications, handling memory management, fault recovery, scheduling, and task distribution.

A

Spark Core Engine

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

The brain of a Spark application, responsible for planning and coordinating execution

A

Driver

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Manages cluster resources and allocates them to the Driver (internal).

A

Cluster Manager (Master)

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Nodes in the cluster that host Executors

A

Worker

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Processes on Worker nodes that execute tasks assigned by the Driver.

A

Executors

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Groups of tasks that can be executed in parallel

A

Stages

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

The individual units of work executed by Executors.

A

Tasks

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

This Spark UI provides per-application monitoring through the SparkSession, offering details on progress, DAG visualization, resource usage, and more.

A

Application UI

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q

This Spark UI gives a cluster-wide view for monitoring multiple applications, showing the health status of nodes and overall resource allocation.

A

Master UI

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
11
Q

Interactive clusters that support notebooks, jobs, and dashboards with configurable auto-termination.

A

All Purpose Clusters

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
12
Q

Ephemeral clusters that start when a job runs and terminate automatically upon completion, optimized for non-interactive workloads

A

Job Clusters

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
13
Q

Optimized clusters for SQL query performance with instant startup and auto-scaling to balance cost and performance

A

SQL Warehouses

How well did you know this?
1
Not at all
2
3
4
5
Perfectly