spark Flashcards

(10 cards)

1
Q

coomon featurs in hadoop and spark

A

statless/ memory less
no randomization

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

hadoop features

A

disk based
if machine fail- uses replica

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

spark features

A

uses RDD’s
stores data in memory
laxy execution
lineage traking

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

transformation opperations

A

filter
mapPartition and other maps
sample
reduceby key
distinctgeoupbykey

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

action operatings

A

reduce
aggregation
take sample
countbykey

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

wide dependency

A

reduce by key

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

diffrence between spark RDD and SQL filter function

A

Spark SQL filter function is more efficient

Spark SQL understands the logic while Spark RDD doesn’t have access to it

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
8
Q

Key difference between Spark RDD and Spark SQL

A

Spark SQL can achieve higher performance since it understands the logic of the opertaitons

Spark SQL employs the catalyst query optimizer

Spark SQL is awar of the data schem while spark RDD is schema agnostic

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
9
Q

operations

A

filter
projection
cross product
aggregation
union
intersection

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
10
Q
A
How well did you know this?
1
Not at all
2
3
4
5
Perfectly