spark Flashcards by Sophia Chavez

coomon featurs in hadoop and spark

statless/ memory less
no randomization

How well did you know this?

Not at all

Perfectly

hadoop features

disk based
if machine fail- uses replica

How well did you know this?

Not at all

Perfectly

spark features

uses RDD’s
stores data in memory
laxy execution
lineage traking

How well did you know this?

Not at all

Perfectly

transformation opperations

filter
mapPartition and other maps
sample
reduceby key
distinctgeoupbykey

How well did you know this?

Not at all

Perfectly

action operatings

reduce
aggregation
take sample
countbykey

How well did you know this?

Not at all

Perfectly

wide dependency

reduce by key

How well did you know this?

Not at all

Perfectly

diffrence between spark RDD and SQL filter function

Spark SQL filter function is more efficient

Spark SQL understands the logic while Spark RDD doesn’t have access to it

How well did you know this?

Not at all

Perfectly

Key difference between Spark RDD and Spark SQL

Spark SQL can achieve higher performance since it understands the logic of the opertaitons

Spark SQL employs the catalyst query optimizer

Spark SQL is awar of the data schem while spark RDD is schema agnostic

How well did you know this?

Not at all

Perfectly

operations

filter
projection
cross product
aggregation
union
intersection

How well did you know this?

Not at all

Perfectly

How well did you know this?

Not at all

Perfectly

spark Flashcards

(10 cards)