Structured Streaming Flashcards

(7 cards)

1
Q

Spark streaming

A

▪ Extends the core RDD API to support
streaming
▪ Not recommended

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
2
Q

Structured Streaming

A

▪ Extends the Dataframe API
(SparkSQL)
▪ More efficient and recommended
Streaming in Spark

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
3
Q

Output Mode: Complete

A

▪ Produce the entire output after each batch
▪ Usually useful for debugging and for small
output

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
4
Q

Output Mode: Append

A

▪ Only produce new output records
▪ No changes are allowed

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
5
Q

Output Mode: Produce

A

▪ Produce both new records and modified
records

How well did you know this?
1
Not at all
2
3
4
5
Perfectly
6
Q

Timestamps

A
  • In some cases, data might come with its
    own timestamps e.g ( Weather data,Crimes/Incidents)
  • Structured Streaming allows developers to
    utilize these timestamps
  • The assumption is that the data is already
    (partially) sorted by this timestamp
How well did you know this?
1
Not at all
2
3
4
5
Perfectly
7
Q

Watermarking

A
  • The input data might not be totally
    ordered
  • For example, when merging multiple
    ordered streams
  • Why not ensuring a total ordering?
  • Watermarking is a Structured
    Streaming feature for informing Spark
    how much error to tolerate
How well did you know this?
1
Not at all
2
3
4
5
Perfectly