Spark streaming
▪ Extends the core RDD API to support
streaming
▪ Not recommended
Structured Streaming
▪ Extends the Dataframe API
(SparkSQL)
▪ More efficient and recommended
Streaming in Spark
Output Mode: Complete
▪ Produce the entire output after each batch
▪ Usually useful for debugging and for small
output
Output Mode: Append
▪ Only produce new output records
▪ No changes are allowed
Output Mode: Produce
▪ Produce both new records and modified
records
Timestamps
Watermarking