Spark Streaming: Understanding StreamingContext

Spark Streaming wasn't the first streaming architecture. Over time, multiple technologies have been developed in order to address various real-time processing needs. One of the first popular stream processor technologies was Twitter Storm, and it was used in many businesses.

Spark includes the streaming library, which has grown to become the most widely used technology today. This is mainly because Spark Streaming holds some significant advantages over all of the other technologies, the most important being its integration of Spark Streaming APIs within its core API. Not only that, but Spark Streaming is also integrated with Spark ML and Spark SQL, along with GraphX.

Because of all of these integrations, Spark is a powerful and versatile streaming technology.

This tutorial has been taken from Big Data Analytics with Hadoop 3 written by Sridhar Alla and published by Packt.

Note that you can find more information here on Spark Streaming Flink, Heron (Twitter Storm's successor), and Samza and their various features; for example, their ability to handle events while minimizing latency. However, Spark Streaming consumes data and processes it in microbatches. The size of these microbatches is of a minimum of 500 milliseconds.

Spark Streaming works by creating batches of events at certain time intervals, as ...


Read More on Datafloq

Comments

Popular posts from this blog

Underwater Autonomous Vehicles Helping Navy Get More for the Money 

Canada regulator seeks information from public on Rogers-Shaw deal