Nothing Special   »   [go: up one dir, main page]

Spark Streaming

Download as txt, pdf, or txt
Download as txt, pdf, or txt
You are on page 1of 3
At a glance
Powered by AI
The basic abstraction of Spark Streaming is a discretized stream (DStream) which is a continuous stream of data represented as a sequence of RDDs. DStreams can be created from sources like Kafka, Flume and can save output to sinks like file systems.

The basic abstractions of Spark Streaming are discretized streams (DStreams) which represent a continuous stream of data as a sequence of RDDs. DStreams are internally represented as a collection of RDDs arriving at discrete time intervals.

Some sources that can be used in Spark Streaming include Kafka, Flume, Twitter and TCP sockets. Sinks include saving to file systems, databases like HDFS. The data can also be pushed to external storage systems.

The basic programming abstraction of Spark Streaming is _.

Dstreams--rgt

Which among the following can act as a data source for Spark Streaming?
All the options--rgt

Dstreams are internally, a collection of _.


RDD--rgt

HDFS cannot be a sink for Spark Streaming.


False--rgt

We cannot configure Twitter as a data source system for Spark Streaming.


False--rgt

Spark Streaming can be used for real-time processing of data.


True--rgt

Dstreams cannot be created directly from sources such as Kafka and Flume.
False--rgt

Internally DStream is represented as a sequence of _ arriving at discrete time


intervals
RDD--rgt

park streaming converts the input data streams into


micro-batches--rgt

Dstreams can be created from an existing Dstream.


True--rgt

How can a Dstream be created?


Both ways--rgt

Block Management units in the worker node reports to


Block Management Master in the Driver--rgt

Choose the correct statement.


All the options--rgt

Block Management Master keeps track of _


Block id--rgt

ssc.start() is the entry point for a Streaming application.


True--rgt

The receiver divides the stream into blocks and keeps them in memory.
True--rgt

Starting point of a streaming application is _.


ssc.start()--rgt

When is a batch interval defined?


creation of Streaming context--rgt

Sliding Interval is the interval at which sliding of the window area occur.
True--rgt

Which among the following needs to be a multiple of batch interval?


All the options--rgt

Which among the following is true about Window Operations?


All the options--rgt

There can be multiple Dstreams in a single window.


True--rgt

What is a Window Duration/Size?


Interval at which a certain fold operation is done on top of Dstreams.--rgt

Internally DStream is represented as a sequence of _ arriving at discrete time


intervals.

RDD--rgt

Spark streaming converts the input data streams into ______.

micro-batches--rgt

Block Management Master keeps track of ___.

Block id--rgt

Block Management units in the worker node reports to ____.

Block Management Master in the Driver--rgt

Which among the following is true about Spark Streaming?

All the options

reduceByKey is a _.

Transformation

With Spark Streaming, the incoming data is split into micro batches.

True--correct

What is the strategy taken in order to prevent loss of the incoming stream?

Data is replicated in different nodes

What does saveAsTextFiles(prefix, [suffix]) do?

Save this DStream's contents as text files--correct

Mllib and Spark SQL can work on top of the data taken up via Spark Streaming.

True--correct

What is a batch Interval?

Interval at which a certain operation is done on top of Dstreams.

Who is responsible for keeping track of the Block Ids?


Block Management Master in the Driver--correct

Which among the following are Basic Sources of Spark Streaming?

Kafka--correct

Which among the following can act as a data sink for Spark Streaming?

All the options

Which of the following transformations can be applied to a Dstream?

All the options--correct

Benefits of Discretized Stream Processing are ___.

All the options

Dstreams are _.

Collection of RDD

What is a Sliding Interval?

Interval at which sliding of the window area occur.

Dstreams are internally _.

Collection of RDD

DStream represents a continuous stream of data.

True--correct

Reciever recieves data from the Streaming sources at the start of _.

Streaming Context

Batch interval is configured at _.

it is 10 Seconds by default

You might also like