Kafka Stream vs Apache Storm? - apache-kafka

I know how Kafka can be used as a message queue and can also be connected to Storm for real time data processing, I want to know Is there any advantage of using Kafka stream api over storm ?

Related

Kafka stream vs Kafka Consumer and Java Stream APIs

I am new to Kafka Stream. I wanted to know what benefits one get by using kafka stream APIs instead of using standard kafka Producer / Consumer APIs and doing the stream processing in Consumer with Java Stream APIs.
Java Stream API has nothing to do with Kafka APIs. You can do consumerRecords.forEach from the Consumer poll iterator, or call producer send method from a Java stream as well, and that's about it.
The Kafka Streams API allows you to more easily map/filter/branch/join topic data, as well as access persistent StateStores. It also has foreach and peek methods, which I assume you're referring to? You have more direct access when to commit your offsets with the Consumer API than any Kafka Streams methods.
For producer, you cannot use Kafka Streams API to produce original data into a topic.

Process messages pushed through Kafka

I haven't used Kafka before and wanted to know if messages are published through Kafka what are the possible ways to capture that info?
Is Kafka only way to receive that info via "Consumers" or can Rest APIs be also used here?
Haven't used Kafka before and while reading up I did find that Kafka needs ZooKeeper running too.
I don't need to publish info just process data received from Kafka publisher.
Any pointers will help.
Kafka is a distributed streaming platform that allows you to process streams of records in near real-time.
Producers publish records/messages to Topics in the cluster.
Consumers subscribe to Topics and process those messages as they are available.
The Kafka docs are an excellent place to get up to speed on the core concepts: https://kafka.apache.org/intro
Is Kafka only way to receive that info via "Consumers" or can Rest APIs be also used here?
Kafka has its own TCP based protocol, not a native HTTP client (assuming that's what you actually mean by REST)
Consumers are the only way to get and subsequently process data, however plenty of external tooling exists to make it so you don't have to write really any code if you don't want to in order to work on that data

Consuming Kafka messages by two separate applications (storm and spark streaming)

We have a developed an ingestion application using Storm which consume Kafka messages (some times series sensor data) and save those messages into Cassandra. We use a Nifi workflow to do this.
I am now going to develop a separate Spark Streaming application which need to consume those Kafka messages as a source. I wonder why if there would be a problem when two application interacting with one Kafka Chanel? Should I duplicate Kafka messages in the Nifi to another Chanel so my Spark Streaming application use them, this is an overhead though.
From Kafka documentation:
If all the consumer instances have different consumer groups, then each record will be broadcast to all the consumer processes.
Which in your case means that your second application just have to use another consumer group, so that these two applications will get same messages.

Kafka Stream API vs Consumer API

I need to read from a specific Kafka topic, do a VERY short processing on the message and pass it on to a different Kafka cluster.
Currently, I'm using a consumer that's also a producer on the other kafka server.
However, the streaming API supposedly offers a more light-weight high-throughput option.
So the questions are:
Assuming my processing code doesn't require much horse power, is the streaming API better?
Does the streaming APi support writing to a different Kafka cluster?
What are the Streaming API cons comparing to the Consumer API?
Unfortunately KafkaStreams doesn't currently support writing to a different Kafka cluster.

Flafka (Http -> Flume->Kafka ->Spark Streaming)

I have one use case for real time streaming, we will be using Kafka(0.9) for message buffer and spark streaming(1.6) for stream processing (HDP 2.4). We will receive ~80-90K/Sec event on Http. Can you please suggest a recommended architecture for data ingestion into Kafka topics which will be consumed by spark streaming.
We are considering flafka architecture.
Is Flume listening to Http and sending to Kafka (Flafka )for real time streaming a good option?
Please share other possible approaches if any.
One approach could be Kafka Connect. Look for a source that fit in your needs or develop a custom new one.