What are the advantages of using Apache Storm's KafkaBolt in apache storm 1.2.2 instead of using the kafka producer apis directly from the bolt in topology to publish to downstream kafka topics?
Related
In the Spring Cloud website (https://spring.io/projects/spring-cloud-stream), are listed the binders options to use. And there we have the Apache Kafka and the Kafka Streams options.
What's the difference between them?
For what purpose we should choose between these two?
The Apache Kafka binder is used for basic kafka client usage consumer/producer api,
Kafka Stream binder is built upon the base apache kafka binder and adds the ability to use kafka streams api,
Kafka streams api is lightweight code libraries which gives you the functionality to manipulate data from topic/s in kafka to other topic/s in kafka , allow you to transform, enhance, filter,join, aggregate and more...
The Apache Kafka Binder implementation maps each destination to an Apache Kafka topic. The consumer group maps directly to the same Apache Kafka concept. Partitioning also maps directly to Apache Kafka partitions as well.
The binder currently uses the Apache Kafka kafka-clients version 2.3.1. This client can communicate with older brokers (see the Kafka documentation), but certain features may not be available. For example, with versions earlier than 0.11.x.x, native headers are not supported. Also, 0.11.x.x does not support the autoAddPartitions property
https://docs.spring.io/spring-cloud-stream-binder-kafka/docs/3.1.3/reference/html/spring-cloud-stream-binder-kafka.html#_apache_kafka_binder
Spring Cloud Stream includes a binder implementation designed explicitly for Apache Kafka Streams binding. With this native integration, a Spring Cloud Stream "processor" application can directly use the Apache Kafka Streams APIs in the core business logic.
Kafka Streams binder implementation builds on the foundations provided by the Spring for Apache Kafka project.
Kafka Streams binder provides binding capabilities for the three major types in Kafka Streams - KStream, KTable and GlobalKTable.
Kafka Streams applications typically follow a model in which the records are read from an inbound topic, apply business logic, and then write the transformed records to an outbound topic. Alternatively, a Processor application with no outbound destination can be defined as well.
https://docs.spring.io/spring-cloud-stream-binder-kafka/docs/3.1.3/reference/html/spring-cloud-stream-binder-kafka.html#_kafka_streams_binder
I wanted use Apache Spark Structured Streaming along with Kafka, Spark Structured Streaming Supports Kafka 0.10 and above and my Kafka cluster uses kafka version 0.8.2.1 . I want to replicate some of the topics from current kafka 0.8.2.1 cluster to new Kafka Cluster which is based on 2.2.0.
To do this i tried using kafka-console-consumer on Kafka 2.2.0 cluster to listen the messages from kafka cluster 0.8.2.1 and piped the result of kafka-console-consumer to kafka-console-producer on the the kafka 2.2.0 cluster. But that didn't kafka-console-consumer on Kafka 2.2.0 cluster was not able to receive any messages.
As of now I have solved this problem by reading the data from kafka 0.8.2.1 cluster using the Java Client APIs and I am writing the data read from older kafka cluster(0.8.2.1) to newer kafka cluster(2.2.0) using the client APIs.
Can anyone suggest some better ways to mirror two kafka clusters running different versions of Kafka?
I am trying to integrate MongoDB and Storm-Kafka, Kafka Producer produces data from MongoDB but it fails to fetch from Consumer side.
Kafka version :0.10.*
Storm version :1.2.1
Do i need to add any functionality in Consumer?
What is the difference between KafkaSpout and KafkaBolt object ? Usually KafkaSpout is used for reading data from kafka producers, but why we use KafkaBolt ?
Bolts in Storm write data. Spouts are Kafka consumers. You read from the broker directly, not from producers.
For example, you can use a Spout to read anything, transform that data within the topology, then setup a Kafka Bolt to produce data into Kafka
I am using Kafka client library comes with Kafka 0.11.0.1. I noticed that using kafkaconsumer does not need to configure zookeeper anymore. Does that mean zookeep server will automatically be located by the kafka bootstrap server?
Since Kafka 0.9 the KafkaConsumer implementation stores offsets commit and consumer group information in Kafka brokers themselves. This eliminates the zookeeper dependency and increases the scalability of the consumers.