both "Kafka Spout" and "Kafka Consumer" do retrieve data from the Kafka Brokers, the spout so far i know is for communicating with Storm, and the Consumer is with whatever else.
-but still, what is the difference technically?
-or what would be the difference between If i pulled out the data using a Consumer then receive it using a "Storm Spout" and between if i just used a "Kafka Spout" then add it to my Storm Topology Builder's setSpout(); function
-and when to use Consumer, or a Kafka Spout
A/the "Kafka Spout" is a Storm-specific adapter to read data from Kafka into a Storm topology. Behind the scenes, the Kafka spout actually uses Kafka's built-in "Kafka consumer" client.
Technically, the difference is that the Kafka spout is a kind of a Storm-aware "wrapper" on top of Kafka's consumer client.
In Storm, you should typically always use the included Kafka spout (see https://github.com/apache/storm/tree/master/external/storm-kafka or, for a spout implementation that uses Kafka's so-called "new" consumer client, https://github.com/apache/storm/tree/master/external/storm-kafka-client). It would be a very rare case to implement your own -- perhaps the most likely case would be if there is a bug in the existing Kafka spout that you need to work around until the Storm project fixes the bug upstream.
Related
I want to replace existing JMS message consumer in EJB with Apache Kafka message consumer. I am not able to figure out the option to configure Apache Kafka consumer with EJB configuration.
Kafka is not a straight messaging solution (comparable to say RabbitMQ), but can be used as one.
You will need to translate your JMS concepts (topics, queues) into Kafka topics (which are closer to JMS topics).
Also given that consumers have a configureable/storeable start offset you will need to define these policies for your consumers.
I run a system comprising an InfluxDB, a Kafka Broker and data sources (sensors) producing time series data. The purpose of the broker is to protect the database from inbound event overload and as a format-agnostic platform for ingesting data. The data is transferred from Kafka to InfluxDB via Apache Camel routes.
I would like to use Kafka a intermediate message buffer in case a Camel route crashes or becomes unavailable - which is the most often error in the system. Up to now, I didn’t achieve to configure Kafka in a manner that inbound messages remain available for later consumption.
How do I configure it properly?
The messages will retain in Kafka topics based on its retention policies (you can choose between time or byte size limits) as described in the Topic Configurations. With
cleanup.policy=delete
Retention.ms=-1
the messages will in a Kafka topic will never be deleted.
Then your camel consumer will be able to re-read all messages (offsets) if you select a new consumer group or reset the offsets of the existing consumer group. Otherwise, your camel consumer might auto commit the messages (check corresponding consumer configuration) and it will not be possible to re-read offsets again for the same consumer group.
To limit the consumption rate of the camel consumer you may adjust configurations like maxPollRecords or fetchMaxBytes which are described in the docs.
What is the difference between KafkaSpout and KafkaBolt object ? Usually KafkaSpout is used for reading data from kafka producers, but why we use KafkaBolt ?
Bolts in Storm write data. Spouts are Kafka consumers. You read from the broker directly, not from producers.
For example, you can use a Spout to read anything, transform that data within the topology, then setup a Kafka Bolt to produce data into Kafka
I am using Kafka 2 and looks like exactly once is possible with
Kafka Streams
Kafka read/transform/write transactional producer
Kafka connect
Here, all of the above works between topics (source and destination is topic).
Is it possible to have exactly once with other destinations?
Source and destinations (sinks) of Connect are not only topics, but which Connector you use determines the delivery semantics, not all are exactly once
For example, a JDBC Source Connector polling a database might miss some records
Sink Connectors coming out of Kafka will send every message from a topic, but it's up to the downstream system to acknowledge that retrieval
I am using Kafka client library comes with Kafka 0.11.0.1. I noticed that using kafkaconsumer does not need to configure zookeeper anymore. Does that mean zookeep server will automatically be located by the kafka bootstrap server?
Since Kafka 0.9 the KafkaConsumer implementation stores offsets commit and consumer group information in Kafka brokers themselves. This eliminates the zookeeper dependency and increases the scalability of the consumers.