Will there any data loss while upgrading kafka client from 0.8.0 to 0.10.0.1? - apache-kafka

we are planning to upgrade Kafka client from 0.8.0 to 0.10.0.1 but since in consumers the offset in 0.8.0 version is stored in zookeeper where as it is stored in broker in version 0.10.0.1, if we start consumer with the same group and client id as of version 0.8.0 in 0.10.0.1 then will new consumer fetch the messages from where old consumer stopped consuming. If data loss is going to happen can we try migrating the offsets from zookeeper to broker and then start our new consumer

You can continue storing offsets in zookeeper on 0.10. In fact, if you just upgraded the client binaries, you won't see any change in the offset commit behavior. Where you will have to start thinking about migration of data and offsets is when you move to using the new consumer API in your application. This is where you will need to stop your old application instance based on the old API, check the offsets stored in zookeeper, and then start the new consumer API implementation from that offset to about data loss or duplication.

Related

Can we update a consumer offset in kafka 0.10?

I am using older version of Kafka, 0.10. Is there any way by which I can update the consumer offset for a topic to an arbitrary number?
In Kafka 0.10, I don't think there was a tool to easily update a consumer offset.
You basically have 2 options:
Use the tools from a more recent Kafka version. Nowadays, consumer offsets can be updated using both the kafka-consumer-groups.sh tool or the AdminClient (only in trunk at the moment, it will be in Kafka 2.5).
Write a small application that starts a consumer and calls commitSync() to update its consumer offsets, like in ConsumerGroupCommand.resetOffsets()

Apache Kafka spout is not working on Consumer Side

I am trying to integrate MongoDB and Storm-Kafka, Kafka Producer produces data from MongoDB but it fails to fetch from Consumer side.
Kafka version :0.10.*
Storm version :1.2.1
Do i need to add any functionality in Consumer?

Apache Kafka why producer is connected by broker, but consumer is connected to zookeeper?

The old version suggests that consumer connect zookeeper, and the new version suggests linking broker? A netizen from a community replied to me that the old version of topic's offset is ZK, and the new version is Kafka itself. Is this the answer to this answer?
Older versions of Kafka i.e before 0.9 version store offsets with Zookeeper.
Newer version of Kafka, store offsets in an internal Kafka topic called __consumer_offsets.
The newer version still provides the feasibility to store offsets with Zookeeeper.
With this the consumers can now only talk to the Brokers and does not need to rely on Zookeeper.
If there are many consumers simultaneously reading from Kafka, the read write load on ZooKeeper may exceed its capacity, making ZooKeeper a bottleneck.
check this for more information
https://github.com/SOHU-Co/kafka-node/issues/502

Kafka consumer api (no zookeeper configuration)

I am using Kafka client library comes with Kafka 0.11.0.1. I noticed that using kafkaconsumer does not need to configure zookeeper anymore. Does that mean zookeep server will automatically be located by the kafka bootstrap server?
Since Kafka 0.9 the KafkaConsumer implementation stores offsets commit and consumer group information in Kafka brokers themselves. This eliminates the zookeeper dependency and increases the scalability of the consumers.

In Storm, how to migrate offsets to store in Kafka?

I've been having all sorts of instabilities related to Kafka and offsets. Things like workers crashing on startup with exceptions related to invalidate offsets, and other things I don't understand.
I read that it is recommended to migrate offsets to be stored in Kafka instead of Zookeeper. I found the below in the Kafka documentation:
Migrating offsets from ZooKeeper to Kafka Kafka consumers in
earlier releases store their offsets by default in ZooKeeper. It is
possible to migrate these consumers to commit offsets into Kafka by
following these steps: 1. Set offsets.storage=kafka and
dual.commit.enabled=true in your consumer config. 2. Do a rolling
bounce of your consumers and then verify that your consumers are
healthy. 3. Set dual.commit.enabled=false in your consumer config. 4. Do
a rolling bounce of your consumers and then verify that your consumers
are healthy.
A roll-back (i.e., migrating from Kafka back to ZooKeeper) can also
be performed using the above steps if you set
offsets.storage=zookeeper.
http://kafka.apache.org/documentation.html#offsetmigration
But, again, I don't understand what this is instructing me to do. I don't see anywhere in my topology config where I configure where offsets are stored. Is it buried in the cluster yaml?
Any advice on if storing offsets in Kafka, rather than Zookeeper, is a good idea? And how I can perform this change?
At the time of this writing Storm's Kafka spout (see documentation/README at https://github.com/apache/storm/tree/master/external/storm-kafka) only supports managing consumer offsets in ZooKeeper. That is, all current Storm versions (up to 0.9.x and including 0.10.0 Beta) still rely on ZooKeeper for storing such offsets. Hence you should not perform the ZK->Kafka offset migration you referenced above because Storm isn't compatible yet.
You will need to wait until the Storm project -- specifically, its Kafka spout -- supports managing consumer offsets via Kafka (instead of ZooKeeper). And yes, in general it is better to store consumer offsets in Kafka rather than ZooKeeper, but alas Storm isn't there yet.
Update November 2016:
The situation in Storm has improved in the meantime. There's now a new, second Kafka spout that is based on Kafka's new 0.10 consumer client, which stores consumer offsets in Kafka (and not in ZooKeeper): https://github.com/apache/storm/tree/master/external/storm-kafka-client.
However, at the time I am writing this, there are still several issues being reported by the users in the storm-user mailing list (such as Urgent help! kafka-spout stops fetching data after running for a while), so I'd use this new Kafka spout with care, and only after thorough testing.