Rolling update of kafka consumers from v0.8.2 -> 0.10.2 - apache-kafka

We have v0.8.2 consumers that we are in the process of updating to v0.10.2. These are high volume applications so we normally do rolling updates. The problem is that the v8 consumers commit offsets to zookeeper while v10 consumers commit to kafka. In our testing, trying to run a mix of v8 and v10 consumers results in messages being double consumed. See a more detailed writeup of the problem here: http://www.search-hadoop.com/m/Kafka/uyzND1ymwxk17UWdj?subj=Re+kafka+0+10+offset+storage+on+zookeeper
Has anyone found a work around so that we can update from v8 to v10 consumers without taking an outage?

Two things here that I would suggest.
Do the version upgrade and the API upgrade in separate steps. You can rolling upgrade to 0.10 and continue using the old consumer API and committing offsets to zookeeper. Then in a second step do your API update. This will reduce your number of moving parts per step.
When doing the API update, you'll have to do some kind of full switchover because as you've observed, the consumer group info and offsets are going to be stored in Kafka now. One thing you can do to help this is here
https://kafka.apache.org/documentation/#offsetmigration
However, the consumer groups are being coordinated differently (in the old version they are being coordinated in Zookeeper no matter where you commit offsets), so at some point you will need to stop all old consumers and start up your new API implementation. Migrating your offsets first should make this a very short outage with no data loss, just whatever the duration is of bringing down all consumers and starting up the new ones.

Related

How to measure kafka reBalance duration

is there tool available to measure Kafka re-balancing duration? or check any intermediate status?
we have observe a many time, specific consumer get stuck forever during Kafka rebalancing, we never waited to finish.
There was a bug in Apache Kafka 2.4.1 which got fixed, and MSK has 2.4.1.1. This was exactly related to infinite rebalance of consumers.
If your MSK running later versions, you can open support case.

Kafka broker occassionally takes much longer than usual to load logs on startup

We are observing that Kafka brokers occasionally take much more time to load logs on startup than usual. Much longer in this case means 40 minutes instead of at most 1 minute. This happens during a rolling restart following the procedure described by Confluent. This happens after the broker reported that controlled shutdown was succesful.
Kafka Setup
Confluent Platform 5.5.0
Kafka Version 2.5.0
3 Replicas (minimum 2 in sync)
Controlled broker shutdown enabled
1TB of AWS EBS for Kafka log storage
Other potentially useful information
We make extensive use of Kafka Streams
We use exactly-once processing and transactional producers/consumers
Observations
It is not always the same broker that takes a long time.
It does not only occur when the broker is the active controller.
A log partition that loads quickly (15ms) can take a long time (9549 ms) for the same broker a day later.
We experienced this issue before on Kafka 2.4.0 but after upgrading to 2.5.0 it did not occur for a few weeks.
Does anyone have an idea what could be causing this? Or what additional information would be useful to track down the issue?

Kafka cluster migration over clouds, how to ensure consumers consume from right offsets when offsets are managed by us?

For migration of Kafka clusters from AWS to AZURE, the challenge is that we are using our custom offsets management for consumers. If I replicate the ZK nodes with offsets, the Kafka Mirror will change those offsets. Is there any way to ensure the offsets are same so that migration can be smooth?
I think the problem might be your custom management. Without more details on this, it's hard to give suggestions.
The problem I see with trying to copy offsets at all is that you consume from cluster A, topic T offset 1000. You copy this to a brand new cluster B, you now have topic T, offset 0. Having consumers starting at offset 1000 will just fail in this scenario, or if at least 1000 messages were mirrored, then you're effectively skipping that data.
With newer versions of Kafka (post 0.10), MirrorMaker uses the the __consumer_offsets topic, not Zookeeper since it's built on newer Java clients.
As for replication tools, uber/uReplicator uses ZooKeeper for offsets.
There are other tools that manage offsets differently, such as Comcast/MirrorTool or salesforce/mirus via the Kafka Connect Framework.
And the enterprise supported tool would be Confluent Replicator, which has unique ways of handling cluster failover and migrations.

Will there any data loss while upgrading kafka client from 0.8.0 to 0.10.0.1?

we are planning to upgrade Kafka client from 0.8.0 to 0.10.0.1 but since in consumers the offset in 0.8.0 version is stored in zookeeper where as it is stored in broker in version 0.10.0.1, if we start consumer with the same group and client id as of version 0.8.0 in 0.10.0.1 then will new consumer fetch the messages from where old consumer stopped consuming. If data loss is going to happen can we try migrating the offsets from zookeeper to broker and then start our new consumer
You can continue storing offsets in zookeeper on 0.10. In fact, if you just upgraded the client binaries, you won't see any change in the offset commit behavior. Where you will have to start thinking about migration of data and offsets is when you move to using the new consumer API in your application. This is where you will need to stop your old application instance based on the old API, check the offsets stored in zookeeper, and then start the new consumer API implementation from that offset to about data loss or duplication.

In Storm, how to migrate offsets to store in Kafka?

I've been having all sorts of instabilities related to Kafka and offsets. Things like workers crashing on startup with exceptions related to invalidate offsets, and other things I don't understand.
I read that it is recommended to migrate offsets to be stored in Kafka instead of Zookeeper. I found the below in the Kafka documentation:
Migrating offsets from ZooKeeper to Kafka Kafka consumers in
earlier releases store their offsets by default in ZooKeeper. It is
possible to migrate these consumers to commit offsets into Kafka by
following these steps: 1. Set offsets.storage=kafka and
dual.commit.enabled=true in your consumer config. 2. Do a rolling
bounce of your consumers and then verify that your consumers are
healthy. 3. Set dual.commit.enabled=false in your consumer config. 4. Do
a rolling bounce of your consumers and then verify that your consumers
are healthy.
A roll-back (i.e., migrating from Kafka back to ZooKeeper) can also
be performed using the above steps if you set
offsets.storage=zookeeper.
http://kafka.apache.org/documentation.html#offsetmigration
But, again, I don't understand what this is instructing me to do. I don't see anywhere in my topology config where I configure where offsets are stored. Is it buried in the cluster yaml?
Any advice on if storing offsets in Kafka, rather than Zookeeper, is a good idea? And how I can perform this change?
At the time of this writing Storm's Kafka spout (see documentation/README at https://github.com/apache/storm/tree/master/external/storm-kafka) only supports managing consumer offsets in ZooKeeper. That is, all current Storm versions (up to 0.9.x and including 0.10.0 Beta) still rely on ZooKeeper for storing such offsets. Hence you should not perform the ZK->Kafka offset migration you referenced above because Storm isn't compatible yet.
You will need to wait until the Storm project -- specifically, its Kafka spout -- supports managing consumer offsets via Kafka (instead of ZooKeeper). And yes, in general it is better to store consumer offsets in Kafka rather than ZooKeeper, but alas Storm isn't there yet.
Update November 2016:
The situation in Storm has improved in the meantime. There's now a new, second Kafka spout that is based on Kafka's new 0.10 consumer client, which stores consumer offsets in Kafka (and not in ZooKeeper): https://github.com/apache/storm/tree/master/external/storm-kafka-client.
However, at the time I am writing this, there are still several issues being reported by the users in the storm-user mailing list (such as Urgent help! kafka-spout stops fetching data after running for a while), so I'd use this new Kafka spout with care, and only after thorough testing.