Mirror Maker 2 Translate Consumer offsets - apache-kafka

I am working on kafka mirror making and I am using MM2. I can able to start the mirror process and all topics and date are replicated from source to target cluster.
I need to start the consumers in target cluster from where it has been left off in source cluster. I came across RemoteClusterUtils.translateOffsets to translate the consumer offsets from remote to local.
On checking the code, I can see that it just consumes the checkpoint topic and returning the offset for the consumer group we provided and it does not commit the offset.
Whether we need to explicitly commit the offset using OffsetCommitRequest and start the consumers in target clusters or whether some other way for this?

Related

Make consumers in the consumer group idempotent while using mirror maker

I am trying to perform Kafka cluster migration from one cluster to another cluster using Mirror Maker 2.0. I am using Kafka 2.8.1 which has good support for consumer-group replication and offset sync.
I have encountered a scenario where I want to migration a topic along with its producers and consumers from source cluster to target cluster.
Example:
I have topic A which I want to migrate, it has 3 partitions
source -> topic = A
destination -> topic = source.A
topic A is replicated as source.A in target cluster
I have a consumer-group topic-a-reader-group created at source cluster with 3 consumers. It is replicated at destination cluster with same name, I have created 1 consumers in this group at destination cluster.
Now when I producer messages at source cluster to topic A, it is consumed by 3 consumers are source cluster and this message will also get replicated at target cluster which is consumer by the consumer present in the consumer-group running at target cluster. Basically the message is read both by source consumer and target consumer. Duplicate read for consumer ultimately.
I want only one consumer to get this message, not to duplicate at source and destination. In my application, I can't just turn off the consumer at source and move it to target cluster(time critical application). So I want to keep both consumers at source and target running for sometime and turn off source consumer after some duration where both are running active.
Is there any idempotence property available for consumer-group which let only one consumer-group to read the message in mirror maker scenario without being duplicated at source and target cluster?
Please suggest if there are any other approach to move consumers from source to target cluster without downtime.

Kafka MirrorMaker2 automated consumer offset sync

I am using MirrorMaker2 for DR.
Kafka 2.7 should support
automated consumer offset sync
Here is the yaml file I am using (I use strimzi for creating it)
All source cluster topics are replicated in destination cluster.
Also ...checkpoint.internal topic is created in destination cluster that contains all source cluster offsets synced, BUT I don't see these offsets being translated into destination cluster _consumer_offsets topic which means when I will start consumer (same consumer group) in destination cluster it will start reading messages from the beginning.
My expectation is that after allowing automated consumer offsets sync all consumer offsets from source clusters translated and stored in _consumer_offsets topic in the destination cluster.
Can someone please clarify if my expectation is correct and if not how it should work.
The sync.group.offsets.enabled setting is for MirrorCheckpointConnector.
I'm not entirely sure how Strimzi runs MirrorMaker 2 but I think you need to set it like:
checkpointConnector:
config:
checkpoints.topic.replication.factor: 1
sync.group.offsets.enabled: "true"

Kafka cluster migration over clouds, how to ensure consumers consume from right offsets when offsets are managed by us?

For migration of Kafka clusters from AWS to AZURE, the challenge is that we are using our custom offsets management for consumers. If I replicate the ZK nodes with offsets, the Kafka Mirror will change those offsets. Is there any way to ensure the offsets are same so that migration can be smooth?
I think the problem might be your custom management. Without more details on this, it's hard to give suggestions.
The problem I see with trying to copy offsets at all is that you consume from cluster A, topic T offset 1000. You copy this to a brand new cluster B, you now have topic T, offset 0. Having consumers starting at offset 1000 will just fail in this scenario, or if at least 1000 messages were mirrored, then you're effectively skipping that data.
With newer versions of Kafka (post 0.10), MirrorMaker uses the the __consumer_offsets topic, not Zookeeper since it's built on newer Java clients.
As for replication tools, uber/uReplicator uses ZooKeeper for offsets.
There are other tools that manage offsets differently, such as Comcast/MirrorTool or salesforce/mirus via the Kafka Connect Framework.
And the enterprise supported tool would be Confluent Replicator, which has unique ways of handling cluster failover and migrations.

Kafka 0.8.2.1 Mirror Maker

I'm having issue with Kafka Mirror Maker.
I've stopped the mirror maker for 30 minutes due to a cluster upgrade and at the restart of the cluster the mirror maker is not able to consume data from the source cluster.
I see that the lag of the consumer group of the mirror maker is very high so I'm thinking about some parameters to change in order to increase the buffer size of the mirror maker.
I've tried changing the consumer group for the mirror maker and in this case this operation allows to restart consuming data from the latest messages. When I try to restart the process from the last saved offsets I see a peak of consumed data but the mirror maker is not able to commit offsets infact the log is blocked at the row: INFO kafka.tools.MirrorMaker$: Committing offsets and no more rows are showev after this one.
I think that the problem is related to the huge amount of data to process.
Ive running a cluster with Kafka 0.8.2.1 with this configuration:
dual.commit.enabled=false offsets.storage=zookeeper auto.offset.reset=largest
Thanks in advance

In Storm, how to migrate offsets to store in Kafka?

I've been having all sorts of instabilities related to Kafka and offsets. Things like workers crashing on startup with exceptions related to invalidate offsets, and other things I don't understand.
I read that it is recommended to migrate offsets to be stored in Kafka instead of Zookeeper. I found the below in the Kafka documentation:
Migrating offsets from ZooKeeper to Kafka Kafka consumers in
earlier releases store their offsets by default in ZooKeeper. It is
possible to migrate these consumers to commit offsets into Kafka by
following these steps: 1. Set offsets.storage=kafka and
dual.commit.enabled=true in your consumer config. 2. Do a rolling
bounce of your consumers and then verify that your consumers are
healthy. 3. Set dual.commit.enabled=false in your consumer config. 4. Do
a rolling bounce of your consumers and then verify that your consumers
are healthy.
A roll-back (i.e., migrating from Kafka back to ZooKeeper) can also
be performed using the above steps if you set
offsets.storage=zookeeper.
http://kafka.apache.org/documentation.html#offsetmigration
But, again, I don't understand what this is instructing me to do. I don't see anywhere in my topology config where I configure where offsets are stored. Is it buried in the cluster yaml?
Any advice on if storing offsets in Kafka, rather than Zookeeper, is a good idea? And how I can perform this change?
At the time of this writing Storm's Kafka spout (see documentation/README at https://github.com/apache/storm/tree/master/external/storm-kafka) only supports managing consumer offsets in ZooKeeper. That is, all current Storm versions (up to 0.9.x and including 0.10.0 Beta) still rely on ZooKeeper for storing such offsets. Hence you should not perform the ZK->Kafka offset migration you referenced above because Storm isn't compatible yet.
You will need to wait until the Storm project -- specifically, its Kafka spout -- supports managing consumer offsets via Kafka (instead of ZooKeeper). And yes, in general it is better to store consumer offsets in Kafka rather than ZooKeeper, but alas Storm isn't there yet.
Update November 2016:
The situation in Storm has improved in the meantime. There's now a new, second Kafka spout that is based on Kafka's new 0.10 consumer client, which stores consumer offsets in Kafka (and not in ZooKeeper): https://github.com/apache/storm/tree/master/external/storm-kafka-client.
However, at the time I am writing this, there are still several issues being reported by the users in the storm-user mailing list (such as Urgent help! kafka-spout stops fetching data after running for a while), so I'd use this new Kafka spout with care, and only after thorough testing.