Kafka MirrorMaker2 automated consumer offset sync - apache-kafka

I am using MirrorMaker2 for DR.
Kafka 2.7 should support
automated consumer offset sync
Here is the yaml file I am using (I use strimzi for creating it)
All source cluster topics are replicated in destination cluster.
Also ...checkpoint.internal topic is created in destination cluster that contains all source cluster offsets synced, BUT I don't see these offsets being translated into destination cluster _consumer_offsets topic which means when I will start consumer (same consumer group) in destination cluster it will start reading messages from the beginning.
My expectation is that after allowing automated consumer offsets sync all consumer offsets from source clusters translated and stored in _consumer_offsets topic in the destination cluster.
Can someone please clarify if my expectation is correct and if not how it should work.

The sync.group.offsets.enabled setting is for MirrorCheckpointConnector.
I'm not entirely sure how Strimzi runs MirrorMaker 2 but I think you need to set it like:
checkpointConnector:
config:
checkpoints.topic.replication.factor: 1
sync.group.offsets.enabled: "true"

Related

Make consumers in the consumer group idempotent while using mirror maker

I am trying to perform Kafka cluster migration from one cluster to another cluster using Mirror Maker 2.0. I am using Kafka 2.8.1 which has good support for consumer-group replication and offset sync.
I have encountered a scenario where I want to migration a topic along with its producers and consumers from source cluster to target cluster.
Example:
I have topic A which I want to migrate, it has 3 partitions
source -> topic = A
destination -> topic = source.A
topic A is replicated as source.A in target cluster
I have a consumer-group topic-a-reader-group created at source cluster with 3 consumers. It is replicated at destination cluster with same name, I have created 1 consumers in this group at destination cluster.
Now when I producer messages at source cluster to topic A, it is consumed by 3 consumers are source cluster and this message will also get replicated at target cluster which is consumer by the consumer present in the consumer-group running at target cluster. Basically the message is read both by source consumer and target consumer. Duplicate read for consumer ultimately.
I want only one consumer to get this message, not to duplicate at source and destination. In my application, I can't just turn off the consumer at source and move it to target cluster(time critical application). So I want to keep both consumers at source and target running for sometime and turn off source consumer after some duration where both are running active.
Is there any idempotence property available for consumer-group which let only one consumer-group to read the message in mirror maker scenario without being duplicated at source and target cluster?
Please suggest if there are any other approach to move consumers from source to target cluster without downtime.

Kafka Connect best practices for topic compaction

I am using Debezium which makes of Kafka Connect.
Kafka Connect exposes a couple of topics that need to be created:
OFFSET_STORAGE_TOPIC
This environment variable is required when running the Kafka Connect service. Set this to the name of the Kafka topic where the Kafka Connect services in the group store connector offsets. The topic should have many partitions, be highly replicated (e.g., 3x or more) and should be configured for compaction.
STATUS_STORAGE_TOPIC
This environment variable should be provided when running the Kafka Connect service. Set this to the name of the Kafka topic where the Kafka Connect services in the group store connector status. The topic can have multiple partitions, should be highly replicated (e.g., 3x or more) and should be configured for compaction.
Does anyone have any specific recommended compaction configs for these topics?
e.g.
is it enough to set just:
cleanup.policy: compact
unclean.leader.election.enable: true
or also:
min.compaction.lag.ms: 60000
segment.ms: 1800000
min.cleanable.dirty.ratio: 0.01
delete.retention.ms: 100
The defaults should be fine, and Connect will create/configure those topics on its own unless you preconfigure those topics with those settings.
These are the only cases when I can think of when to adjust the compaction settings
a connect-group lingering on the topic longer than you want it to be. For example, a source connector doesn't start immediately after a long downtime because it's processing the offsets topic
your Connect cluster doesn't accurately report its state, or the tasks do not rebalance appropriately (because the status topic is in a bad state)
The __consumer_offsets (compacted) topic is what is used for Sink connectors, and would be configured separately for all consumers, not only Connect

Disable mirrormaker2 offset-sync topics on source kafka cluster

We're using MirrorMaker2 to replicate some topics from one kerberized kafka cluster to another kafka cluster (strictly unidirectional). We don't control the source kafka cluster and we're given only access to describe and read specific topics that are to be consumed.
MirrorMaker2 creates and maintains a topic (mm2-offset-syncs) in the source cluster to encode cluster-to-cluster offset mappings for each topic-partition being replicated and also creates an AdminClient in the source cluster to handle ACL/Config propagation. Because MM2 needs authorization to create and write to these topics in the source cluster, or to perform operations through AdminClient, I'm trying to understand why/if we need these mechanisms in our scenario.
My question is:
In a strictly unidirectional scenario, what is the usefulness of this source-cluster offset-sync topic to Mirrormaker?
If indeed it's superfluous, is it possible to disable it or operate mm2 without access to create/produce to this topic?
If ACL and Config propagation is disabled, is it safe to assume that the AdminClient is not used for anything else?
In the MirrorMaker code, the offset-sync topic it is readily created by MirrorSourceConnector when it starts and then maintained by the MirrorSourceTask. The same happens to AdminClient in the MirrorSourceConnector.
I have found no way to toggle off these features but honestly I might be missing something in my line of thought.
There is an option inroduced in Kafka 3.0 to make MM2 not to create the mm2-offset-syncs topic in the source cluster and operate on it in the target cluster.
Thanks to the KIP-716: https://cwiki.apache.org/confluence/display/KAFKA/KIP-716%3A+Allow+configuring+the+location+of+the+offset-syncs+topic+with+MirrorMaker2
Pull-request:
https://issues.apache.org/jira/browse/KAFKA-12379
https://github.com/apache/kafka/pull/10221
Tim Berglund noted this KIP-716 in Kafka 3.0 release: https://www.youtube.com/watch?v=7SDwWFYnhGA&t=462s
So, to make MM2 to operate on the mm2-offset-syncs topic in the target cluster you should:
set option src->dst.offset-syncs.topic.location = target
manually create mm2-offset-syncs.dst.internal topic in the target cluster
start MM2
src and dst - are examples of aliases, replace it with yours.
Keep in mind: if mm2-offset-syncs.dst.internal topic is not created manually in the target cluster, then MM2 still tries to create this topic in the source cluster.
In case of one-direction replication process this topic is useless, because it is empty all the time, but MM2 requires it anyway.

Mirror Maker 2 Translate Consumer offsets

I am working on kafka mirror making and I am using MM2. I can able to start the mirror process and all topics and date are replicated from source to target cluster.
I need to start the consumers in target cluster from where it has been left off in source cluster. I came across RemoteClusterUtils.translateOffsets to translate the consumer offsets from remote to local.
On checking the code, I can see that it just consumes the checkpoint topic and returning the offset for the consumer group we provided and it does not commit the offset.
Whether we need to explicitly commit the offset using OffsetCommitRequest and start the consumers in target clusters or whether some other way for this?

What is the best method to migrate __consumer_offsets to a new cluster?

I'm in the phase of migrating a big cluster to a new datacenter. I'm using the MirrorMaker tool to do the mirroring of the topics. I have a requirement to migrate the __consumer_offset topic to the new cluster. What is the procedure to move this topic?.
My consumer properties
bootstrap.servers=<server_dns>:9092
exclude.internal.topics=false
client.id=mirror_maker_consumer_all
group.id=mirror_maker_consumer_all
producer.properties
bootstrap.servers=<bootstrap_servers>:9092
acks = 1
batch.size = 10000
timeout.ms = 3000
client.id=mirror_maker_consumer_offsets
I'm running mirror maker with num.streams=10 and whitelist='.*'
With MirrorMaker you can't mirror that topic. Offsets across clusters often differ so the data in that topic does not make sense in the new cluster.
MirrorMaker2 addresses this issue and is able to replicate offsets between clusters. MirrorMaker2 is now the recommend tool and replaces the old MirrorMaker.
See the MirrorMaker2 README for details about its features and how to run it: https://github.com/apache/kafka/tree/trunk/connect/mirror