Kafka Mirror Maker 2 Offset Replication Not Working - apache-kafka

We are testing DR Scenario for kafka. we have 2 kafka cluster in separate region. We are using MirrorMaker2 to replicate the topics and messages.
Topics and messages are able to replicate. But we are observing offset is not replicating.
e.g
produced 10 messages from producuder pointed to kafka region 1.
Consumed 5 messages on from consumer pointed to kafka region 1
stop consumer pointed to region1
start consumer pointed to region2
consume the message
here expectation is region 2 consumer should consume from offset 6
but it starts consuming from offset 0
below is property file
clusters = primary, secondary
# primary cluster information
primary.bootstrap.servers = test1-primary.com:9094,test2-primary.com.apttuscloud.io:9094,test3-primary.com:9094
primary.security.protocol= SASL_SSL
primary.ssl.truststore.password= dummypassword
primary.ssl.truststore.location= /opt/bitnami/kafka/config/certs/kafka.truststore.jks
primary.ssl.keystore.password= dummypassword
primary.ssl.keystore.location= /opt/bitnami/kafka/config/certs/kafka.keystore.jks
primary.ssl.endpoint.identification.algorithm=
primary.sasl.mechanism= PLAIN
primary.sasl.jaas.config=org.apache.kafka.common.security.plain.PlainLoginModule required username="dummyuser" password="dummypassword";
# secondary cluster information
secondary.bootstrap.servers = test1-secondary.com:9094,test2-secondary.com.apttuscloud.io:9094,test3-secondary.com:9094
secondary.security.protocol= SASL_SSL
secondary.ssl.truststore.password= dummypassword
secondary.ssl.truststore.location= /opt/bitnami/kafka/config/certs/kafka.truststore.jks
secondary.ssl.keystore.password= dummypassword
secondary.ssl.keystore.location= /opt/bitnami/kafka/config/certs/kafka.keystore.jks
secondary.ssl.endpoint.identification.algorithm=
secondary.sasl.mechanism=PLAIN
secondary.sasl.jaas.config=org.apache.kafka.common.security.plain.PlainLoginModule required username="dummyuser" password="dummypassword";
# Topic Configuration
primary->secondary.enabled = true
primary->secondary.topics = .*
secondary->primary.enabled = true
secondary->primary.topics = .*
############################# Internal Topic Settings #############################
# The replication factor for mm2 internal topics "heartbeats", "B.checkpoints.internal" and
# "mm2-offset-syncs.B.internal"
# For anything other than development testing, a value greater than 1 is recommended to ensure availability such as 3
checkpoints.topic.replication.factor= 3
heartbeats.topic.replication.factor= 3
offset-syncs.topic.replication.factor= 3
# The replication factor for connect internal topics "mm2-configs.B.internal", "mm2-offsets.B.internal" and
# "mm2-status.B.internal"
# For anything other than development testing, a value greater than 1 is recommended to ensure availability such as 3.
offset.storage.replication.factor=3
status.storage.replication.factor=3
config.storage.replication.factor=3
replication.factor = 3
refresh.topics.enabled = true
sync.topic.configs.enabled = true
refresh.topics.interval.seconds = 10
topics.blacklist = .*[\-\.]internal, .*\.replica, __consumer_offsets
groups.blacklist = console-consumer-.*, connect-.*, __.*
primary->secondary.emit.heartbeats.enabled = true
primary->secondary.emit.checkpoints.enabled = true
Please note some confedentilal values are placed with dummy values
Regards,
Narendra Jadhav

With MirrorMaker 2.5, when moving consumers between clusters, offsets are not automatically translated.
So upon starting consumers on another cluster, consumers need to use RemoteClusterUtils.translateOffsets() to find their offsets in this cluster.
In 2.7 (expected November 2020), you can have MirrorMaker 2 automatically translate offsets, see https://cwiki.apache.org/confluence/display/KAFKA/KIP-545%3A+support+automated+consumer+offset+sync+across+clusters+in+MM+2.0

Related

While producing the messages, some brokers going breaking down makes any exception in Kafka producer side?

I am testing the scenario as follows.
I am producing the messages to sink which is the Kafka containing the three brokers.
What if brokers are going to down, the producing side have an any issue because of the broker-down?
When I tested it on my local using Flink, I generated the messages and sinked them to Kafka. And I have three kafka brokers. When I deleted the number of brokers to 2, there are no problems. And obviously, when all the brokers are going to down, then the producer-side app gives an exception.
So, according to these fact, I think that the producer-side app can still alive without any errors until one broker remains. Is my assumption correct?
Below is the my producer side configuration.
acks = 1
batch.size = 16384
compression.type = lz4
connections.max.idle.ms = 540000
delivery.timeout.ms = 120000
enable.idempotence = false
key.serializer = class org.apache.kafka.common.serialization.ByteArraySerializer
linger.ms = 0
partitioner.class = class org.apache.kafka.clients.producer.internals.DefaultPartitioner
receive.buffer.bytes = 32768
reconnect.backoff.max.ms = 1000
reconnect.backoff.ms = 50
request.timeout.ms = 30000
replication is two and I have three partitions for each topic.
Any help will be appreciated.
Thanks.
It all depends on your requirements and your producer configuration. At the moment, yes you can have 2 out of 3 brokers alive and your producer will continue as normal.
This is because you have acks=1 which means only the leader has to acknowledge the message before it is considered successful. The followers don't have to acknowledge the message.
You should also check whether you have changed min.insync.replicas at the broker or topic level configuration. The default is 1, meaning only 1 in-sync replica is needed for a broker to allow acks=all requests.
Side note: you have replication=2, I'd change this so partitions were replicated across all 3 brokers.
I'm not sure if I understood your question, but in Kafka client API there are some retryable Exceptions (like Not Leader, or unreached/unknown host).
So your Producer wil retry until reaching the first limit of these two configs:
retries : https://kafka.apache.org/documentation/#producerconfigs_retries
delivery.timeout.ms : https://kafka.apache.org/documentation/#producerconfigs_delivery.timeout.ms
So using the default values :
retries > 2 billions time &
delivery.timeout.ms = 2 minutes
Your producer will do N retries for only 2 minutes then crashes.

How to set consumer config values for Kafka Mirrormaker-2 2.6.1?

I am attempting to use mirrormaker 2 to replicate data between AWS Managed Kafkas (MSK) in 2 different AWS regions - one in eu-west-1 (CLOUD_EU) and the other in us-west-2 (CLOUD_NA), both running Kafka 2.6.1. For testing I am currently trying just to replicate topics 1 way, from EU -> NA.
I am starting a mirrormaker connect cluster using ./bin/connect-mirror-maker.sh and a properties file (included)
This works fine for topics with small messages on them, but one of my topic has binary messages up to 20MB in size. When I try to replicate that topic I get an error every 30 seconds
[2022-04-21 13:47:05,268] INFO [Consumer clientId=consumer-29, groupId=null] Error sending fetch request (sessionId=INVALID, epoch=INITIAL) to node 2: {}. (org.apache.kafka.clients.FetchSessionHandler:481)
org.apache.kafka.common.errors.DisconnectException
When logging in DEBUG to get more information we get
[2022-04-21 13:47:05,267] DEBUG [Consumer clientId=consumer-29, groupId=null] Disconnecting from node 2 due to request timeout. (org.apache.kafka.clients.NetworkClient:784)
[2022-04-21 13:47:05,268] DEBUG [Consumer clientId=consumer-29, groupId=null] Cancelled request with header RequestHeader(apiKey=FETCH, apiVersion=11, clientId=consumer-29, correlationId=35) due to node 2 being disconnected (org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient:593)
It gets stuck in a loop constantly disconnecting with request timeout every 30s and then trying again.
Looking at this, I suspect that the problem is the request.timeout.ms is on the default (30s) and it times out trying to read the topic with many large messages.
I followed the guide at https://github.com/apache/kafka/tree/trunk/connect/mirror to attempt to configure the consumer properties, however, no matter what I set, the timeout for the consumer remains fixed at the default, confirmed both by kafka outputting its config in the log and by timing how long between the disconnect messages. e.g. I set:
CLOUD_EU.consumer.request.timeout.ms=120000
In the properties that I start MM-2 with.
based on various guides I have found while looking at this, I have also tried
CLOUD_EU.request.timeout.ms=120000
CLOUD_EU.cluster.consumer.request.timeout.ms=120000
CLOUD_EU.consumer.override.request.timeout.ms=120000
CLOUD_EU.cluster.consumer.override.request.timeout.ms=120000
None of which have worked.
How can I change the consumer request.timeout setting? The log is approx 10,000 lines long, but everywhere where the ConsumerConfig is logged out it logs request.timeout.ms = 30000
Properties file I am using:
# specify any number of cluster aliases
clusters = CLOUD_EU, CLOUD_NA
# connection information for each cluster
CLOUD_EU.bootstrap.servers = kafka.eu-west-1.amazonaws.com:9092
CLOUD_NA.bootstrap.servers = kafka.us-west-2.amazonaws.com:9092
# enable and configure individual replication flows
CLOUD_EU->CLOUD_NA.enabled = true
CLOUD_EU->CLOUD_NA.topics = METRICS_ATTACHMENTS_OVERSIZE_EU
CLOUD_NA->CLOUD_EU.enabled = false
replication.factor=3
tasks.max = 1
############################# Internal Topic Settings #############################
checkpoints.topic.replication.factor=3
heartbeats.topic.replication.factor=3
offset-syncs.topic.replication.factor=3
offset.storage.replication.factor=3
status.storage.replication.factor=3
config.storage.replication.factor=3
############################ Kafka Settings ###################################
# CLOUD_EU cluster over writes
CLOUD_EU.consumer.request.timeout.ms=120000
CLOUD_EU.consumer.session.timeout.ms=150000

Why does Kafka Mirrormaker target topic contain half of original messages?

I want to copy all messages from a topic in Kafka cluster. So I ran Kafka Mirrormaker however it seems to have copied roughly only half of the messages from the source cluster (I checked that there's no consumer lag in source topic). I have 2 brokers in the source cluster does this have anything to do with this?
This is the source cluster config:
log.retention.ms=1814400000
transaction.state.log.replication.factor=2
offsets.topic.replication.factor=2
auto.create.topics.enable=true
default.replication.factor=2
min.insync.replicas=1
num.io.threads=8
num.network.threads=5
num.partitions=1
num.replica.fetchers=2
replica.lag.time.max.ms=30000
socket.receive.buffer.bytes=102400
socket.request.max.bytes=104857600
socket.send.buffer.bytes=102400
unclean.leader.election.enable=true
zookeeper.session.timeout.ms=18000
The source topic has 4 partitions and is not compacted. The Mirrormaker config is:
mirrormaker-consumer.properties
bootstrap.servers=broker1:9092,broker2:9092
group.id=picturesGroup3
auto.offset.reset=earliest
mirrormaker-producer.properties
bootstrap.servers=localhost:9092
max.in.flight.requests.per.connection=1
retries=2000000000
acks=all
max.block.ms=2000000000
Below are the stats from Kafdrop on the source cluster topic:
Partition
First Offset
Last Offset
Size
Leader Node
Replica Nodes
In-sync Replica Nodes
Offline Replica Nodes
Preferred Leader
Under-replicated
0
13659
17768
4109
1
1
1
Yes
No
1
13518
17713
4195
2
2
2
Yes
No
2
13664
17913
4249
1
1
1
Yes
No
3
13911
18072
4161
2
2
2
Yes
No
and these are the stats for the target topic after Mirrormaker run:
Partition
First Offset
Last Offset
Size
Leader Node
Replica Nodes
In-sync Replica Nodes
Offline Replica Nodes
Preferred Leader
Under-replicated
0
2132
4121
1989
1
1
1
Yes
No
1
2307
4217
1910
1
1
1
Yes
No
2
2379
4294
1915
1
1
1
Yes
No
3
2218
4083
1865
1
1
1
Yes
No
As you can see roughly only half of the source messages are in the target topic based on size column. What am I doing wrong?
I realized that the issue happened because I was copying data from a cluster with 2 brokers to a cluster with 1 broker. So I assume Mirrormaker1 just copied data from one broker from original cluster. When I configured the target cluster to have 2 brokers all of the messages were copied to it.
Regarding the advice of #OneCricketeer to use Mirrormaker2 this also worked however it took me a while to get to correct configuration file:
clusters = source, dest
source.bootstrap.servers = sourcebroker1:9092,sourcebroker2:9092
dest.bootstrap.servers = destbroker1:9091,destbroker2:9092
topics = .*
groups = mm2topic
source->dest.enabled = true
offsets.topic.replication.factor=1
offset.storage.replication.factor=1
auto.offset.reset=latest
In addition Mirrormaker2 can be found in connect container in this KafkaConnect project (enter the container and in the /kafka/bin directory there will be connect-mirror-maker.sh executable).
A major downside with Mirrormaker2 solution is it will add a prefix to the topics in target cluster (in my case new names would require changing application code). The prefix can't be changed in Mirrormaker2 configuration so the only way is to implement a custom Java class as explained here.

__consumer_offset is unable to sync

I am using mm2 with below properties
source(A),sink(B) clusters both have their own separate zookeeper
I consume some data from topic test in source A.
then I stopped consumer, and start mirror process
when I pointed consumer with same group id to sink then it start consuming from beginning. I am expecting it should start in sink from where it left off in source.
###############
A.bootstrap.servers = localhost:9092
B.bootstrap.servers = localhost:9093
A->B.enabled = true
A->B.topics = test
#B->A.enabled = true
#B->A.topics = .*
checkpoints.topic.replication.factor=1
heartbeats.topic.replication.factor=1
offset-syncs.topic.replication.factor=1
offset.storage.replication.factor=1
status.storage.replication.factor=1
config.storage.replication.factor=1```
Since Kafka 2.7, MirrorMaker can automatically mirror consumer group offsets by setting sync.group.offsets.enabled=true.
In your example:
A->B.sync.group.offsets.enabled=true
Before 2.7, MirrorMaker does not automatically commit consumer group offsets and you need to use RemoteClusterUtils to do the offsets translation.

Problems with Amazon MSK default configuration and publishing with transactions

Recently we have started doing some testing of our Kafka connectors to MSK, Amazon's managed Kafka service. Publishing records seem to work fine however not when transactions are enabled.
Our cluster consists of 2 brokers (because we have 2 zones) using the default MSK configuration. We are creating our Java Kafka producer using the following properties:
bootstrap.servers=x.us-east-1.amazonaws.com:9094,y.us-east-1.amazonaws.com:9094
client.id=kafkautil
max.block.ms=5000
request.timeout.ms=5000
security.protocol=SSL
transactional.id=transactions
However when the producer was started with the transactional.id setting which enables transactions, the initTransactions() method hangs:
producer = new KafkaProducer<Object, Object>(kafkaProperties);
if (kafkaProperties.containsKey(ProducerConfig.TRANSACTIONAL_ID_CONFIG)) {
// this hangs
producer.initTransactions();
}
Looking at the log output we see streams of the following, and it didn't seem like it ever timed out.
TransactionManager - Enqueuing transactional request (type=FindCoordinatorRequest,
coordinatorKey=y, coordinatorType=TRANSACTION)
TransactionManager - Request (type=FindCoordinatorRequest, coordinatorKey=y,
coordinatorType=TRANSACTION) dequeued for sending
NetworkClient - Found least loaded node z:9094 (id: -2 rack: null) connected with no
in-flight requests
Sender - Sending transactional request (type=FindCoordinatorRequest, coordinatorKey=y,
coordinatorType=TRANSACTION) to node z (id: -2 rack: null)
NetworkClient - Sending FIND_COORDINATOR {coordinator_key=y,coordinator_type=1} with
correlation id 424 to node -2
NetworkClient - Completed receive from node -2 for FIND_COORDINATOR with
correlation id 424, received {throttle_time_ms=0,error_code=15,error_message=null,
coordinator={node_id=-1,host=,port=-1}}
TransactionManager LogContext.java:129 - Received transactional response
FindCoordinatorResponse(throttleTimeMs=0, errorMessage='null',
error=COORDINATOR_NOT_AVAILABLE, node=:-1 (id: -1 rack: null)) for request
(type=FindCoordinatorRequest, coordinatorKey=xxx, coordinatorType=TRANSACTION)
As far as I can determine, the broker is available and each of the hosts in the bootstrap.servers property are available. If I connect to each of them and publish without transactions then it works.
Any idea what we are missing?
However when the producer was started with the transactional.id setting which enables transactions, the initTransactions() method hangs:
This turned out to a problem with the default AWS MSK properties and the number of brokers. If you create a Kafka cluster with less than 3 brokers, the following settings will need to be adjusted.
The following settings should be set (I think) to the number of brokers:
Property
Kafka Default
AWS Default
Description
default.replication.factor
1
3
Default replication factors for automatically created topics.
min.insync.replicas
1
2
Minimum number of replicas that must acknowledge a write for the write to be considered successful
offsets.topic.replication.factor
3
3
Internal topic that shares offsets on topics.
transaction.state.log.replication.factor
3
3
Replication factor for the transaction topic.
Here's the Kafka docs on broker properties.
Because we have 2 brokers, we ended up with:
default.replication.factor=2
min.insync.replicas=2
offsets.topic.replication.factor=2
transaction.state.log.replication.factor=2
This seemed to resolve the issue. IMHO this is a real problem with the AWS MSK and the default configuration. They need to auto-generate the default configuration and tune it depending on the number of brokers in the cluster.