How to handle source topic deletion in Mirror Maker 2? - apache-kafka

When I try to delete a topic in the source cluster that MM2 is trying to replicate, it starts throwing the below error continuously. While this is expected, the error doesn't stop and continues forever causing huge log files on my system. Is there a way to have MM2 handle source topic deletion gracefully?
[2022-05-12 14:42:57,473] WARN [Consumer clientId=consumer-4, groupId=null] Received unknown topic or partition error in fetch for partition sourcetopic-0 (org.apache.kafka.clients.consumer.internals.Fetcher:1250)
PS: I am running MM2 in dedicated cluster mode

You would need to write some api that will update the MirrorMaker without the deleted topic after the topic gets deleted and restart the consumer (happens automatically when you POST an update to the Connect API)
There's no way for it to know a topic should be consumed or not, as it has a static configuration

Related

Clarification on topic metadata sync in MM2.0 logs

In MM2.0 logs, I often see the following:
Resetting the last seen epoch of topic-partition to 0 since the associated topicId changed from null to XXXXX (org.apache.kafka.clients.Metadata:402)
I can see that this is part of the metadata sync process between mirror maker and the Kafka broker but wanted to clarify a couple of points:
Does this happen for all topic partition(s) present in the Kafka broker or only the topics present in the mm2.properties ? I'm curious as I see this log even for topics that are not present in the mm2.properties file but are there in the broker.
Incase it's for all the topic partition(s) present in the broker, will this be a point of concern for the mirror maker replication performance if the number of topic partitions residing in the broker were to increase drastically?

Kafka stream consumer skipping a few offset no log compaction enabled

kafka server version: 3.2.0
kstreams version : 2.7.2
I have a producer, which is producing to topic foo, I can see the offset from the producer in the logs.
We have kafka stream application reading from the same topic foo. What I am observing is that the consumer skips reading offset. Sometime the skip is over 30 to 40 offsets. I am printing the offset in process method using ProcessorContext.offset() method.
Skipping of offset seems to be very common, will using ProcessorContext.offset() result in every offset being printed ?.
Some points
No kafka rebalance has occurred.
No restarts of the container
We have 3 state store defined in the streams application, and the change log topic has replication factor of 1.
We have kafka broker outage where few broker were down some extend period of time, about 3 weeks back. I dont know how things impact the message i should consumer today.
We have NOT set processing.guarantee, so default should be AT_LEAST_ONCE. We do not have transactions enabled, so it cant be transactional messages. which are skipped
The log to print offset if the first line in the process method.
Question:
What internal kafka stream logs can I see to see if messages are consumed.
Any reason why the messages could be skipped

Duplicate messages when using kafka mirrormaker at the time of problems on the source cluster

We have a remote kafka cluster that belongs to an external service, with which we pull data using a mirrormaker to our internal kafka cluster.
The following situation has occurred - on the side of the external service, one of the cluster brokers has fallen due to technical reasons.
The following appeared in the mirrormaker logs:
...
ERROR [Consumer clientId=XXX-1, groupId=YYY] Offset commit failed on partition PARTITION_NAME at offset 123456: The coordinator is not aware of this member. (org.apache.kafka.clients.consumer.internals.ConsumerCoordinator)
WARN Failed to commit offsets because the consumer group has rebalanced and assigned partitions to another instance. If you see this regularly, it could indicate that you need to either increase the consumer's session.timeout.ms or reduce the number of records handled on each iteration with max.poll.records (kafka.tools.MirrorMaker$)
...
Next, consumers reconnected to alive nodes in the cluster and continued to read messages.
The problem is that due to the fall of the broker on the side of the external kafka, the messages could be read, but could not be committed. For this reason, after the rebalancing, the messages were read again and duplicates appeared in our internal cluster.
Are there any ways that would help in this situation to avoid duplicates in the internal cluster? (except for those indicated in the log warning.)
Maybe there are some consumer configuration parameters that would help to solve problems with duplicates.

Kafka consumer restart when it is reading from beginning of topic

I am new to Kafka .
Lets say I have one kafka topic topoic1(replicationfactor=1,partitions=1) and one consumer(java process) reading(readfrombegining/earliest) from kafka topic1 . Consumer is running fine for some time and later for some reason it got hung and killed by admin.
So if I Restart the consumer it will read from beginning again leading to data duplication So how to handle this usecase ?
NOTE: I am aware that if the consumer code written as to read from latest then we will not get duplicated data. Other than this is there in solution ?
Consumers will only reset from the beginning when auto.offset.reset=earliest, and
you have auto commits disabled + don't manually commit offsets
or don't manually seek the consumer upon startup; i.e. you can track offsets externally from Kafka

Duplicate message consumption in Kafka due to auto-downscaling/deletion of pods

Background
We have a simple producer/consumer style application with Kafka as the message broker and Consumer Processes running as Kubernetes pods. We have defined two topics namely the in-topic and the out-topic. A set of consumer pods that belong to the same consumer group read messages from the in-topic, perform some work and finally write out the same message (key) to the out-topic once the work is complete.
Issue Description
We noticed that there are duplicate messages being written out to the out-topic by the consumers that are running in the Kubernetes pods. To rephrase, two different consumers are consuming the same messages from the in-topic twice and thus publishing the same message twice to the out-topic as well. We analyzed the issue and can safely conclude that this issue only occurs when pods are auto-downscaled/deleted by Kubernetes.
In fact, an interesting observation we have is that if any message is read by two different consumers from the in-topic (and thus published twice in the out-topic), the given message is always the last message consumed by one of the pods that was downscaled. In other words, if a message is consumed twice, the root cause is always the downscaling of a pod.
We can conclude that a pod is getting downscaled after a consumer writes the message to the out-topic but before Kafka can commit the offset to the in-topic.
Consumer configuration
props.put(ConsumerConfig.ENABLE_AUTO_COMMIT_CONFIG, "true");
props.put(ConsumerConfig.MAX_POLL_INTERVAL_MS_CONFIG, "3600000");
props.put(ConsumerConfig.AUTO_OFFSET_RESET_CONFIG, "latest");
props.put(ConsumerConfig.KEY_DESERIALIZER_CLASS_CONFIG,"org.apache.kafka.common.serialization.StringDeserializer");
props.put(ConsumerConfig.VALUE_DESERIALIZER_CLASS_CONFIG"org.apache.kafka.common.serialization.StringDeserializer")
Zookeeper/broker logs :
[2021-04-07 02:42:22,708] INFO [GroupCoordinator 0]: Preparing to rebalance group PortfolioEnrichmentGroup14 in state PreparingRebalance with old generation 1 (__consumer_offsets-17) (reason: removing member PortfolioEnrichmentConsumer13-9aa71765-2518-
493f-a312-6c1633225015 on heartbeat expiration) (kafka.coordinator.group.GroupCoordinator)
[2021-04-07 02:42:23,331] INFO [GroupCoordinator 0]: Stabilized group PortfolioEnrichmentGroup14 generation 2 (__consumer_offsets-17) (kafka.coordinator.group.GroupCoordinator)
[2021-04-07 02:42:23,335] INFO [GroupCoordinator 0]: Assignment received from leader for group PortfolioEnrichmentGroup14 for generation 2 (kafka.coordinator.group.GroupCoordinator)
What we tried
Looking at the logs, it was clear that rebalancing takes place because of the heartbeat expiration. We added the following configuration parameters to increase the heartbeat and also increase the session time out :
props.put(ConsumerConfig.HEARTBEAT_INTERVAL_MS_CONFIG, "10000")
props.put(ConsumerConfig.AUTO_OFFSET_RESET_CONFIG, "latest");
props.put(ConsumerConfig.SESSION_TIMEOUT_MS_CONFIG, "900000");
props.put(ConsumerConfig.MAX_PARTITION_FETCH_BYTES_CONFIG, "512");
props.put(ConsumerConfig.MAX_POLL_RECORDS_CONFIG, "1");
However, this did not solve the issue. Looking at the broker logs, we can confirm that the issue is due to the downscaling of pods.
Question : What could be causing this behavior where a message is consumed twice when a pod gets downscaled?
Note : I already understand the root cause of the issue; however, considering that a consumer is a long lived process running in an infinite loop, how and why is Kubernetes downscaling/killing a pod before the consumer commits the offset? How do I tell Kubernetes not to remove a running pod from a consumer group until all Kafka commits are completed?
"What could be causing this behavior where a message is consumed twice when a pod gets downscaled?"
You have provided the answer already yourself: "[...] that a pod is getting downscaled after a consumer writes the message to the out-topic but before Kafka can commit the offset to the in-topic."
As the message was processed but not committed, another pod is re-processing the same message again after the downscaling happens. Remember that adding or removing a consumer from a consumer group always initiates a Rebalancing. You have now first-hand experience why this should generally be avoided as much as feasible. Depending on the Kafka version a rebalance will cause every single consumer of the consumer group to stop consuming until the rebalancing is done.
To solve your issue, I see two options:
Only remove running pods out of the Consumer Group when they are idle
Reduce the consumer configuration auto.commit.interval.ms to 1 as this defaults to 5 seconds. This will only work if you set enable.auto.commit to true.
If you want your consumer to commit message/s before exiting you would need to handle exit signal to your consumer. A lot of languages do support this. Have a look at this thread on how to do this in java - How to finish kafka consumer safety?(Is there meaning to call thread#join inside shutdownHook ? ).
That being said, please note that there is no 100% guarantee to achieving exactly once. Your process can be killed forcefully by OS before even given time to run any exit clean up (kill -9 <process_id>.