Duplicate messages when using kafka mirrormaker at the time of problems on the source cluster - apache-kafka

We have a remote kafka cluster that belongs to an external service, with which we pull data using a mirrormaker to our internal kafka cluster.
The following situation has occurred - on the side of the external service, one of the cluster brokers has fallen due to technical reasons.
The following appeared in the mirrormaker logs:
...
ERROR [Consumer clientId=XXX-1, groupId=YYY] Offset commit failed on partition PARTITION_NAME at offset 123456: The coordinator is not aware of this member. (org.apache.kafka.clients.consumer.internals.ConsumerCoordinator)
WARN Failed to commit offsets because the consumer group has rebalanced and assigned partitions to another instance. If you see this regularly, it could indicate that you need to either increase the consumer's session.timeout.ms or reduce the number of records handled on each iteration with max.poll.records (kafka.tools.MirrorMaker$)
...
Next, consumers reconnected to alive nodes in the cluster and continued to read messages.
The problem is that due to the fall of the broker on the side of the external kafka, the messages could be read, but could not be committed. For this reason, after the rebalancing, the messages were read again and duplicates appeared in our internal cluster.
Are there any ways that would help in this situation to avoid duplicates in the internal cluster? (except for those indicated in the log warning.)
Maybe there are some consumer configuration parameters that would help to solve problems with duplicates.

Related

Weired Kafka Consumer Group Re-balancing

My kafka topic has two partitions and single kafka consumer. I have deployed my consumer application(spring kafka) in the AWS. In logs I see kafka consumer re-balance in time to time. This is not frequent. As per the current observation when consumer is listening to the topic and idle this re-balancing occurs. Appreciate if someone can explain me this behavior. I have posted some logs here.
[Consumer clientId=consumer-b2b-group-1, groupId=b2b-group] Request joining group due to: group is already rebalancing
[Consumer clientId=consumer-b2b-group-1, groupId=b2b-group] Revoke previously assigned partitions order-response-v3-qa-0, order-response-v3-qa-1
[Consumer clientId=consumer-b2b-group-1, groupId=b2b-group] Revoke previously assigned partitions order-response-v3-qa-0, order-response-v3-qa-1
b2b-group: partitions revoked: [order-response-v3-qa-0, order-response-v3-qa-1
[Consumer clientId=consumer-b2b-group-1, groupId=b2b-group] (Re-)joining group
Re-balancing is a feature that automatically optimizes uneven workloads as well as topology changes (e.g., adding or removing brokers). This is achieved via a background process that continuously checks a variety of metrics to determine if and when a to rebalance should occur.
you can go through the below link for further knowledge:
https://medium.com/streamthoughts/apache-kafka-rebalance-protocol-or-the-magic-behind-your-streams-applications-e94baf68e4f2
Kafka starts a rebalancing if a consumer joins or leaves a group. Below are various reasons why that can or will happen.
A consumer joins a group:
Application Start/Restart — If we deploy an application (or restart it), a new consumer joins the group
Application scale-up — We are creating new pods/application
A consumer leaves a group:
max.poll.interval.ms exceeded — polled records not processed in time
session.timeout.ms exceeded — no heartbeats sent, likely because of an application crash or a network error
Consumer shuts down
Pod relocation — Kubernetes relocates pods sometimes, e.g. if nodes are removed via kubectl drain or the cluster is scaled down. The consumer shuts down (leaves the group) and is restarted again on another node (joins the group).
Application scale-down
If you would like to understand more in depth. Here is one of the amazing article I have read
https://medium.com/bakdata/solving-my-weird-kafka-rebalancing-problems-c05e99535435

Duplicate message consumption in Kafka due to auto-downscaling/deletion of pods

Background
We have a simple producer/consumer style application with Kafka as the message broker and Consumer Processes running as Kubernetes pods. We have defined two topics namely the in-topic and the out-topic. A set of consumer pods that belong to the same consumer group read messages from the in-topic, perform some work and finally write out the same message (key) to the out-topic once the work is complete.
Issue Description
We noticed that there are duplicate messages being written out to the out-topic by the consumers that are running in the Kubernetes pods. To rephrase, two different consumers are consuming the same messages from the in-topic twice and thus publishing the same message twice to the out-topic as well. We analyzed the issue and can safely conclude that this issue only occurs when pods are auto-downscaled/deleted by Kubernetes.
In fact, an interesting observation we have is that if any message is read by two different consumers from the in-topic (and thus published twice in the out-topic), the given message is always the last message consumed by one of the pods that was downscaled. In other words, if a message is consumed twice, the root cause is always the downscaling of a pod.
We can conclude that a pod is getting downscaled after a consumer writes the message to the out-topic but before Kafka can commit the offset to the in-topic.
Consumer configuration
props.put(ConsumerConfig.ENABLE_AUTO_COMMIT_CONFIG, "true");
props.put(ConsumerConfig.MAX_POLL_INTERVAL_MS_CONFIG, "3600000");
props.put(ConsumerConfig.AUTO_OFFSET_RESET_CONFIG, "latest");
props.put(ConsumerConfig.KEY_DESERIALIZER_CLASS_CONFIG,"org.apache.kafka.common.serialization.StringDeserializer");
props.put(ConsumerConfig.VALUE_DESERIALIZER_CLASS_CONFIG"org.apache.kafka.common.serialization.StringDeserializer")
Zookeeper/broker logs :
[2021-04-07 02:42:22,708] INFO [GroupCoordinator 0]: Preparing to rebalance group PortfolioEnrichmentGroup14 in state PreparingRebalance with old generation 1 (__consumer_offsets-17) (reason: removing member PortfolioEnrichmentConsumer13-9aa71765-2518-
493f-a312-6c1633225015 on heartbeat expiration) (kafka.coordinator.group.GroupCoordinator)
[2021-04-07 02:42:23,331] INFO [GroupCoordinator 0]: Stabilized group PortfolioEnrichmentGroup14 generation 2 (__consumer_offsets-17) (kafka.coordinator.group.GroupCoordinator)
[2021-04-07 02:42:23,335] INFO [GroupCoordinator 0]: Assignment received from leader for group PortfolioEnrichmentGroup14 for generation 2 (kafka.coordinator.group.GroupCoordinator)
What we tried
Looking at the logs, it was clear that rebalancing takes place because of the heartbeat expiration. We added the following configuration parameters to increase the heartbeat and also increase the session time out :
props.put(ConsumerConfig.HEARTBEAT_INTERVAL_MS_CONFIG, "10000")
props.put(ConsumerConfig.AUTO_OFFSET_RESET_CONFIG, "latest");
props.put(ConsumerConfig.SESSION_TIMEOUT_MS_CONFIG, "900000");
props.put(ConsumerConfig.MAX_PARTITION_FETCH_BYTES_CONFIG, "512");
props.put(ConsumerConfig.MAX_POLL_RECORDS_CONFIG, "1");
However, this did not solve the issue. Looking at the broker logs, we can confirm that the issue is due to the downscaling of pods.
Question : What could be causing this behavior where a message is consumed twice when a pod gets downscaled?
Note : I already understand the root cause of the issue; however, considering that a consumer is a long lived process running in an infinite loop, how and why is Kubernetes downscaling/killing a pod before the consumer commits the offset? How do I tell Kubernetes not to remove a running pod from a consumer group until all Kafka commits are completed?
"What could be causing this behavior where a message is consumed twice when a pod gets downscaled?"
You have provided the answer already yourself: "[...] that a pod is getting downscaled after a consumer writes the message to the out-topic but before Kafka can commit the offset to the in-topic."
As the message was processed but not committed, another pod is re-processing the same message again after the downscaling happens. Remember that adding or removing a consumer from a consumer group always initiates a Rebalancing. You have now first-hand experience why this should generally be avoided as much as feasible. Depending on the Kafka version a rebalance will cause every single consumer of the consumer group to stop consuming until the rebalancing is done.
To solve your issue, I see two options:
Only remove running pods out of the Consumer Group when they are idle
Reduce the consumer configuration auto.commit.interval.ms to 1 as this defaults to 5 seconds. This will only work if you set enable.auto.commit to true.
If you want your consumer to commit message/s before exiting you would need to handle exit signal to your consumer. A lot of languages do support this. Have a look at this thread on how to do this in java - How to finish kafka consumer safety?(Is there meaning to call thread#join inside shutdownHook ? ).
That being said, please note that there is no 100% guarantee to achieving exactly once. Your process can be killed forcefully by OS before even given time to run any exit clean up (kill -9 <process_id>.

How client will automatically detect a new leader when the primary one goes down in Kafka?

Consider the below scenario:
I have a Kakfa broker cluster(localhost:9002,localhost:9003,localhost:9004,localhost:9005).
Let's say localhost:9002 is my primary(leader) for the cluster.
Now my producer is producing data and sending it to the broker(localhost:9002).
If my primary broker(localhost:9002) goes down, with the help of Zookeeper or some other consensus algorithm new leader will be elected(consider localhost:9003 is now the new leader).
So, in the above scenario can someone please explain to me how the Kafka client(producer) will get notified about the new broker configuration(localhost:9003) and how it will connect to the new leaders and start producing data again.
Kafka clients are receiving the necessary meta information from the cluster automatically on each request when reading from or writing to a topic in case of a leadership change.
In general, the client sends a (read/write) request to one of the bootstrap server, listed in the configuration bootstrap.servers. This initial request (hence called bootstrap) returns the details on which broker the topic partition leader is located so that the client can communicate directly with that broker. Each individual broker contains all meta information for the entire cluster, meaning also having the knowledge on the partition leader of other brokers.
Now, if one of your broker goes down and the leadership of a topic partition switches, your producer will get notified about it through that mechanism.
There is a KafkaProducer configuration called metadata.max.age.ms which you can modify to update metadata on your producer even if there is no leadership change happening:
"Controls how long the producer will cache metadata for a topic that's idle. If the elapsed time since a topic was last produced to exceeds the metadata idle duration, then the topic's metadata is forgotten and the next access to it will force a metadata fetch request."
Just a few notes on your question:
The term "Kafka broker cluster" does not really exists. You have a Kafka cluster containing one or multiple Kafka brokers.
You do not have a broker as a "primary(leader) for the cluster" but you have for each TopicPartition a leader. Maybe you mean the Controller which is located on one of the brokers within your cluster.

Confluent.Kafka.KafkaException: Broker: Specified group generation id is not valid

Environment
3-node Kafka Cluster
Amazon MSK
v2.3
1 topic
6 partitions
1 consumer group with 2 consumers
Running in Kubernetes
Confluent .NET SDK 1.2.2
Except for bootstrap.servers and group.id, all of the default settings.
Problem
First, one of my consumers encounters the following exception.
Confluent.Kafka.KafkaException: Broker: Specified group generation id is not valid
at Confluent.Kafka.Impl.SafeKafkaHandle.Commit(IEnumerable`1 offsets)
at Confluent.Kafka.Consumer`2.Commit(IEnumerable`1 offsets)
The exception is trapped and the consumer is supposed to retry, but instead the app sits idle. The container is still up and running, but not consuming any more messages.
What's weirder is that the broker never reassigns that consumer's partitions so the consumer lag on those partitions begins to grow. It seems like the consumer is both alive (since the broker is not reassigning its partitions) and dead (since it cannot commit its offset or consume more messages). If we intervene and manually restart the consumers then the partitions are reassigned and the situation goes back to normal.
I'm not entirely sure what to make of the exception above. Google doesn't offer much. The most relevant lead I have is this issue in GitHub, which involves a broker restarting. To my knowledge, that is not happening in my situation. Any assistance would be greatly appreciated.
at least I have found a solution for me.
In my code I did a manual commit and set EnableAutoCommit = false.
Somehow it was possible that for an offset a commit was executed twice. I removed the manual commits on the consumer and set EnableAutoCommit = true.
After that it worked.

Apache Kafka Topic Partition Message Handling

I'm a bit confused on the Topic partitioning in Apache Kafka. So I'm charting down a simple use case and I would like to know what happens in different scenarios. So here it is:
I have a Topic T that has 4 partitions TP1, TP2, TP4 and TP4.
Assume that I have 8 messages M1 to M8. Now when my producer sends these messages to the topic T, how will they be received by the Kafka broker under the following scenarios:
Scenario 1: There is only one kafka broker instance that has Topic T with the afore mentioned partitions.
Scenario 2: There are two kafka broker instances with each node having same Topic T with the afore mentioned partitions.
Now assuming that kafka broker instance 1 goes down, how will the consumers react? I'm assuming that my consumer was reading from broker instance 1.
I'll answer your questions by walking you through partition replication, because you need to learn about replication to understand the answer.
A single broker is considered the "leader" for a given partition. All produces and consumes occur with the leader. Replicas of the partition are replicated to a configurable amount of other brokers. The leader handles replicating a produce to the other replicas. Other replicas that are caught up to the leader are called "in-sync replicas." You can configure what "caught up" means.
A message is only made available to consumers when it has been committed to all in-sync replicas.
If the leader for a given partition fails, the Kafka coordinator will elect a new leader from the list of in-sync replicas and consumers will begin consuming from this new leader. Consumers will have a few milliseconds of added latency while the new leader is elected. A new coordinator will also be elected automatically if the coordinator fails (this adds more latency, too).
If the topic is configured with no replicas, then when the leader of a given partition fails, consumers can't consume from that partition until the broker that was the leader is brought back online. Or, if it is never brought back online, the data previously produced to that partition will be lost forever.
To answer your question directly:
Scenario 1: if replication is configured for the topic, and there exists an in-sync replica for each partition, a new leader will be elected, and consumers will only experience a few milliseconds of latency because of the failure.
Scenario 2: now that you understand replication, I believe you'll see that this scenario is Scenario 1 with a replication factor of 2.
You may also be interested to learn about acks in the producer.
In the producer, you can configure acks such that the produce is acknowledged when:
the message is put on the producer's socket buffer (acks=0)
the message is written to the log of the lead broker (acks=1)
the message is written to the log of the lead broker, and replicated to all other in-sync replicas (acks=all)
Further, you can configure the minimum number of in-sync replicas required to commit a produce. Then, in the event when not enough in-sync replicas exist given this configuration, the produce will fail. You can build your producer to handle this failure in different ways: buffer, retry, do nothing, block, etc.