Does Kafka Consumer reconnect and resubscribe to topics after cluster goes down and comes back up - apache-kafka

kafka consumer using librdkafka (high level consumer) connected to kafka cluster and subscribed to 10 topics and consuming data. There was assign partitions event.
There was network issue due to which cluster was not reachable. Lost connection with group co-ordinator and heartbeats got stuck. There was revoked partitions event where the code calls unassign on consumer.
When cluster came back up, the consumer was not consuming any data although its in the while true loop calling consume with timeout of 1 sec.
Does consumer needs to resubscribe to topics again once it is connected to cluster? What is a reliable way to detect the consumer is connected to cluster in code?

Does consumer needs to resubscribe to topics again once it is connected to cluster?
Yes. New group members will cause a rebalance amongst existing members, and they need to resubscribe
What is a reliable way to detect the consumer is connected to cluster in code?
You could describe the consumer group and see if there are active clients for the consumer group you are interested in

Related

kafka consumers in consumer group not resuming messages after restart

Hope you are having good day.
I have an issue with kafka consumers on kubernetes. I am running 3 replicas inside a consumer group
I have a topic with 3 partitions and 3 brokers with offsets replication factor set to 3. My offset in consumer group is set to earliest.
When I start the consumer group, all are working fine with each consumer replica taking different partition and processing the data.
Issue: When by any means if a consumer replica inside the consumer group "abc-consumer-group" restarts OR if a broker(leader) restarts, it is not resuming from the point where it stopped. It states that I am up to date and no messages I have to process.
Any suggestions please where to look at?
Tried increasing rebalance, heartbeat, session timeout on broker level, no luck.
And yes whenever any new consumer is added or removed to the consumer group rebalacing is taken care by kafka. I do see it happening but still not consumers are not resuming messages. It states nothing to process.

Weired Kafka Consumer Group Re-balancing

My kafka topic has two partitions and single kafka consumer. I have deployed my consumer application(spring kafka) in the AWS. In logs I see kafka consumer re-balance in time to time. This is not frequent. As per the current observation when consumer is listening to the topic and idle this re-balancing occurs. Appreciate if someone can explain me this behavior. I have posted some logs here.
[Consumer clientId=consumer-b2b-group-1, groupId=b2b-group] Request joining group due to: group is already rebalancing
[Consumer clientId=consumer-b2b-group-1, groupId=b2b-group] Revoke previously assigned partitions order-response-v3-qa-0, order-response-v3-qa-1
[Consumer clientId=consumer-b2b-group-1, groupId=b2b-group] Revoke previously assigned partitions order-response-v3-qa-0, order-response-v3-qa-1
b2b-group: partitions revoked: [order-response-v3-qa-0, order-response-v3-qa-1
[Consumer clientId=consumer-b2b-group-1, groupId=b2b-group] (Re-)joining group
Re-balancing is a feature that automatically optimizes uneven workloads as well as topology changes (e.g., adding or removing brokers). This is achieved via a background process that continuously checks a variety of metrics to determine if and when a to rebalance should occur.
you can go through the below link for further knowledge:
https://medium.com/streamthoughts/apache-kafka-rebalance-protocol-or-the-magic-behind-your-streams-applications-e94baf68e4f2
Kafka starts a rebalancing if a consumer joins or leaves a group. Below are various reasons why that can or will happen.
A consumer joins a group:
Application Start/Restart — If we deploy an application (or restart it), a new consumer joins the group
Application scale-up — We are creating new pods/application
A consumer leaves a group:
max.poll.interval.ms exceeded — polled records not processed in time
session.timeout.ms exceeded — no heartbeats sent, likely because of an application crash or a network error
Consumer shuts down
Pod relocation — Kubernetes relocates pods sometimes, e.g. if nodes are removed via kubectl drain or the cluster is scaled down. The consumer shuts down (leaves the group) and is restarted again on another node (joins the group).
Application scale-down
If you would like to understand more in depth. Here is one of the amazing article I have read
https://medium.com/bakdata/solving-my-weird-kafka-rebalancing-problems-c05e99535435

Is it possible to kill a consumer from the Kafka server?

When I check for consumer lag, it shows that a particular consumer-id is running from a particular host consuming from a topic.
But when I go to that host there is no such consumer running.
How do I kill this consumer-id, so that I can reset consumer offset for the group that its part of.
Kafka server version: 0.11.0.1
Kafka client version(stuck): 0.10.0.2
This consumer-id got stuck in the first place as it was not able to consume messages because of some messages having headers in Kafka.
I've already tried the following:
Consuming from a different host and different Kafka version, it consumes messages but the consumer-id, host does not change.
Restarting kafka broker which is the leader for that topic.
Changing security groups to prevent the host from connecting to my broker.
Perhaps what you see is not a consumer id, but a consumer group, see Kafka docs, consumer config to learn about the difference.
Kafka uses consumer groups to keep track of the last consumed message (consumer offset), so when talking about the consumer lag this is probably the explanation.
This means there is no consumer running and you only need to get rid of the consumer offset for this group. See e.g. How do I delete a Kafka Consumer Group to reset offsets?

Apache Kafka Cluster Consumer Fail Over

I had a set up of 3 Node Zookeeper and 3 Broker Cluster when one of my brokers goes down in Cluster, the producer is not giving any Error but, consumers will throw an error saying that...
Marking coordinator Dead for the group... Discovered coordinator for
the group.
According to my knowledge if any one Broker available across the cluster I should not be stopped consuming messages.
But, as of now Server.1, server.2, server.3 if my server.2 goes down my all consumers stops consuming messages.
What are the exact parameters to set to achieve failover of producers and as well as consumers?
if my server.2 goes down my all consumers stops consuming messages.
For starters, you disable unclear leader election in the brokers, and create your topics with --replication-factor=3 and a configuration of min.insync.replicas=2.
To ensure that a producer has at least two durable writes (as set by the in-sync replcicas), then set acks=all
Then, if any broker fails, and assuming a leader election does not have any error, a producer and consumer should seemlessly re-connect to the new leader TopicPartitions.

What happens to consumer groups in Kafka if the entire cluster goes down?

We have a consumer service that is always trying to read data from a topic using a consumer group. Due to redeployments, our Kafka cluster periodically is brought down and recreated again.
Whenever the cluster comes back again, we observed that although the previous topics are picked up (probably from zookeeper), the previous consumer groups are not created. Because of this, our running consumer process which is created with a previous consumer group gets stuck and never comes out.
Is this how the behavior of the consumer groups should be or is there a configuration we need to enable somewhere?
Any help is greatly appreciated.
Kafka Brokers keep a cache of healthy consumers and consumer groups, if the entire cluster is destroyed/recreated it no longer has knowledge of those consumers and groups, including offsets. The consumers will have to reconnect and re-establish the group and offsets from the beginning of the topic.
Operationally it makes more sense to keep the Kafka cluster running long-term, and do version upgrades in a rolling fashion so you don't interrupt the service.