How kafka commit work in a consumer group? - apache-kafka

I'm very much new in Apache Kafka. So, I'm facing some issues.
If one consumer Auto-Commit is enabled and that machine is consumed a data then other new consumers on that group is not receiving those message.
I have 3 consumer, 1 producer. Here, every single consumer is running on different machine. I'm consuming messages from a single machine with auto-commit enabled for testing purpose. And after that when I'm running another two consumer machine then they are not receiving those messages.
As other two consumers did not received those messages so I want to receive those message for this two consumer.
So, How Kafka commit works for a consumer group?

Related

Does Kafka Consumer reconnect and resubscribe to topics after cluster goes down and comes back up

kafka consumer using librdkafka (high level consumer) connected to kafka cluster and subscribed to 10 topics and consuming data. There was assign partitions event.
There was network issue due to which cluster was not reachable. Lost connection with group co-ordinator and heartbeats got stuck. There was revoked partitions event where the code calls unassign on consumer.
When cluster came back up, the consumer was not consuming any data although its in the while true loop calling consume with timeout of 1 sec.
Does consumer needs to resubscribe to topics again once it is connected to cluster? What is a reliable way to detect the consumer is connected to cluster in code?
Does consumer needs to resubscribe to topics again once it is connected to cluster?
Yes. New group members will cause a rebalance amongst existing members, and they need to resubscribe
What is a reliable way to detect the consumer is connected to cluster in code?
You could describe the consumer group and see if there are active clients for the consumer group you are interested in

Is it possible to kill a consumer from the Kafka server?

When I check for consumer lag, it shows that a particular consumer-id is running from a particular host consuming from a topic.
But when I go to that host there is no such consumer running.
How do I kill this consumer-id, so that I can reset consumer offset for the group that its part of.
Kafka server version: 0.11.0.1
Kafka client version(stuck): 0.10.0.2
This consumer-id got stuck in the first place as it was not able to consume messages because of some messages having headers in Kafka.
I've already tried the following:
Consuming from a different host and different Kafka version, it consumes messages but the consumer-id, host does not change.
Restarting kafka broker which is the leader for that topic.
Changing security groups to prevent the host from connecting to my broker.
Perhaps what you see is not a consumer id, but a consumer group, see Kafka docs, consumer config to learn about the difference.
Kafka uses consumer groups to keep track of the last consumed message (consumer offset), so when talking about the consumer lag this is probably the explanation.
This means there is no consumer running and you only need to get rid of the consumer offset for this group. See e.g. How do I delete a Kafka Consumer Group to reset offsets?

two kafka consumer in same group and one partition

I have kafka topic and one partiton .When i create consumer missed to mention the group .id . there are 2 different consumer for the same partition .
I can receive messages and my flink source extends FlinkKafkaConsumer011.
Ideally one consumer should receive message since its a same group but in my case both the consumers are receiving messages not sure why ???.
To test have restarted one of my job and newly started consumer is not picking where it has left because other consumer is commiting offset it seems.
Flink 1.6.0
Kafka 0.11.2

Kafka consumer is not reading from only one partition out of 4

I was using Kafka 0.9 and recently migrated to Kafka 1.0, but the client I am using is still 0.9. Irrespective of this I was facing a problem where our consumers sometimes intermittently stop consuming from one or two of the partitions.
I have 5 consumers reading from 24 partitions, these are consumer JVM threads created from an application deployed in the single server. Frequently one of the consumer (thread) will stop reading from one of the partitions it would be consuming from.
Eg: One consumer thread would be reading from partition 1,2,3,and 4. It will stop reading from partition 1 and end up in building the lag. I have to restart the consumer to start picking those messages from that particular partition.
I want to understand the issue here.
My consumer configuration
session.timeout.ms=150000
request.timeout.ms=300000
max.partition.fetch.bytes=153600

Messages sent to all consumers with the same consumer group name

There is following consumer code:
from kafka.client import KafkaClient
from kafka.consumer import SimpleConsumer
kafka = KafkaClient("localhost", 9092)
consumer = SimpleConsumer(kafka, "my-group", "my-topic")
consumer.seek(0, 2)
for message in consumer:
print message
kafka.close()
Then I produce message with script:
bin/kafka-console-producer.sh --broker-list localhost:9092 --topic my-topic
The thing is that when I start consumers as two different processes then I receive new messages in each process. However I want it to be sent to only one consumer, not broadcasted.
In documentation of Kafka (https://kafka.apache.org/documentation.html) there is written:
If all the consumer instances have the same consumer group, then this
works just like a traditional queue balancing load over the consumers.
I see that group for these consumers is the same - my-group.
How to make it so that new message is read by exactly one consumer instead of broadcasting it?
the consumer-group API was not officially supported untilĀ kafka v. 0.8.1 (released Mar 12, 2014). For server versions prior, consumer groups do not work correctly. And as of this post the kafka-python library does not currently attempt to send group offset data:
https://github.com/mumrah/kafka-python/blob/c9d9d0aad2447bb8bad0e62c97365e5101001e4b/kafka/consumer.py#L108-L115
Its hard to tell from the example above what your Zookeeper configuration is or if there's one at all. You'll need a Zookeeper cluster for the consumer group information to be persisted WRT what consumer within each group has consumed to a given offset.
A solid example is here:
Official Kafka documentation - Consumer Group Example
This should not happen - make sure that both of the consumers are being registered under the same consumer group in the zookeeper znodes. Each message to a topic should be consumed by a consumer group exactly once, so one consumer out of everyone in the group should receive the message, not what you are experiencing. What version of Kafka are you using?