kafka topic partition rebalance notification - apache-kafka

I'm using Kafka 0.8.1.1
Is there any API (callback etc.) using which the lost partitions or newly added partitions to a consumer could be found?

I'm using Kafka 0.9.1 API and there is an interface ConsumerRebalanceListener with two methods
public void onPartitionsRevoked(Collection partitions)
where
partitions The list of partitions that were assigned to the consumer on the last rebalance
public void onPartitionsAssigned(Collection partitions)
partitions The list of partitions that are now assigned to the consumer (may include partitions previously assigned to the consumer)

Related

Is Consumer Offset managed at Consumer group level or at the individual consumer inside that consumer group?

I am trying to find out is there any offsets at consumer group level as well. Is Consumer Offset is at Consumer group level or at the individual consumer inside that consumer group in Kafka ?
Offsets are tracked at ConsumerGroup level.
Imagine you have 4 consumer threads in one ConsumerGroup consuming from one topic with 4 partitions. If you now stop all 4 threads and restart just a single one with the same group the one threads will know where all 4 threads left off consuming and continue from there.
"you are saying one offset (basically a shared int/long value) will be shared/updated by all the consumers in a consumer group?"
Yes, this is correct. Remember that a single partition of a topic can be read only by one consumer thread within a group. Two consumer threads of the same ConsumerGroup will never consume a single topic partition at the same time. The offsets of the consumers groups are stored in an internal Kafka topic called __consumer_offsets. In this topic you basically have a key/value pair, where your key is basically the concatenation of
ConsumerGroup
Topic
Partition within Topic
and your value is the offset. This internal __consumer_offsets topic is available to all consumers so the information is shared.

Kafka Consumer multi tenancy

I am new to writing the Kafka consumer, I have scenario in case I have two consumer running under a same group id and I have two partitions.
Suppose that;
Consumer 1===>Linked to ====>Partition 1
Consumer 2===>Linked to ====>Partition 2
I case my consumer-2 is down how can I ensure that my Consumer-1 re-read all the event which came to partition 2 again, I just came across some thing regarding setConsumerRebalanceListener so I have set my container property for this, and for the onPartitionsAssigned method I am setting consumer.seekToBeginning(consumer.assignment())
Is this correct, does this line means my consumer-1 will read all the event from partition-2 as well when the consumer-2 is down and the partition-2 is reassigned to consumer?
I also will request if someone could share some good links where i can read the basics about ConsumerRebalanceListener.
public ConcurrentKafkaListenerContainerFactory<String, MultiTenancyOrgDataMessage> kafkaListenerContainerFactory() {
LOG.debug("ConcurrentKafkaListenerContainerFactory executing");
ConcurrentKafkaListenerContainerFactory<String, MultiTenancyOrgDataMessage> factory = new ConcurrentKafkaListenerContainerFactory<>();
factory.setConsumerFactory(consumerFactory());
factory.getContainerProperties().setAckMode(AckMode.MANUAL_IMMEDIATE);
factory.getContainerProperties().setConsumerRebalanceListener(new ConsumerAwareRebalanceListener() {
#Override
public void onPartitionsAssigned(Consumer<?, ?> consumer, Collection<TopicPartition> partitions) {
consumer.seekToBeginning(consumer.assignment()); // read topic from beginning on service restart
}
});
This is what committing is for - if consumer 2 goes down then any records it has consumed but not committed will be picked up by consumer 1 after a rebalance.
This is one reason that Kafka supports at least once semantics - after rebalance consumer 1 will pick up from the last committed offset and hence may process records that have been successfully processed by consumer 2 if it died before committing.
An example of why you might use a ConsumerRebalanceListener is to deal with pausing across a rebalance - I have written about this at https://chrisg23.blogspot.com/2020/02/why-is-pausing-kafka-consumer-so.html?m=1

What Happens when there is only one partition in Kafka topic and multiple consumers?

I have a Kafka Topic with only one partition and I am not getting what will happen in following cases? How messages will be delivered to consumers?
If all consumers are in same group
If all consumers are in different group
I am not sure if consumers will receive unique messages or duplicate ones.
Each Consumer subscribes to a/more partition in a topic. And each consumer belongs to a consumer group. Below are two scenarios:
When all consumers belong to the same group : Each consumer will try to subscribe to a different partition. In case,if there is only one partition, only one consumer will get the messages, while other consumers will be idle.
When all consumers belong to the different consumer group: Each consumer will get the messages from all partitions. Partition subscription is based on the consumer groups.
It depends on the consumer groups. Consumers within the same consumer group don't read the data again from the same partitions once the read offsets have been committed.

Kafka Consumer seektoBeginning

I did not use a partition to publish to Kafka topic.
ProducerRecord(String topic, K key, V value)
In the consumer, I would like to go to the beginning.
seekToBeginning(Collection partitions)
Is it possible to seek to beginning without using a partition? Does Kafka assign a default partition?
https://kafka.apache.org/0102/javadoc/org/apache/kafka/clients/producer/ProducerRecord.html
https://kafka.apache.org/0102/javadoc/org/apache/kafka/clients/consumer/KafkaConsumer.html
When producing, if you don't explicitely specify a partition, the producer will pick one automatically from your topic.
In your consumer, if your are subscribed to your topic, you can seek to the start of all the partitions your consumer is currently assigned to using:
consumer.seekToBeginning(consumer.assignment())

kafka consumer reads the same message

I have a single Topic with 5 partitions.
I have 5 threads, each creating a Consumer
All consumer are with the same consumer group using group.id.
I also gave each consumer a different and unique client.id
I see that 2 consumers are reading the same message to process
Should kafka handle this?
How do I troubleshoot it?
Consumers within the same group should not receive the same messages. The partitions should be split across all consumers and at any time Kafka's consumer group logic ensures only 1 consumer is assigned to each partition.
The exception is if 1 consumer crashes before it's able to commit its offset. In that case, the new consumer that gets assigned the partition will re-consume from the last committed offset.
You can use the consumer group tool kafka-consumer-groups that comes with Kafka to check the partitions assigned to each consumer in your group.