I am trying the quickstart of kafka documentation,link is, https://kafka.apache.org/quickstart.
I have deploy 3 brokers and create a topic.
➜ kafka_2.10-0.10.1.0 bin/kafka-topics.sh --describe --zookeeper
localhost:2181 --topic my-replicated-topic
Topic:my-replicated-topic PartitionCount:1 ReplicationFactor:3
Configs:
Topic: my-replicated-topic Partition: 0 Leader: 2 Replicas: 2,0,1
Isr: 2,1,0
Then I use the "bin/kafka-console-producer.sh --broker-list localhost:9092 --topic my-replicated-topic" to test producer.
And use "bin/kafka-console-consumer.sh --bootstrap-server localhost:9092 --from-beginning --topic my-replicated-topic to test consumer"
the producer and consumer work well.
if I kill server 1 or 2, the producer and consumer work properly.
but if I kill server 0, and I type the message in producer terminal, the consumer can't read new messages.
when I kill server 0,the consumer print the log:
[2017-06-23 17:29:52,750] WARN Auto offset commit failed for group console-consumer-97540: Offset commit failed with a retriable exception. You should
retry committing offsets.
(org.apache.kafka.clients.consumer.internals.ConsumerCoordinator)
[2017-06-23 17:29:52,974] WARN Auto offset commit failed for group console-
consumer-97540: Offset commit failed with a retriable exception. You should
retry committing offsets.
(org.apache.kafka.clients.consumer.internals.ConsumerCoordinator)
[2017-06-23 17:29:53,085] WARN Auto offset commit failed for group console-
consumer-97540: Offset commit failed with a retriable exception. You should
retry committing offsets.
(org.apache.kafka.clients.consumer.internals.ConsumerCoordinator)
[2017-06-23 17:29:53,195] WARN Auto offset commit failed for group console-
consumer-97540: Offset commit failed with a retriable exception. You should
retry committing offsets.
(org.apache.kafka.clients.consumer.internals.ConsumerCoordinator)
[2017-06-23 17:29:53,302] WARN Auto offset commit failed for group console-
consumer-97540: Offset commit failed with a retriable exception. You should
retry committing offsets.
(org.apache.kafka.clients.consumer.internals.ConsumerCoordinator)
[2017-06-23 17:29:53,409] WARN Auto offset commit failed for group console-
consumer-97540: Offset commit failed with a retriable exception. You should
retry committing offsets.
(org.apache.kafka.clients.consumer.internals.ConsumerCoordinator)
then I restart the server 0,the consumer print the message and some warn logs:
hhhh
hello
[2017-06-23 17:32:32,795] WARN Auto offset commit failed for group console-
consumer-97540: Offset commit failed with a retriable exception. You should
retry committing offsets.
(org.apache.kafka.clients.consumer.internals.ConsumerCoordinator)
[2017-06-23 17:32:32,902] WARN Auto offset commit failed for group console-
consumer-97540: Offset commit failed with a retriable exception. You should
retry committing offsets.
(org.apache.kafka.clients.consumer.internals.ConsumerCoordinator)
This confused me.Why server 0 is so special, and the server 0 is not the leader.
And i noticed that server log printed by server 0 has much information as below:
[2017-06-23 17:32:33,640] INFO [Group Metadata Manager on Broker 0]: Finished
loading offsets from [__consumer_offsets,23] in 38 milliseconds.
(kafka.coordinator.GroupMetadataManager)
[2017-06-23 17:32:33,641] INFO [Group Metadata Manager on Broker 0]: Loading
offsets and group metadata from [__consumer_offsets,26]
(kafka.coordinator.GroupMetadataManager)
[2017-06-23 17:32:33,646] INFO [Group Metadata Manager on Broker 0]: Finished
loading offsets from [__consumer_offsets,26] in 4 milliseconds.
(kafka.coordinator.GroupMetadataManager)
[2017-06-23 17:32:33,646] INFO [Group Metadata Manager on Broker 0]: Loading
offsets and group metadata from [__consumer_offsets,29]
(kafka.coordinator.GroupMetadataManager)
but server1 and serve2 log don't have that content.
can somebody explains it for me ,thanks very much!
Solved:
The replication factor on the _consumer-offsets topic is the root cause. It's an issue: issues.apache.org/jira/browse/KAFKA-3959
kafka-console-producer defaults to acks = 1 so that's not fault tolerant at all. Add the flag or config parameter to set acks = all and if your topic and the _consumer-offsets topic were both created with replication factor of 3 your test will work.
The servers share their load for managing Consumer Groups.
Usually each independant consumer has a unique Consumer Group ID and you use the same Group ID when you want to split the consuming process between multiple consumers.
That being said: being the leader broker, for a Kafka server within the cluster, is just for coordination of other brokers. The leader has nothing to do (directly) with the server that is currently managing the Group ID and commits for a specific consumer!
So, whenever you subscribe, you are designated a server which will handle the offset commits for your group and this has nothing to do with leader election.
Shut down that server and you might have issue for your group consumption until the Kafka cluster stabilizes again (reallocates your consumer to move the Group management to other servers or waits for the nodes to respond again... I am not expert enough from there to tell you exactly how the failover happens).
Probably, the topic __consumer_offsets has the "Replicas" set to 0.
To confirm this, verify the topic __consumer_offsets:
kafka-topics.sh --bootstrap-server localhost:9092 --describe --topic __consumer_offsets
Topic: __consumer_offsets PartitionCount: 50 ReplicationFactor: 1 Configs: compression.type=producer,cleanup.policy=compact,segment.bytes=104857600
Topic: __consumer_offsets Partition: 0 Leader: 0 Replicas: 0 Isr: 0
Topic: __consumer_offsets Partition: 1 Leader: 0 Replicas: 0 Isr: 0
Topic: __consumer_offsets Partition: 2 Leader: 0 Replicas: 0 Isr: 0
Topic: __consumer_offsets Partition: 3 Leader: 0 Replicas: 0 Isr: 0
Topic: __consumer_offsets Partition: 4 Leader: 0 Replicas: 0 Isr: 0
...
Topic: __consumer_offsets Partition: 49 Leader: 0 Replicas: 0 Isr: 0
Notice the "Replicas: 0 Isr: 0". This is the reason when you stop the broker 0, the consumer doesn't get the messages anymore.
To correct this, you need to alter the "Replicas" of the topic __consumer_offsets, including the other brokers.
Create a json file like this (config/inc-replication-factor-consumer_offsets.json):
{"version":1,
"partitions":[
{"topic":"__consumer_offsets", "partition":0, "replicas":[0, 1, 2]},
{"topic":"__consumer_offsets", "partition":1, "replicas":[0, 1, 2]},
{"topic":"__consumer_offsets", "partition":2, "replicas":[0, 1, 2]},
{"topic":"__consumer_offsets", "partition":3, "replicas":[0, 1, 2]},
{"topic":"__consumer_offsets", "partition":4, "replicas":[0, 1, 2]},
{"topic":"__consumer_offsets", "partition":5, "replicas":[0, 1, 2]},
{"topic":"__consumer_offsets", "partition":6, "replicas":[0, 1, 2]},
{"topic":"__consumer_offsets", "partition":7, "replicas":[0, 1, 2]},
{"topic":"__consumer_offsets", "partition":8, "replicas":[0, 1, 2]},
{"topic":"__consumer_offsets", "partition":9, "replicas":[0, 1, 2]},
{"topic":"__consumer_offsets", "partition":10, "replicas":[0, 1, 2]},
{"topic":"__consumer_offsets", "partition":11, "replicas":[0, 1, 2]},
{"topic":"__consumer_offsets", "partition":12, "replicas":[0, 1, 2]},
{"topic":"__consumer_offsets", "partition":13, "replicas":[0, 1, 2]},
{"topic":"__consumer_offsets", "partition":14, "replicas":[0, 1, 2]},
{"topic":"__consumer_offsets", "partition":15, "replicas":[0, 1, 2]},
{"topic":"__consumer_offsets", "partition":16, "replicas":[0, 1, 2]},
{"topic":"__consumer_offsets", "partition":17, "replicas":[0, 1, 2]},
{"topic":"__consumer_offsets", "partition":18, "replicas":[0, 1, 2]},
{"topic":"__consumer_offsets", "partition":19, "replicas":[0, 1, 2]},
{"topic":"__consumer_offsets", "partition":20, "replicas":[0, 1, 2]},
{"topic":"__consumer_offsets", "partition":21, "replicas":[0, 1, 2]},
{"topic":"__consumer_offsets", "partition":22, "replicas":[0, 1, 2]},
{"topic":"__consumer_offsets", "partition":23, "replicas":[0, 1, 2]},
{"topic":"__consumer_offsets", "partition":24, "replicas":[0, 1, 2]},
{"topic":"__consumer_offsets", "partition":25, "replicas":[0, 1, 2]},
{"topic":"__consumer_offsets", "partition":26, "replicas":[0, 1, 2]},
{"topic":"__consumer_offsets", "partition":27, "replicas":[0, 1, 2]},
{"topic":"__consumer_offsets", "partition":28, "replicas":[0, 1, 2]},
{"topic":"__consumer_offsets", "partition":29, "replicas":[0, 1, 2]},
{"topic":"__consumer_offsets", "partition":30, "replicas":[0, 1, 2]},
{"topic":"__consumer_offsets", "partition":31, "replicas":[0, 1, 2]},
{"topic":"__consumer_offsets", "partition":32, "replicas":[0, 1, 2]},
{"topic":"__consumer_offsets", "partition":33, "replicas":[0, 1, 2]},
{"topic":"__consumer_offsets", "partition":34, "replicas":[0, 1, 2]},
{"topic":"__consumer_offsets", "partition":35, "replicas":[0, 1, 2]},
{"topic":"__consumer_offsets", "partition":36, "replicas":[0, 1, 2]},
{"topic":"__consumer_offsets", "partition":37, "replicas":[0, 1, 2]},
{"topic":"__consumer_offsets", "partition":38, "replicas":[0, 1, 2]},
{"topic":"__consumer_offsets", "partition":39, "replicas":[0, 1, 2]},
{"topic":"__consumer_offsets", "partition":40, "replicas":[0, 1, 2]},
{"topic":"__consumer_offsets", "partition":41, "replicas":[0, 1, 2]},
{"topic":"__consumer_offsets", "partition":42, "replicas":[0, 1, 2]},
{"topic":"__consumer_offsets", "partition":43, "replicas":[0, 1, 2]},
{"topic":"__consumer_offsets", "partition":44, "replicas":[0, 1, 2]},
{"topic":"__consumer_offsets", "partition":45, "replicas":[0, 1, 2]},
{"topic":"__consumer_offsets", "partition":46, "replicas":[0, 1, 2]},
{"topic":"__consumer_offsets", "partition":47, "replicas":[0, 1, 2]},
{"topic":"__consumer_offsets", "partition":48, "replicas":[0, 1, 2]},
{"topic":"__consumer_offsets", "partition":49, "replicas":[0, 1, 2]}
]
}
Execute the following command:
kafka-reassign-partitions.sh --bootstrap-server localhost:9092 --zookeeper localhost:2181 --reassignment-json-file ../config/inc-replication-factor-consumer_offsets.json --execute
Confirm the "Replicas":
kafka-topics.sh --bootstrap-server localhost:9092 --describe --topic __consumer_offsets
Topic: __consumer_offsets PartitionCount: 50 ReplicationFactor: 3 Configs: compression.type=producer,cleanup.policy=compact,segment.bytes=104857600
Topic: __consumer_offsets Partition: 0 Leader: 0 Replicas: 0,1,2 Isr: 0,2,1
Topic: __consumer_offsets Partition: 1 Leader: 0 Replicas: 0,1,2 Isr: 0,2,1
Topic: __consumer_offsets Partition: 2 Leader: 0 Replicas: 0,1,2 Isr: 0,2,1
Topic: __consumer_offsets Partition: 3 Leader: 0 Replicas: 0,1,2 Isr: 0,2,1
...
Topic: __consumer_offsets Partition: 49 Leader: 0 Replicas: 0,1,2 Isr: 0,2,1
Now you can stop only the broker 0, produce some messages and see the result on the consumer.