Kafka Broker leader change without effect - apache-kafka

I have 3 kafka brokers, with 3 partitions :
broker.id 1001: 10.18.0.73:9092 LEADER
broker.id 1002: 10.18.0.73:9093
broker.id 1005: 10.18.0.73:9094
Zookeeper set with 127.0.0.1:2181
Launch with:
1001 -> .\kafka-server-start.bat ..\..\config\server.properties
1002 -> .\kafka-server-start.bat ..\..\config\server1.properties
1005 -> .\kafka-server-start.bat ..\..\config\server2.properties
This is server.properties
broker.id=-1
listeners=PLAINTEXT://10.18.0.73:9092
advertised.listeners=PLAINTEXT://10.18.0.73:9092
num.network.threads=3
num.io.threads=8
socket.send.buffer.bytes=102400
socket.receive.buffer.bytes=102400
socket.request.max.bytes=104857600
log.dirs=/tmp/kafka-logs
num.partitions=1
num.recovery.threads.per.data.dir=1
offsets.topic.replication.factor=1
transaction.state.log.replication.factor=1
transaction.state.log.min.isr=1
log.retention.hours=168
log.segment.bytes=1073741824
log.retention.check.interval.ms=300000
zookeeper.connect=10.18.0.73:2181
zookeeper.connection.timeout.ms=6000
group.initial.rebalance.delay.ms=0
advertised.port=9092
advertised.host.name=10.18.0.73
port=9092
This is server1.properties
broker.id=-1
listeners=PLAINTEXT://10.18.0.73:9093
advertised.listeners=PLAINTEXT://10.18.0.73:9093
num.network.threads=3
num.io.threads=8
socket.send.buffer.bytes=102400
socket.receive.buffer.bytes=102400
socket.request.max.bytes=104857600
log.dirs=/tmp/kafka-logs4
num.partitions=1
num.recovery.threads.per.data.dir=1
offsets.topic.replication.factor=1
transaction.state.log.replication.factor=1
transaction.state.log.min.isr=1
log.retention.hours=168
log.segment.bytes=1073741824
log.retention.check.interval.ms=300000
zookeeper.connect=localhost:2181
zookeeper.connection.timeout.ms=6000
group.initial.rebalance.delay.ms=0
advertised.port=9093
advertised.host.name=10.18.0.73
port=9093
This is server2.properties
broker.id=-1
listeners=PLAINTEXT://10.18.0.73:9094
advertised.listeners=PLAINTEXT://10.18.0.73:9094
num.network.threads=3
num.io.threads=8
socket.send.buffer.bytes=102400
socket.receive.buffer.bytes=102400
socket.request.max.bytes=104857600
log.dirs=/tmp/kafka-logs2
num.partitions=1
num.recovery.threads.per.data.dir=1
offsets.topic.replication.factor=1
transaction.state.log.replication.factor=1
transaction.state.log.min.isr=1
log.retention.hours=168
log.segment.bytes=1073741824
log.retention.check.interval.ms=300000
zookeeper.connect=localhost:2181
zookeeper.connection.timeout.ms=6000
group.initial.rebalance.delay.ms=0
advertised.port=9094
advertised.host.name=10.18.0.73
port=9094
in folder C:\kafka_2.12-2.4.0\config
Run All
Run Producer
.\kafka-console-producer.bat --broker-list 10.18.0.73:9092,10.18.0.73:9093,10.18.0.73:9094 --topic clinicaleventmanager
Run Consumer
.\kafka-console-consumer.bat --bootstrap-server 10.18.0.73:9092,10.18.0.73:9093,10.18.0.73:9094 --topic clinicaleventmanager
I launch a test message
Receive ok!
Now, i shutdown broker 1001 (the leader)
The new leader is 1002
In the consumer this message appeared for 1 second, I imagine for the time necessary for the election of the new leader
[2020-01-16 15:33:35,802] WARN [Consumer clientId=consumer-console-consumer-56669-1, groupId=console-consumer-56669] Connection to node 2147482646 (/10.18.0.73:9092) could not be established. Broker may not be available. (org.apache.kafka.clients.NetworkClient)
if I try to send another message, this is not read by the consume
The new leader 1002 does not appear to be sending messages.
Why?
If i run 1001 broker.id, works all.
Thanks

First, Kafka never "sends (pushes) messages", the consumer asks for them.
Second, it would seem you've changed nothing but the listeners, port, and log dir.
You don't explicitly create any topic, so you would end up with the defaults of one partition and one replica. For your topic and the internal consumer offsets topic
If any replica is offline from the broker you stopped, then no other process can read (or write) to that replica, regardless of which broker is the controller.
So, change the offsets (and transactions) replication factor to 3 and try again

Related

Kafka multi-datacenter solution

I have 2 services ( 1 producer who writes 15 000 messages to kafka topic, and 1 consumer who reads this messages from that topic ) and i have streched 3 dc kafka cluster ( this 3 dc locates within same city, so latency is low )
to immitate 2 dc failure i'm simultaneously shutdown 2 kafkas ( systemctl kill through ansible ) so i have only 1 kafka up & running, i have acks=all and isr=3 and min isr=3, so in theory if even 1 kafka will be down all writes to kafka will stop
but in my case my service write to kafka with only 1 node alive!
why this happens?
here's my /etc/kafka/server.properties
zookeeper.connect=192.168.1.11:2181,192.168.1.12:2181,192.168.1.13:2181
log.dirs=/var/lib/kafka/data
broker.id=0
group.initial.rebalance.delay.ms=0
log.retention.check.interval.ms=30000
log.retention.hours=3
log.roll.hours=1
log.segment.bytes=1073741824
num.io.threads=16
num.network.threads=8
num.partitions=1
num.recovery.threads.per.data.dir=2
offsets.topic.replication.factor=3
socket.receive.buffer.bytes=1024000
socket.request.max.bytes=104857600
socket.send.buffer.bytes=1024000
transaction.state.log.min.isr=3
transaction.state.log.replication.factor=3
zookeeper.connection.timeout.ms=10000
delete.topic.enable=True
replica.fetch.max.bytes=5242880
max.message.bytes=5242880
message.max.bytes=5242880
default.replication.factor=3
min.insync.replicas=3
replica.fetch.wait.max.ms=200
replica.lag.time.max.ms=1000
advertised.listeners=PLAINTEXT://192.168.1.11:9092
unclean.leader.election=false
acks=all

Kafka Broker Issue (Replica Manager with max size)

I am seeing the following errors in my kafka env. It works for a few hours and then chokes.
20200224;21:01:38: [2020-02-24 21:01:38,615] ERROR [ReplicaManager broker=0] Error processing fetch with max size 1048576 from consumer on partition SANDBOX.BROKER.NEWORDER-0: (fetchOffset=211886, logStartOffset=-1, maxBytes=1048576, currentLeaderEpoch=Optional.empty) (kafka.server.ReplicaManager)
20200224;21:01:38: org.apache.kafka.common.errors.CorruptRecordException: Found record size 0 smaller than minimum record overhead (14) in file /data/tmp/kafka-topic-logs/SANDBOX.BROKER.NEWORDER-0/00000000000000000000.log.
20200224;21:05:48: [2020-02-24 21:05:48,711] INFO [GroupMetadataManager brokerId=0] Removed 0 expired offsets in 1 milliseconds. (kafka.coordinator.group.GroupMetadataManager)
20200224;21:10:22: [2020-02-24 21:10:22,204] INFO [GroupCoordinator 0]: Member xxxxxxxx_011-9e61d2c9-ce5a-4231-bda1-f04e6c260dc0-StreamThread-1-consumer-27768816-ee87-498f-8896-191912282d4f in group yyyyyyyyy_011 has failed, removing it from the group (kafka.coordinator.group.GroupCoordinator)
Setup:
1. Kafka broker (kafka_2.12-2.1.1/ )
1. Zookeeper
Config for kafka:
num.network.threads=3
num.io.threads=8
socket.send.buffer.bytes=102400
socket.receive.buffer.bytes=102400
socket.request.max.bytes=104857600
group.initial.rebalance.delay.ms=0
delete.topic.enable=true
socket.receive.buffer.bytes=102400
socket.request.max.bytes=104857600
log.dirs=/data/tmp/kafka-topic-logs
num.partitions=1
num.recovery.threads.per.data.dir=1
offsets.topic.replication.factor=1
transaction.state.log.replication.factor=1
transaction.state.log.min.isr=1
log.flush.interval.ms=1000
log.retention.hours=168
log.segment.bytes=1073741824
log.retention.check.interval.ms=300000
zookeeper.connect=localhost:2181
zookeeper.connection.timeout.ms=6000
group.initial.rebalance.delay.ms=0
delete.topic.enable=true
zookeper config
dataDir=/data/tmp/zookeeper
clientPort=2181
maxClientCnxns=0
delete.topic.enable=true

Why kafka cluster did error "Number of alive brokers '0' does not meet the required replication factor"?

I have 2 kafka brokers and 1 zookeeper. Brokers config: server.properties file:
1 broker:
auto.create.topics.enable=true
broker.id=1
delete.topic.enable=true
group.initial.rebalance.delay.ms=0
listeners=PLAINTEXT://5.1.2.3:9092
log.dirs=/opt/kafka_2.12-2.1.0/logs
log.retention.check.interval.ms=300000
log.retention.hours=168
log.segment.bytes=1073741824
max.message.bytes=105906176
message.max.bytes=105906176
num.io.threads=8
num.network.threads=3
num.partitions=10
num.recovery.threads.per.data.dir=1
offsets.topic.replication.factor=1
replica.fetch.max.bytes=105906176
socket.receive.buffer.bytes=102400
socket.request.max.bytes=105906176
socket.send.buffer.bytes=102400
transaction.state.log.min.isr=1
transaction.state.log.replication.factor=1
zookeeper.connect=5.1.3.6:2181
zookeeper.connection.timeout.ms=6000
2 broker:
auto.create.topics.enable=true
broker.id=2
delete.topic.enable=true
group.initial.rebalance.delay.ms=0
listeners=PLAINTEXT://18.4.6.6:9092
log.dirs=/opt/kafka_2.12-2.1.0/logs
log.retention.check.interval.ms=300000
log.retention.hours=168
log.segment.bytes=1073741824
max.message.bytes=105906176
message.max.bytes=105906176
num.io.threads=8
num.network.threads=3
num.partitions=10
num.recovery.threads.per.data.dir=1
offsets.topic.replication.factor=1
replica.fetch.max.bytes=105906176
socket.receive.buffer.bytes=102400
socket.request.max.bytes=105906176
socket.send.buffer.bytes=102400
transaction.state.log.min.isr=1
transaction.state.log.replication.factor=1
zookeeper.connect=5.1.3.6:2181
zookeeper.connection.timeout.ms=6000
if i ask zookeeper like this:
echo dump | nc zook_IP 2181
i got:
SessionTracker dump:
Session Sets (3):
0 expire at Sun Jan 04 03:40:27 MSK 1970:
1 expire at Sun Jan 04 03:40:30 MSK 1970:
0x1000bef9152000b
1 expire at Sun Jan 04 03:40:33 MSK 1970:
0x1000147d4b40003
ephemeral nodes dump:
Sessions with Ephemerals (2):
0x1000147d4b40003:
/controller
/brokers/ids/2
0x1000bef9152000b:
/brokers/ids/1
looke fine, but not works :(. Zookeeper see 2 brokers, but in first kafka broker we have error:
ERROR [KafkaApi-1] Number of alive brokers '0' does not meet the required replication factor '1' for the offsets topic (configured via 'offsets.topic.replication.factor'). This error can be ignored if the cluster is starting up and not all brokers are up yet. (kafka.server.KafkaApis)
also we use kafka_exporter for prometheus, and he log this error:
Cannot get oldest offset of topic Some.TOPIC partition 9: kafka server: Request was for a topic or partition that does not exist on this broker." source="kafka_exporter.go:296
pls help ! were i mistake in config ?
Are your clocks working? Zookeeper thinks it's 1970
Sun Jan 04 03:40:27 MSK 1970
You may want to look at the rest of the logs or see if Kafka and Zookeeper are actively running and ports are open.
In your first message, after starting a fresh cluster you see this, so it's not a true error
This error can be ignored if the cluster is starting up and not all brokers are up yet. (kafka.server.KafkaApis)
The properties you show, though, have listeners on entirely different subnets and you're not using advertised.listeners
Kafka broker.id changes maybe cause this problem. Clean up the kafka metadata under zk, note: kafka data will be lost
I got this error message in this situation :
Cluster talking in SSL
Every broker is a container
Updated the certificate with new password inside ALL brokers
Rolling update
After the first broker reboot, it spammed this error message and the broker controller talked about "a new broker connected but password verification failed".
Solutions :
Set the new certificate password with the old password
Down then Up your entire cluster at once
(not tested yet) Change the certificate on one broker, reboot it then move to the next broker until you reboot all of them (ending with the controller)

kafka 2.12 version failed kafka-logs is not in the form of topic-partition

Here is a detailed log:
配置文件
[root#mast-1 ~]# grep '^[a-zA-Z]' /opt/kafka/config/server.properties
broker.id=1
listeners=PLAINTEXT://10.0.0.11:9092
num.network.threads=6
num.io.threads=8
socket.send.buffer.bytes=102400
socket.receive.buffer.bytes=102400
socket.request.max.bytes=104857600
log.dirs=/opt/kafka/logs/
num.partitions=6
num.recovery.threads.per.data.dir=1
offsets.topic.replication.factor=1
transaction.state.log.replication.factor=1
transaction.state.log.min.isr=1
log.retention.hours=60
log.segment.bytes=1073741824
log.retention.check.interval.ms=300000
zookeeper.connect=10.0.0.11:2181,10.0.0.12:2181,10.0.0.14:2181
zookeeper.connection.timeout.ms=6000
group.initial.rebalance.delay.ms=0
ERROR There was an error in one of the threads during logs loading:
org.apache.kafka.common.KafkaException: Found directory
/opt/kafka_2.12-2.0.1/logs/kafka-logs, 'kafka-logs' is not in the form
of topic-partition or topic-partition.uniqueId-delete (if marked for
deletion). Kafka's log directories (and children) should only contain
Kafka topic data. (kafka.log.LogManager) [2018-11-30 04:16:29,108]
ERROR [KafkaServer id=1] Fatal error during KafkaServer startup.
Prepare to shutdown (kafka.server.KafkaServer)
org.apache.kafka.common.KafkaException: Found directory
/opt/kafka_2.12-2.0.1/logs/kafka-logs, 'kafka-logs' is not in the form
of topic-partition or topic-partition.uniqueId-delete (if marked for
deletion).

Multi node kafka cluster: Producer and Consumer not working

I have a kafka cluster consisting of two machines. This is my server.properties:
broker.id=2
listeners=PLAINTEXT://:9092
advertised.listeners=PLAINTEXT://a.b.c.d:9092
num.network.threads=3
num.io.threads=8
socket.send.buffer.bytes=102400
socket.receive.buffer.bytes=102400
socket.request.max.bytes=104857600
log.dirs=/tmp/kafka-logs
num.partitions=2
num.recovery.threads.per.data.dir=1
offsets.topic.replication.factor=1
transaction.state.log.replication.factor=1
transaction.state.log.min.isr=1
log.retention.hours=168
log.segment.bytes=1073741824
log.retention.check.interval.ms=300000
zookeeper.connect=a.b.c.d:2181,a.b.c.e:2181
zookeeper.connection.timeout.ms=6000
group.initial.rebalance.delay.ms=0
And this is my zookeeper.properties:
dataDir=/tmp/zookeeper
clientPort=2181
maxClientCnxns=0
tickTime=2000
server.1=a.b.c.d:2888:3888
server.2=a.b.c.e:2888:3888
initLimit=20
syncLimit=10
a.b.c.d = The IPs these machines have, e.g. 192.168.....
I start the zookeeper server on both machines using:
bin/zookeeper-server-start config/zookeeper.properties
I then start kafka servers on both the nodes. After this, I am able to create a new topic and get its details using --describe. However I am unable to read from consumer or write to producer. I run these by:
bin/kafka-console-consumer --bootstrap-server a.b.c.d:9092,a.b.c.e:9092 --topic randomTopic --from beginning
bin/kafka-console-producer --broker-list a.b.c.d:9092,a.b.c.e:9092 --topic randomTopic
When I run the producer, the prompt(>) appears and I can write into it. However, kafka cannot read anything from the consumer and the screen remains black.
How do I make the consumer read the data in the topics or make producer able to write the data in the topics?