Kafka, Unable to produce and consume events - apache-kafka

While trying to set kafka on 2 replica and 1 master boxes, got a weird condition where I was not able to consume or produce to a topic.
Using Mirror Maker to sync data between replica <--> Master. Getting following logs unending :
[2016-08-26 14:28:33,897] WARN Bootstrap broker localhost:9092 disconnected (org.apache.kafka.clients.NetworkClient) [2016-08-26
14:28:43,515] WARN Bootstrap broker localhost:9092 disconnected
(org.apache.kafka.clients.NetworkClient) [2016-08-26 14:28:45,118]
WARN Bootstrap broker localhost:9092 disconnected
(org.apache.kafka.clients.NetworkClient) [2016-08-26 14:28:46,721]
WARN Bootstrap broker localhost:9092 disconnected
(org.apache.kafka.clients.NetworkClient) [2016-08-26 14:28:48,324]
WARN Bootstrap broker localhost:9092 disconnected
(org.apache.kafka.clients.NetworkClient) [2016-08-26 14:28:49,927]
WARN Bootstrap broker localhost:9092 disconnected
(org.apache.kafka.clients.NetworkClient) [2016-08-26 14:28:53,029]
WARN Bootstrap broker localhost:9092 disconnected
(org.apache.kafka.clients.NetworkClient)**
Only way I could recover was by restarting Kafka which produced this kind of logs :
[2016-08-26 14:30:54,856] WARN Found a corrupted index file, /tmp/kafka-logs/__consumer_offsets-43/00000000000000000000.index,
deleting and rebuilding index... (kafka.log.Log) [2016-08-26
14:30:54,856] INFO Recovering unflushed segment 0 in log
__consumer_offsets-43. (kafka.log.Log) [2016-08-26 14:30:54,857] INFO Completed load of log __consumer_offsets-43 with log end offset 0
(kafka.log.Log) [2016-08-26 14:30:54,860] WARN Found a corrupted index
file,
/tmp/kafka-logs/__consumer_offsets-26/00000000000000000000.index,
deleting and rebuilding index... (kafka.log.Log) [2016-08-26
14:30:54,860] INFO Recovering unflushed segment 0 in log
__consumer_offsets-26. (kafka.log.Log) [2016-08-26 14:30:54,861] INFO Completed load of log __consumer_offsets-26 with log end offset 0
(kafka.log.Log) [2016-08-26 14:30:54,864] WARN Found a corrupted index
file,
/tmp/kafka-logs/__consumer_offsets-35/00000000000000000000.index,
deleting and rebuilding index... (kafka.log.Log)**
ERROR Error when sending message to topic dr_ubr_analytics_limits with key: null, value: 1 bytes with error:
(org.apache.kafka.clients.producer.internals.ErrorLoggingCallback)
org.apache.kafka.common.errors.TimeoutException: Failed to update
metadata after 60000 ms.**
This is my test phase so I was able to restart and recover from the master box but I want know what caused this issue and how can it be avoided. Is there a way to debug this issue?
Trying to achieve following via Kafka

Related

Steps to delete data inside the Kafka Topic on Windows?

I am working on Spring Batch and Apache Kafka Integration. Before posting the question I went over web : Is there a way to delete all the data from a topic or delete the topic before every run? to find out better solution, but did not find out.
I am using Kafka version 2.11.
I want to delete all data under the topic without stopping either Zookeeper or Kafka. How can we do that ?
Below commands causes lot of issues in windows
C:\kafka_2.11-2.3.1\bin\windows>kafka-topics.bat --zookeeper localhost:2181 --delete --topic customers
Topic customers is marked for deletion.
Note: This will have no impact if delete.topic.enable is not set to true.
C:\kafka_2.11-2.3.1\bin\windows>kafka-topics.bat --zookeeper localhost:2181 --delete --topic test
C:\kafka_2.11-2.3.1\bin\windows>kafka-console-consumer.bat --bootstrap-server localhost:9092 --topic customers --from-beginning
[2020-04-21 10:25:02,812] WARN [Consumer clientId=consumer-1, groupId=console-consumer-65075] Connection to node -1 (localhost/127.0.0.1:9092) could not be established. Broker may not be available. (org.apache.kafka.clients.NetworkClient)
[2020-04-21 10:25:04,886] WARN [Consumer clientId=consumer-1, groupId=console-consumer-65075] Connection to node -1 (localhost/127.0.0.1:9092) could not be established. Broker may not be available. (org.apache.kafka.clients.NetworkClient)
[2020-04-21 10:25:06,996] WARN [Consumer clientId=consumer-1, groupId=console-consumer-65075] Connection to node -1 (localhost/127.0.0.1:9092) could not be established. Broker may not be available. (org.apache.kafka.clients.NetworkClient)
[2020-04-21 10:25:09,267] WARN [Consumer clientId=consumer-1, groupId=console-consumer-65075] Connection to node -1 (localhost/127.0.0.1:9092) could not be established. Broker may not be available. (org.apache.kafka.clients.NetworkClient)
[2020-04-21 10:25:11,744] WARN [Consumer clientId=consumer-1, groupId=console-consumer-65075] Connection to node -1 (localhost/127.0.0.1:9092) could not be established. Broker may not be available. (org.apache.kafka.clients.NetworkClient)
Processed a total of 0 messages
Terminate batch job (Y/N)?
^C
C:\kafka_2.11-2.3.1\bin\windows>
I am using Kafka version 2.11.
There is no Kafka 2.11. Your command prompt says kafka_2.11-2.3.1: hence, you are using Kafka 2.3.1. The 2.11 part is the Scala version that was used during compilation.
Note: This will have no impact if delete.topic.enable is not set to true.
Did you check your broker configs if delete.topic.enable is set to true? If yes, you should be able to delete a topic without stopping ZK or the brokers. Note though, that deleting topics is async, i.e., when you command returns the topic is not deleted yet and it will take some time until the command is executed.

Kafka send to azure event hub

I've set up a kafka in my machine and I'm trying to set up Mirror Maker to consume from a local topic and mirror it to an azure event hub, but so far i've been unable to do it and I get the following error:
ERROR Error when sending message to topic dev-eh-kafka-test with key: null, value: 5 bytes with error: (org.apache.kafka.clients.producer.internals.ErrorLoggingCallback)
org.apache.kafka.common.errors.TimeoutException: Failed to update metadata after 60000 ms.
After some time I realized that this must be the producer part so I tried to simply use the kafka-console-producer tool directly to event hub and got the same error.
Here is my producer settings file:
bootstrap.servers=dev-we-eh-feed.servicebus.windows.net:9093
compression.type=none
max.block.ms=0
# for event hub
sasl.mechanism=PLAIN
security.protocol=SASL_SSL
sasl.jaas.config=org.apache.kafka.common.security.plain.PlainLoginModule required username="$ConnectionString" password="Endpoint=sb://dev-we-eh-feed.servicebus.windows.net/;SharedAccessKeyName=RootManageSharedAccessKey;SharedAccessKey=*****”;
Here is the command to spin the producer:
kafka-console-producer.bat --broker-list dev-we-eh-feed.servicebus.windows.net:9093 --topic dev-eh-kafka-test
My event hub namespace has an event hub named dev-eh-kafka-test.
Has anyone been able to do it? Eventually the idea would be to SSL this with a certificate but first I need to be able to do the connection.
I tried using both Apacha Kafka 1.1.1 or the Confluent Kafka 4.1.3 (because this is the version the client is using).
==== UPDATE 1
Someone showed me how to get more logs and this seems to be the detailed version of the error
[2020-02-28 17:32:08,010] DEBUG [Producer clientId=console-producer] Initialize connection to node dev-we-eh-feed.servicebus.windows.net:9093 (id: -1 rack: null) for sending metadata request (org.apache.kafka.clients.NetworkClient)
[2020-02-28 17:32:08,010] DEBUG [Producer clientId=console-producer] Initiating connection to node dev-we-eh-feed.servicebus.windows.net:9093 (id: -1 rack: null) (org.apache.kafka.clients.NetworkClient)
[2020-02-28 17:32:08,010] DEBUG [Producer clientId=console-producer] Created socket with SO_RCVBUF = 32768, SO_SNDBUF = 102400, SO_TIMEOUT = 0 to node -1 (org.apache.kafka.common.network.Selector)
[2020-02-28 17:32:08,010] DEBUG [Producer clientId=console-producer] Completed connection to node -1. Fetching API versions. (org.apache.kafka.clients.NetworkClient)
[2020-02-28 17:32:08,010] DEBUG [Producer clientId=console-producer] Initiating API versions fetch from node -1. (org.apache.kafka.clients.NetworkClient)
[2020-02-28 17:32:08,010] DEBUG [Producer clientId=console-producer] Connection with dev-we-eh-feed.servicebus.windows.net/51.144.238.23 disconnected (org.apache.kafka.common.network.Selector)
java.io.EOFException
at org.apache.kafka.common.network.NetworkReceive.readFromReadableChannel(NetworkReceive.java:124)
at org.apache.kafka.common.network.NetworkReceive.readFrom(NetworkReceive.java:93)
at org.apache.kafka.common.network.KafkaChannel.receive(KafkaChannel.java:235)
at org.apache.kafka.common.network.KafkaChannel.read(KafkaChannel.java:196)
at org.apache.kafka.common.network.Selector.attemptRead(Selector.java:559)
at org.apache.kafka.common.network.Selector.pollSelectionKeys(Selector.java:495)
at org.apache.kafka.common.network.Selector.poll(Selector.java:424)
at org.apache.kafka.clients.NetworkClient.poll(NetworkClient.java:460)
at org.apache.kafka.clients.producer.internals.Sender.run(Sender.java:239)
at org.apache.kafka.clients.producer.internals.Sender.run(Sender.java:163)
at java.base/java.lang.Thread.run(Thread.java:830)
[2020-02-28 17:32:08,010] DEBUG [Producer clientId=console-producer] Node -1 disconnected. (org.apache.kafka.clients.NetworkClient)
So here is the configuration that worked (it seems I was missing client.id).
Also, it seems you can not choose the destination topic, it seems it must have the same name as the source...
bootstrap.servers=dev-we-eh-feed.servicebus.windows.net:9093
client.id=mirror_maker_producer
request.timeout.ms=60000
sasl.mechanism=PLAIN
security.protocol=SASL_SSL
sasl.jaas.config=org.apache.kafka.common.security.plain.PlainLoginModule required username="$ConnectionString" password="Endpoint=sb://dev-we-eh-feed.servicebus.windows.net/;SharedAccessKeyName=RootManageSharedAccessKey;SharedAccessKey=******";

Shutdown broker because all log dirs have failed

[2019-10-29 10:09:36,903] INFO [ReplicaManager broker=0] Broker 0 stopped fetcher for partitions __consumer_offsets-30,__consumer_offsets-8,__consumer_offsets-4,__consumer_offsets-27,__consumer_offsets-7,__consumer_offsets-46,__consumer_offsets-33,__consumer_offsets-23,__consumer_offsets-49,__consumer_offsets-36,__consumer_offsets-42,topic-0,__consumer_offsets-17,__consumer_offsets-48,__consumer_offsets-11,__consumer_offsets-14,__consumer_offsets-20,__consumer_offsets-0,__consumer_offsets-39,__consumer_offsets-45,__consumer_offsets-1,__consumer_offsets-26,__consumer_offsets-29,__consumer_offsets-10 and stopped moving logs for partitions because they are in the failed log directory C:\tmp\kafka-logs. (kafka.server.ReplicaManager)
[2019-10-29 10:09:36,908] INFO Stopping serving logs in dir C:\tmp\kafka-logs (kafka.log.LogManager)
[2019-10-29 10:09:36,952] ERROR Shutdown broker because all log dirs in C:\tmp\kafka-logs have failed (kafka.log.LogManager)
i have started zookeeper,Kafka and producer also. But when i tried to consume data immediately this error is coming in Windows
command: .\bin\windows\Kafka-console-consumer.bat --bootstrap-server localhost:9092 --topic topic
I had the similar issue and had to do trial and error. But what I eventually did was to disable the other JRE versions and leave only one enabled. See image attached. This seems to have resolved my problem since my broker doesn't crash anymore.

kafka_2.12-2.3.0 broker shutdown in windows 10

below is the error i am getting in the console while trying to start the kafka server with kafka-server-start command in command prompt .
ERROR Error while creating log for kafka_example-0 in dir
C:\Users\user11\Softwareskafka_2.12-2.3.0kafka_logs (kafka.server.LogDirFailureChannel)
java.io.IOException: The requested operation cannot be performed on a file with a user-mapped section open
at java.io.RandomAccessFile.setLength(Native Method)
at kafka.log.AbstractIndex.$anonfun$resize$1(AbstractIndex.scala:188)
at scala.runtime.java8.JFunction0$mcZ$sp.apply(JFunction0$mcZ$sp.java:23)
at kafka.utils.CoreUtils$.inLock(CoreUtils.scala:253)
at kafka.log.AbstractIndex.resize(AbstractIndex.scala:174)
at kafka.log.AbstractIndex.$anonfun$trimToValidSize$1(AbstractIndex.scala:240)
at scala.runtime.java8.JFunction0$mcZ$sp.apply(JFunction0$mcZ$sp.java:23)
at kafka.utils.CoreUtils$.inLock(CoreUtils.scala:253)
at kafka.log.AbstractIndex.trimToValidSize(AbstractIndex.scala:240)
INFO [ReplicaManager broker=0] Stopping serving replicas in dir
C:\Users\user11\Softwareskafka_2.12-2.3.0kafka_logs (kafka.server.ReplicaManager)
ERROR [ReplicaManager broker=0] Error while making broker the leader for partition Topic: kafka_example; Partition: 0; Leader: None; AllReplicas: ; InSyncReplicas: in dir None (kafka.server.ReplicaManager)
org.apache.kafka.common.errors.KafkaStorageException: Error while creating log for kafka_example-0 in dir C:\Users\user11\Softwareskafka_2.12-2.3.0kafka_logs
Caused by: java.io.IOException: The requested operation cannot be performed on a file with a user-mapped section open
at java.io.RandomAccessFile.setLength(Native Method)
at kafka.log.AbstractIndex.$anonfun$resize$1(AbstractIndex.scala:188)
at scala.runtime.java8.JFunction0$mcZ$sp.apply(JFunction0$mcZ$sp.java:23)
at kafka.utils.CoreUtils$.inLock(CoreUtils.scala:253)
at kafka.log.AbstractIndex.resize(AbstractIndex.scala:174)
at kafka.log.AbstractIndex.$anonfun$trimToValidSize$1(AbstractIndex.scala:240)
at scala.runtime.java8.JFunction0$mcZ$sp.apply(JFunction0$mcZ$sp.java:23)
at kafka.utils.CoreUtils$.inLock(CoreUtils.scala:253)
at kafka.log.AbstractIndex.trimToValidSize(AbstractIndex.scala:240)
at kafka.log.LogSegment.recover(LogSegment.scala:397)
at kafka.log.Log.recoverSegment(Log.scala:493)
at kafka.log.Log.recoverLog(Log.scala:608)
at kafka.log.Log.$anonfun$loadSegments$3(Log.scala:568)
INFO Replica loaded for partition --from-beginning-0 with initial high watermark 0 (kafka.cluster.Replica)
INFO [Partition --from-beginning-0 broker=0] --from-beginning-0 starts at Leader Epoch 0 from offset 0. Previous Leader Epoch was: -1 (kafka.cluster.Partition)
INFO [GroupMetadataManager brokerId=0] Scheduling loading of offsets and group metadata from __consumer_offsets-0 (kafka.coordinator.group.GroupMetadataManager)
INFO [ReplicaFetcherManager on broker 0] Removed fetcher for partitions Set(__consumer_offsets-0, --from-beginning-0, Kafka_Example-0) (kafka.server.ReplicaFetcherManager)
INFO [ReplicaAlterLogDirsManager on broker 0] Removed fetcher for partitions Set(__consumer_offsets-0, --from-beginning-0, Kafka_Example-0) (kafka.server.ReplicaAlterLogDirsManager)
INFO [ReplicaManager broker=0] Broker 0 stopped fetcher for partitions __consumer_offsets-0,--from-beginning-0,Kafka_Example-0 and stopped moving logs for partitions because they are in the failed log directory C:\Users\user11\Softwareskafka_2.12-2.3.0kafka_logs. (kafka.server.ReplicaManager)
INFO Stopping serving logs in dir C:\Users\user11\Softwareskafka_2.12-2.3.0kafka_logs (kafka.log.LogManager)
ERROR Shutdown broker because all log dirs in C:\Users\user11\Softwareskafka_2.12-2.3.0kafka_logs have failed (kafka.log.LogManager)
on my local using java 8 version and above subject mentioned is kafka version.

Missing messages on Kafka's compacted topic

I have a topic that is compacted:
/opt/kafka/bin/kafka-topics.sh --zookeeper localhost --describe --topic myTopic
Topic:myTopic PartitionCount:1 ReplicationFactor:1 Configs:cleanup.policy=compact
There are no messages on it:
/opt/kafka/bin/kafka-console-consumer.sh --bootstrap-server localhost:9092 --topic myTopic --from-beginning --property print-key=true
^CProcessed a total of 0 messages
Both the earliest and latest offset on the only partition that's there is 12, though.
/opt/kafka/bin/kafka-run-class.sh kafka.tools.GetOffsetShell --broker-list localhost:9092 --topic myTopic --time -2
myTopic:0:12
/opt/kafka/bin/kafka-run-class.sh kafka.tools.GetOffsetShell --broker-list localhost:9092 --topic myTopic --time -1
myTopic:0:12
I wonder what could have happened with these 12 messages? The number is correct, I was expecting them to be there, but for some reason they're gone.
As far as I understand, even if these 12 messages had the same key, I should have seen at least one - that's how the compaction works.
The topic in question was created as compacted. The only weird thing that might have happened during that time is that the Kafka instance lost its Zookeeper data completely. Is it possible that it also caused the data loss?
To rephrase the last question: can something bad happen with the physical data on Kafka if I remove all the Kafka-related ZNodes on Zookeeper?
In addition, here are some logs from Kafka startup.
[2019-04-30 12:02:16,510] WARN [Log partition=myTopic-0, dir=/var/lib/kafka] Found a corrupted index file corresponding to log file /var/lib/kafka/myTopic-0/00000000000000000000.log due to Corrupt index found, index file (/var/lib/kafka/myTopic-0/00000000000000000000.index) has non-zero size but the last offset is 0 which is no greater than the base offset 0.}, recovering segment and rebuilding index files... (kafka.log.Log)
[2019-04-30 12:02:16,524] INFO [Log partition=myTopic-0, dir=/var/lib/kafka] Completed load of log with 1 segments, log start offset 0 and log end offset 12 in 16 ms (kafka.log.Log)
[2019-04-30 12:35:34,530] INFO Got user-level KeeperException when processing sessionid:0x16a6e1ea2000001 type:setData cxid:0x1406 zxid:0xd11 txntype:-1 reqpath:n/a Error Path:/config/topics/myTopic Error:KeeperErrorCode = NoNode for /config/topics/myTopic (org.apache.zookeeper.server.PrepRequestProcessor)
[2019-04-30 12:35:34,535] INFO Topic creation Map(myTopic-0 -> ArrayBuffer(0)) (kafka.zk.AdminZkClient)
[2019-04-30 12:35:34,547] INFO [ReplicaFetcherManager on broker 0] Removed fetcher for partitions myTopic-0
(kafka.server.ReplicaFetcherManager)
[2019-04-30 12:35:34,580] INFO [Partition myTopic-0 broker=0] No checkpointed highwatermark is found for partition myTopic-0 (kafka.cluster.Partition)
[2019-04-30 12:35:34,580] INFO Replica loaded for partition myTopic-0 with initial high watermark 0 (kafka.cluster.Replica)
[2019-04-30 12:35:34,580] INFO [Partition myTopic-0 broker=0] myTopic-0 starts at Leader Epoch 0 from offset 12. Previous Leader Epoch was: -1 (kafka.cluster.Partition)
And the messages were indeed removed:
[2019-04-30 12:39:24,199] INFO [Log partition=myTopic-0, dir=/var/lib/kafka] Found deletable segments with base offsets [0] due to retention time 10800000ms breach (kafka.log.Log)
[2019-04-30 12:39:24,201] INFO [Log partition=myTopic-0, dir=/var/lib/kafka] Rolled new log segment at offset 12 in 2 ms. (kafka.log.Log)
NoNode for /config/topics/myTopic
Kafka no longer knows this topic exists and that it should be compacted, which seems to be evident by the log cleaner logs
due to retention time 10800000ms breach
So yes, Zookeeper is very important. But so is gracefully shutting down a broker with kafka-server-stop, otherwise forcibly killing the process or the host machine would end up with corrupted partition segments
I'm not entirely sure what conditions would lead to this
the last offset is 0 which is no greater than the base offset 0
But assuming that you had a full cluster and that the topic had a replication factor higher than 1, then you could hope that at least one replica were healthy.
The way to recover a broker with a corrupted index/partition would be stop kafka process, delete the corrupted partition folder from disk, restart kafka on that machine, and then let it replicate back from a healthy instance