Wrong Kafka consumer_offset - apache-kafka

I am currently using the confluent platform community license. I started Zookeeper, Kafka and the schema-registry - all are used in local mode. However when starting the schema-registry for the first time, 50 messages are sent and stored inside the __consumer_offset topic (__consumer_offsets-0 to __consumer_offsets-49). Those messages are stored in the kafka-logs and when I am trying to start the services again, it fails. To be more precise: Zookeeper works but Kafka fails with the error:
"ERROR Shutdown broker because all log dirs have failed".
As suggested in some other posts I deleted the log.dirs directory referenced in the zookeeper.properties file and the log.dirs directory referenced in the server.properties file. After doing this I can start kafka again without any error - but the 50 messages are stored in __consumer_offset again when starting the schema-registry and after stopping kafka and trying to start kafka again it fails with the same error.
Any help is greatly appreciated. :)
EDIT:
Above that error theres another error saying:
"ERROR Failed to clean up log for _schemas-0 in dir /mnt/c/Users/Username/Desktop/Big_Data/confluent-6.0.0/kafka-logs due to IOException (kafka.server.LogDirFailureChannel) java.io.IOException: Invalid argument"
and also two warnings:
"WARN [ReplicaManager broker=0] Stopping serving replicas in dir /mnt/c/Users/Username/Desktop/Big_Data/confluent-6.0.0/kafka-logs (kafka.server.ReplicaManager)"
and
"WARN [ReplicaManager broker=0] Broker 0 stopped fetcher for partitions __consumer_offsets-22, ... (all of the 50 offsets are then listed)"

Related

Kafka: Producer threads get stuck

I have an Apache Nifi workflow that streams data into Kafka. My Kafka cluster is made of 5 nodes that uses SSL for encryption.
When there is a lot of data that is going throw, my Kafka producer (PublishKafkaRecord) freeze and stop working. I have to restart the processor and I am getting Threads errors.
I am using Kafka Confluent 5.3.1.
I am seeing these errors in the Kafka logs:
ERROR Uncaught exception in scheduled task 'transactionalID-expiration' (Kafka.utils.Kafkascheduler)
Retrying leaderEpoch request for partitions XXX-0 as the leader reported an error: UNKNOWN_LEADER_EPOCH (kafka.server.ReplicaFetcherThread)
Could not find offset index file corresponding to log file XXX/*.log recovering segment and rebuilding index files (kafka.log.Log)
ERROR when handing request: .... __transaction_state
ERROR TransactionMetadata (... ) failed: this should not happen (kafka.coordinator.transaction.TransactionMetadata)
I cannot pin point to the actual error.
How can I fix Threads being stuck in Kafka?

Kafka Server crashing when confluent KSQL is used in a Windows 10 Linux Subsystem

When i am using confluent ksql to create a stream and after resetting the offeset to earliest and trying to query the stream i am seeing the Kafka server crashing. I am using windows 10 and i have tried both Ubuntu and Debian as WSL.
I have tried to clear the log folder sudo rm -fr /tmp/confl* and tried to restart the confluent platform using confluent local start but kafka is not getting started.
Below is the error i am seeing in the confluent local log kafka
INFO [Transaction State Manager 0]: Loading transaction metadata from __transaction_state-8 at epoch 0 (kafka.coordinator.transaction.TransactionStateManager)
[2020-06-26 11:27:26,208] **ERROR Error while renaming dir for _confluent-ksql-default_transient_1143297338875599674_1593157827320-Aggregate-Aggregate-Materialize-changelog-0 in log dir /tmp/confluent.HBnj6u7x/kafka/data (kafka.server.LogDirFailureChannel)
java.nio.file.AccessDeniedException: /tmp/confluent.HBnj6u7x/kafka/data/_confluent-ksql-default_transient_1143297338875599674_1593157827320-Aggregate-Aggregate-Materialize-changelog-0 -> /tmp/confluent.HBnj6u7x/kafka/data/_confluent-ksql-default_transient_1143297338875599674_1593157827320-Aggregate-Aggregate-Materialize-changelog-0.355fe6c61afa41609e74e252e3dbac92-delete**
[2020-06-26 11:27:26,287] WARN Stopping serving logs in dir /tmp/confluent.HBnj6u7x/kafka/data (kafka.log.LogManager)
[2020-06-26 11:27:26,292] **ERROR Shutdown broker because all log dirs in /tmp/confluent.HBnj6u7x/kafka/data have failed (kafka.log.LogManager)**
[2020-06-26 11:27:26,294] INFO [Transaction State Manager 0]: Completed loading transaction metadata from __transaction_state-44 for coordinator epoch 0 (kafka.coordinator.transaction.TransactionStateManager)
[2020-06-26 11:27:26,295] INFO [Transaction State Manager 0]: Loading transaction metadata from __transaction_state-27 at epoch 0 (kafka.coordinator.transaction.TransactionStateManager)
The java.nio.file.AccessDeniedException error message tells me that something is off regarding KSQL being able to read/write files in the filesystem provided by WSL. This might be related to a common problem of WSL not providing the proper Linux permissions on the filesystem. A possible fix is describe on this blog.
Regardless, I would encourage you to try using ksqlDB (the community version of Confluent KSQL) that provides ready-to-be-used scripts based on Docker so you can abstract away these filesystem problems and jump straight to the coding part.
ksqlDB Quickstart

Shutdown broker because all log dirs have failed

[2019-10-29 10:09:36,903] INFO [ReplicaManager broker=0] Broker 0 stopped fetcher for partitions __consumer_offsets-30,__consumer_offsets-8,__consumer_offsets-4,__consumer_offsets-27,__consumer_offsets-7,__consumer_offsets-46,__consumer_offsets-33,__consumer_offsets-23,__consumer_offsets-49,__consumer_offsets-36,__consumer_offsets-42,topic-0,__consumer_offsets-17,__consumer_offsets-48,__consumer_offsets-11,__consumer_offsets-14,__consumer_offsets-20,__consumer_offsets-0,__consumer_offsets-39,__consumer_offsets-45,__consumer_offsets-1,__consumer_offsets-26,__consumer_offsets-29,__consumer_offsets-10 and stopped moving logs for partitions because they are in the failed log directory C:\tmp\kafka-logs. (kafka.server.ReplicaManager)
[2019-10-29 10:09:36,908] INFO Stopping serving logs in dir C:\tmp\kafka-logs (kafka.log.LogManager)
[2019-10-29 10:09:36,952] ERROR Shutdown broker because all log dirs in C:\tmp\kafka-logs have failed (kafka.log.LogManager)
i have started zookeeper,Kafka and producer also. But when i tried to consume data immediately this error is coming in Windows
command: .\bin\windows\Kafka-console-consumer.bat --bootstrap-server localhost:9092 --topic topic
I had the similar issue and had to do trial and error. But what I eventually did was to disable the other JRE versions and leave only one enabled. See image attached. This seems to have resolved my problem since my broker doesn't crash anymore.

Apache kafka lot of WARN log show Unexpected error

I was running Kafka with 2 borker for cluster.
But I keep getting the WARN log.
I checked all my systems and there was no host using IP 10.8.7.1.
By the way, there was more IPs looks like from zookeeper or broker ?
If I shotdown on of Kafka, the WARNING log will be less
I am not familiar with Kafka and zookeeper, just getting starting and study
Any ideas?
Kafka version: 1.0.1
WARN log similar as below(get this kind of log about 10 secs),
[2018-04-19 09:13:08,342] WARN [SocketServer brokerId=0] Unexpected error from /10.8.7.1; closing connection (org.apache.kafka.common.network.Selector)
org.apache.kafka.common.network.InvalidReceiveException: Invalid receive (size = 369295616 larger than 104857600)
at org.apache.kafka.common.network.NetworkReceive.readFromReadableChannel(NetworkReceive.java:132)
at org.apache.kafka.common.network.NetworkReceive.readFrom(NetworkReceive.java:93)
at org.apache.kafka.common.network.KafkaChannel.receive(KafkaChannel.java:235)
at org.apache.kafka.common.network.KafkaChannel.read(KafkaChannel.java:196)
at org.apache.kafka.common.network.Selector.attemptRead(Selector.java:545)
at org.apache.kafka.common.network.Selector.pollSelectionKeys(Selector.java:483)
at org.apache.kafka.common.network.Selector.poll(Selector.java:412)
at kafka.network.Processor.poll(SocketServer.scala:551)
at kafka.network.Processor.run(SocketServer.scala:468)
at java.lang.Thread.run(Thread.java:748)
One possible cause is that a Kafka producer on 10.8.7.1 is attempting to stream 0.369 GB of data in a batch instead of streaming. You may have to trace down the kafka producer and see whats going.
Hope this helps.

When does kafka change leader?

I was running my services that work with kafka already for a year and no spontaneous changes of leader happens.
But for the last 2 weeks that started happens quite often.
Kafka log on that:
[2015-09-27 15:35:14,826] INFO [ReplicaFetcherManager on broker 2]
Removed fetcher for partitions [myTopic] (kafka.server.ReplicaFetcherManager)
[2015-09-27 15:35:14,830] INFO Truncating log myTopic-0 to offset 11520979. (kafka.log.Log)
[2015-09-27 15:35:14,845] WARN [Replica Manager on Broker 2]: Fetch request with correlation id 713276 from client ReplicaFetcherThread-0-2 on partition [myTopic,0] failed due to Leader not local for partition [myTopic,0] on broker 2 (kafka.server.ReplicaManager)
[2015-09-27 15:35:14,857] WARN [Replica Manager on Broker 2]: Fetch request with correlation id 256685 from client mirrormaker-1 on partition [myTopic,0] failed due to Leader not local for partition [myTopic,0] on broker 2 (kafka.server.ReplicaManager)
[2015-09-27 15:35:20,171] INFO [ReplicaFetcherManager on broker 2] Removed fetcher for partitions [myTopic,0] (kafka.server.ReplicaFetcherManager)
What can cause switching leader? If there is info in some kafka documentation - please - just point the link. I've failed to find.
System configuration
kafka version: kafka_2.10-0.8.2.1
os: Red Hat Enterprise Linux Server release 6.5 (Santiago)
server.properties (differs from default):
broker.id=001
socket.send.buffer.bytes=1048576
socket.receive.buffer.bytes=1048576
socket.request.max.bytes=104857600
log.flush.interval.messages=10000
log.flush.interval.ms=1000
log.retention.bytes=-1
controlled.shutdown.enable=true
auto.create.topics.enable=false
It appears like lead broker is down for that partition. It might be that data directroy(log.dirs) configured in server.properties is out of space and broker is not able to accommodate.
Also, what is replication factor of topic and cluster size of brokers?
I am assuming you have one topic and one partition with a replication factor of 2. Which is not a good configuration for optimal Kafka performance and consumers.
Your Logs are not clear enough for leader switch. Major issue in your topic may be having the only one leader due to the only partition. Now the single file in your logs is getting bigger in size day by day. Kafka internally does rebalancing at some level(details are not confirmed). That can be the reason for your leader switch. But i am not sure.
Also in your 2nd log line its says some of the logs are truncated. Can you please go though the logs in details and check is this happening only after truncation?
As you already mentioned you already checked your Kafka log directory files and their size. Please run the describe when you got this issue. The leader switch will reflect here as well. Or if you can setup some dashboard that will display the leader for past time. Then it will be easy for you to find the root cause.
bin/kafka-topics.sh --describe --zookeeper Zookeeperhost:Port --topic TopicName
Suggestion: i will suggest you to create a new topic with more partitions(read Kafka documentation to get a good idea about optimum number of partitions) and start writing to it. Or you can check, how to change partitions for current topic.
Last Thing: Is leader switch causing some issues in your Clients or you are worried only about warnings?