I had my Kafka Connectors paused and upon restarting them got these errors in my logs
[2020-02-19 19:36:00,219] ERROR WorkerSourceTask{id=wem-postgres-source-0} Failed to commit offsets (org.apache.kafka.connect.runtime.SourceTaskOffsetCommitter)
************
************
[2020-02-19 19:36:00,216] ERROR WorkerSourceTask{id=wem-postgres-source-0} Failed to flush, timed out while waiting for producer to flush outstanding 2389 messages (org.apache.kafka.connect.runtime.WorkerSourceTask)
I got this error multiple times with the number of outstanding messages changed. Then it stopped and haven't seen it again.
Do I need to take any action here or has Connect retried and committed the offsets and that is why the error has stopped?
Thanks
The error indicates that there are a lot of messages buffered and cannot be flushed before the timeout is reached. To address this issue you can
either increase offset.flush.timeout.ms configuration parameter in your Kafka Connect Worker Configs
or you can reduce the amount of data being buffered by decreasing producer.buffer.memory in your Kafka Connect Worker Configs. This turns to be the best option when you have fairly large messages.
Related
I have an Apache Nifi workflow that streams data into Kafka. My Kafka cluster is made of 5 nodes that uses SSL for encryption.
When there is a lot of data that is going throw, my Kafka producer (PublishKafkaRecord) freeze and stop working. I have to restart the processor and I am getting Threads errors.
I am using Kafka Confluent 5.3.1.
I am seeing these errors in the Kafka logs:
ERROR Uncaught exception in scheduled task 'transactionalID-expiration' (Kafka.utils.Kafkascheduler)
Retrying leaderEpoch request for partitions XXX-0 as the leader reported an error: UNKNOWN_LEADER_EPOCH (kafka.server.ReplicaFetcherThread)
Could not find offset index file corresponding to log file XXX/*.log recovering segment and rebuilding index files (kafka.log.Log)
ERROR when handing request: .... __transaction_state
ERROR TransactionMetadata (... ) failed: this should not happen (kafka.coordinator.transaction.TransactionMetadata)
I cannot pin point to the actual error.
How can I fix Threads being stuck in Kafka?
The following warning message observed in Kafka producer logs. I have a callback to notify failures. But didn't see failure logs. Is this WARN can cause a message failures to topic?
2020-02-17 10:23:47|[kafka-producer-network-thread | producer-1]|WARN |[o.a.k.c.producer.internals.Sender.warn(251)]|[Producer clientId=producer-1] Got error produce response with correlation id 2031 on topic-partition queue-1, retrying (0 attempts left). Error: NETWORK_EXCEPTION
retrying (0 attempts left).
Is saying that the producer is going to retry one final time to send the message
If that fails, I suspect you're going to see another log message, at which point you're losing events due to your network exceptions
I am using Kafka 2, and for some reason the only NiFi processors that will correctly publish my messages to Kafka are PublishKafka (0_9) and PublishKafka_0_10. The later versions don't push my messages through, which is odd because again, I'm running Kafka 2.1.1.
For more information, when I try to run my FlowFile through the later PublishKafka processors, I get a timeout exception that repeats voluminously.
2019-03-11 16:05:34,200 ERROR [Timer-Driven Process Thread-7] o.a.n.p.kafka.pubsub.PublishKafka_2_0 PublishKafka_2_0[id=6d7f1896-0169-
1000-ca27-cf7f86f22694] PublishKafka_2_0[id=6d7f1896-0169-1000-ca27-
cf7f86f22694] failed to process session due to
org.apache.kafka.common.errors.TimeoutException: Timeout expired while
initializing transactional state in 5000ms.; Processor Administratively
Yielded for 1 sec: org.apache.kafka.common.errors.TimeoutException:
Timeout expired while initializing transactional state in 5000ms.
org.apache.kafka.common.errors.TimeoutException: Timeout expired while
initializing transactional state in 5000ms.
2019-03-11 16:05:34,201 WARN [Timer-Driven Process Thread-7]
o.a.n.controller.tasks.ConnectableTask Administratively Yielding
PublishKafka_2_0[id=6d7f1896-0169-1000-ca27-cf7f86f22694] due to uncaught
Exception: org.apache.kafka.common.errors.TimeoutException: Timeout
expired while initializing transactional state in 5000ms.
My processor settings are the following:
All other configurations are defaults.
Any ideas on why this is happening?
I was running Kafka with 2 borker for cluster.
But I keep getting the WARN log.
I checked all my systems and there was no host using IP 10.8.7.1.
By the way, there was more IPs looks like from zookeeper or broker ?
If I shotdown on of Kafka, the WARNING log will be less
I am not familiar with Kafka and zookeeper, just getting starting and study
Any ideas?
Kafka version: 1.0.1
WARN log similar as below(get this kind of log about 10 secs),
[2018-04-19 09:13:08,342] WARN [SocketServer brokerId=0] Unexpected error from /10.8.7.1; closing connection (org.apache.kafka.common.network.Selector)
org.apache.kafka.common.network.InvalidReceiveException: Invalid receive (size = 369295616 larger than 104857600)
at org.apache.kafka.common.network.NetworkReceive.readFromReadableChannel(NetworkReceive.java:132)
at org.apache.kafka.common.network.NetworkReceive.readFrom(NetworkReceive.java:93)
at org.apache.kafka.common.network.KafkaChannel.receive(KafkaChannel.java:235)
at org.apache.kafka.common.network.KafkaChannel.read(KafkaChannel.java:196)
at org.apache.kafka.common.network.Selector.attemptRead(Selector.java:545)
at org.apache.kafka.common.network.Selector.pollSelectionKeys(Selector.java:483)
at org.apache.kafka.common.network.Selector.poll(Selector.java:412)
at kafka.network.Processor.poll(SocketServer.scala:551)
at kafka.network.Processor.run(SocketServer.scala:468)
at java.lang.Thread.run(Thread.java:748)
One possible cause is that a Kafka producer on 10.8.7.1 is attempting to stream 0.369 GB of data in a batch instead of streaming. You may have to trace down the kafka producer and see whats going.
Hope this helps.
Today I met one issue,the broker 2 is not existent in zookeeper, I thought the borker 2 is down, but it's still running well. I checked the zookeeper log,
it only mentioned " Established session 0x153514345be0321 with negotiated timeout 6000 for client broker 2",
from the removed broker server and state-change log,
[2016-03-08 06:00:00,257] ERROR Controller 2 epoch 19 initiated state change for partition [robotEvents,186] from OfflinePartition to OnlinePartition failed (state.change.logger)
kafka.common.StateChangeFailedException: encountered error while electing leader for partition [***,186] due to: aborted leader election for partition [****,186] since the LeaderAndIsr path was already written by another controller. This probably means that the current controller 2 went through a soft failure and another controller was elected with epoch 20..
a lot of these kind of errors occur, from the source code, this error seems normal process during leader election, but it can't explain why the broker 2 was removed from zk. Any idea?
Thanks in advance