We are using Apache flink kafka consumer to consume the payload . We are facing the delay in processing intermittently. We have added the logs in our business logic and everything looks good. But keep on getting the below error.
[kafka-producer-network-thread | producer-44] WARN org.apache.kafka.clients.producer.internals.Sender - [Producer clientId=producer-44] Got error produce response with correlation id 82 on topic-partition topicname-ingress-0, retrying (2147483646 attempts left). Error: NETWORK_EXCEPTION. Error Message: Disconnected from node 0
[kafka-producer-network-thread | producer-44] WARN org.apache.kafka.clients.producer.internals.Sender - [Producer clientId=producer-44] Received invalid metadata error in produce request on partition topicnamae-ingress-0 due to org.apache.kafka.common.errors.NetworkException: Disconnected from node 0. Going to request metadata update now
I have an Apache Nifi workflow that streams data into Kafka. My Kafka cluster is made of 5 nodes that uses SSL for encryption.
When there is a lot of data that is going throw, my Kafka producer (PublishKafkaRecord) freeze and stop working. I have to restart the processor and I am getting Threads errors.
I am using Kafka Confluent 5.3.1.
I am seeing these errors in the Kafka logs:
ERROR Uncaught exception in scheduled task 'transactionalID-expiration' (Kafka.utils.Kafkascheduler)
Retrying leaderEpoch request for partitions XXX-0 as the leader reported an error: UNKNOWN_LEADER_EPOCH (kafka.server.ReplicaFetcherThread)
Could not find offset index file corresponding to log file XXX/*.log recovering segment and rebuilding index files (kafka.log.Log)
ERROR when handing request: .... __transaction_state
ERROR TransactionMetadata (... ) failed: this should not happen (kafka.coordinator.transaction.TransactionMetadata)
I cannot pin point to the actual error.
How can I fix Threads being stuck in Kafka?
We use spring kafka stream producer to produce data to kafka topic. when we did resiliency test, we got the the below error.
`2020-08-28 16:18:35.536 WARN [,,,] 26 --- [ad | producer-3] o.a.k.clients.producer.internals.Sender : [Producer clientId=producer-3] Received invalid metadata error in produce request on partition topic1-0 due to org.apache.kafka.common.errors.NotLeaderForPartitionException: This server is not the leader for that topic-partition.. Going to request metadata update now
log: 2020-08-28 16:18:35.536 WARN [,,,] 26 --- [ad | producer-3] o.a.k.clients.producer.internals.Sender : [Producer clientId=producer-3] Got error produce response with correlation id 80187 on topic-partition topic1-0, retrying (4 attempts left). Error: NOT_LEADER_FOR_PARTITION
[Producer clientId=producer-3] Received invalid metadata error in produce request on partition topic1-0 due to org.apache.kafka.common.errors.NotLeaderForPartitionException: This server is not the leader for that topic-partition.. Going to request metadata update now.
The warn should be coming only for the period of time we are running resiliency(broker down/up testing) but these warning happening even after the resiliency test period and happening only for the particular partition(here topic1-0). all the other partitions are working fine.`
this is the producer config we have:
spring.cloud.stream.kafka.binder.requiredAcks=all spring.cloud.stream.kafka.binder.configuration.retries=5 spring.cloud.stream.kafka.binder.configuration.metadata.max.age.ms=3000 spring.cloud.stream.kafka.binder.configuration.max.in.flight.requests.per.connection=1 spring.cloud.stream.kafka.binder.configuration.retry.backoff.ms=10000
we have retry config too and it is retrying to get the proper metadata which you can see the above log but keep getting the same warning for that particular partition. Our kafka team also analyzing this issue. I checked google for any solution but nothing i could find to be useful.
is there any config or anything else missing?
please help me.
Thanks in advance.
This error comes when Kafka is down. Restarting Kafka worked for me! :)
I am using spring boot 2.1.9 and spring Kafka 2.2.9 with Kafka chained transactions.
I am getting some warning from Kafka producer every time. due to this some time functionality will not work.
I want to know why these errors are coming? is there in config problem?
2020-05-04 09:12:35.216 WARN [xxxxx-order-service,,,] 10 --- [ad | producer-8] o.a.k.clients.producer.internals.Sender : [Producer clientId=producer-8, transactionalId=xxxxx-Order-Service-JOg4T1vFzW4tuc-2] Got error produce response with correlation id 1946 on topic-partition process_event-0, retrying (2147483646 attempts left). Error: UNKNOWN_PRODUCER_ID
2020-05-04 09:12:35.327 WARN [xxxxx-order-service,,,] 10 --- [ad | producer-8] o.a.k.clients.producer.internals.Sender : [Producer clientId=producer-8, transactionalId=xxxxx-Order-Service-JOg4T1vFzW4tuc-2] Got error produce response with correlation id 1950 on topic-partition audit-0, retrying (2147483646 attempts left). Error: UNKNOWN_PRODUCER_ID
2020-05-04 09:12:53.512 WARN [xxxxx-order-service,,,] 10 --- [ad | producer-6] o.a.k.clients.producer.internals.Sender : [Producer clientId=producer-6, transactionalId=xxxxx-Order-Service-JOg4T1vFzW4tuc-0] Got error produce response with correlation id 5807 on topic-partition process_submitted_page_count-2, retrying (2147483646 attempts left). Error: UNKNOWN_PRODUCER_ID
2020-05-04 09:12:53.632 WARN [xxxxx-order-service,,,] 10 --- [ad | producer-6] o.a.k.clients.producer.internals.Sender : [Producer clientId=producer-6, transactionalId=xxxxx-Order-Service-JOg4T1vFzW4tuc-0] Got error produce response with correlation id 5811 on topic-partition process_event-0, retrying (2147483646 attempts left). Error: UNKNOWN_PRODUCER_ID
2020-05-04 09:12:53.752 WARN [xxxxx-order-service,,,] 10 --- [ad | producer-6] o.a.k.clients.producer.internals.Sender : [Producer clientId=producer-6, transactionalId=xxxxx-Order-Service-JOg4T1vFzW4tuc-0] Got error produce response with correlation id 5816 on topic-partition audit-0, retrying (2147483646 attempts left). Error: UNKNOWN_PRODUCER_ID
I assume that you might be hitting this issue.
When a streams application has little traffic, then it is possible
that consumer purging would delete even the last message sent by a
producer (i.e., all the messages sent by this producer have been
consumed and committed), and as a result, the broker would delete that
producer's ID. The next time when this producer tries to send, it will
get this UNKNOWN_PRODUCER_ID error code, but in this case, this error
is retriable: the producer would just get a new producer id and
retries, and then this time it will succeed.
Proposed Solution: Upgrade Kafka
Now this issue has been solved for versions 2.4.0+ so if you are still hitting this you need to upgrade to a newer Kafka version.
Alternative Solution: Increase retention time & transactional.id.expiration.ms
Alternatively, if you cannot (or don't want to) upgrade then you can increase the retention period (log.retention.hours) as well as transactional.id.expiration.ms that defines the amount of inactivity time that needs to pass in order for a producer to be considered as expired (defaults to 7 days).
I had my Kafka Connectors paused and upon restarting them got these errors in my logs
[2020-02-19 19:36:00,219] ERROR WorkerSourceTask{id=wem-postgres-source-0} Failed to commit offsets (org.apache.kafka.connect.runtime.SourceTaskOffsetCommitter)
************
************
[2020-02-19 19:36:00,216] ERROR WorkerSourceTask{id=wem-postgres-source-0} Failed to flush, timed out while waiting for producer to flush outstanding 2389 messages (org.apache.kafka.connect.runtime.WorkerSourceTask)
I got this error multiple times with the number of outstanding messages changed. Then it stopped and haven't seen it again.
Do I need to take any action here or has Connect retried and committed the offsets and that is why the error has stopped?
Thanks
The error indicates that there are a lot of messages buffered and cannot be flushed before the timeout is reached. To address this issue you can
either increase offset.flush.timeout.ms configuration parameter in your Kafka Connect Worker Configs
or you can reduce the amount of data being buffered by decreasing producer.buffer.memory in your Kafka Connect Worker Configs. This turns to be the best option when you have fairly large messages.