flink checkpoint and kafka producer exactly-once - apache-kafka

When I create kafka producer with exactly once, if I also use checkpoint, it will lead to such problem:
java.lang.InterruptedException
at java.lang.Object.wait(Native Method)
at java.lang.Thread.join(Thread.java:1260)
at org.apache.kafka.clients.producer.KafkaProducer.close(KafkaProducer.java:1155)
at org.apache.kafka.clients.producer.KafkaProducer.close(KafkaProducer.java:1132)
at org.apache.kafka.clients.producer.KafkaProducer.close(KafkaProducer.java:1111)
at org.apache.flink.streaming.connectors.kafka.internal.FlinkKafkaInternalProducer.close(FlinkKafkaInternalProducer.java:150)
at org.apache.flink.streaming.connectors.kafka.FlinkKafkaProducer.abortTransactions(FlinkKafkaProducer.java:1093)
at org.apache.flink.streaming.connectors.kafka.FlinkKafkaProducer.initializeState(FlinkKafkaProducer.java:1031)
at org.apache.flink.streaming.util.functions.StreamingFunctionUtils.tryRestoreFunction(StreamingFunctionUtils.java:178)
at org.apache.flink.streaming.util.functions.StreamingFunctionUtils.restoreFunctionState(StreamingFunctionUtils.java:160)
at org.apache.flink.streaming.api.operators.AbstractUdfStreamOperator.initializeState(AbstractUdfStreamOperator.java:96)
at org.apache.flink.streaming.api.operators.AbstractStreamOperator.initializeState(AbstractStreamOperator.java:281)
at org.apache.flink.streaming.runtime.tasks.StreamTask.initializeState(StreamTask.java:881)
at org.apache.flink.streaming.runtime.tasks.StreamTask.invoke(StreamTask.java:395)
at org.apache.flink.runtime.taskmanager.Task.doRun(Task.java:705)
at org.apache.flink.runtime.taskmanager.Task.run(Task.java:530)
at java.lang.Thread.run(Thread.java:748)
How can I solve it?

Related

MongoDB Kafka Connect can't send large kafka messages

I am trying to send json large data (more than 1 Mo ) from MongoDB with kafka connector, it's worked well for small data , but I got the following error when working with big json data :
[2022-09-27 11:13:48,290] ERROR [source_mongodb_connector|task-0] WorkerSourceTask{id=source_mongodb_connector-0} Task threw an uncaught and unrecoverable exception. Task is being killed and will not recover until manually restarted (org.apache.kafka.connect.runtime.WorkerTask:195)
org.apache.kafka.connect.errors.ConnectException: Unrecoverable exception from producer send callback
at org.apache.kafka.connect.runtime.WorkerSourceTask.maybeThrowProducerSendException(WorkerSourceTask.java:290)
at org.apache.kafka.connect.runtime.WorkerSourceTask.sendRecords(WorkerSourceTask.java:351)
at org.apache.kafka.connect.runtime.WorkerSourceTask.execute(WorkerSourceTask.java:257)
at org.apache.kafka.connect.runtime.WorkerTask.doRun(WorkerTask.java:188)
at org.apache.kafka.connect.runtime.WorkerTask.run(WorkerTask.java:243)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:750)
Caused by: org.apache.kafka.common.errors.RecordTooLargeException: The message is 2046979 bytes when serialized which is larger than 1048576, which is the value of the max.request.size configuration.
I tried to configure Topic , here is the description
*hadoop#vps-data1 ~/kafka $ bin/kafka-configs.sh --bootstrap-server 192.168.13.80:9092,192.168.13.81:9092,192.168.13.82:9092 --entity-type topics --entity-name prefix.large.topicData --describe
Dynamic configs for topic prefix.large.topicData are:
max.message.bytes=1280000 sensitive=false synonyms={DYNAMIC_TOPIC_CONFIG:max.message.bytes=1280000, STATIC_BROKER_CONFIG:message.max.bytes=419430400, DEFAULT_CONFIG:message.max.bytes=1048588}
Indeed I configured producer, consumer and server properties file but the same problem still stacking ....
Any help will be appreciated
The solution is to configure kafka and kafka connect properties

Kafka Structured streaming application throwing IllegalStateException when there is a gap in the offset

I have a Structured Streaming application running with Kafka on spark 2.3,
The "spark-sql-kafka-0-10_2.11" version is 2.3.0
The application starts to read messages and process them successfully, then after reaching a specific offset (as shown in the exception message), it throws the following exception:
java.lang.IllegalStateException: Tried to fetch 666 but the returned record offset was 665
at org.apache.spark.sql.kafka010.InternalKafkaConsumer.org$apache$spark$sql$kafka010$InternalKafkaConsumer$$fetchData(KafkaDataConsumer.scala:297)
at org.apache.spark.sql.kafka010.InternalKafkaConsumer$$anonfun$get$1.apply(KafkaDataConsumer.scala:163)
at org.apache.spark.sql.kafka010.InternalKafkaConsumer$$anonfun$get$1.apply(KafkaDataConsumer.scala:147)
at org.apache.spark.util.UninterruptibleThread.runUninterruptibly(UninterruptibleThread.scala:77)
at org.apache.spark.sql.kafka010.InternalKafkaConsumer.runUninterruptiblyIfPossible(KafkaDataConsumer.scala:109)
at org.apache.spark.sql.kafka010.InternalKafkaConsumer.get(KafkaDataConsumer.scala:147)
at org.apache.spark.sql.kafka010.KafkaDataConsumer$class.get(KafkaDataConsumer.scala:54)
at org.apache.spark.sql.kafka010.KafkaDataConsumer$CachedKafkaDataConsumer.get(KafkaDataConsumer.scala:362)
at org.apache.spark.sql.kafka010.KafkaSourceRDD$$anon$1.getNext(KafkaSourceRDD.scala:151)
at org.apache.spark.sql.kafka010.KafkaSourceRDD$$anon$1.getNext(KafkaSourceRDD.scala:142)
at org.apache.spark.util.NextIterator.hasNext(NextIterator.scala:73)
at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:408)
at scala.collection.Iterator$$anon$11.hasNext(Iterator.scala:408)
at org.apache.spark.sql.catalyst.expressions.GeneratedClass$GeneratedIteratorForCodegenStage1.processNext(Unknown Source)
at org.apache.spark.sql.execution.BufferedRowIterator.hasNext(BufferedRowIterator.java:43)
at org.apache.spark.sql.execution.WholeStageCodegenExec$$anonfun$10$$anon$1.hasNext(WholeStageCodegenExec.scala:614)
at org.apache.spark.sql.execution.streaming.ForeachSink$$anonfun$addBatch$1.apply(ForeachSink.scala:52)
at org.apache.spark.sql.execution.streaming.ForeachSink$$anonfun$addBatch$1.apply(ForeachSink.scala:49)
at org.apache.spark.rdd.RDD$$anonfun$foreachPartition$1$$anonfun$apply$29.apply(RDD.scala:935)
at org.apache.spark.rdd.RDD$$anonfun$foreachPartition$1$$anonfun$apply$29.apply(RDD.scala:935)
at org.apache.spark.SparkContext$$anonfun$runJob$5.apply(SparkContext.scala:2074)
at org.apache.spark.SparkContext$$anonfun$runJob$5.apply(SparkContext.scala:2074)
at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:87)
at org.apache.spark.scheduler.Task.run(Task.scala:109)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:381)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
It always fail on the same offset, looks like this is due to a gap in the offset, because I saw in Kafka UI that after offset 665 there is 667 (it skipped 666 for some reason), and the Kafka client in my Structured Streaming application tries to fetch 666 and fails.
After digging inside Spark's code, I see that they did not expect this exception to happen (according to the comment):
https://github.com/apache/spark/blob/branch-2.3/external/kafka-0-10-sql/src/main/scala/org/apache/spark/sql/kafka010/KafkaDataConsumer.scala#L297
So I am wondering, am I doing something wrong?
Or is this a bug on the specific version I am using?
There was a longstanding issue in Spark which was fixed in Spark 2.4, which created a bit of an impedance mismatch between Kafka and Spark. Part of the fix was backported to Spark 2.3.1, but is only enabled if the configuration option spark.streaming.kafka.allowNonConsecutiveOffsets is set to true; as you observe, it's quite possible that you're hitting something that wasn't backported, in which case upgrading to Spark 2.4 might be worth considering.

Kafka CommitFailedException occurs when a breakpoint is used on a method annotated with #KafkaListener

I do not understand why I get the CommitFailedException when I use a breakpoint on the method which is annotated with #KafkaListener.
I know for sure that I am not exceeding the metadata.max.age.ms = 300000 of the consumer and further more I am using max.poll.records = 1.
It seams like Heartbeat Thread is timing out but my understanding is that the Heartbeat Thread is independent from the poll thread.
I see that the following is condition is true in the AbstractCoordinator class a so the markCoordinatorUnknown() method is executed
else if (AbstractCoordinator.this.heartbeat.sessionTimeoutExpired(now)) {
AbstractCoordinator.this.markCoordinatorUnknown();
}
I am using a spring-boot 2.3.5 which comes with kafka-clients-2.5.1 and IntelliJ 2019
Apologies if I do not provided detailed information about the entire setup but my goal is to see if other developers experience the same issue.
In production this issue it does not happen since (of course) the application is not running in debug mode :-)
Following is the error in the logs:
2021-03-03 12:26:13.830 ERROR 664 --- [ntainer#0-0-C-1] essageListenerContainer$ListenerConsumer : Consumer exception
java.lang.IllegalStateException: This error handler cannot process 'org.apache.kafka.clients.consumer.CommitFailedException's; no record information is available
at org.springframework.kafka.listener.SeekUtils.seekOrRecover(SeekUtils.java:151) ~[spring-kafka-2.5.7.RELEASE.jar:2.5.7.RELEASE]
at org.springframework.kafka.listener.SeekToCurrentErrorHandler.handle(SeekToCurrentErrorHandler.java:113) ~[spring-kafka-2.5.7.RELEASE.jar:2.5.7.RELEASE]
at org.springframework.kafka.listener.KafkaMessageListenerContainer$ListenerConsumer.handleConsumerException(KafkaMessageListenerContainer.java:1368) ~[spring-kafka-2.5.7.RELEASE.jar:2.5.7.RELEASE]
at org.springframework.kafka.listener.KafkaMessageListenerContainer$ListenerConsumer.run(KafkaMessageListenerContainer.java:1070) ~[spring-kafka-2.5.7.RELEASE.jar:2.5.7.RELEASE]
at java.base/java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:515) ~[na:na]
at java.base/java.util.concurrent.FutureTask.run(FutureTask.java:264) ~[na:na]
at java.base/java.lang.Thread.run(Thread.java:834) ~[na:na]
Caused by: org.apache.kafka.clients.consumer.CommitFailedException: Offset commit cannot be completed since the consumer is not part of an active group for auto partition assignment; it is likely that the consumer was kicked out of the group.
at org.apache.kafka.clients.consumer.internals.ConsumerCoordinator.sendOffsetCommitRequest(ConsumerCoordinator.java:1116) ~[kafka-clients-2.5.1.jar:na]
at org.apache.kafka.clients.consumer.internals.ConsumerCoordinator.commitOffsetsSync(ConsumerCoordinator.java:983) ~[kafka-clients-2.5.1.jar:na]
at org.apache.kafka.clients.consumer.KafkaConsumer.commitSync(KafkaConsumer.java:1510) ~[kafka-clients-2.5.1.jar:na]
at org.springframework.kafka.listener.KafkaMessageListenerContainer$ListenerConsumer.doCommitSync(KafkaMessageListenerContainer.java:2324) ~[spring-kafka-2.5.7.RELEASE.jar:2.5.7.RELEASE]
at org.springframework.kafka.listener.KafkaMessageListenerContainer$ListenerConsumer.commitSync(KafkaMessageListenerContainer.java:2319) ~[spring-kafka-2.5.7.RELEASE.jar:2.5.7.RELEASE]
at org.springframework.kafka.listener.KafkaMessageListenerContainer$ListenerConsumer.commitIfNecessary(KafkaMessageListenerContainer.java:2305) ~[spring-kafka-2.5.7.RELEASE.jar:2.5.7.RELEASE]
at org.springframework.kafka.listener.KafkaMessageListenerContainer$ListenerConsumer.processCommits(KafkaMessageListenerContainer.java:2119) ~[spring-kafka-2.5.7.RELEASE.jar:2.5.7.RELEASE]
at org.springframework.kafka.listener.KafkaMessageListenerContainer$ListenerConsumer.pollAndInvoke(KafkaMessageListenerContainer.java:1104) ~[spring-kafka-2.5.7.RELEASE.jar:2.5.7.RELEASE]
at org.springframework.kafka.listener.KafkaMessageListenerContainer$ListenerConsumer.run(KafkaMessageListenerContainer.java:1038) ~[spring-kafka-2.5.7.RELEASE.jar:2.5.7.RELEASE]
... 3 common frames omitted

Strange error on Kafka broker

In our production Kafka broker I found this strange error in server.log. Due to this message sending in one of the topic was impacted. Producer was getting error "Partition count is 0: should refresh metadata". Kafka version 0.10.0.1. Open JDK Java 1.8
Can anyone help me out as to what could this mean?
[2018-01-10 17:23:51,411] ERROR Processor got uncaught exception. (kafka.network.Processor)
java.lang.NoClassDefFoundError: Could not initialize class java.net.IDN
at javax.net.ssl.SNIHostName.<init>(SNIHostName.java:175)
at sun.security.ssl.ServerNameExtension.<init>(ServerNameExtension.java:137)
at sun.security.ssl.HelloExtensions.<init>(HelloExtensions.java:78)
at sun.security.ssl.HandshakeMessage$ClientHello.<init>(HandshakeMessage.java:250)
at sun.security.ssl.ServerHandshaker.processMessage(ServerHandshaker.java:217)
at sun.security.ssl.Handshaker.processLoop(Handshaker.java:979)
at sun.security.ssl.Handshaker$1.run(Handshaker.java:919)
at sun.security.ssl.Handshaker$1.run(Handshaker.java:916)
at java.security.AccessController.doPrivileged(Native Method)
at sun.security.ssl.Handshaker$DelegatedTask.run(Handshaker.java:1369)
at org.apache.kafka.common.network.SslTransportLayer.runDelegatedTasks(SslTransportLayer.java:336)
at org.apache.kafka.common.network.SslTransportLayer.handshakeUnwrap(SslTransportLayer.java:414)
at org.apache.kafka.common.network.SslTransportLayer.handshake(SslTransportLayer.java:270)
at org.apache.kafka.common.network.KafkaChannel.prepare(KafkaChannel.java:62)
at org.apache.kafka.common.network.Selector.pollSelectionKeys(Selector.java:338)
at org.apache.kafka.common.network.Selector.poll(Selector.java:291)
at kafka.network.Processor.poll(SocketServer.scala:476)
at kafka.network.Processor.run(SocketServer.scala:416)
at java.lang.Thread.run(Thread.java:745)

Consumer group can't rebalance

I am learning kafka and now I am having some problems with the example code i've found here: https://cwiki.apache.org/confluence/display/KAFKA/Consumer+Group+Example
Each time I run the code it throws the exception:
Exception in thread "main" kafka.common.ConsumerRebalanceFailedException: ttt_NB644-1475151991986-76dfa03f can't rebalance after 4 retries
at kafka.consumer.ZookeeperConsumerConnector$ZKRebalancerListener.syncedRebalance(ZookeeperConsumerConnector.scala:670)
at kafka.consumer.ZookeeperConsumerConnector.kafka$consumer$ZookeeperConsumerConnector$$reinitializeConsumer(ZookeeperConsumerConnector.scala:977)
at kafka.consumer.ZookeeperConsumerConnector.consume(ZookeeperConsumerConnector.scala:264)
at kafka.javaapi.consumer.ZookeeperConsumerConnector.createMessageStreams(ZookeeperConsumerConnector.scala:85)
at kafka.javaapi.consumer.ZookeeperConsumerConnector.createMessageStreams(ZookeeperConsumerConnector.scala:97)
at com.glowbyte.kafka.consumertest.ConsumerGroupExample.run(ConsumerGroupExample.java:44)
at com.glowbyte.kafka.consumertest.ConsumerGroupExample.main(ConsumerGroupExample.java:78)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:498)
at com.intellij.rt.execution.application.AppMain.main(AppMain.java:140)
The kafka version which I'm using is 0.10, the latest one.
There is only one topic with one broker and two partitions, and I'm trying to run the code with 2 threads.
In the meantime, another code, just more simple, runs successfully on the same environment with also 2 threads. So I'd like to understand what's causing the described exception. Thanks.