How to remove the state of a GlobalKTable store? - apache-kafka

I want to remove the state of my GlobalKTable<Integer, Long> store.
I tried to remove the state by:
deleting the entire Kafka topic by calling kafka-topics.sh --zookeeper localhost:8080 --delete --topic my-topic
running KafkaStreams.cleanUp() on application startup
producing a 'Test:null' message to my stream, as null values should be treated as a DELETE statement in the store, as described here. However, my streams application fails as the null value can not be deserialized to LONG.
See the following exception:
org.apache.kafka.streams.errors.StreamsException: Deserialization exception handler is set to fail upon a deserialization error. If you would rather have the streaming pipeline continue after a deserialization error, please set the default.deserialization.exception.handler appropriately.
at org.apache.kafka.streams.processor.internals.RecordDeserializer.deserialize(RecordDeserializer.java:74)
at org.apache.kafka.streams.processor.internals.RecordQueue.addRawRecords(RecordQueue.java:91)
at org.apache.kafka.streams.processor.internals.PartitionGroup.addRawRecords(PartitionGroup.java:117)
at org.apache.kafka.streams.processor.internals.StreamTask.addRecords(StreamTask.java:546)
at org.apache.kafka.streams.processor.internals.StreamThread.addRecordsToTasks(StreamThread.java:920)
at org.apache.kafka.streams.processor.internals.StreamThread.runOnce(StreamThread.java:821)
at org.apache.kafka.streams.processor.internals.StreamThread.runLoop(StreamThread.java:774)
at org.apache.kafka.streams.processor.internals.StreamThread.run(StreamThread.java:744)
Caused by: org.apache.kafka.common.errors.SerializationException: Size of data received by LongDeserializer is not 8
How to remove the state of my GlobalKTable?

I have found a way to solve this. To remove the entire state of the GlobalKTable, the RocksDB cache files needs to be purged as well.
The RocksDB files are stored at location StreamsConfig.STATE_DIR_CONFIG. I've deleted the files and now my state is entirely cleaned.
There might be a better solution to do this, though?

KafkaStreams.cleanUp() should actually delete those RocksDB files. It's a bug that is fixed in upcoming 1.1 release: issues.apache.org/jira/browse/KAFKA-6259.

Related

IOError(Stalefile) exception being thrown by Kafka Streams RocksDB

When running my stateful Kafka streaming applications I'm coming across various different RocksDB Disk I/O Stalefile exceptions. The exception only occurs when I have at least one KTable implementation and it happens at various different times. I've tried countless times to reproduce it but haven't been able to.
App/Environment details:
Runtime: Java
Kafka library: org.apache.kafka:kafka-streams:2.5.1
Deployment: OpenShift
Volume type: NFS
RAM: 2000 - 8000 MiB
CPU: 200 Millicores to 2 Cores
Threads: 1
Partitions: 1 - many
Exceptions encountered:
Caused by: org.apache.kafka.streams.errors.ProcessorStateException: Error while getting value for key from at org.apache.kafka.streams.state.internals.RocksDBStore.get(RocksDbStore.java:301)
Caused by: org.apache.kafka.streams.errors.ProcessorStateException: Error restoring batch to store at org.apache.kafka.streams.state.internals.RocksDBStore$RocksDBBatchingRestoreCallback.restoreAll(RocksDbStore.java:636)
Caused by: org.apache.kafka.streams.errors.ProcessorStateException: Error while range compacting during restoring at org.apache.kafka.streams.state.internals.RocksDBStore$SingleColumnFamilyAccessor.toggleDbForBulkLoading(RocksDbStore.java:616)
Caused by: org.apache.kafka.streams.errors.ProcessorStateException: Error while executing flush from store at org.apache.kafka.streams.state.internals.RocksDBStore.flush(RocksDbStore.java:616)
Apologies for not being able to post the entire stack trace, but all of the above exceptions seem to reference the org.rocksdb.RocksDBException: IOError(Stalefile) exception.
Additional info:
Using a persisted state directory
Kafka topic settings are created with defaults
Running a single instance on a single thread
Exception is raised during gets and writes
Exception is raised when consuming valid data
Exception also occurs on internal repartition topics
I'd really appreciate any help and please let me know if I can provide any further information.
If you are using Posix file system, this error means that the file system returns ESTALE. See description to the code in https://man7.org/linux/man-pages/man3/errno.3.html

How to ignore error result in Kafka Connect Elasticsearch

I am trying to run kafka connect for elastic search .
But because of some mistake i entered wrong record in kafka topic .
Now i fixed that issue and inserting correct value but elastic search is still throwing error on previous record in the topic
Here is the error
Caused by: org.apache.kafka.connect.errors.DataException: Converting byte[] to Kafka Connect data failed due to serialization error
Caused by: org.apache.kafka.common.errors.SerializationException: com.fasterxml.jackson.core.JsonParseException: Unrecognized token 'lambdaDemo0': was expecting ('true', 'false' or 'null')
at [Source: (byte[])"lambdaDemo0-9749-0e710000fd04"; line: 1, column: 13]
Is there any way i can ignore the older record in the topic and tell kafka connect to pick latest record ?
I am trying to delete the topic i get topic marked for deletion but still records are present in the topic .
I tried below two properties but does seems to be working
drop.invalid.message=true
behavior.on.malformed.documents=ignore
Please suggest how i can clean up the wrong record in the topic
You can tell Kafka Connect to just skip bad records
errors.tolerance = all
Optionally, you can route these messages to another topic (known as a dead letter queue) for inspection by adding
errors.tolerance = all
errors.deadletterqueue.topic.name = my-dlq-topic
These settings are valid for Kafka Connect with any connector that is failing in the serialisation/deserialisation stage of processing. For more information see this article.

Kafka Streams shutdown after IllegalStateException: No current assignment for partition

I have a Kafka Streams application that launches and runs successfully. We have 4 instances of the application running. Occasionally one of our instance of the application is legitimately killed which causes several rounds of rebalancing until the old node is replaced.
Sometimes during the rebalance, one ore more previously healthy nodes fail. The logs are indicating that the Streams application transitions into a PENDING_SHUTDOWN state directly after receiving the following exception:
java.lang.IllegalStateException: No current assignment for partition public.chat.message-28
at org.apache.kafka.clients.consumer.internals.SubscriptionState.assignedState(SubscriptionState.java:256)
at org.apache.kafka.clients.consumer.internals.SubscriptionState.resetFailed(SubscriptionState.java:418)
at org.apache.kafka.clients.consumer.internals.Fetcher$2.onFailure(Fetcher.java:621)
at org.apache.kafka.clients.consumer.internals.RequestFuture.fireFailure(RequestFuture.java:177)
at org.apache.kafka.clients.consumer.internals.RequestFuture.raise(RequestFuture.java:147)
at org.apache.kafka.clients.consumer.internals.RequestFutureAdapter.onFailure(RequestFutureAdapter.java:30)
at org.apache.kafka.clients.consumer.internals.RequestFuture$1.onFailure(RequestFuture.java:209)
at org.apache.kafka.clients.consumer.internals.RequestFuture.fireFailure(RequestFuture.java:177)
at org.apache.kafka.clients.consumer.internals.RequestFuture.raise(RequestFuture.java:147)
at org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient$RequestFutureCompletionHandler.fireCompletion(ConsumerNetworkClient.java:571)
at org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient.firePendingCompletedRequests(ConsumerNetworkClient.java:389)
at org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient.poll(ConsumerNetworkClient.java:297)
at org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient.poll(ConsumerNetworkClient.java:236)
at org.apache.kafka.clients.consumer.internals.ConsumerNetworkClient.poll(ConsumerNetworkClient.java:215)
at org.apache.kafka.clients.consumer.internals.Fetcher.getTopicMetadata(Fetcher.java:292)
at org.apache.kafka.clients.consumer.internals.Fetcher.getAllTopicMetadata(Fetcher.java:275)
at org.apache.kafka.clients.consumer.KafkaConsumer.listTopics(KafkaConsumer.java:1849)
at org.apache.kafka.clients.consumer.KafkaConsumer.listTopics(KafkaConsumer.java:1827)
at org.apache.kafka.streams.processor.internals.StoreChangelogReader.refreshChangelogInfo(StoreChangelogReader.java:259)
at org.apache.kafka.streams.processor.internals.StoreChangelogReader.initialize(StoreChangelogReader.java:133)
at org.apache.kafka.streams.processor.internals.StoreChangelogReader.restore(StoreChangelogReader.java:79)
at org.apache.kafka.streams.processor.internals.TaskManager.updateNewAndRestoringTasks(TaskManager.java:328)
at org.apache.kafka.streams.processor.internals.StreamThread.runOnce(StreamThread.java:866)
at org.apache.kafka.streams.processor.internals.StreamThread.runLoop(StreamThread.java:804)
at org.apache.kafka.streams.processor.internals.StreamThread.run(StreamThread.java:773)
Prior to this error we often seem to also recieve some informational logs reporting a disconnect exception:
Error sending fetch request (sessionId=568252460, epoch=7) to node 4: org.apache.kafka.common.errors.DisconnectException
I have a feeling the two are related but I'm unable to reason why at present.
Is anyone able to give me some hints as to what may be causing this issue and any possible solutions?
Additional Info:
Kafka 2.2.1
32 partitions spread evenly across the 4 worker nodes
StreamsConfig settings:
kafkaStreamProps.put(StreamsConfig.REPLICATION_FACTOR_CONFIG, 2);
kafkaStreamProps.put(StreamsConfig.NUM_STANDBY_REPLICAS_CONFIG, 1);
kafkaStreamProps.put(StreamsConfig.NUM_STREAM_THREADS_CONFIG, 4);
kafkaStreamProps.put(StreamsConfig.COMMIT_INTERVAL_MS_CONFIG, 120000);
kafkaStreamProps.put(StreamsConfig.TOPOLOGY_OPTIMIZATION, StreamsConfig.OPTIMIZE);
This looks like it could be related to https://issues.apache.org/jira/browse/KAFKA-9073, which has been fixed in Kafka Streams 2.3.2.
If you can't wait for that release, you could try creating a private build using the changeset from this pull request: https://github.com/apache/kafka/pull/7630/files

Cannot Restart Kafka Consumer Application, Failing due to OffsetOutOfRangeException

Currently, my Kafka Consumer streaming application is manually committing the offsets into Kafka with enable.auto.commit set to false.
The application failed when I tried restarting it throwing below exception:
org.apache.kafka.clients.consumer.OffsetOutOfRangeException: Offsets out of range with no configured reset policy for partitions:{partition-12=155555555}
Assuming the above error is due to the message not present/partition deleted due to retention period, I tried below method:
I disabled the manual commit and enabled auto commit(enable.auto.commit=true and auto.offset.reset=earliest)
Still it fails with the same error
org.apache.kafka.clients.consumer.OffsetOutOfRangeException: Offsets out of range with no configured reset policy for partitions:{partition-12=155555555}
Please suggest ways to restart the job so that it can successfully read the correct offset for which message/partition is present
You are trying to read offset 155555555 from partition 12 of topic partition, but -most probably- it might have already been deleted due to your retention policy.
You can either use Kafka Streams Application Reset Tool in order to reset your Kafka Streams application's internal state, such that it can reprocess its input data from scratch
$ bin/kafka-streams-application-reset.sh
Option (* = required) Description
--------------------- -----------
* --application-id <id> The Kafka Streams application ID (application.id)
--bootstrap-servers <urls> Comma-separated list of broker urls with format: HOST1:PORT1,HOST2:PORT2
(default: localhost:9092)
--intermediate-topics <list> Comma-separated list of intermediate user topics
--input-topics <list> Comma-separated list of user input topics
--zookeeper <url> Format: HOST:POST
(default: localhost:2181)
or start your consumer using a fresh consumer group ID.
I met the same problem and I use package org.apache.spark.streaming.kafka010 in my application.In the begining,I suscepted the auto.offset.reset strategy take no effect,but when I read the description of the method fixKafkaParams in the object KafkaUtils,i found the configuration has been overwrited.I guess the reason why it tweak the configuration ConsumerConfig.AUTO_OFFSET_RESET_CONFIG for executor is to keep consistent offset obtained by driver and executor.

UnknownProducerIdException in Kafka streams when enabling exactly once

After enabling exactly once processing on a Kafka streams application, the following error appears in the logs:
ERROR o.a.k.s.p.internals.StreamTask - task [0_0] Failed to close producer
due to the following error:
org.apache.kafka.streams.errors.StreamsException: task [0_0] Abort
sending since an error caught with a previous record (key 222222 value
some-value timestamp 1519200902670) to topic exactly-once-test-topic-
v2 due to This exception is raised by the broker if it could not
locate the producer metadata associated with the producerId in
question. This could happen if, for instance, the producer's records
were deleted because their retention time had elapsed. Once the last
records of the producerId are removed, the producer's metadata is
removed from the broker, and future appends by the producer will
return this exception.
at org.apache.kafka.streams.processor.internals.RecordCollectorImpl.recordSendError(RecordCollectorImpl.java:125)
at org.apache.kafka.streams.processor.internals.RecordCollectorImpl.access$500(RecordCollectorImpl.java:48)
at org.apache.kafka.streams.processor.internals.RecordCollectorImpl$1.onCompletion(RecordCollectorImpl.java:180)
at org.apache.kafka.clients.producer.KafkaProducer$InterceptorCallback.onCompletion(KafkaProducer.java:1199)
at org.apache.kafka.clients.producer.internals.ProducerBatch.completeFutureAndFireCallbacks(ProducerBatch.java:204)
at org.apache.kafka.clients.producer.internals.ProducerBatch.done(ProducerBatch.java:187)
at org.apache.kafka.clients.producer.internals.Sender.failBatch(Sender.java:627)
at org.apache.kafka.clients.producer.internals.Sender.failBatch(Sender.java:596)
at org.apache.kafka.clients.producer.internals.Sender.completeBatch(Sender.java:557)
at org.apache.kafka.clients.producer.internals.Sender.handleProduceResponse(Sender.java:481)
at org.apache.kafka.clients.producer.internals.Sender.access$100(Sender.java:74)
at org.apache.kafka.clients.producer.internals.Sender$1.onComplete(Sender.java:692)
at org.apache.kafka.clients.ClientResponse.onComplete(ClientResponse.java:101)
at org.apache.kafka.clients.NetworkClient.completeResponses(NetworkClient.java:482)
at org.apache.kafka.clients.NetworkClient.poll(NetworkClient.java:474)
at org.apache.kafka.clients.producer.internals.Sender.run(Sender.java:239)
at org.apache.kafka.clients.producer.internals.Sender.run(Sender.java:163)
at java.lang.Thread.run(Thread.java:748)
Caused by: org.apache.kafka.common.errors.UnknownProducerIdException
We've reproduced the issue with a minimal test case where we move messages from a source stream to another stream without any transformation. The source stream contains millions of messages produced over several months. The KafkaStreams object is created with the following StreamsConfig:
StreamsConfig.PROCESSING_GUARANTEE_CONFIG = "exactly_once"
StreamsConfig.APPLICATION_ID_CONFIG = "Some app id"
StreamsConfig.NUM_STREAM_THREADS_CONFIG = 1
ProducerConfig.BATCH_SIZE_CONFIG = 102400
The app is able to process some messages before the exception occurs.
Context information:
we're running a 5 node Kafka 1.1.0 cluster with 5 zookeeper nodes.
there are multiple instances of the app running
Has anyone seen this problem before or can give us any hints about what might be causing this behaviour?
Update
We created a new 1.1.0 cluster from scratch and started to process new messages without problems. However, when we imported old messages from the old cluster, we hit the same UnknownProducerIdException after a while.
Next we tried to set the cleanup.policy on the sink topic to compact while keeping the retention.ms at 3 years. Now the error did not occur. However, messages seem to have been lost. The source offset is 106 million and the sink offset is 100 million.
As explained in the comments, there currently seems to be a bug that may cause problems when replaying messages older than the (maximum configurable?) retention time.
At time of writing this is unresolved, the latest status can always be seen here:
https://issues.apache.org/jira/browse/KAFKA-6817