The question is simple. What will happen if you delete the Kafka snapshot files in the Kafka log dir. Will Kafka be able to start? Will it have to do a slow rebuild of something?
Bonus question what exactly do the snapshot files contain?
Background for this question
I have a cluster that has been down for a couple of days due to simultaneous downtime on all brokers and a resulting corrupted broker. Now when it starts it is silent for hours (in the log file no new messages). By inspecting the JVM I have found that all the (very limited) cpu usage is spent in loadproducersfromlog function/method. By reading the comments above it is suggested that this is an attempt to recover producer state from the snapshots. I do not care about this. I just want my broker back so I am thinking if I can simply delete the snapshots to get Kafka started again.
If snapshot files are deleted, during start up method log.loadSegmentFiles(), all messages in the partition will have to be read to recreate the snapshot even if log and index files are present. This will increase the time to load partition.
For contents of snapshot file, please refer writeSnapshot() in ProducerStateManager.
https://github.com/apache/kafka/blob/980b725bb09ee42469534bf50d01118ce650880a/core/src/main/scala/kafka/log/ProducerStateManager.scala
Parameter log.dir defines where topics (ie, data) is stored (supplemental for log.dirs property).
A snapshot basically gives you a copy of your data at one point in time.
In a situation like yours, instead of waiting for a response, you could:
change the log.dirs path, restart everything and see how it goes;
backup the snapshots, saving them in a different location, delete them all from the previous one and see how it goes.
After that you're meant to be able to start up Kafka.
Related
I am trying to understand how ActiveMQ Artemis manages the journal files (under data/journal) and when it creates new ones. I read the documentation, but it wasn't clear how the files are create. I have a broker.xml has simple settings (won't be able to share unfortunately). Here are a few:
journal min threads - 2
journal pool size - 50
file size is defaulted to 10MB.
ActiveMQ Artemis starts, and I see 2 files are already created under /data/journal. I am now running a request that posts a lot of messages in a very short time. These messages are being actively consumed. I am publishing a lot of messages but they are not accumulating as fast because the consumer is consuming them. However, this doesn't cause the files to grow to recreate the space issue.
As the message volume is going up I don't see the number of files going up. It goes up and stays at 12 files for a long time.
I can understand the message # is not sufficient trigger additional files if only the latest journal file is being written to. However, I see all the 11 files have updated timestamps making me think they are being rotated.
My paging directory is empty.
I am trying to understand why the journal is not growing despite the message volume.
From what I can tell, message consumption has reached a break-even point with message production so that messages are not accumulating on the broker therefore the journal is not actually growing in size. The journal files are simply being re-used as messages are being added & acknowledged.
I've a project where I need to provide statistical information via API to the external services. In the mentioned service I use only Kafka as a "storage". When the application starts it reads events from cluster for 1 week and counts some values. And actively listens to new events to update the information. For example information is "how many times x item was sold" etc.
Startup of the application takes a lot of time and brings some other problems with it. It is a Kubernetes service and readiness probe fails time to time, when reading last 1 weeks events takes much time.
Two alternatives came to my mind to replace the entire logic:
Kafka Streams or KSQL (I'm not sure if I will need same amount of memory and computation unit here)
Cache Database
I'm wondering which idea would be better here? Or is there any idea better than them?
First, I hope this is a compacted topic that you are reading, otherwise, your "x times" will be misleading as data is deleted from the topic.
Any option you chose will require reading from the beginning of the topic, so the solution will come down to starting a persistent consumer that:
Stores data on disk (such as Kafka Streams or KSQL KTable) in RocksDB
Some other database of your choice. Redis would be a good option, but so would Couchbase if you want to use Memcached
I am trying to implement a Camel Spring Boot application that is using FileComponent to poll on a directory. I also want to support clustering meaning multiple instances of this Camel-spring boot application could be started and consume from the directory.
I am trying to implement the IdempotentRepository on the File consumer with KafkaIdempotentRepository. However, when I start two instances at the same time, both of them consume a file coming into the directory and both instances broadcasts action:add for key my_file_name.
The configuration for the file component is the following:
file:incoming?readLock=idempotent&idempotentRepository=#myKafkaRepo&readLockLoggingLevel=WARN&shuffle=true
All the examples on clustered Idempotent Repository were with Hazelcast and for me it is difficult to impose on my users from operational reasons.
My question: does KafkaIdempotentRepository support clustered IdempotentRepository? If not which implementation would you suggest to use?
Kafka:: Apache Camel - IdempotentRepository Documentation
On startup, the instance subscribes to the topic and rewinds the offset to the beginning, rebuilding the cache to the latest state. The cache will not be considered warmed up until one poll of pollDurationMs in length returns 0 records. Startup will not be completed until either the cache has warmed up, or 30 seconds go by; if the latter happens the idempotent repository may be in an inconsistent state until its consumer catches up to the end of the topic.
My opinion
It depends how many recently processed records you need to remember and what the retention period of the topic will be.
If you can set the retention time of the topic is big enough that it satisfies your number of records to remember requirement but small enough for cache warm up can complete in much less than 30 seconds, go for it.
I'm not sure if it's already answered. As I didn't get proper explanation, posting my question here.
Why kafka streams state.dir is stored under /tmp/kafka-streams?
I know I can change the path by providing the state dir config in the stream code like below
StreamsConfig.STATE_DIR_CONFIG,"/var/abc-Streams"
But will there be any impact of changing the directory?
or
Can I configure the state DB in an application directory and not in /tmp.
As per the confluent documentation, for :
Stateful operations :
automatically creates and manages such state stores when you are calling stateful operators such as count() or aggregate(), or when you are windowing a stream
but didn't specify where exactly it's being stored.
ANy thoughts?
Why kafka streams state.dir is stored under /tmp/kafka-streams?
There are several reasons.
Usually /tmp directory has a default write permission. So you don't have to struggle with write permissions as a beginner.
/tmp directory is short lived directory. On each system reboot, it is cleaned, hence you don't experience the over flooded disk storage in case you forgot to delete the state.dir. Downside is, you lose the states from previous run hence you need to rebuild the states from scratch.
If you want to reuse the states stored in state.dir, you should store it somewhere except /tmp.
All the state-stores are stored in the location specified in state.dir. If not specified, it is /tmp/kafka-streams/<app-id> directory.
In the case of network partition or node crash, most of the distributed atomic broadcast protocols (like Extended Virtual Synchrony or Paxos), require running nodes, to keep logging messages, until the crashed or partitioned node rejoins the cluster. When a node rejoins the cluster, replay of logged messages are enough to regain the current state.
My question is, if the partitioned/crash node takes really long time to join the cluster again, then eventually logs will overflow. This seem to be a very practical issue, but still no one in their paper talks about it. Is there a very obvious solution to this which I am missing? Or my understanding in incorrect.
You don't really need to remember the whole log. Imagine for example that the state you were synchronizing between the nodes was something like an SQL table with a row of the form (id: int, name: string) and the commands that would be written into the logs were in a form "insert row with id=x and name=y", "delete row where id=z", "set name=a where id=1000",...
Once such commands were committed, all you really care about is the final table. Then once a node which was offline for a long time goes online, it would only need to download the table + few entries from the log that were committed while the table was being downloaded.
This is called "log compaction", check out the chapter 7 in the Raft paper for more info.
There are a few potential solutions to the infinite log problem but one of the more popular ones for replicated state machines is to periodically snap-shot the full replicated state machine and delete all history prior to that point. A node that has been offline too long would then just discard all of their information, download the snapshot, and start replaying the replicated logs from that point.