I am creating the Kafka topic programmatically . when application end . i have to delete the created Kafka topics . I am using adminClient.deleteTopics(TOPIC_LIST) but still topics doe not get deleted instead making marked as "marked for deletion" . Any one know how can i delete topic permanently using java.
The method you're using is correct.
There's no method available to wait for the data on disk to permanently be removed unless you decide to use something like JSSH and manually delete remote file paths from the brokers
Related
We are using Apache Kafka and we process more than 30 million messages per day. We have an retention policy of "30" days. However, before 30 days, our messages got archived.
Is there a way we could retrieve the deleted messages?
Is it possible to reset the "start index" to older index to retrieve the data through query?
What other options do we have?
If we have "disk backup", could we use that for retrieving the data?
Thank You
I'm assuming your messages got deleted by the Kafka cluster here.
In general, no - if the records got deleted due to duration / size related policies, then they have been removed.
Theoretically, if you have access to backups you might move the Kafka data-log files to server directory, but the behaviour is undefined. Trying that with a fresh cluster with infinite size/time policies (so nothing gets purged instantly) might work and let you consume again.
In my experience, until the general availability of Tiered Storage, there is no free/easy way to recover data (via the Kafka Consumer protocol).
For example, you can use some Kafka Connect Sink connector to write to some external, more persistent storage. Then, would you want to write a job that scrapes that data? Sure, you could have a SQL database table of STRING topic, INT timestamp, BLOB key, BLOB value, and maybe track "consumer offsets" separately from that? If you use that design, then Kafka doesn't really seem useful, as you'd be reimplementing various parts of it when you could've just added more storage to the Kafka cluster.
Is it possible to reset the "start index" to older index to retrieve the data through query?
That is what auto.offset.reset=earliest will do, or kafka-consumer-groups --reset-offsets --to-earliest
have "disk backup", could we use that
With caution, maybe. For example - you can copy old broker log segments into a server, but then there aren't any tools I know of that will retroactively discover the new "low watermark" of each topic (maybe the broker finds this upon restart, I haven't tested). You'd need to copy this data for each broker manually, I believe, since the replicas wouldn't know about old segments (again, maybe after a full cluster restart, they might).
Plus, the consumer offsets would already be reading way past that data, unless you stop all consumers and reset them.
I'm also not sure what happens if you had gaps in the segment files. E.g. your current oldest segment is N and you copy N-2, but not N-1... You might then run into an error or the consumer will simply apply auto.offset.reset policy, and seek to the next available offset or to the very end of the topic
I start to develop with KSQLDB and alongside with Kafka-connect. Kafka-connect is awesome and everything is well and has the behaviour to not reread the records if it detects that it was already read in the past (extrem useful for production). But, for development and debugging of KSQLDB queries it is necessary to replay the data, as ksqldb will create table entries on the fly on emitted changes. If nothing is replayed the to 'test' query stays empty. Any advice how to replay a csv file with kafka connect after the file is inserted for the first time? Maybe, ksqldb has the possiblity to reread the whole topic after the table is created. Has someone the answer for a beginner?
Create the source connector with a different name, or give the CSV file a new name. Both should cause it to be re-read.
I had also a problem in my configuration file and ksqldb did not recognize the
SET 'auto.offset.reset'='earliest';
option. Use the above command, notice the ', to force ksqldb to reread the whole topic from the beginning after a table/stream creation command. You have to set it manually every time you connect via ksql-cli or client.
I am a beginner to Kafka, and recently started using in my projects at work. One important thing that I wanna know is, whether it is possible to capture event(s) when messages expire in kafka. The intent is to trap these expired messages and back them up in a backup store.
I believe the goal you want to achieve is similar to Apache Kafka Tiered Storage which is still under development in Open Source Apache Kafka
Messages don't expire. I think you could think of two different scenarios when you think about messages that expire.
A topic is configured with cleanup.policy = delete . After retention.ms or retention.bytes it looks like messages expire. However what actually happens is that a whole log segment, whose newest message is older than retention.ms or if the partitions retention.bytes is exceeded, will be deleted. It will only be considered for deletion if it is not the active segment which Kafka currently writes to.
A topic is configured with cleanup.policy = tombstone. When two log segments are merged, Kafka will make sure that only the latest version for each distinct key will be kept. To "delete" messages one would send a message with a key to target a message and with an empty value - also called a tombstone.
There's no hook or event you could subscribe to, in order to figure out if either of these two cases will happen. You'd have to take of the logic on the client side, which is hard because the Kafka API does not expose any details about the log segments within a partition.
Is there any way to delete a topic in kafka-node? I have searched some documents but there are steps to remove a topic using kafka-node. I want to delete the topic instead of removing
No, there is no way to delete a topic using kafka-node. There's probably a way to do it via node-zookeeper-client, but that's non-standard. You'd have to look at the way the standard Kafka delete topic command does it, and then roll your own deletion code.
But regardless -- you can't do it via kafka-node.
I have a folder that contains two files, I want to display the names of two file and when I added another file I display their name too. My question is, can I do that with Kafka?
Its strange case of use kafka. I think that this should by done adding an apache flume. Flume can pool directory and when discover new files, send this to kafka and then process this messages to recover file name.
it solves your problem?