I'm looking to change the log.dirs path for all Kafka log data that will be held on the broker servers however i already have a handful number of existing topics created and actively being used but not sure what will happen to the existing topic/log data? Can someone please clarify what would be the impact on the existing topics/log data by changing the log.dirs log path? Thanks
They'd be untracked, not be cleaned with retention/compaction, and when the server restarts, the clients would be unable to read that data anymore.
It'd be better to stop the broker, copy/move all the log files to the new location, then change the config, and start the broker.
Related
Having installed KAFKA and having looked at these posts:
kafka loses all topics on reboot
Kafka topic no longer exists after restart
and thus moving kafka-logs to /opt... location, I still note that when I reboot:
I can re-create the topic again.
the kafka-logs directory contains information on topics, offsets etc. but it gets corrupted.
I am wondering how to rectify this.
Testing of new topics prior to reboot works fine.
There can be two potential problems
If it is kafka running in docker, then docker image restart always cleans up the previous state and creates a new cluster hence all topics are lost.
Check the log.dir or Zookeeper data path. If either is set to /tmp directory, it will be cleaned on each reboot. Hence you will lose all logs and topics will be lost.
In this VM I noted the Zookeeper log was defined on /tmp. Changed that to /opt (presume it should be /var though) and the clearing of Kafka data when instance terminated was corrected. Not sure how to explain this completely.
I am using kafka confluent-4.1.1. I created few topics and it worked well. I don't see previously created topics if I restart confluent.
I tried the suggestions mentioned in the post:
Kafka topic no longer exists after restart
But no luck. Did any body face the same issue? Do I need to change any configurations?
Thanks in advance.
What configuration changes do I need to do in order to persist?
confluent start will use the CONFLUENT_CURRENT environmental variable for all its data storage. If you export this to a static location, data should, in theory, persist across reboots.
Otherwise, the standard ways to run each component individually is what you would do in a production environment (e.g. zookeeeper-start, kafka-server-start, schema-registry-start, etc.), which will persist data in whatever settings you've given in their respective configuration files.
I started zookeeper cluster in my computer, it's includes three instances.By default the size of log file should be 64M, but i found a strange things
If anyone can explain what happened with Zookeeper?
here is the content of the log file
The FileTxnLog is truncated, which is implemented by FileTxnSnapLog.truncateLog.
This scenario happens when there is a new election, and follower has some transaction that is not committed in leader.
This can be verified if log like:
Truncating log to get in sync with ...
exists in zookeeper.out or the log file you specified.
I'm configuring my first kafka network. I can't seem to find any support on saving a configured topic. I know I can configure a topic from the quickstart guide here, but how do I save it? I thought I could add the topic info to a .properties file inside the config dir, but I don't see any support for that.
If I shutdown my machine, my topic is deleted. How do I save the configuration?
Could the topic be deleted because you are using the default broker config? With the default config, Kafka logs are stored under /tmp folder. This folder gets wiped out during a machine reboot. You could change the broker config and pick another location for Kafka logs.
I can see a property in config/server.properties called log.dir? Does this mean kafka uses the same directory for storing logs and data both?
Kafka topics are "distributed and partitioned append only logs". Parameter log.dir defines where topics (ie, data) is stored.
It is not related to application/broker logging.
The default log.dir is /tmp/kafka-logs which you may want to change in case your OS has a /tmp directory cleaner.
log.dir or log.dirs in the config/server.properties specifiy the directories in which the log data is kept.
The server log directory is kafka_base_dir/logs by default. You could modify it by specifying another directory for 'kafka.logs.dir' in log4j.properties.
log.dir in server.properties is the place where the Kafka broker will store the commit logs containing your data. Typically this will your high speed mount disk for mission critical use-cases.
For application/broker logging you can use general log4j logging to get the event logging in your custom location. Below are the variables to do this.
-Dlog4j.configuration=file:<configuration file with log rolling, logging level etc.> & -Dkafka.logs.dir=<path to logs>
The directory location of logs and data were perfectly described by Mathias. Yet the data were designed for internal processing of Kafka engine, may you could use Kafka Connect to store and manipulate the data. Kafka Connect is a tool for scalability and reliability data between Apache Kafka and other systems. Look the picture bellow:
It will make simple to define connectors that move large amount of data into and out of Kafka internal data system. Kafka Connect can ingest entire database making the data available for stream processing or sink the specific data of a single topic (or multiples) to another system or database for further analysis.