Zookeeper data directory cleanup - apache-zookeeper

I am running 3 node zookeeper cluster to process storm and kafka.Zookeeper Data directory eats up all the space in my system.I am not sure how to clean it up.As, I don't want to delete the data entirely because i will lose the state of the processes.I looked into autopurge.purgeInterval in zoo.cfg, but it doesn't work as I expected.
I am using zookeeper 3.4.6
How can I delete the old data without affecting the new ones?

Assuming you have zookeeper installed in /opt/zookeeper-3.4.6, the following will delete all but the last 10 snapshots.
stop zookeeper (/opt/zookeeper-3.4.6/bin/zkServer.sh stop)
cd /opt/zookeeper-3.4.6
./zkCleanup.sh ../data/ -n 10
Probably wise to make a backup of the whole /opt/zookeeper-3.4.6 dir before doing this.

Stop Zookeeper
Go to the bin folder of your Zookeeper
Run ./zkCli.sh
Use ls / to check Zookeeper's content
Identify what you want to delete with the exact path
Delete /znode or path of what you want to delete

Related

Snapshot directory has log files. Check if dataLogDir and dataDir configuration is correct. + Zookeeper startup

Whenever I'm starting my zookeeper server, I'm getting the below error.
[2020-05-12 08:11:43,510] ERROR Unexpected exception, exiting abnormally (org.apache.zookeeper.server.quorum.QuorumPeerMain)
org.apache.zookeeper.server.persistence.FileTxnSnapLog$SnapDirContentCheckException: Snapshot directory has log files. Check if dataLogDir and dataDir configuration is correct.
at org.apache.zookeeper.server.persistence.FileTxnSnapLog.checkSnapDir(FileTxnSnapLog.java:140)
at org.apache.zookeeper.server.persistence.FileTxnSnapLog.<init>(FileTxnSnapLog.java:109)
at org.apache.zookeeper.server.quorum.QuorumPeerMain.runFromConfig(QuorumPeerMain.java:141)
at org.apache.zookeeper.server.quorum.QuorumPeerMain.initializeAndRun(QuorumPeerMain.java:114)
at org.apache.zookeeper.server.quorum.QuorumPeerMain.main(QuorumPeerMain.java:81)
Here is what I have in my zookeeper.properties
dataDir=/var/appdata/zookeeper
dataLogDir=/var/applogs/zookeeper
The default setting in zookeeper is that the snapshot and log files are stored in the same directory(the dataDir dir). Now that you have started your cluster the system would have created both the logs and snapshot files in the same directory.
Later on when you decide to have separate folders(maybe to increase efficiency) for data(snapshot) and logs(log) files you would simply added the two properties (dataDir,dataLogDir) and restart your cluster . But this will not work and the server fails during startup saying "Snapshot directory has log files". This is right because your setting clearly demands unique folders for the two(but both the files are present in the same folder already in your case. So u have to migrate now).
Soln:
cut all the log files from the dataDir and paste inside the corresponding version folder(eg version-2) of the dataLogDir. This would help you resume your zookeeper from the previous state(snapshot).

what is in zookeeper datadir and how to cleanup?

I found my zookeeper dataDir is huge. I would like to understand
What is in the dataDir?
How to cleanup? Does it automatically cleanup after certain period?
Thanks
According to Zookeeper's administrator guide:
The ZooKeeper Data Directory contains files which are a persistent copy of the znodes stored by a particular serving ensemble. These are the snapshot and transactional log files. As changes are made to the znodes these changes are appended to a transaction log, occasionally, when a log grows large, a snapshot of the current state of all znodes will be written to the filesystem. This snapshot supercedes all previous logs.
So in short, for your first question, you can assume that dataDir is used to store Zookeeper's state.
As for your second question, there is no automatic cleanup. From the doc:
A ZooKeeper server will not remove old snapshots and log files, this is the responsibility of the operator. Every serving environment is different and therefore the requirements of managing these files may differ from install to install (backup for example).
The PurgeTxnLog utility implements a simple retention policy that administrators can use. The API docs contains details on calling conventions (arguments, etc...).
In the following example the last count snapshots and their corresponding logs are retained and the others are deleted. The value of should typically be greater than 3 (although not required, this provides 3 backups in the unlikely event a recent log has become corrupted). This can be run as a cron job on the ZooKeeper server machines to clean up the logs daily.
java -cp zookeeper.jar:log4j.jar:conf org.apache.zookeeper.server.PurgeTxnLog <dataDir> <snapDir> -n <count>
If this is a dev instance, I guess you could just almost completely purge the folder (except some files like myid if its there). But for a production instance you should follow the cleanup procedure shown above.

MongoDB NonDocker and Docker Nodes

I have a 5 node MongoDB cluster installed non-Dockerized. I want to start adding nodes to this cluster but I want to use the Dockerized MongoDB (i.e. end result is to migrate Dockerized into the replica set and decommission the non-Dockerized nodes.)
When I do this, I am currently getting my added nodes stuck in STARTUP status so from my understanding the config files are not able to sync up.
Is there something that I need to do to prepare the cluster for the new nodes or is there some logs that I can delve into to find out why it is not moving to STARTUP2?
The data directory was not large enough thus the config files were unable to sync. As soon as I grew the data directory - all was well.

How can I update a configuration file on zookeeper?

I uploaded a configuration folder for Solr core to Apache zookeeper using zkClient.
When I delete a file in my local configuration and update it to Zookeeper again, I can't see the change reflected in Solr admin page.
Could somebody please explain how to update/delete files from zookeeper?
Also where to find the physical files in zookeeper folder?
In order to upload a modified file in zookeeper client, you need to:
remove the old file from Zookeeper and
upload the new one and
restart the Solr nodes (depending on the change, you could reload the collection instead).
For instance if you need to update solrconfig.xml, you can:
a) clear old file on zookeeper (otherwise depending from the client version you'll get an error):
zkcli.sh --zkhost <ZK_HOST>:<ZK_PORT> -cmd clear /configs/<MY_COLLECTION>/solrconfig.xml
b) upload the updated file:
zkcli.sh --zkhost <ZK_HOST>:<ZK_PORT> -cmd putfile /configs/<MY_COLLECTION>/solrconfig.xml /<MY_UPDATED_FILE_LOCAL_FOLDER>/solrconfig.xml
c) Restart the Solr nodes.
I believe your Solr files should be in /configs/<MY_COLLECTION>.

Which zookeeper nodes should not be deleted?

I already googled this info but found nothing decisive. I want to clear my zookeeper, is it alright if I run rmr on all root nodes?
Thanks
Feel free to delete every node that your system it not using.
Also, you may want to use ephemeral nodes, that get deleted automatically when the connession with zookeeper server is dropped.