what is in zookeeper datadir and how to cleanup? - apache-zookeeper

I found my zookeeper dataDir is huge. I would like to understand
What is in the dataDir?
How to cleanup? Does it automatically cleanup after certain period?
Thanks

According to Zookeeper's administrator guide:
The ZooKeeper Data Directory contains files which are a persistent copy of the znodes stored by a particular serving ensemble. These are the snapshot and transactional log files. As changes are made to the znodes these changes are appended to a transaction log, occasionally, when a log grows large, a snapshot of the current state of all znodes will be written to the filesystem. This snapshot supercedes all previous logs.
So in short, for your first question, you can assume that dataDir is used to store Zookeeper's state.
As for your second question, there is no automatic cleanup. From the doc:
A ZooKeeper server will not remove old snapshots and log files, this is the responsibility of the operator. Every serving environment is different and therefore the requirements of managing these files may differ from install to install (backup for example).
The PurgeTxnLog utility implements a simple retention policy that administrators can use. The API docs contains details on calling conventions (arguments, etc...).
In the following example the last count snapshots and their corresponding logs are retained and the others are deleted. The value of should typically be greater than 3 (although not required, this provides 3 backups in the unlikely event a recent log has become corrupted). This can be run as a cron job on the ZooKeeper server machines to clean up the logs daily.
java -cp zookeeper.jar:log4j.jar:conf org.apache.zookeeper.server.PurgeTxnLog <dataDir> <snapDir> -n <count>
If this is a dev instance, I guess you could just almost completely purge the folder (except some files like myid if its there). But for a production instance you should follow the cleanup procedure shown above.

Related

How set different state directory for multiple instances of the same Kafka Streams application on a single machine

From version 2.6.0, KafkaStreams with states locks the state.dir directory and as the documentation says
The state directory. Kafka Streams persists local states under the state directory. Each application has a subdirectory on its hosting machine that is located under the state directory. The name of the subdirectory is the application ID. The state stores associated with the application are created under this subdirectory. When running multiple instances of the same application on a single machine, this path must be unique for each such instance.
In the scenario of running multiple instances of the same application on a single machine,
The path cannot be a random path like /state/dir/{uuid} because this solution bypass the KAFKA-10716 issue.
My solution is to have a directory like /state/dir with ordinal subdirectories, e.g., 0,1,2... and each instance on startup checks this subdirectories from 0 and finds the first subdirectory that is not locked and use that directory for state.dir. As a result, the process id is read from metafile and the previous tasks will be assign to new process correctly.
Is this a correct solution?
What is the best practice to set a different path for each instance on a single machine?
I had the same issue and i also came with a solution that is similar to yours:
I've created a service registry. Each kafka streams instance will request an instance-id when its starting up. The service registry wil then give an integer back beginning from 0. If a second instance comes up, this will get id 1. The instance-id is used to set the group.instance.id and the state.dir configs.
To make it more reliable, each instance will periodically send a heartbeat request to the service-registry. This is needed to make an instance-id available again in case an instance goes down. It will also unregister itself in a shutdown hook to make its id available again. So if instance-0 restarts, it will then get id 0 again, because 0 is the next lowest available number.
With this solution you dont need to read directories and lock-files.
PS: why dont you just increase the num.stream.threads. As you descbribe yourself, you are running it on the same machine (scaling vertically ). With the solution i provided, you can scale horizontally and point the state.dir to the same directory.

Debezium Connector tries to open old log file

I have a debezium connector that works fine, for a limited time. These errors occur in log file:
Caused by: java.sql.SQLException: ORA-00308: cannot open archived log '+RECO/XXXXXXXX/ARCHIVELOG/2022_01_04/thread_1_seq_53874.3204.1093111215'
ORA-17503: ksfdopn:2 Failed to open file +RECO/XXXXXXXX/ARCHIVELOG/2022_01_04/thread_1_seq_53874.3204.1093111215
ORA-15012: ASM file '+RECO/XXXXXX/ARCHIVELOG/2022_01_04/thread_1_seq_53874.3204.1093111215' does not exist
I've learnt in this database log files are deleted daily. Is my connector trying to read an old log file, which does not exist anymore? How can I tell my connector to check only last 12 hours, for example. Or should I do something in database side?
I've learnt in this database log files are deleted daily. Is my connector trying to read an old log file, which does not exist anymore?
It is fine to delete archive logs that are no longer needed, but it's critical that you make sure that you are not deleting logs that the Oracle Connector still requires in order to perform mining. In your particular case, the connector still required thread_1_seq_53874.3204.1093111215 but the log is no longer on the file system and therefore the connector will stop with an error. This error happens with any other connector such as MySQL if you remove the binlogs before the connector is done reading them.
How can I tell my connector to check only last 12 hours, for example.
You cannot.
The way the Debezium connectors are designed is that they're meant to read all changes from the logs in chronological order to guarantee that there is no change data event loss. If a log were to be deleted that was needed and we did not throw an error, then you would have gaps where changes from the source database would not be represented as change events and so your consumers wouldn't be kept in sync.
Or should I do something in database side
Archive logs need to be retained for as long as they're needed by the connector. The latency of the Oracle connector is dependent both on the volatility of your database but also on a number of factors such as the performance of the database server hardware (disk and cpu), the size of your redo logs, etc.
Some environments may not be able to keep archive logs available in the default destination location for extended periods of time due to space constraints. This is why we introduced a way that you can set up Oracle to write archive logs to a secondary destination location that is capable of retaining the logs for a longer period of time, often via a network mount, and then you can explicitly tell the connector use that archive destination name rather than the first valid/default location of the system.

How to configure WAL archiving for a cluster that *only* hosts dev or test databases?

I've got a dev and test database for a project, i.e. databases that I use to either run my project or run tests, locally. They're both in the same cluster ('instance' – I come from Redmond).
Note that my local cluster is different than the cluster that hosts the production database.
How should I configure those databases with respect to archiving the WAL files?
I'd like to be able to 'build' or 'rebuild' either of those databases by restoring from a base backup and running seed data scripts.
But how should I configure the databases or the cluster for archiving WAL files? I understand that I need them if I want to recover the database. I think that's unlikely (as I didn't even know about 'WAL' or their files, or that, presumably they're shared by all of the databases in the same cluster, which seems weird and scary coming from Microsoft SQL Server.)
In the event that I rebuild one of the databases, I should delete the WAL files since the base backup – how can I do that?
But I also don't want to have to worry about the size of the WAL files growing indefinitely. I don't want to be forced to rebuild just to save space. What can I do to prevent this?
My local cluster only contains a single dev and test database for my project, i.e. losing data from one of these databases is (or should be) no big deal. Even having to recreate the cluster itself, and the two databases, is fine and not an issue if it's even just easier than otherwise to restore the two databases to a 'working' condition for local development and testing.
In other words, I don't care about the data in either database. I will ensure – separate from WAL archiving – that I can restore either database to a state sufficient for my needs.
Also, I'd like to document (e.g. in code) how to configure my local cluster and the two databases so that other developers for the same project can use the same setup for their local clusters. These clusters are all distinct from the cluster that hosts the production database.
Rather than trying to manage your WAL files manually, it's generally recommended that you let a third-party app take care of that for you. There are several options, but pg_backrest is the most popular of the open-source offerings out there.
Each database instance writes its WAL stream, chopped in segments of 16MB.
Every other relational database does the same thing, even Microsoft SQL Server (the differences are in the name and organization of these files).
The WAL contains the physical information required to replay transactions. Imagine it as information like: "in file x, block 2734, change 24 bytes at offset 543 as follows: ..."
With a base backup and this information you can restore any given point in time in the life of the database since the end of the base backup.
Each PostgreSQL cluster writes its own "WAL stream". The files are named with long weird hexadecimal numbers that never repeat, so there is no danger that a later WAL segment of a cluster can conflict with an earlier WAL segment of the same cluster.
You have to make sure that WAL is archived to a different machine, otherwise the exercise is pretty useless. If you have several clusters on the same machine, make sure that you archive them to different directories (or locations in general), because the names of the WAL segments of different clusters will collide.
About retention: You want to keep around your backups for some time. Once you get rid of a base backup, you can also get rid of all WAL segments from before that base backup. There is the pg_archivecleanup executable that can help you get rid of all archived WAL segments older than a given base backup.
I'd like to be able to 'build' or 'rebuild' either of those databases by restoring from a base backup and running seed data scripts.
Where is the basebackup coming from? If you are restoring the PROD base backup and running the seed scripts over it, then you don't need WAL archiving at all on test/dev. But then what you get will be a clone of PROD, which means it will not have different databases for test and for dev in the same instance, since (presumably) PROD doesn't have that.
If the base backup is coming from someplace else, you will have to describe what it is. That will dictate your WAL needs.
Trying to run one instance with both test and dev on it seems like a false economy to me. Just run two instances.
Setting archive_mode=off will entirely disable a wal archive. There will still be "live" WAL files in the pg_wal or pg_xlog directory, but these get removed/recycled automatically after each checkpoint--you should not need to manage these, other than by controlling how often checkpoints take place (and making sure you don't have any replication slots hanging around). The WAL archive and the live WAL files are different things. The live WAL files are mandatory and are needed to automatically recover from something like a power failure. The WAL archive may be needed to manually recover from a hard-drive crash or the total destruction of your server, and probably isn't needed at all on dev/test.

Zookeeper dataLogDir config invalid

I built a zookeeper cluster and it runs very well. But I found that the log directory I set in the zoo.cfg seems not working. Below is my config about log directory and snapshots directory.
dataDir=/var/lib/zookeeper
dataLogDir=/var/lib/zookeeper/logs
However, file zookeeper.out is generated in /var/lib/zookeeper rather than the subsidiary log folder /var/lib/zookeeper/logs.
I restarted zookeeper on every server many times, but made no sense.
This happens because zookeeper.out is related to other type of log (application log) instead of the one specified by dataLogDir which relates to transaction log.
dataLogDir
This option will direct the machine to write the transaction log to
the dataLogDir rather than the dataDir. This allows a dedicated log
device to be used, and helps avoid competition between logging and
snaphots.
By checking zkServer.sh you'll see that zookeeper.out is related to _ZOO_DAEMON_OUT which depends on ZOO_LOG_DIR which is set by default by zkEnv.sh. Depending on your environment and zookeeper (ZK) version the zookeeper.out file might land in different places (according to this answer even in the working directory from which ZK is started).
For application logging you'll better configure the log4j.properties file; that's because ZK uses log4j.

Backup Zookeeper data on active server

I am trying to configure production-ready Zookeeper data backup.
As I learned from different sources, Zookeeper snapshot file is not enough to guarantee a return to a previous state. In fact, the snapshot file may not even represent the state of the tree at any point in time (see corresponding stackoverflow ticket answer).
So to make the consistent zk data storage backup (to store it on a cloud or elsewhere), I need to copy snapshots with transaction logs.
The question is: how can I copy transaction log files while zookeeper is active and makes hundreds of transactions a second? Won't the files be corrupted?
What other practices can be used in this case?