Link mongo-data to /data/db folder to a volume Mongodb Docker - mongodb

I accidentally deleted a volume of docker mongo-data:/data/db , i have a copy of that folder , now the problem is when i run docker-compose up mongodb container doesn't start and gives an error of mongo_1 exited with code 14 below more details of the error and the mongo-data folder , can you someone help me please
in docker-compose.yml
volumes:
- ./mongo-data:/data/db

Restore from backup files
A step-by-step process to repair the corrupted files from a failed mongodb in a docker container:
! Before you start, make copy of the files. !
Make sure you know which version of the image was running in the container
Spawn new container with to run the repair process as follows
docker run -it -v <data folder>:/data/db <image-name>:<image-version> mongod --repair
Once the files are repaired, you can start the containers from the docker-compose
If the repair fails, it usually means that the files are corrupted beyond repair. There is still a chance to repair it with exporting the data as described here.
How to secure proper backup files
The database is constantly working with the files, so the files are constantly changed on the disks. In addition, the database will keep some of the changes in the internal memory buffers before they are flushed to the filesystem. Although the database engines are doing very good job to assure the the database can recover from abrupt failure by using the 2-stage commit process (first update the transaction-log than the datafile), when the files are copied there could be a corruption that will prevent the database from recovery.
Reason for such corruption is that the copy process is not aware of the database written process progress, and this creates a racing condition. With very simple words, while the database is in middle of writing, the copy process will create a copy of the file(s) that is half-updated, hence it will be corrupted.
When the database writer is in middle of writing to the files, we call them hot files. hot files are term from the OS perspective, and MongoDB also uses a term hot backup which is a term from MongoDB perspective. Hot backup means that the backup was taken when the database was running.
To take a proper snapshot (assuring the files are cold) you need to follow the procedure explained here. In short, the command db.fsyncLock() that is issued during this process will inform the database engine to flush all buffers and stop writing to the files. This will make the files cold, however the database remains hot, hence the difference between the terms hot files and hot backup. Once the copy is done, the database is informed to start writing to the filesystem by issuing db.fsyncUnlock()
Note the process is more complex and can change with different version of the databse. Here I give a simplification of it, in order to illustrate the point about the problems with the file snapshot. To secure proper and consistent backup, always follow the documented procedure for the database version that you use.
Suggested backup method
Preferred backup should always be the data dump method, since this assures that you can restore even in case of upgraded/downgraded database engines. MongoDB provides very useful tool called mongodump that can be used to create database backups by dumping the data, instead by copy of the files.
For more details on how to use the backup tools, as well as for the other methods of backup read the MongoDB Backup Methods chapter of the MondoDB documentation.

Related

What files should I source control with MongoDB

I've been using a MongoDB instance in my docker compose script. I want to set it up so I can keep my database from PC to PC but have all the same data.
There seems to be quite a bit of files in a MongoDB docker installation, .lock, .turtle, .wt, .bson diagnostic.data, journal etc.
Is there a rule of thumb of what I should store and would I should ignore in my repo? It's been unclear to me, I don't want to store anyfiles that could effect booting on another docker container.
Best is to preserve everything under the mongod dbPath folder , but some files/folders can be removed afcourse like diagnostic.data - the folder contain collected metrics during operation necessary for performance analysis at later stage , but in general the already collected stats are not necessary for the mongod process to be executed.

Can I copy the postgresql /base directory as a DB backup?

Don't shoot me, I'm only the OP!
When needing to backup our DB, we always are able to shutdown postgresql completely. After it is down, I found I could copy the "/base" directory with the binary data in it to another location. Checksum the accuracy and am later able to restore that if/when necessary. This has even worked when upgrading to a later version of postgresql. Integrity of the various 'conf' files is not an issue as that is done elsewhere (ie. by other processes/procedures!) in the system.
Is there any risk to this approach that I am missing?
The "File System Level Backup" link in Abelisto's comment is what JoeG is talking about doing. https://www.postgresql.org/docs/current/static/backup-file.html
To be safe I would go up one more level, to "main" on our ubuntu systems to take the snapshot, and thoroughly go through the caveats of doing file-level backups. I was tempted to post the caveats here, but I'd end up quoting the entire page.
The thing to be most aware of (in a 'simple' postgres environment ) is the relationship between the postgres database, a user database and the pg_clog and pg_xlog files. If you only get the "base" you lose the transaction and WAL information, and in more complex installations, other 'necessary' information.
If those caveat conditions listed do not exist in your environment, and you can do a full shutdown, this is a valid backup strategy, which can be much faster than a pg_dump.

mongorestore takes a lot of time, how about I just copy-paste the '/data/db' directory?

In my case, I want to backup and restore all the databases. This might sound stupid but -
Instead of doing
# backup
mongodump # takes time
# restore
mongorestore # takes a lot of time
Why can't I just
# backup
tar -cvzf /backup/mongo.tar.gz /data/db
# restore
tar -xzf /backup/mongo.tar.gz -C /data/db
Would this not work?
In principle, yes, that's possible, but there are several caveats. The strategies with their respective down- and upsides are discussed in detail in the backup documentation. Essentially, replica sets and sharding make the process more complex.
You'll have to shut down or lock the server so the files aren't being written to while you're copying them. Since copying still takes time, it makes sense to only do that on a secondary, otherwise your system will be effectively down.
Consider using file system / lvm snapshots (also discussed in the documentation); they are generally faster because the file system will do copy-on-write when necessary afterwards, so the actual snapshot takes only milliseconds. However, make sure you understand how that works on whatever LVM, file system or virtualization platform you're using, the performance characteristics can be peculiar, especially when keeping multiple snapshots.
Remember that any backup taken while the system is running is inconsistent - the only way to get a 'clean' backup is to gracefully shut down the application (so it finishes all pending writes but doesn't accept any further requests), then backup the database.

Mongodb EC2 EBS Backups

I have confusion on what I need to do here. I am new to Mongo. I have set up a small Mongo server on Amazon EC2, with EBS volumes, one for data, one for logs. I need to do a backup. It's okay to take the DB down in the middle of the night, at least currently.
Using the boto library, EBS snapshots and python to do the backup, I built a simple script that does the following:
sudo service mongodb stop
run backup of data
run backup of logs
sudo service mongodb start
The script ran through and restarted, but I noted in the AWS console that the snapshots are still being created, even through boto has come back, but Mongo has restarted. Certainly not ideal.
I checked the Mongo docs, and found this explanation on what to do for backups:
http://docs.mongodb.org/ecosystem/tutorial/backup-and-restore-mongodb-on-amazon-ec2/#ec2-backup-database-files
This is good info, but a bit unclear. If you are using journaling, which we are, it says:
If the dbpath is mapped to a single EBS volume then proceed to Backup the Database Files.
We have a single volume for data. So, I'm assuming that means to bypass the steps on flushing and locking. But at the end of Backup the Database Files, it discusses removing the locks.
So, I'm a bit confused. As I read it originally, then I don't actually need to do anything - I can just run the backup, and not worry about flushing/locking period. I probably don't need to take the DB down. But the paranoid part of me says no, that sounds suspicious.
Any thoughts from anyone on this, or experience, or good old fashioned knowledge?
Since you are using journaling, you can just run the snapshot without taking the DB down. This will be fine as long as the journal files are on the same EBS volume, which they would be unless you symlink them elsewhere.
We run a lot of mongodb servers on Amazon and this is how we do it too.

Backing up the DB vs. backing up the VM

We're serving a Django/Postgres site running on a VM hypervisor. We're now trying to figure out our back up strategy and have two probable options:
Back up the DB directly using pg_dump
Back up the VM directly by copying the VM image
I'm with the latter as I think, I could simply back up everything that has to do with the site. I'm not sure whether I have to shut down the VM for this though.
What is a better and more recommended way of backing up a DB? Are there any reasons for not using the VM backup?
Thanks
The question basically boils down to, can you consider a hot copy of PostgreSQL's data files a backup?
The answer is: not really. PostgreSQL tries very hard through the use of WAL to ensure that its files are in a consistent state all the time and that it can survive a power failure, but starting it up from a copy of these files puts PostgreSQL into recovery mode. If the backup happened at the wrong second and PostgreSQL can't recover from the state of these files, your backup is useless. You don't want your backup/restore mechanism to depend on the recovery mechanism (unless you're dealing with "crash only" software, which PostgreSQL is not).
The probability of PostgreSQL not being able to recover from these files is not high, but it's not zero either. The probability of PostgreSQL not being able to load an SQL dump that it made, on the other hand, is zero. I prefer backup choices with lower probabilities of failure. pg_dump was designed for doing backups.
PostgreSQL recommends using pg_dump for backups, as a file system (or VM) backup requires the database to be shut down (and has other drawbacks):
http://www.postgresql.org/docs/8.1/static/backup-file.html
Edit: Also, a pg_dump backup will be significantly smaller than a filesystem dump of the same database.
There is an additional option. With PostgreSQL you can make an online backup that allows you to snapshot the file system and maintain consistency. You can see details here:
http://www.postgresql.org/docs/9.0/static/continuous-archiving.html
We use this exact method for making backups when we run PostgreSQL in a VM.