What are important mongo data files for backup - mongodb

If I want to backup database by copying raw files. What files do I need to copy ? only db-name.ns, db-name.0, db-name.1.... or whole folder (local.ns.., journal). I'm running replica set. I understand procedure for locking hidden secondary node and then copying files to new location. But I'm wondering do I need to copy whole folder or just some files.
Thx

Simple answer: All of them. As obvious as it might sound. And here is why:
If you don't copy a namespaces file, your database will most likely not work.
When not copying all datafiles, some of your data is missing and your indices will point to void locations. The database in question might work (minus the data stored in the missing data file), but I would not bet on that – and since the data was important enough to create a backup in the first place, you don't want this to happen, do you?
Config, admin and local databases are vitally necessary for their respective features – and since you used the feature, you probably want to use it after a restore, too.
How do I backup all files?
The best solution save for MMS backup I have found so far is to create LVM snapshots of the filesystem the MongoDB data resides on. In order for tis to work, the journal needs to be included. Usually, you don't need a dedicated backup node for this approach. It is a bit complicated to set up, though.
Preparing LVM backups
Let's assume you have your data in the default data directory /data/db and you have not changed any paths. Then you would mount a logical volume to /data/db and use this to hold the data. Assuming that you don't have anything like this, here is a step by step guide:
Create a logical volume big enough to hold your data. I will call that one /dev/VolGroup/LogVol1 from now on. Make sure that you only use about 80% of the available disk space in the volume group for creating the logical volume.
Create a filesystem on the logical volume. I prefer XFS, so we create an xfs filesystem on /dev/VolGroup/LogVol1:
mkfs.xfs /dev/VolGroup/LogVol1
Mount the newly created filesystem on /mnt
mount /dev/VolGroup/LogVol1 /mnt
Shut down mongod:
killall mongod
(Note that the upstart scripts sometimes have problems shutting down mongod, and this command gracefully stops mongod anyway).
Copy the datafiles from /data/dbto /mntby issuing
cp -a /data/db/* /mnt
Adjust your /etc/fstab so that the logical volume gets mounted on reboot:
# The noatime parameter increases io speed of mongod significantly
/dev/VolGroup/LogVol1 /data/db xfs defaults,noatime 0 1
Umount the logical volume from it's current outpoint and remount it on the correct one:
cd && umount /mnt/ && mount /data/db
Restart mongod
Creating a backup
Creating a backup now becomes as easy as
Create a snapshot:
lvcreate -l100%FREE -s -n mongo_backup /dev/VolGroup/LogVol1
Mount the snapshot:
mount /dev/VolGroup/mongo_backup /mnt
Copy it somewhere. The reason we need to do this is that the snapshot can only be held up until the changes to the data files do not exceed the space in the volume group you did not allocate during preparation. For example, if you have a 100GB disk and you allocated 80GB for /dev/VolGroup/LogVol1, the snapshot size would be 20GB. While the changes on the filesystem from the point you took the snapshot are less than 20GB, everything runs fine. After that, the filesystem will refuse to take any changes. So you aren't in a hurry, but you should definitely move the data to an offsite location, an FTP server or whatever you deem appropriate. Note that compressing the datafiles can take quite long and you might run out of "change space" before finishing that. Personally, I like to have a slower HDD as a temporary place to store the backup, doing all other operations on the HDD. So my copy command looks like
cp -a /mnt/* /home/mongobackup/backups
when the HDD is mounted on /home/mongobackup.
Destroy the snapshot:
umount /mnt && lvremove /dev/VolGroup/mongo_backup
The space allocated for the snapshot is released and the restrictions to the amount of changes to the filesystem are removed.

Whole db-Data folder + where ever you have your logs and journalling

The best solution to backup data on MongoDB would be to use Mongo monitoring Service(MMS). All other solutions including copying files manually, mongodump, mongoexport are way behind MMS.

Related

Link mongo-data to /data/db folder to a volume Mongodb Docker

I accidentally deleted a volume of docker mongo-data:/data/db , i have a copy of that folder , now the problem is when i run docker-compose up mongodb container doesn't start and gives an error of mongo_1 exited with code 14 below more details of the error and the mongo-data folder , can you someone help me please
in docker-compose.yml
volumes:
- ./mongo-data:/data/db
Restore from backup files
A step-by-step process to repair the corrupted files from a failed mongodb in a docker container:
! Before you start, make copy of the files. !
Make sure you know which version of the image was running in the container
Spawn new container with to run the repair process as follows
docker run -it -v <data folder>:/data/db <image-name>:<image-version> mongod --repair
Once the files are repaired, you can start the containers from the docker-compose
If the repair fails, it usually means that the files are corrupted beyond repair. There is still a chance to repair it with exporting the data as described here.
How to secure proper backup files
The database is constantly working with the files, so the files are constantly changed on the disks. In addition, the database will keep some of the changes in the internal memory buffers before they are flushed to the filesystem. Although the database engines are doing very good job to assure the the database can recover from abrupt failure by using the 2-stage commit process (first update the transaction-log than the datafile), when the files are copied there could be a corruption that will prevent the database from recovery.
Reason for such corruption is that the copy process is not aware of the database written process progress, and this creates a racing condition. With very simple words, while the database is in middle of writing, the copy process will create a copy of the file(s) that is half-updated, hence it will be corrupted.
When the database writer is in middle of writing to the files, we call them hot files. hot files are term from the OS perspective, and MongoDB also uses a term hot backup which is a term from MongoDB perspective. Hot backup means that the backup was taken when the database was running.
To take a proper snapshot (assuring the files are cold) you need to follow the procedure explained here. In short, the command db.fsyncLock() that is issued during this process will inform the database engine to flush all buffers and stop writing to the files. This will make the files cold, however the database remains hot, hence the difference between the terms hot files and hot backup. Once the copy is done, the database is informed to start writing to the filesystem by issuing db.fsyncUnlock()
Note the process is more complex and can change with different version of the databse. Here I give a simplification of it, in order to illustrate the point about the problems with the file snapshot. To secure proper and consistent backup, always follow the documented procedure for the database version that you use.
Suggested backup method
Preferred backup should always be the data dump method, since this assures that you can restore even in case of upgraded/downgraded database engines. MongoDB provides very useful tool called mongodump that can be used to create database backups by dumping the data, instead by copy of the files.
For more details on how to use the backup tools, as well as for the other methods of backup read the MongoDB Backup Methods chapter of the MondoDB documentation.

mongorestore takes a lot of time, how about I just copy-paste the '/data/db' directory?

In my case, I want to backup and restore all the databases. This might sound stupid but -
Instead of doing
# backup
mongodump # takes time
# restore
mongorestore # takes a lot of time
Why can't I just
# backup
tar -cvzf /backup/mongo.tar.gz /data/db
# restore
tar -xzf /backup/mongo.tar.gz -C /data/db
Would this not work?
In principle, yes, that's possible, but there are several caveats. The strategies with their respective down- and upsides are discussed in detail in the backup documentation. Essentially, replica sets and sharding make the process more complex.
You'll have to shut down or lock the server so the files aren't being written to while you're copying them. Since copying still takes time, it makes sense to only do that on a secondary, otherwise your system will be effectively down.
Consider using file system / lvm snapshots (also discussed in the documentation); they are generally faster because the file system will do copy-on-write when necessary afterwards, so the actual snapshot takes only milliseconds. However, make sure you understand how that works on whatever LVM, file system or virtualization platform you're using, the performance characteristics can be peculiar, especially when keeping multiple snapshots.
Remember that any backup taken while the system is running is inconsistent - the only way to get a 'clean' backup is to gracefully shut down the application (so it finishes all pending writes but doesn't accept any further requests), then backup the database.

MongoDB Backups - Is it safe to snapshot only the dbpath volume?

Assumption: Single MongoDB instance.
I have tested a backup and restore using an EBS snapshot of only the volume storing my data (dbpath) and NOT the /logs or /journal volumes. The restore seems to work fine and the data is available.
Are there any risks or downsides to doing this? In other words, do I lose anything if I don't have a backup snapshot of the /logs and /journal volumes?
Backing up if journal and dbpath are on separate EBS volumes
If your /journal directory is on a different EBS volume from your dbpath, the only way to get a consistent backup would be to use db.fsyncLock() to ensure there are no pending write operations. The fsyncLock() command has the side effect of blocking all writes to your database, so typically you would only want to use this approach if you are backing up from a secondary in a replica set (rather than a sole mongod, as per your assumption in the question description).
Backing up if journal and dbpath are on the same EBS volumes
If the journal and dbpath are on the same EBS volume you can get a consistent backup using EBS snapshots.
Do you need to backup the log directory?
Strictly speaking, you do not need to backup the logs. For troubleshooting purposes it can be useful to rotate the logs and keep a few days of recent log files.
I have tested a backup and restore using an EBS snapshot of only the volume storing my data (dbpath) and NOT the /logs or /journal volumes. The restore seems to work fine and the data is available.
This approach will be fine, until it isn't -- that fateful day when you want to need to restore from backup and realise that your last n backups are unusable as you try them one at a time, or perhaps encounter unexpected errors days after you assumed a restored database was OK. If you don't backup the journal file this is effectively the same as running without journaling, and the recommended recovery procedures involve running a repair before restarting. The risk isn't so much about changes that haven't been flushed from the journal, but rather the unlucky timing if the power goes out in the middle of a write to the data files leaving things in an inconsistent state with no recovery information (aka: the journal).
If you're going to take backups, definitely follow the correct procedure to remove unnecessary risk.
For more information see EC2 Backup and Restore in the MongoDB manual.

should I let mongodb make use of the new hard disk in this way?

I have a mongodb v2.4.6 running on ubuntu 13.04. It is known that mongodb store all data in /var/lib/mongodb. Now the mongodb is running out of the hard disk. Fortunately, I got a new hard disk which is installed, fdisked, formated and got a name /dev/sda3. Unfortunately I don't know how to let the mongodb make use of the new hard disk because my knowledge on ubuntu and mongodb is very limited. After some research in internet, it seems that I should execute the following command
sudo mount /dev/sda3 /var/lib/mongodb
Is this what I need to do to let mongodb use the new disk? If so, will mongodb automatically and intelligently increase its data to this disk? Is there any othere things I should do? Thank you.
Unfortunately this one will not be that straightforward. Even if you succeed with the mounting it will not move the files at all. What you can do is to
mount the disk elsewhere (mkdir /var/lib/mongodb1, mount /dev/sda3 /var/lib/mongodb1)
stop mongo
copy the files from /var/lib/mongodb to /var/lib/mongodb1 (only helps if the new disk is bigger)
reconfigure mongo to use as db dir the new directory or swap the names with mv commands
start mongo
if everything went fine, mongo started and so on,(check it first!!!) you can delete the old data.
If you have a disk which is the same size so with moving the data you will run into the same problem, if you need larger space then a single disk you should play around with RAID and/or LVM and more disks.

postgresql initdb - directory not empty

I am installing postgres 8.4 on an ubuntu lucid server (no, at the moment we are using the "lucid" LTS version on that server so an upgrade is not possible yet (although we are going to start testing the system on precise quite soon now))
I have set up an own partition for the /var/lib/postgresql/8.4/main directory with a ext4 file system. (Those of you who are really into postgres installs knows what is happening now...) Since ext4 puts a lost+found directory in the root of all file system, postgres will not use that directory as its data-directory since it is initially not empty...
initdb: directory "/var/lib/postgresql/8.4/main" exists but is not empty
If you want to create a new database system, either remove or empty
the directory "/var/lib/postgresql/8.4/main" or run initdb
with an argument other than "/var/lib/postgresql/8.4/main".
The easiest way to proceed would be to remove the lost+found and recreate it after initdb has done its job. - could that cause any problems? Does the lost+found have any special attributes or anything that makes it impossible to recreate, and also, it is needed at any other time than if checkdisk finds something it needs to put there?
Another way would be to unmount the .../main/ file system, init the database, temporary mount the .../main/ filesystem somewhere else, move things over there and mount it in place. Seems to be a bit more work than the "easiest way".
Or is it some way to make initdb ignore that the directory is not empty? (couldn't see any command line switches for that)
May a lost+found directory within postgres main directory cause any problems?
At the moment I am running the system on a virtual machine for testing, so it really doesn't matter if I mess up things, but before making this an official way of installing a mission-critical system, it would be nice to have some thoughts on this.
lost+found has preallocated blocks that make it easier for fsck to move data into it when the partition is short of free blocks. To create it, better use the mklost+found command rather than mkdir.
If you don't recreate it, fsck will do it anyway when it's needed.
But if it comes to the point where fsck finds corruption within PGDATA, I'd think about going for a backup rather than counting on lost+found to retrieve anything.