Mongo DB Backup & restore

Mongo DB Backup & restore - mongodb

I have a DB with 150GB of data's . Am using MongoDump and Mongorestore method to back and restore.
My Production server is running with Mongo 2.2 and Test server is with 2.6.1
When i take a back up from production server Mongo2.2 its taking long time to complete the back up for 150GB of data. And restoration take 6-8 Hours. its not completed without error, some times restore is dropped automatically and we need to run the restore again or restore the missed collection.
is there a best way to to take a backup and restore method, where we can save time and run it without Errors?
Regards,
Rishi

You can couple of options for native backup and restore functionality and these are listed very well in the documentation at http://docs.mongodb.org/manual/administration/backup/.
Just to summarize, as your data grows, mongodump / mongorestore becomes less ideal for backup / restore purpose and you should start looking at other options like:
File system based snapshots or LVM snapshots (Since you are in EC2, this should be fairly straight forwards)
MMS Backup

The best method is to Backup and Restore Using LVM on a Linux System.
Creating a Snapshot:
lvcreate --size 100M --snapshot --name mdb-snap01 /dev/vg0/mongodb
Archive a Snapshot:
umount /dev/vg0/mdb-snap01
dd if=/dev/vg0/mdb-snap01 | gzip > mdb-snap01.gz
Restore a Snapshot:
lvcreate --size 1G --name mdb-new vg0
gzip -d -c mdb-snap01.gz | dd of=/dev/vg0/mdb-new
mount /dev/vg0/mdb-new /srv/mongodb
Restore Directly from a Snapshot:
umount /dev/vg0/mdb-snap01
lvcreate --size 1G --name mdb-new vg0
dd if=/dev/vg0/mdb-snap01 of=/dev/vg0/mdb-new
mount /dev/vg0/mdb-new /srv/mongodb
Remote Backup Storage:
umount /dev/vg0/mdb-snap01
dd if=/dev/vg0/mdb-snap01 | ssh username#example.com gzip > /opt/backup/mdb-snap01.gz
lvcreate --size 1G --name mdb-new vg0
ssh username#example.com gzip -d -c /opt/backup/mdb-snap01.gz | dd of=/dev/vg0/mdb-new
mount /dev/vg0/mdb-new /srv/mongodb

Related

mongodump for collection larger than ram

I am using a command like this to dump data from a remote machine:
mongodump --verbose \
--uri="mongodb://mongousr:somepassword#host.domain.com:27017/somedb?authSource=admin" \
--out="$BACKUP_PATH"
This fails like so:
Failed: error writing data for collection `somedb.someCollection` to disk: error reading collection: EOF
somedb.someCollection is about 40GB. I don't have the ability to increase RAM to this size.
I have seen two explanations. One is that the console output is too verbose and fills the RAM. This seems absurd, it's only a few kilobytes and it's on the client machine anyway. Rejected (but I am trying it again now with --quiet just to be sure).
The more plausible explanation is that the host fills its RAM with somedb.someCollection data and then fails. The problem is that the 'solution' that I've seen proposed is to increase the RAM to be bigger than the size of the collection.
Really? That can't be right. What's the point of mongodump with that limitation?
The question: is it possible to mongodump a database with a collection that is larger than my RAM size? How?
mongodump Client:
macOS
mongodump --version
mongodump version: 4.0.3
git version: homebrew
Go version: go1.11.4
os: darwin
arch: amd64
compiler: gc
OpenSSL version: OpenSSL 1.0.2r 26 Feb 2019
Server:
built with docker FROM mongo:
Reports: MongoDB server version: 4.0.8

Simply dump your collection slice by slice:
mongodump --verbose \
--uri="mongodb://mongousr:somepassword#host.domain.com:27017/somedb?authSource=admin" \
--out="$BACKUP_PATH" -q '{_id: {$gte: ObjectId("40ad7bce1a3e827d690385ec")}}'
mongodump --verbose \
--uri="mongodb://mongousr:somepassword#host.domain.com:27017/somedb?authSource=admin" \
--out="$BACKUP_PATH" -q '{_id: {$lt: ObjectId("40ad7bce1a3e827d690385ec")}}'
or partitioning your dump by a different query set on _id or some different field. The reported _id is a mere example.

Stennie's answer really works.
The default value of storage.wiredTiger.engineConfig.cacheSizeGB is max((RAM-1GB)/2, 256MB). If your mongodb server is running in a docker container with default configs, and there are other apps running in the host machine, the memory could be full filled when you are dumping a large collection. The same thing can happen if the containers' RAM is limited due to your configs.
You can use docker run --name some-mongo -d mongo --wiredTigerCacheSizeGB 1.5 (the number is based on you situation).

Another possibility is to add the compress flag to the output of mongodump. It helped me to backup a db that hanged at 48% without compressing. So the syntax would be:
mongodump --uri="mongodb://mongousr:somepassword#host.domain.com:27017/somedbauthSource=admin" --gzip --out="$BACKUP_PATH"

WAL archiving in postgres is not working

I have enabled WAL archiving in postgres configuration file. When I restarted Postgres service WAL recovery is not working. There was no wal recovery entries in logs.
Steps I followed:
Created directory for wal:
mkdir -p /var/lib/pgsql/wals/
mkdir -p /var/lib/pgsql/backups/
chown postgres:postgres -R /var/lib/pgsql/backups/
chown postgres:postgres -R /var/lib/pgsql/wals/
Edited the postgresql.conf with the below changes:
wal_level=archive
archive_mode=on
archive_command = 'test ! -f /var/lib/pgsql/wals/%f && cp %p /var/lib/pgsql/wals/%f'
sudo service postgresql restart 10
sudo su - postgres
pg_basebackup -D /var/lib/pgsql/data #created base backup
tar -C /var/lib/pgsql/data/ -czvf /var/lib/pgsql/backups/pg_basebackup_backup.tar.gz .
Deleted two rows of data in my database and stopped the postgres service
sudo service postgresql stop 10
Extracted the Basebackup
tar xvf /var/lib/pgsql/backups/pg_basebackup_backup.tar.gz -C /var/lib/pgsql/data
Created recovery.conf with the below content and restarted postgres service
echo "restore_command = 'cp /var/lib/pgsql/wals/%f %p'">/var/lib/pgsql/recovery.conf
cp /var/lib/pgsql/recovery.conf /var/lib/pgsql/data/
sudo service postgresql stop 10
sudo service postgresql start 10
There was no wal recovery entries in logs and the two rows which I deleted didn't get restored.

pg_basebackup can catch all WAL segments during backup. I use basebackup into tar format with "-X stream" option and all works well. See here - pg_basebackup – bash script for backup and archiving on Google storage
It works excellent - I backup database 4.5+ TB big which takes almost 2 days.
Restoration is described here - pg_basebackup / pg-barman – restore tar backup
All works - we already had incidents when we had to restore from these backups.

The WAL containing the two deleted rows probably hasn't been archived yet.
archive_command is run whenever a 16 MB WAL segment is full or something forces a log switch.
To be able to recover the latest changes that have not been archived yet, you have to copy the contents of pg_wal to the pg_wal directory in the restored base backup.
You can also consider streaming replication or pg_receivewal if you cannot afford to lose transactions.

Initialize data on dockerized mongo

I'm running a dockerized mongo container.
I'd like to create a mongo image with some initialized data.
Any ideas?

A more self-contained approach:
create javascript files that initialize your database
create a derived MongoDB docker image that contains these files
There are many answers that use disposable containers or create volumes and link them, but this seems overly complicated. If you take a look at the mongo docker image's docker-entrypoint.sh, you see that line 206 executes /docker-entrypoint-initdb.d/*.js files on initialization using a syntax: mongo <db> <js-file>. If you create a derived MongoDB docker image that contains your seed data, you can:
have a single docker run command that stands up a mongo with seed data
have data is persisted through container stops and starts
reset that data with docker stop, rm, and run commands
easily deploy with runtime schedulers like k8s, mesos, swarm, rancher
This approach is especially well suited to:
POCs that just need some realistic data for display
CI/CD pipelines that need consistent data for black box testing
example deployments for product demos (sales engineers, product owners)
How to:
Create and test your initialization scripts (grooming data as appropriate)
Create a Dockerfile for your derived image that copies your init scripts
FROM mongo:3.4
COPY seed-data.js /docker-entrypoint-initdb.d/
Build your docker image
docker build -t mongo-sample-data:3.4 .
Optionally, push your image to a docker registry for others to use
Run your docker image
docker run \
--name mongo-sample-data \
-p 27017:27017 \
--restart=always \
-e MONGO_INITDB_DATABASE=application \
-d mongo-sample-data:3.4
By default, docker-entrypoint.sh will apply your scripts to the test db; the above run command env var MONGO_INITDB_DATABASE=application will apply these scripts to the application db instead. Alternatively, you could create and switch to different dbs in the js file.
I have a github repo that does just this - here are the relevant files.

with the latest release of mongo docker , something like this works for me.
FROM mongo
COPY dump /home/dump
COPY mongo_restore.sh /docker-entrypoint-initdb.d/
the mongo restore script looks like this.
#!/bin/bash
# Restore from dump
mongorestore --drop --gzip --db "<RESTORE_DB_NAME>" /home/dump
and you could build the image normally.
docker build -t <TAG> .

First create a docker volume
docker volume create --name mongostore
then create your mongo container
docker run -d --name mongo -v mongostore:/data/db mongo:latest
The -v switch here is responsible for mounting the volume mongostore at the /data/db location, which is where mongo saves its data. The volume is persistent (on the host). Even with no containers running you will see your mongostore volume listed by
docker volume ls
You can kill the container and create a new one (same line as above) and the new mongo container will pick up the state of the previous container.
Initializing the volume
Mongo initializes a new database if none is present. This is responsible for creating the initial data in the mongostore. Let's say that you want to create a brand new environment using a pre-seeded database. The problem becomes how to transfer data from your local environment (for instance) to the volume before creating the mongo container. I'll list two cases.
Local environment
You're using either Docker for Mac/Windows or Docker Toolbox. In this case you can easily mount a local drive to a temporary container to initialize the volume. Eg:
docker run --rm -v /Users/myname/work/mongodb:/incoming \
-v mongostore:/data alpine:3.4 cp -rp /incoming/* /data
This doesn't work for cloud storage. In that case you need to copy the files.
Remote environment (AWS, GCP, Azure, ...)
It's a good idea to tar/compress things up to speed the upload.
tar czf mongodata.tar.gz /Users/myname/work/mongodb
Then create a temporary container to untar and copy the files to the mongostore. the tail -f /dev/null just makes sure that the container doesn't exit.
docker run -d --name temp -v mongostore:/data alpine:3.4 tail -f /dev/null
Copy files to it
docker cp mongodata.tar.gz temp:.
Untar and move to the volume
docker exec temp tar xzf mongodata.tar.gz && cp -rp mongodb/* /data
Cleanup
docker rm temp
You could also copy the files to the remote host and mounting from there but I tend to avoid interacting with the remote host at all.
Disclaimer. I'm writing this from memory (no testing).

Here is how its done with docker-compose. I use an older image of mongo but the docker-entrypoint.sh accepts *.js and *.sh files for all versions of the image.
docker-compose.yaml
version: '3'
services:
mongo:
container_name: mongo
image: mongo:3.2.12
ports:
- "27017:27017"
volumes:
- mongo-data:/data/db:cached
- ./deploy/local/mongo_fixtures /fixtures
- ./deploy/local/mongo_import.sh:/docker-entrypoint-initdb.d/mongo_import.sh
volumes:
mongo-data:
driver: local
mongo_import.sh:
#!/bin/bash
# Import from fixtures
mongoimport --db wcm-local --collection clients --file /fixtures/properties.json && \
mongoimport --db wcm-local --collection configs --file /fixtures/configs.json
And my monogo_fixtures json files are the product of monogoexport which have the following format:
{"_id":"some_id","field":"value"}
{"_id":"another_id","field":"value"}
This should help those using this without a custom Dockefile, just using the image straight away with the right entrypoint setup right in your docker-compose file. Cheers!

I've found a way that is somehow easier for me.
Say you have a database in a docker container on your server, and you want to back it up, here’s what you could do.
What might differ from your setup to mine is the name of your mongo docker container [mongodb] (default when using elastic_spence). So make sure you start your container first with --name mongodb to match the following steps:
$ docker run \
--rm \
--link mongodb:mongo \
-v /root:/backup \
mongo \
bash -c ‘mongodump --out /backup --host $MONGO_PORT_27017_TCP_ADDR’
And to restore the database from a dump.
$ docker run \
--rm \
--link mongodb:mongo \
-v /root:/backup \
mongo \
bash -c ‘mongorestore /backup --host $MONGO_PORT_27017_TCP_ADDR’
If you need to download the dump from to your server you can use scp:
$ scp -r root#IP:/root/backup ./backup
Or upload it:
$ scp -r ./backup root#IP:/root/backup
P.S: Original source by Tim Brandin available at https://blog.studiointeract.com/mongodump-and-mongorestore-for-mongodb-in-a-docker-container-8ad0eb747c62
Thank you!

What is the easiest to backup a mongoDB deployed with mup?

I deployed my app on a Ubuntu server using mup deploy (https://github.com/arunoda/meteor-up) with the option "setupMongo": true in the mup.json file.
Everything works fine, and I would like to save the mongoDB database daily to FTP or S3, or to set a mongoDB replica to another server (to avoid copying the whole database every time, but it seems more complicated).

If deployed with mup, you are in luck.
You can find the steps here: https://github.com/xpressabhi/mup-data-backup
Here are the steps again:
MongoDB Data Backup deployed via mup
These commands run well only if meteor deployed with mup tool. Mup creates docker for mongodb hence taking backup becomes easy with these commands.
Backup
Take backup of running app data from docker then copy to local folder out of docker.
docker exec -it mongodb mongodump --archive=/root/mongodump.gz --gzip
docker cp mongodb:/root/mongodump.gz mongodump_$(date +%Y-%m-%d_%H-%M-%S).gz
Copy backup to server
Move data to another server/local machine or a backup location
scp /path/to/dumpfile root#serverip:/path/to/backup
Delete old data from meteor deployment
Get into mongo console running in docker then drop current database before getting new data.
docker exec -it mongodb mongo appName
db.runCommand( { dropDatabase: 1 } )
Restore data to meteor docker
docker cp /path/to/dumpfile mongodb:/root/mongodump.gz
docker exec -it mongodb mongorestore --archive=/root/mongodump.gz --gzip

The best way is to mongodump it.
Assuming its running on the mup instance itself since it only listens to 127.0.0.1 you would have to ssh in and use mongodump.
If you simply run it:
mongodump
It will create a directory dump containing your backup.
If you want to do this remotely you would have to edit /etc/mongodb.conf to ensure it binds globally, you will have to create users though since it will be publicly accessible. Then set auth to true.
You could then mongodump from your own machine (you can download the mongodump binary from mongodb.org):
./mongodump --host <your server ip address> --username <username> --password <password>

This answer is inspired by:
sheharyar.me/blog/regular-mongo-backups-using-cron
It uses a script to: mongodump -> tar -> wput (ftp)
First, create a bash script:
#!/bin/bash
MONGO_DATABASE="your_db_name"
APP_NAME="your_app_name"
MONGO_HOST="127.0.0.1"
MONGO_PORT="27017"
TIMESTAMP=`date +%F-%H%M`
MONGODUMP_PATH="/usr/bin/mongodump"
BACKUPS_DIR="/home/username/backups/$APP_NAME"
BACKUP_NAME="$APP_NAME-$TIMESTAMP"
# mongo admin --eval "printjson(db.fsyncLock())"
# $MONGODUMP_PATH -h $MONGO_HOST:$MONGO_PORT -d $MONGO_DATABASE
$MONGODUMP_PATH -d $MONGO_DATABASE
# mongo admin --eval "printjson(db.fsyncUnlock())"
mkdir -p $BACKUPS_DIR
mv dump $BACKUP_NAME
tar -zcvf $BACKUPS_DIR/$BACKUP_NAME.tgz $BACKUP_NAME
rm -rf $BACKUP_NAME
wput $BACKUP_NAME.tgz ftp://login:password#ftp.domain.com/backups/
Save it as mongo_backup.sh and run:
chmod +x mongo_backup.sh
bash mongo_backup.sh
sudo su
crontab -e
And enter this new line:
00 00 * * * /bin/bash /home/username/scripts/mongo_backup.sh
That's it.

Postgresql raises 'data directory has wrong ownership' when trying to use volume

I'm trying to run postgresql in docker container, but of course I need to have my database data to be persistent, so I'm trying to use data only container which expose volume to store database at this place.
So, my data container has such Dockerfile:
FROM ubuntu
# Create data directory
RUN mkdir -p /data/postgresql
# Create /data volume
VOLUME /data/postgresql
Which I run:
docker run --name postgresql_data lyapun/postgresql_data true
In my postgresql.conf I set:
data_directory = '/data/postgresql'
Then I run my postgresql container in such way:
docker run -d --name postgre --volumes-from postgresql_data lyapun/postgresql
And I got:
2014-07-04 07:45:57 GMT FATAL: data directory "/data/postgresql" has wrong ownership
2014-07-04 07:45:57 GMT HINT: The server must be started by the user that owns the data directory.
How to deal with this issue? I googled a lot to find some information about using postgresql with docker volumes, but I didn't found anything.
Thanks!

Ok, seems like I found workaround for this issue.
Instead of running postgres in such way:
CMD ["/usr/lib/postgresql/9.1/bin/postgres", "-D", "/var/lib/postgresql/9.1/main", "-c", "config_file=/etc/postgresql/9.1/main/postgresql.conf"]
I wrote bash script:
chown -Rf postgres:postgres /data/postgresql
chmod -R 700 /data/postgresql
sudo -u postgres /usr/lib/postgresql/9.1/bin/postgres -D /var/lib/postgresql/9.1/main -c config_file=/etc/postgresql/9.1/main/postgresql.conf
And replaced CMD in postgresql image to:
CMD ["bash", "/run.sh"]
It works!

You have to set ownership of directory /data/postgresql to the same user, under which you are running your postgresql binary. For example, in Ubuntu it is usually postgres user.
Then you have to use this command:
chown postgres.postgres /data/postgresql

A better way to solve that issue, assuming your postgres images is named "postgres" and that your backup is ./backup.tar:
First, add this to your postgres Dockerfile:
VOLUME ["/etc/postgresql", "/var/log/postgresql", "/var/lib/postgresql"]
Then run:
docker run -it --name postgres -v $(pwd):/db postgres sh -c "tar xvf /db/backup.tar --no-overwrite-dir" && \
docker run -it --name data --volumes-from postgres busybox true && \
docker rm postgres && \
docker run -it --name postgres --volumes-from=data postgres
You don't have permission issues since the archive is extracted by the postgres user of your postgres image, so it is the owner of the extracted files.
You can then backup your data using the data container. The advantage of this solution is that you don't chmod/chown every time you run the image.

This type of errors is quite common when you link a NTFS directory into your docker container. NTFS directories don't support ext3 file & directory access control.
The only way to make it work is to link directory from a ext3 drive into your container.
I got a bit desperate when I played around Apache / PHP containers with linking the www folder. After I linked files reside on a ext3 filesystem the problem disappear.
I published a short Docker tutorial on youtube, may it helps to understand this problem: https://www.youtube.com/watch?v=eS9O05TTFjM

We Keep Coding

iphone swift flutter scala powershell matlab mongodb postgresql perl eclipse