How to migrate a MongoDB database between Docker containers? - mongodb

Migrating databases in MongoDB is a pretty well understood problem domain and there are a range of tools available to do so on a host-level. Everything from mongodump and mongoexport to rsync on the data files. If you're getting very fancy, you can use network mounts like SSHFS and NFS to mitigate diskspace and IOPS constraint problems.
Migrating a Database on a Host
# Using a temporary archive
mongodump --db my_db --gzip --archive /tmp/my_db.dump --port 27017
mongorestore --db my_db --gzip --archive /tmp/my_db.dump --port 27018
rm /tmp/my_db.dump
# Or you can stream it...
mongodump --db my_db --port 27017 --archive \
| mongorestore --db my_db --port 27018 --archive
Performing the same migrations in a containerized environment, however, can be somewhat more complicated and the lightweight, purpose-specific nature of containers means that you often don't have the same set of tools available to you.
As an engineer managing containerized infrastructure, I'm interested in what approaches can be used to migrate a database from one container/cluster to another whether for backup, cluster migration or development (data sampling) purposes.
For the purpose of this question, let's assume that the database is NOT a multi-TB cluster spread across multiple hosts and seeing thousands(++) of writes per second (i.e. that you can make a backup and have "enough" data for it to be valuable without needing to worry about replicating oplogs etc).

I've used a couple of approaches to solve this before. The specific approach depends on what I'm doing and what requirements I need to work within.
1. Working with files inside the container
# Dump the old container's DB to an archive file within the container
docker exec $OLD_CONTAINER \
bash -c 'mongodump --db my_db --gzip --archive /tmp/my_db.dump'
# Copy the archive from the old container to the new one
docker cp $OLD_CONTAINER:/tmp/my_db.dump $NEW_CONTAINER:/tmp/my_db.dump
# Restore the archive in the new container
docker exec $NEW_CONTAINER \
bash -c 'mongorestore --db my_db --gzip --archive /tmp/my_db.dump'
This approach works quite well and avoids many encoding issues encountered when piping data over stdout, however it also doesn't work particularly well when migrating to containers on different hosts (you need to docker cp to a local file and then repeat the process to copy that local file to the new host) as well as when migrating from, say, Docker to Kubernetes.
Migrating to a different Docker cluster
# Dump the old container's DB to an archive file within the container
docker -H old_cluster exec $OLD_CONTAINER \
bash -c 'mongodump --db my_db --gzip --archive /tmp/my_db.dump'
docker -H old_cluster exec $OLD_CONTAINER rm /tmp/my_db.dump
# Copy the archive from the old container to the new one (via your machine)
docker -H old_cluster cp $OLD_CONTAINER:/tmp/my_db.dump /tmp/my_db.dump
docker -H new_cluster cp /tmp/my_db.dump $NEW_CONTAINER:/tmp/my_db.dump
rm /tmp/my_db.dump
# Restore the archive in the new container
docker -H new_cluster exec $NEW_CONTAINER \
bash -c 'mongorestore --db my_db --gzip --archive /tmp/my_db.dump'
docker -H new_cluster exec $NEW_CONTAINER rm /tmp/my_db.dump
Downsides
The biggest downside to this approach is the need to store temporary dump files everywhere. In the base case scenario, you would have a dump file in your old container and another in your new container; in the worst case you'd have a 3rd on your local machine (or potentially on multiple machines if you need to scp/rsync it around). These temp files are likely to be forgotten about, wasting unnecessary space and cluttering your container's filesystem.
2. Copying over stdout
# Copy the database over stdout (base64 encoded)
docker exec $OLD_CONTAINER \
bash -c 'mongodump --db my_db --gzip --archive 2>/dev/null | base64' \
| docker exec $NEW_CONTAINER \
bash -c 'base64 --decode | mongorestore --db my_db --gzip --archive'
Copying the archive over stdout and passing it via stdin to the new container allows you to remove the copy step and join the commands into a beautiful little one liner (for some definition of beautiful). It also allows you to potentially mix-and-match hosts and even container schedulers...
Migrating between different Docker clusters
# Copy the database over stdout (base64 encoded)
docker -H old_cluster exec $(docker -H old_cluster ps -q -f 'name=mongo') \
bash -c 'mongodump --db my_db --gzip --archive 2>/dev/null | base64' \
| docker -H new_cluster exec $(docker -H new_cluster ps -q -f 'name=mongo') \
bash -c 'base64 --decode | mongorestore --db my_db --gzip --archive'
Migrating from Docker to Kubernetes
# Copy the database over stdout (base64 encoded)
docker exec $(docker ps -q -f 'name=mongo') \
bash -c 'mongodump --db my_db --gzip --archive 2>/dev/null | base64' \
| kubectl exec mongodb-0 \
bash -c 'base64 --decode | mongorestore --db my_db --gzip --archive'
Downsides
This approach works well in the "success" case, but in situations where it fails to dump the database correctly the need to suppress the stderr stream (with 2>/dev/null) can cause serious headaches for debugging the cause.
It is also 33% less network efficient than the file case, since it needs to base64 encode the data for transport (potentially a big issue for larger databases). As with all streaming modes, there's also no way to inspect the data that was sent after the fact, which might be an issue if you need to track down an issue.

Related

Restore mongodb dump to different db [duplicate]

In MongoDB, is it possible to dump a database and restore the content to a different database? For example like this:
mongodump --db db1 --out dumpdir
mongorestore --db db2 --dir dumpdir
But it doesn't work. Here's the error message:
building a list of collections to restore from dumpdir dir
don't know what to do with subdirectory "dumpdir/db1", skipping...
done
You need to actually point at the "database name" container directory "within" the output directory from the previous dump:
mongorestore -d db2 dumpdir/db1
And usually just <path> is fine as a positional argument rather than with -dir which would only be needed when "out of position" i.e "in the middle of the arguments list".
p.s. For archive backup file (tested with mongorestore v3.4.10)
mongorestore --gzip --archive=${BACKUP_FILE_GZ} --nsFrom "${DB_NAME}.*" --nsTo "${DB_NAME_RESTORE}.*"
mongodump --db=DB_NAME --out=/path-to-dump
mongorestore --nsFrom "DB_NAME.*" --nsTo "NEW_DB_NAME.*" /path-to-dump
In addition to the answer of Blakes Seven, if your databases use authentication I got this to work using the --uri option, which requires a recent mongo version (>3.4.6):
mongodump --uri="mongodb://$sourceUser:$sourcePwd#$sourceHost/$sourceDb" --gzip --archive | mongorestore --uri="mongodb://$targetUser:$targetPwd#$targetHost/$targetDb" --nsFrom="$sourceDb.*" --nsTo="$targetDb.*" --gzip --archive
Thank you! #Blakes Seven
Adding Docker notes:
container names are interchangeable with container ID's
(assumes authenticated, assumes named container=my_db and new_db)
dump:
docker exec -it my_db bash -c "mongodump --uri mongodb://db:password#localhost:27017/my_db --archive --gzip | cat > /tmp/backup.gz"
copy to workstation:
docker cp my_db:/tmp/backup.gz c:\backups\backup.gz
copy into new container(form backups folder):
docker cp .\backup.gz new_db:/tmp
restore from container tmp folder:
docker exec -it new_db bash -c "mongorestore --uri mongodb://db:password#localhost:27017/new_db --nsFrom 'my_db.*' --nsTo 'new_db.*' --gzip --archive=/tmp/backup.gz"
You can restore DB with another name. The syntax is:
mongorestore --port 27017 -u="username" -p="password"
--nsFrom "dbname.*"
--nsTo "new_dbname.*"
--authenticationDatabase admin /backup_path

How to restore postgres within a docker?

I create backups like this: docker exec DOCKER pg_dump -U USER -F t DB | gzip > ./FILE.tar.gz
What's the best way to restore the database given that the database runs within a container?
For your case:
docker exec -it <CONTAINER> gunzip < backup.tar.gz | pg_restore -U <USER> -F t -d <DB>
Remote restore is also available if your container is public facing and remote connections are allowed in pg_hba.conf for postresql:
gunzip < backup.tar.gz | pg_restore -U <USER> -F t -d <DB> -h <HOST_IP> -p 5432
As a rule of thumb, it is good idea to document your backup and restore commands specific to the project.
How take backup of the data which is existing in the running PostgreSQL container
Create some folder in your root
mkdir -p '/myfolder/bdbackup'
download the postgres image which you are using and execute the following command
docker run --name demo1 -e POSTGRES_PASSWORD=passowrd -v /myfolder/bdbackup:/var/lib/postgresql/data -d postgres
docker exec -it demo1 psql -U postgres
Back up will be stored in the following folder /myfolder/bdbackup
you can kill the container and stop the container any time but data will be stored in the host.
and once again re-run the postgres the container with same command
docker run --name demo2 -e POSTGRES_PASSWORD=passowrd -v /myfolder/bdbackup:/var/lib/postgresql/data -d postgres
docker exec -it demo1 psql -U postgres
and execute following query select * from emp;
you can see the data has restored...

Mongorestore through stdin to db with different name

I have been trying to find a way to do this but cannot, and I have a feeling there isn't an easy way to do what I want.
I have been testing using mongodump | mongorestore as a way to avoid creating an actual stored set of files, which is very useful on a cloud-based service. So far, I have been testing this by specifying the same db, although it technically isn't necessary, like so...
mongodump -h hostname -d dumpdb --excludeCollection bigcollection --archive | mongorestore --archive -d dumpdb --drop
mongodump -h hostname -d dumpdb -c bigcollection --archive --queryFile onlythese.json | mongorestore --archive -d dumpdb -c bigcollection --drop
I have found that these options work best for me; when I attempt to specify the -o - option with the single db, with the --archive removed, I was having some issues, but since this worked, I didn't mess with it.
Since I was restoring to the same db and because only the collections that were in there at the time were restored, I realize I can knock off the -d and -c in both mongorestore commands. But, it was easy to do, and is the set up for the next step...
All I wanted to do was restore the specified db in two steps, to a db of a different name, like so...
mongodump -h hostname -d dumpdb --excludeCollection bigcollection --archive | mongorestore --archive -d restoredb --drop
mongodump -h hostname -d dumpdb -c bigcollection --archive --queryFile onlythese.json | mongorestore --archive -d restoredb -c bigcollection --drop
The dump works fine, but restore does not. Based on the limited documentation, my assumption is that the dump db and restore db need to be the same for this to work; if the db specified in mongorestore is not in the dump, it just won't restore.
I find this to be rather annoying; my other thought was to restore the db as is, and just copy it to the other db.
I have thought about using other tools such as mongo-dump-stream, but considering everything was working swimmingly so far, I was hoping that it would work with the default tools.
On another point, I did dump the archive to a file (like so: > dumpdb.tar) and attempt to restore from there (like so --archive=dumpdb.tar) which is how I confirmed that the db needed to be in the dump.
Any suggestions/comments/hacks would be welcome.
As of version 3.4 of mongorestore, you can accomplish this using the --nsFrom and --nsTo options, which provide a pattern-based way to manipulate the names of your collections and/or dbs between the source and destination.
For example, to dump from a database named dumpdb into a new database named restoredb:
mongodump -h hostname -d dumpdb --archive | mongorestore --archive --nsFrom "dumpdb.*" --nsTo "restoredb.*" --drop
More from the mongodb docs: https://docs.mongodb.com/manual/reference/program/mongorestore/#change-collections-namespaces-during-restore
I haven't really found an answer to this, but, based on mongodb Jira, it looks like this is an open issue, and a feature request on their timeline, to actually be able to restore to a different DB than what you started with.
In the meantime, I came up with a not-so-great, but okay solution: as the archive file also contains the metadata, which in the code for mongorestore shows that that is all that is changed, you can replace the name of the DB in the binary file stream with one of the same length; so, my code now looks like this:
mongodump -h hostname -d dumpdb --excludeCollection bigcollection --archive | bbe -e "s/$(echo -n dumpdb | xxd -g 0 -u -ps -c 256 | sed -r 's/[A-F0-9]{2}/\\x&/g')\x00/$(echo -n rstrdb | xxd -g 0 -u -ps -c 256 | sed -r 's/[A-F0-9]{2}/\\x&/g')\x00/g" | mongorestore --archive --drop
You'll notice that I used bbe and dropped the -d tag, because this just changes the name of db in the archived metadata, and will restore to the new one.
The obvious issue here is that you cannot change the length gracefully; a longer name won't work, and shorter requires padding, which probably won't work. Otherwise, for those looking for a better solution than dumping to a file, or running a copyDatabase command (don't do it!), this is a workable solution.

Mongorestore to a different database

In MongoDB, is it possible to dump a database and restore the content to a different database? For example like this:
mongodump --db db1 --out dumpdir
mongorestore --db db2 --dir dumpdir
But it doesn't work. Here's the error message:
building a list of collections to restore from dumpdir dir
don't know what to do with subdirectory "dumpdir/db1", skipping...
done
You need to actually point at the "database name" container directory "within" the output directory from the previous dump:
mongorestore -d db2 dumpdir/db1
And usually just <path> is fine as a positional argument rather than with -dir which would only be needed when "out of position" i.e "in the middle of the arguments list".
p.s. For archive backup file (tested with mongorestore v3.4.10)
mongorestore --gzip --archive=${BACKUP_FILE_GZ} --nsFrom "${DB_NAME}.*" --nsTo "${DB_NAME_RESTORE}.*"
mongodump --db=DB_NAME --out=/path-to-dump
mongorestore --nsFrom "DB_NAME.*" --nsTo "NEW_DB_NAME.*" /path-to-dump
In addition to the answer of Blakes Seven, if your databases use authentication I got this to work using the --uri option, which requires a recent mongo version (>3.4.6):
mongodump --uri="mongodb://$sourceUser:$sourcePwd#$sourceHost/$sourceDb" --gzip --archive | mongorestore --uri="mongodb://$targetUser:$targetPwd#$targetHost/$targetDb" --nsFrom="$sourceDb.*" --nsTo="$targetDb.*" --gzip --archive
Thank you! #Blakes Seven
Adding Docker notes:
container names are interchangeable with container ID's
(assumes authenticated, assumes named container=my_db and new_db)
dump:
docker exec -it my_db bash -c "mongodump --uri mongodb://db:password#localhost:27017/my_db --archive --gzip | cat > /tmp/backup.gz"
copy to workstation:
docker cp my_db:/tmp/backup.gz c:\backups\backup.gz
copy into new container(form backups folder):
docker cp .\backup.gz new_db:/tmp
restore from container tmp folder:
docker exec -it new_db bash -c "mongorestore --uri mongodb://db:password#localhost:27017/new_db --nsFrom 'my_db.*' --nsTo 'new_db.*' --gzip --archive=/tmp/backup.gz"
You can restore DB with another name. The syntax is:
mongorestore --port 27017 -u="username" -p="password"
--nsFrom "dbname.*"
--nsTo "new_dbname.*"
--authenticationDatabase admin /backup_path

Mongorestore, from meteor production server to local

I've found plenty of good instructions on how to use mongodump and mongorestore, to back up my meteor production server and restore the backup if need be:
meteor mongo --url myApp.meteor.com
mongodump -u client -h production-db-b2.meteor.io:27017 -d myApp_meteor_com -out dump/2014_10_21 -p [password from meteor mongo --url]
mongorestore -u client -h production-db-b2.meteor.io:27017 -d myApp_meteor_com dump/2014_10_21_v2/myApp_meteor_com -p [password from meteor mongo --url]
What I haven't found is an explanation of is how to restore a backup-dump to my local meteor app. I have a mongodump output in my app folder. I'm not sure if I can use mongorestore or if there's something else I'm supposed to be doing.
The easiest way that I found:
cd in your project and execute meteor command
in another terminal:
mongorestore -h 127.0.0.1 --port 3001 -d meteor dump/meteor
change 127.0.0.1 if your localhost has different ip address and 3001 to a port you have mongodb on (it is usually 3001 or 3002, so try both), dump/meteor is the path to a dump you created previously.
Also the easiest way to export local db:
cd in your project and execute meteor command
In another terminal:
mongodump -h 127.0.0.1 --port 3001 -d meteor
again, change localhost ip and port if needed. . As a result, the dump/meteor folder with db files will be created in the folder you cd before running mongodump.
Good luck.
To accomplish the opposite, sending local app data to production app, I wrote this little shell script. It has been useful while I am developing locally and just getting the demo synced for the client to view. Note it has --drop at the end which will overwrite your production database, use with care!
It takes care of the client, pw, and server data from meteor mongo --url ... which expires after 1 minute and is really annoying to try to copy-paste within that time.
#!/usr/bin/env bash
mongodump -h 127.0.0.1:3001 -d meteor -o ~/www/APPNAME/server/dump
IN=`meteor mongo --url APPNAME.meteor.com`
client=`echo $IN | awk -F'mongodb://' '{print $2}' | awk -F':' '{print $1}'`
echo $client
pw=`echo $IN | awk -F':' '{print $3}' | awk -F'#' '{print $1}'`
echo $pw
serv=`echo $IN | awk -F'#' '{print $2}' | awk -F'/' '{print $1}'`
echo $serv
mongorestore -u $client -h $serv -d APPNAME_meteor_com dump/meteor -p $pw --drop
This is what I do:
I. Create a mongo dump in the server
DATE=$(date +%m%d%y_%H.%M);
mongodump --host localhost -d APPNAME -o /tmp/APPNAME_$DATE
tar -cjvvf /tmp/APPNAME_$DATE.tar.bz2 /tmp/APPNAME_$DATE
II. Download the dump in the development machine and unpack in /tmp
scp root#$HOST:/tmp/APPNAME_$DATE.tar.bz2 /tmp/
cp /tmp/APPNAME_$DATE.tar.bz2 .
mkdir -p /tmp/APPNAME_$DATE
cd /tmp/APPNAME_$DATE
tar -xjvf /tmp/APPNAME_$DATE.tar.bz2
III. Update local meteor development database
mongorestore --db meteor -h localhost --port 8082 --drop /tmp/APPNAME_$DATE/tmp/APPNAME_$DATE/APPNAME
You can use mongorestore.
It's pretty much the same as what you already did.
In your first line: meteor mongo --url myApp.meteor.com just remove the last part to so the line will read: meteor mongo --url.
When executed on your local machine you will get the information for the local instance of your meteor app. From that point you can just use mongorestore to restore your local db the way you already did remotely.
I use to do a meteor reset prior to a mongorestore, just to be sure that my db is empty, but I don't know if it's actual necessary.
Note that the app should be running when doing this.
I ended up writing a script to download the meteor database. Check it out at https://github.com/AlexeyMK/meteor-download
Usage (in your app's root):
curl https://raw.github.com/AlexeyMK/meteor-download/master/download.sh > download.sh
./download.sh yourapp.meteor.com`
I'm using Google Cloud for Meteor hosting, and wrote custom scripts.
I have this running on a cronjob to backup to google cloud storage:
https://github.com/markoshust/mongo-docker-backup-gcloud/blob/master/mongobackup.sh
#!/bin/bash
MONGO_DB=dbname
MONGO_HOST=127.0.0.1
HOST_DIR=/home/YOURNAME
BACKUP_DIR=/mongobackup
BUCKET=gs://BUCKET_NAME
DATE=`date +%Y-%m-%d:%H:%M:%S`
/usr/bin/docker run --rm \
-v $HOST_DIR/$BACKUP_DIR:$BACKUP_DIR \
markoshust/mongoclient \
mongodump --host $MONGO_HOST --db $MONGO_DB --out $BACKUP_DIR
sudo mkdir -p $HOST_DIR/$BACKUP_DIR/$MONGO_DB/$DATE
sudo mv $HOST_DIR/$BACKUP_DIR/$MONGO_DB/* $HOST_DIR/$BACKUP_DIR/$MONGO_DB/$DATE
$HOST_DIR/gsutil/gsutil rsync -r $HOST_DIR/$BACKUP_DIR $BUCKET
sudo /bin/rm -rf $HOST_DIR/$BACKUP_DIR
Then to restore locally, I created another script which downloads the backup from google cloud storage, stores locally, then does a local restore:
https://github.com/markoshust/mongorestore.sh/blob/master/.mongorestore.sh
#!/bin/bash
## This script syncs a mongodb backup from a Google Cloud Storage bucket and
## mongorestore's it to a local db.
##
## Author: Mark Shust <mark#shust.com>
## Version: 1.1.0
BUCKET=my-bucket-name
FOLDER=folder-name/$1
BACKUP_DIR=./.backups/
DB_NAME=localdb
DB_HOST=localhost
DB_PORT=27017
if [ -z $1 ]; then
echo 'Please specify a subdirectory to sync from...'
exit 0
fi
mkdir -p $BACKUP_DIR
if [ ! -d $BACKUP_DIR ]; then
gsutil -m cp -r gs://$BUCKET/$FOLDER $BACKUP_DIR
fi
mongorestore --db $DB_NAME -h $DB_HOST --port $DB_PORT --drop $BACKUP_DIR/$1/
echo 'Database restore complete.'
I have this working with Meteor, stupid simple and works great :) Just switch db name to meteor and port to 3001 (or whatever config you have). It's meteor-agnostic so works with any mongodb host/platform.