Mongodb Filesystem backup and restore - mongodb

I am taking backup of MongoDB filesystem backup(including config files).
We are not using sharding in our cluster, having 3node replicaset in place.
Primary Cluster: X_host1, X_host2, X_host3
Secondary Cluster: Y_host1, Y_host2, Y_host3
Taking filesystem backup from X_host1 and restoring it to Y_Host1,2,3 (restoring to diff hostname)
So, how re-configure MongoDB to follow new hostnames? I see the replication nodes are configured into the DB (not any editable config files).
Is this the right approach to migrate data from replicated mongodb cluster?
Is this the right approach to migrate MongoDB cluster to new hostnames?
Is there any way to re-configure new hostnames.
AFAIK, after I restore filesystem to new nodes
Data is from old nodes, having info about old replica nodes.(X_hosts)
How to point it to Y_hosts

Follow Restore a Replica Set from MongoDB Backups
In principle do this:
Restore the backup on new host (just one)
Start the MongoDB as stand-alone, connect to it and drop the local database:
db.getSiblingDB('local').dropDatabase()
Initiate the ReplicaSet: rs.initiate()
Add all members to the ReplicaSet. An initial sync is triggered.
If your database is large, initial sync can take a long time to complete. For large databases, it might be preferable to copy the database files onto each host. For details have a look at linked tutorial.

Related

Are there any quick ways to move PostgreSQL database between clusters on the same server?

We have two big databases (200GB and 330GB) in our "9.6 main" PostgreSQL cluster.
What if we create another cluster (instance) on the same server, is there any way to quickly move database files to new cluster's folder?
Without using pg_dump and pg_restore, with minimum downtime.
We want to be able to replicate the 200GB database to another server without pumping all 530GB of data.
Databases aren't portable, so the only way to move them to another cluster is to use pg_dump (which I'm aware you want to avoid), or use logical replication to copy it to another cluster. You would just need to set wal_level to 'logical' in postgresql.conf, and create a publication that included all tables.
CREATE PUBLICATION my_pub FOR ALL TABLES;
Then, on your new cluster, you'd create a subscription:
CREATE SUBSCRIPTION my_sub
CONNECTION 'host=172.100.100.1 port=5432 dbname=postgres'
PUBLICATION my_pub;
More information on this is available in the PostgreSQL documentation: https://www.postgresql.org/docs/current/logical-replication.html
TL;DR: no.
PosgreSQL itself does not allow to move all data files from a single database from one source PG cluster to another target PG cluster, whether the cluster runs on the same machine or on another machine. To this respect it is less flexible than Oracle transportable tablespaces or SQL Server attach/detach database commands for example.
The usual way to clone a PG cluster is to use streaming physical replication to build a physical standby cluster of all databases but this requires to backup and restore all databases with pg_basebackup (physical backup): it can be slow depending on the databases size but once the standby cluster is synchronized it should be really fast to failover to standby cluster by promoting it; miminal downtime is possible. After promotion you can drop the database not needed.
However it may be possible to use storage snaphots to copy quickly all data files from one source cluster to another cluster (and then drop the database not needed in the target cluster). But I have not practiced it and it does not seem to be really used (except maybe in some managed services in the cloud).
(PG cluster means PG instance).
If You would like to avoid pg_dump/pg_restore, than use:
logical replication (enables to replicate only desired databases)
streaming replication via replication slot (moving the whole cluster
to another and then drop undesired databases)
While 1. option is described above, I will briefly describe the 2.:
a) create role with replication privileges on master (cluster I want to copy from)
master# psql> CREATE USER replikator WITH REPLICATION ENCRYPTED PASSWORD 'replikator123';
b) log to slave cluster and switch to postgres user. Stop postgresql instance and delete DB data files. Then You will initiate replication from slave (watch versions and dirs!):
pg_basebackup -h MASTER_IP -U replikator -D /var/lib/pgsql/11/data -r 50M -R –waldir /var/lib/pgwal/11/pg_wal -X stream -c fast -C -S master1_to_slave1 -v -P
What this command do? It connects to master with replikator credentials and start pg_basebackup via slot that will be created. There is bandwith throttling as well (50M) as other options... Right after the basebackup slave will start streaming replication and You've got failsafe replication.
c) Then when You want, promote slave to be standalone and delete undesired databases:
rm -f /varlib/pgsql/11/data/recovery.conf
systemctl restart postgresql11.service

Postgres and multiple locations of data storage

Postgres and the default location for its storage is at my C-drive. I would like to restore a backup to another database but to access it via the same Postgres server instance - the issue is that the size of the DB is too big to be restore on the same c-drive ...would it be possible to tell Postgres that the second database should be restore and placed on another location/drive (while still remaining the first one)? Like database1 at my C-drive and database2 at my D-drive?
Otherwise the second best solution would be to install 2 separate Postgres instances - but that also seems a bit overkill?
That should be entirely achievable, if you've used the postgres pg_dump command.
The pg_dump command does not create the database, so you create it yourself first. Use CREATE TABLESPACE to specify the location.
CREATE TABLESPACE secondspace LOCATION 'D:\postgresdata';
CREATE DATABASE seconddb TABLESPACE secondspace;
This creates an empty database on the D: drive.
Then the standard restore from a pg_dump should work:
psql seconddb < dumpfile
Replication
Sounds like you need database replication.
There are several ways to do this with Postgres, one built-in, and other approaches using add-on libraries.
Built-in replication feature
The built-in replication feature is likely to suit your needs. See the manual. In this approach, you have an instance of Postgres running on your primary server, doing reads and writes of your data. On a second server, an entirely separate computer, you run another instance of Postgres known as the replica. You first set up the replica by doing a full backup of your database on the first server, and restore to the second server.
Next you configure the replication feature. The replica needs to know it is playing the role of a replica rather than a regular database server. And the primary server needs to know the replica exists, so that every database change, every insert, modification, and deletion, can be communicated.
WAL
This communication happens via WAL files.
The Write-Ahead Log (WAL) feature in Postgres is where the database writes all changes first to the WAL, and only after that is complete, then writes to the actual database. In case of crash, power outage, or other failure, the database upon restarting can detect a transaction left incomplete. If incomplete, the transaction is rolled back, and the database server can try again by seeing the "To-Do" list of work listed in the WAL.
Every so often the current WAL is closed, with a new WAL file created to take over the work. With replication enabled, the closed WAL file is copied to the replica. The replica then incorporates that WAL file, to follow the same "To-Do" list of changes as written in that WAL file. So all changes are made to the replica database exactly as they were made to the primary database. Your replica is an exact match to the primary, except for a slight lag in time. The replica is always just one WAL file behind the progress of the primary.
In times of trouble, the replica serves as a warm stand-by. You can shutdown the primary, then tell the replica that it is now the primary. You can even configure the replica to be a hot stand-by, meaning it will automatically take-over when the primary seems to have failed. There are pros and cons to hot stand-by.
Offload read-only queries
As a bonus feature, the replica can be used for read-only queries. If your database is heavily used, you can offload some of the work burden from your primary to the replica. Any queries that do not require the absolute latest information can be shifted by connecting to the replica rather than the original. For example, a quarterly sales report likely does not need the latest data stored in the active WAL file that has not yet arrived on the replica.
Physical replication means all databases are copied
Caveat: This built-in replication feature is physical replication. This means all the changes to the entire Postgres installation (formally known as a cluster, not to be confused with a hardware cluster) is copied to the replica. If you use one Postgres server to server multiple databases, all those databases must be replicated – you cannot pick and choose which get copied over. There may be alternative replication features in the future related to logical replication.
More to learn
I am being brief here. The topics of replication, high-availability, and disaster-recovery are broad and complex, too much for an Answer on Stack Overflow.
Tip: This kind of Question might have been better asked on the sister site, DBA.StackExchange.com.

MongoDB 2.2: why didn't replication catch up a collection following a dump/restore?

We have a three-server replicaset running MongoDB 2.2 on Ubuntu 10.04, and recently had to upgrade the hard drive for each server where one particular database resides. This database contains log information for web service requests, where they write to collections in hourly buckets using the current timestamp to determine the name, e.g. log_yyyymmddhh.
I performed this process:
backup the database on the primary server with mongodump --db log_db
take a secondary server offline, replace the disk
bring the secondary server up in standalone mode (i.e. comment out the replSet entry
in /etc/mongodb.conf before starting the service)
restore the database on the secondary server with mongorestore --drop --db log_db
add the secondary server back into the replicaset and bring it online,
letting replication catch up the hourly buckets that were updated/created
while it had been offline
Everything seemed to go as expected, except that the collection which was the current bucket at the time of the backup was not brought up to date by replication. I had to manually copy that collection over by hand to get it up to date. Note that collections which were created after the backup were synched just fine.
What did I miss in this process that caused MongoDB not to get things back in synch for that one collection? I assume something got out of whack with regard to the oplog?
Edit 1:
The oplog on the primary showed that its earliest timestamp went back a couple of days, so there should have been plenty of space to maintain transactions for a few hours (which was the time the secondary was offline).
Edit 2:
Our MongoDB installation uses two disk partitions: /dev/sda1 and /dev/sdb1. The primary MongoDB directory /var/lib/mongodb/ is on /dev/sda1, and holds several databases, while the log database resides by itself on /dev/sdb1. There's a sym link /var/lib/mongodb/log_db which points to a directory on /dev/sdb1. Since the log db was getting full, we needed to upgrade the disk for /dev/sdb1.
You should be using mongodump with the --oplog option. Running a full database backup with mongodump on a replicaset that is updating collections at the same time may not leave you with a consistent backup. This becomes worse with larger databases, more collections and more frequent updates/inserts/deletes.
From the documentation for your version (2.2) of MongoDB (it's the same for 2.6 but just to be as accurate as possible):
--oplog
Use this option to ensure that mongodump creates a dump of the
database that includes an oplog, to create a point-in-time snapshot of
the state of a mongod instance. To restore to a specific point-in-time
backup, use the output created with this option in conjunction with
mongorestore --oplogReplay.
Without --oplog, if there are write operations during the dump
operation, the dump will not reflect a single moment in time. Changes
made to the database during the update process can affect the output
of the backup.
http://docs.mongodb.org/v2.2/reference/mongodump/
This is not covered well in most MongoDB tutorials around backups and restores. Generally you are better off if you can perform a live snapshot of the storage volume your database resides on (assuming your storage solution has a live snapshot ability compatible with MongoDB). Failing that, your next best bet is taking a secondary offline and then performing a snapshot or backup of the database files. Mongodump on a live database is increasingly a less optimal solution for larger databases due to performance issues.
I'd definitely take a look at the MongoDB overview of backup options: http://docs.mongodb.org/manual/core/backups/
I would guess this has to do with the oplog not being long enough, although it seems like you checked that and it looked reasonably big.
Still, when adding new members to a replica set you shouldn't be snapshotting and restoring them. It's better to simply add a new member and let replication happen by itself. This is described in the Mongo docs and is the process I've always followed.

Restoring from a single instance mongodb to an empty replica set

I have a mongodb installed on windows server. I take regular backups of the data/db folder using Rackspace backup.
I created a deployment of a mongodb replica set with 3 ubuntu servers using Rackspace deployments. Now I want to move the data on windows to the empty replica set. How can I do it?
I tried copying the contents of data/db on windows to var/lib/mongodb on the primary replica set. It didn't work.
For some reason the var/lib/mongodb on the ubuntu machines does not contain data/db directory. When I create a new db the db files are created on var/lib/mongodb directory.
The difference in data directories is fine .. on Windows the default dbpath will be c:\data\db; the Ubuntu package sets the dbpath to /var/lib/mongodb instead.
Since you are starting with an empty replica set (and using a backup from a standalone server), the most straightforward approach would be to:
Stop all the mongod servers for the replica set (you definitely don't want to copy data files directly into a running instance!).
Remove any files that are already in the /var/lib/mongodb data directory.
Copy the data files from your standalone MongoDB backup into /var/lib/mongodb on one of your replica set servers. This server will become your primary to set up the rest of the replica set.
Start up this primary making sure to include a replSet name in your configuration file. You may already have this set from your "empty" replica set that you already created.
Run rs.initiate() in the mongo shell to create the initial configuration on the primary.
Start up your additional servers as members of this replica set: they need the same replSet name configured.
Use rs.add(..) to add your additional servers from the mongo shell on your primary. Assuming the add is successful (i.e. the mongods can connect to each other), this will begin the process of initial sync (copying data from the primary) and the new hosts will become secondaries after they have finishing initial sync.
This is essentially the same steps as the deploy a replica set tutorial, except you are copying over your data first.
The problem could be related to the configuration file of mongodb
locate the file mongodb.conf and edit the dbpath parameter, check if the path really exist, and if it doesn't create the missing directories. Also check permissions in that path
anyway i don't know if its the right way to just copy the datafiles in a new location, i guess you should use mongo import/export

MongoDB replica set to stand alone backup and restore

For development reasons, I need to backup a production replica set mongodb and restore it on a stand alone, different machine test instance.
Some docs are talking about the opposite ( standalone 2 replica-set ), but I cannot find his downgrade/rollback way.
What's the way to go, in this case ?
No matter how many nodes you have in a replica set, each of them holds the same data.
So getting the data is easy - just use mongodump (preferably against the secondary, for performance reasons) and then mongorestore into a new mongod for your development stand-alone system.
mongodump does not pick up any replication related collections (they live in database called local). If you end up taking a file system snapshot of a replica node rather than using mongodump, be sure to drop the local database when you restore the snapshot into your production stand-alone server and then restart mongod so that it will properly detect that it is not part of a replica set.