mongoDB - manual offline replication by one command - mongodb

I would like to have 2 databases: production and offline. My system will work with the production one. But time to time I would like to copy changes from production db to offline db.
In CouchDB you can use something like:
POST /_replicate HTTP/1.1
{"source":"example-database","target":"http://example.org/example-database"}
Is there other way than:
mongodump/mongorestore
db.cloneDatabase( "db0.example.net" )
...in mongoDB? I understand those operations as copying full content of database. Is that correct?

It sounds like you have a few options here depending on the constraints your database system has. In addition to the options above, you could also:
Set your offline database up as a secondary as part of a replica set. This replica could then be used for your offline work and would keep in sync with the primary. The added benefit to this is you will always have an additional copy of your data in case you run into issues with the primary. You may want to mark the "offline" replica as hidden so that it could never take over as primary. See the following links for more information: Replication in MongoDB, Replication Internals
If you really just want point in time snap shots then another option would be to backup your database files and restore them to your offline cluster. The methods to do this vary according to your database setup and environment. The following is a good start for learning about backups: MongoDB Backups

Related

How to quickly mirror a Mongo database?

What's a quick and efficient way to transfer a large Mongo database?
I want to transfer a 10GB production Mongo 3.4 database to a staging environment for testing. I used the mongodump/mongorestore tools to test this transfer to my localhost, but it took over 8 hours and consumed a massive amount of CPU and memory, which is something I'd like to avoid in the future. The database doesn't have any indexes, so the mongodump option to exclude indexes doesn't increase performance.
My staging environment will mostly be read-only, but it will still need to write occasionally, so it can't be setup as a permanent read replica of production.
I've read about [replication sets][1], but they seem very complicated to setup and designed for permanent mirroring of a primary to two or more secondaries. I've read some posts about people hacking this to be temporary, so they can do a one-time mirroring, but I can't find any reliable documentation since this isn't the intended usage of the feature. All the guides I've read also say you need at least 3 servers, which seems unintuitive since I only have 2 (production and staging) and don't want to create a third.
Several options exist today (2020-05-06).
Copy Data Directory
If you can take the system offline you can copy the data directory from one host to another then set the configuration to point to this directory and start up the new mongod.
Mongomirror
Mongomirror (https://docs.atlas.mongodb.com/import/mongomirror/) is intended to be a tool to migrate from on-premises to Atlas, but this tool can be leveraged to copy data to another on-premises host. Beware, this connection requires SSL configurations on source and target to transfer.
Replicaset
MongoDB has built-in High Availability features using a replica set model (https://docs.mongodb.com/manual/tutorial/deploy-replica-set/). It is not overly complicated and works very well. This option allows the original system to stay online while replication does its magic. Once the replication completes reconfigure the replica set to be a single node replica set referring only to the new host and shut down the original host. This configuration is referred to as a single-node replica set. Having a single node replica set offers benefits over a stand-alone installation in that the replica set underpinnings (oplog) are the basis for other features such as change streams (https://docs.mongodb.com/manual/changeStreams/)
Backup and Restore
As you mentioned you can use mongodump/mongorestore. There is a point in time where the backup must be restored. During this time it is expected the original system is offline and not accepting any additional writes. This method is robust but has downtime associated with it. You could use mongoexport/mongoimport to use a JSON file as an intermediate step but this is not recommended as BSON data types could be lost in translation.
Per Mongo documentation, you should be able to cp/rsync files for creating a backup (if you are able to halt write ops temporarily on your production setup - or if you do this during a maintenance window)
https://docs.mongodb.com/manual/core/backups/#back-up-by-copying-underlying-data-files
Back Up with cp or rsync
If your storage system does not support snapshots, you can copy the files >directly using cp, rsync, or a similar tool. Since copying multiple files is not >an atomic operation, you must stop all writes to the mongod before copying the >files. Otherwise, you will copy the files in an invalid state.
Backups produced by copying the underlying data do not support point in time >recovery for replica sets and are difficult to manage for larger sharded >clusters. Additionally, these backups are larger because they include the >indexes and duplicate underlying storage padding and fragmentation. mongodump, >by contrast, creates smaller backups.
FYI - for replica sets, the third "server" is an arbiter which exists to break the tie when electing a new primary. It does not consume as many resources as the primary/secondaries. Since you are looking to creating a staging environment, i would not recommend creating a replica set that includes production and staging env. Your primary instance could switch over to the staging instance and clients who are meant to access production instance will end up reading/writing from staging instance.

Local MongoDB instance with index in remote server

One of our clients have a server running a MongoDB instance and we have to build an analytical application using the data stored in their MongoDB database which changes frequently.
Clients requirements are:
That we do not connect to their MongoDB instance directly or run another instance of MongoDB on their server but just somehow run our own MongoDB instance on our machine in our office using their MongoDB database directory with read only access remotely.
We've suggested deploying a REST application, getting a copy of their database dump but they did not want that. They just want us to run our own MongoDB intance which is hooked up with the MongoDB instance directory. Is this even possible ?
I've been searching for a solution for the past two days and we have to submit a solution by Monday. I really need some help.
I think this is normal request because analytical queries could cause too much load on the production server. It is pretty normal to separate production and analytical databases.
The easiest option is to use MongoDB replication. Set up MongoDB replica set with production database instance as primary and analytical database instance as secondary, also configure the analytical instance to never become primary.
If it is not possible to use replication - for example client doesn't want this, the servers could not connect directly to each other... - there is another option. You can read oplog from remote database and apply operations to your database instance. This is exactly the low level mechanism how replica set works, but you can do it manually too. For example MMS (Mongo Monitoring Sevice) Backup uses reading oplog for online backups of MongoDB.
Update: mongooplog could be the right tool for real-time application of replication oplog pulled from remote server on local server.
I don't think that running two databases that points to the same database files is possible or even recommended.
You could use mongorestore to restore from their data files directly, but this will only work if their mongod instance is not running (because mongorestore will need to lock the directory).
Another solution will be to do file system snapshots and then restore to your local database.
The downside to this backup/restore solutions is that your data will not be synced all the time.
Probably the best solution will be to use replica sets with hidden members.
You can create a replica set with just two members:
Primary - this will be the client server.
Secondary - hidden, with votes and priority set to 0. This will be your local instance.
Their server will always be primary (because hidden members cannot become primaries). Clients cannot see hidden members so for all intents and purposes your server will be read only.
Another upside to this is that the MongoDB replication will do all the "heavy" work of syncing the data between servers and your instance will always have the latest data.

Life-copies for devel-team in MongoDb

Q: Which is the best architecture for life-copies for testing and development?
Current setup:
We have two amazon/EC2 mongod servers like this:
Machine A:
A production database (on an amazon/EC2 server) (name it ‘PROD’)
Other databases (‘OTHER’)
Machine B:
a pre-production database (name it ‘PRE’)
a copy for developer 1 own tests (call it ‘DEVEL-1’)
a copy for developer 2 (DEVEL-2)
…DEVEL-n
The PRE database is for integration-tests before deploying into production.
The DEVEL-n is for each developer trashing its own data without annoying the other developers.
From time to time we want to “restore” fresh data from PROD into the PRE and DEVEL-n bases.
Currently we pass from PROD to PRE via the .copyDatabase() command.
Then we issue .copyDatabase() “n” times to make copies from PRE into DEVEL-n.
The trouble:
A copy takes soooo long (1hour per copy, DBsize over 10GB) and also normally it saturates the mongod so we have to restart the service.
We have found about:
Dump/restore system (saturates as .copyDatabase() does)
Replica sets
Master/Slave (seem deprecated)
Replica-sets seem the winners, but we have serious doubts:
Suppose we want a replica-set to sync live A/PROD into B/PRE (and have A likely as a primary and B likely as secondary):
a) Can I select “a few” databases from A to replicate PROD but leave OTHER alone?
b) Can I have “extra” databases in B (like DEVEL-n) which are not in the master?
c) Can I “stop to replicate” so we can deploy to PRE, test the soft with fresh-data, trash the data with the testing and after tests have been complete “re-link” the replica so changes in PRE are deleted and changes in PROD are transported into PRE adequately?
d) Is there any other better way than replica-sets suitable for this case?
Thanks.
Marina and Xavi.
Replica-sets seem the winners, but we have serious doubts:
Suppose we want a replica-set to sync live A/PROD into B/PRE
(and have A likely as a primary and B likely as secondary):
a) Can I select “a few” databases from A to replicate PROD but leave OTHER alone?
As at MongoDB 2.4, replication always includes all databases. The design intent is for all nodes to be eventually consistent replicas, so that you can failover to another non-hidden secondary in the same replica set.
b) Can I have “extra” databases in B (like DEVEL-n) which are not in the master?
No, there is only a single primary in a replica set.
c) Can I “stop to replicate” so we can deploy to PRE, test the soft
with fresh-data, trash the data with the testing and after tests have
been complete “re-link” the replica so changes in PRE are deleted and
changes in PROD are transported into PRE adequately?
Since there can only be a single primary, the use case of mixing production and test roles in the same replica set is not possible how you've envisioned.
Best practice would to isolate your production and dev/staging environments so there can be no unexpected interaction.
d) Is there any other better way than replica-sets suitable for this case?
There are some approaches you can take to limit the amount of data needed to be transferred so you are not copying the full database (10Gb) across from production each time. Replica sets are suitable as part of the solution, but you will need to have a separate standalone server or replica set for your PRE environment.
Some suggestions:
Use a replica set and add a hidden secondary in your development environment. You can take backups from this node without affecting your production application, and since the secondary replicates changes as they occur you should be doing a comparatively faster local network copy of the backup.
Implement your own scheme for partial replication based on a tailable cursor of MongoDB's oplog. The local oplog.rs capped collection is the same mechanism used to relay changes to members of a replica set and includes details for inserts, deletes, and updates. You could match on the relevant database namespaces and relay matching changes from your production replica set into your isolated PRE environment.
Either of these approaches would allow you control over when the backup is transferred from PROD to PRE, as well as restarting from a previous point after testing.
In our setup we use EBS snapshots to quickly replicate production database on staging environment. Snapshots are run every few hours as part of backup cycle. When starting new DB server in staging, it looks for most recent DB snapshot and use it for EBS drive.
Taking snapshot is almost instant, recovery is also very fast. This approach also scales up very well, we actually using it in huge sharded MongoDB installation. The only downside is that you need to rely on AWS services to implement it. That can be undesirable in some cases.

Mongo db partial back ups

We have a 5 node replication set up on our development server. We are looking for a way to allow developers to back up a subset of data in a mongo db and restore this to their local development enviroments.
We have looked into the clonedb and the mongodump utils, but both only allow for a backup/dump of the complete database. Due to the possible size of the database, we need an option that allows us to limit the data being backed up or restored.
Do any know of a util or way to achieve this?
I just stumbled upon this question again and decided to add a description of our backup strategy we opt in for:
Current back up strategy for our mongo db this server consist of 2 setups; backup via delayed passive secondarynode and daily backup using mongodump (takes journalling and oplog into play).
Besides our normal production nodes, we have setup another secondary node with a priority of 0 (this can either be on its own server or piggy backing off another mongo server but using a seperate port), hidden as true and a delay of 7200 seconds (2hours). This slave is there for "butter fingers", when some one accidentally drops a database or clears a collection, we have 2 hours before these changes replicate to this passive secondary. The passive secondary can NOT be used for READING or WRITING. It's role is simply a back up node. We also use this node for nightly backup to prevent unnecessary overhead on any of the other nodes.
The nightly backup is set to run every night at 23:00 via a cron tab. The command simply executes a script setup in /opt/auto-mongo-backup. This script can be found at https://github.com/jaconel/automongobackup (originally found it at https://github.com/micahwedemeyer/automongobackup). This script allows for a single nightly cron to cover weekly backups and monthly backups. Back ups are saved at /var/backups/mongodb.
Hope this helps some one out.

mongoDB replication with offline nodes

Is it possible to set up mongoDB replica set with following scenario (if it is,how):
2 servers always online running mongodb, one of them holds the main node, the other one a rescue copy;
n computers each of them running mongodb, occasionally connected to internet, holding nodes which need synchronizing with main node, when they go online.
Backup only. In order to do this, you'll have to specify the priority of this node to 0. If your node is never going to be used as master nor queried, you can also set buildIndexes to false.
More informations here.
Intermitent slave. Due to limitations (mainly on the oplog queue), you can't have a slave halted for a very long time if you have many writes on your MongoDB, see here. However, you can use the mongodump and mongorestore tools directly over network or by script + sync the backup file. More informations here. Note that a restore will bring a db or collection in a server and recreate the indexes completely (if you restore the system.indexes collection too) which can take some time.