MongoDB replication to hard drive

MongoDB replication to hard drive - mongodb

MonogDB's dynamic schema design is driving me towards it to replace MySQL in a production site. But this project runs on only 1 dedicated server (with 2 hard drives).
Docs about "MongoDB for production" recommends multiple servers. This makes me wonder if MongoDB is only suited for large commercial projects?
Anyways... I am wondering if the live database data can be replicated to the second hard drive for backup & recovery (to recover from corrupt data due to hard stop).
Any thoughts against the use of MongoDB in a single server environment is also appreciated. In this project, the biggest database will be less than 7GB.
Thanks

Related

How to quickly mirror a Mongo database?

What's a quick and efficient way to transfer a large Mongo database?
I want to transfer a 10GB production Mongo 3.4 database to a staging environment for testing. I used the mongodump/mongorestore tools to test this transfer to my localhost, but it took over 8 hours and consumed a massive amount of CPU and memory, which is something I'd like to avoid in the future. The database doesn't have any indexes, so the mongodump option to exclude indexes doesn't increase performance.
My staging environment will mostly be read-only, but it will still need to write occasionally, so it can't be setup as a permanent read replica of production.
I've read about [replication sets][1], but they seem very complicated to setup and designed for permanent mirroring of a primary to two or more secondaries. I've read some posts about people hacking this to be temporary, so they can do a one-time mirroring, but I can't find any reliable documentation since this isn't the intended usage of the feature. All the guides I've read also say you need at least 3 servers, which seems unintuitive since I only have 2 (production and staging) and don't want to create a third.

Several options exist today (2020-05-06).
Copy Data Directory
If you can take the system offline you can copy the data directory from one host to another then set the configuration to point to this directory and start up the new mongod.
Mongomirror
Mongomirror (https://docs.atlas.mongodb.com/import/mongomirror/) is intended to be a tool to migrate from on-premises to Atlas, but this tool can be leveraged to copy data to another on-premises host. Beware, this connection requires SSL configurations on source and target to transfer.
Replicaset
MongoDB has built-in High Availability features using a replica set model (https://docs.mongodb.com/manual/tutorial/deploy-replica-set/). It is not overly complicated and works very well. This option allows the original system to stay online while replication does its magic. Once the replication completes reconfigure the replica set to be a single node replica set referring only to the new host and shut down the original host. This configuration is referred to as a single-node replica set. Having a single node replica set offers benefits over a stand-alone installation in that the replica set underpinnings (oplog) are the basis for other features such as change streams (https://docs.mongodb.com/manual/changeStreams/)
Backup and Restore
As you mentioned you can use mongodump/mongorestore. There is a point in time where the backup must be restored. During this time it is expected the original system is offline and not accepting any additional writes. This method is robust but has downtime associated with it. You could use mongoexport/mongoimport to use a JSON file as an intermediate step but this is not recommended as BSON data types could be lost in translation.

Per Mongo documentation, you should be able to cp/rsync files for creating a backup (if you are able to halt write ops temporarily on your production setup - or if you do this during a maintenance window)
https://docs.mongodb.com/manual/core/backups/#back-up-by-copying-underlying-data-files
Back Up with cp or rsync
If your storage system does not support snapshots, you can copy the files >directly using cp, rsync, or a similar tool. Since copying multiple files is not >an atomic operation, you must stop all writes to the mongod before copying the >files. Otherwise, you will copy the files in an invalid state.
Backups produced by copying the underlying data do not support point in time >recovery for replica sets and are difficult to manage for larger sharded >clusters. Additionally, these backups are larger because they include the >indexes and duplicate underlying storage padding and fragmentation. mongodump, >by contrast, creates smaller backups.
FYI - for replica sets, the third "server" is an arbiter which exists to break the tie when electing a new primary. It does not consume as many resources as the primary/secondaries. Since you are looking to creating a staging environment, i would not recommend creating a replica set that includes production and staging env. Your primary instance could switch over to the staging instance and clients who are meant to access production instance will end up reading/writing from staging instance.

Mongo backup - performance

I'm going to run schedule backup for my MongoDB server base on this opensource script.
My data is about 10GB (and counting) and back it up taking time, my question is - is that backup operation actually blocking the database or will make the server slow to not serve the applications using it? or that Mongo backup is smart enough to do that in background without harm the service?
Do anyone have this such experience?
TU.

Physically recover heroku postgresql database

Heroku makes clear in its documentation and in its blog that it stores postgres database physical backups (that's binary copies of the database cluster files) in S3 using its software called wal-e.
Does somebody know if there is a way for the final user to access them?
Notice that I'm talking about physical backup, not the logical one provided by PGBackups plugin. This issue is related with a database in the free plan, without rollbacks, forks nor follows.
Thanks a lot.

I wrote and write most of WAL-E on Heroku's behalf. There is no access to the archives because it is operating system and architecture bit-depth dependent. On the "Free" and "Hobby" tiers the archives contain a mix of data from multiple tenants, which is relevant to why fork/follow are not supported.

Deploying large data on mongodb replicaset

Can I deploy large database by copying its files (eg. testing database with files: testing.0,testing.1,testing.ns found on mongodb dbpath) from another server to the target servers (replica set) to avoid usage of communication bandwidth for replication (in case it is only deployed to the primary)? So basically I want to avoid the slow process of replication.
If journaling is enabled, what is the effect on the process?

Yes you can, this is a perfectly valid way of solving having to do tedious and time consuming replication between members of a distanced or latenced network.
If journaling is enabled nothing really happens, copying via the file system goes around MongoDB.

Are there major downsides to running MongoDB locally on VPS vs on MongoLab?

I have an account with MongoLab for MongoDB and the constant calls to this remote server from my app slow it down quite a lot. When I run the app locally on my computer with a local version of Mongod and MongoDB it's far, far faster, as would be expected.
When I deploy my app (running on Node/Express) it will be run from a VPS on CentOS. I have plenty of storage space available on my VPS, are there any major downsides to running MongoDB locally rather than remotely on Mongolab?
Specs of the VPS:
1024MB RAM
1024MB VSwap
4 CPU Cores # 3.3GHz+
60GB SSD space
1Gbps Port
3000GB Bandwidth

Nothing apart from the obvious:
You will have to worry about storage yourself. MongoDB does tend to take a lot of disk space. upgrading storage will probably be harder to manage than letting Mongolab take care of it.
You will have to worry about making sure the Mongo server doesn't crash and it's running fine.
You will have scaling issues in the future once the load on your application and your database increases.
Using a "database-as-a-service" like Mongolab frees you from worrying about a lot of hardware/OS/system level requirements and configuration. Memory optimization? Which file system? Connection limits? Visualization and IO ops issues? (thanks to Nikolay for pointing that one out)
If your VPS provider doesn't account for local traffic in your bandwidth, then you can set up another VPS for MongoDB. That way, the server will be closer so the requests will be faster, and also, it will have the benefits of being an independent server. It still won't be fully managed like MongoLab though.
[ Edit: As Chris pointed out, MongoLab also helps you with your database schema design and bundles MongoDB support with their plans, so that's also nice. ]
Also, this is a good question, but probably not appropriate for StackOverflow. dba.stackexchange.com and serverfault.com are good places for this question.

We Keep Coding

iphone swift flutter scala powershell matlab mongodb postgresql perl eclipse