General Question about performance of mongodb replica set

General Question about performance of mongodb replica set - mongodb

I have set up a mongodb replica set on two 100% identical servers (Hardware & Software).
I am running a webcraper with writes to mongodb.
Now i can see that the performance was better as a standalone instance.
As a standalone mongodb the scraping process was running 2,5 hours with max 150 ops.
As a replica set the scraping process is running 6 hours with max 100 ops.
Is this the usual behavior of a replica set ?
See the screenshots from my grafana dashboard.
As standalone:
As replica set:

Yes this is the expected behaviour, from the official docs
The primary node receives all write operations. A replica set can have only one primary capable of confirming writes ...
Meaning the added nodes just create overhead when talking strictly about performance as Mongo has to sync them up with the primary.
This is expected as the main use of replication is for fail over.
If you're bulk scraping and want to speed up performance and don't care about unexpected errors you can change writeConcernMajorityJournalDefault replica configuration to false.
With writeConcernMajorityJournalDefault set to false, MongoDB does not wait for w: "majority" writes to be written to the on-disk journal before acknowledging the writes. As such, majority write operations could possibly roll back in the event of a transient loss (e.g. crash and restart) of a majority of nodes in a given replica set.

Related

Why is replica set mandatory for transactions in MongoDB?

As per MongoDB documentation, transactions only works for replica sets and not single node. Why such requirement? Isn't it is easier to do transaction stuff on a single node rather than a distributed system?

Implementation of transactions uses sessions which in turn require an oplog. Oplog is provided by replica sets for data synchronization between nodes.
Isn't it is easier to do transaction stuff on a single node rather than a distributed system?
This is true but in practice, MongoDB positions itself as a high-availability database therefore there are rather few production deployments using a standalone server (as far as I know this isn't even an option in Atlas, for example). Hence lack of transaction support on standalone servers typically doesn't affect anything.
Conversely, implementing transactions only on standalone servers would not address the needs of the vast majority of MongoDB deployments/customers that use replica sets and sharded clusters.
For development purposes you can run a single-node replica set which gives you an oplog required for sessions and transactions but still only one mongod process.

Mongo DB - difference between standalone & 1-node replica set

I needed to use Mongo DB transactions, and recently I understood that transactions don't work for Mongo standalone mode, but only for replica sets
(Mongo DB with C# - document added regardless of transaction).
Also, I read that standalone mode is not recommended for production.
So I found out that simply defining a replica set name in the mongod.cfg is enough to run Mongo DB as a replica set instead of standalone.
After changing that, Mongo transactions started working.
However, it feels a bit strange using it as replica-set although I'm not really using the replication functionality, and I want to make sure I'm using a valid configuration.
So my questions are:
Is there any problem/disadvantage with running Mongo as a 1-node replica set, assuming I don't really need the replication, load balancing or any other scalable functionality? (as said I need it to allow transactions)
What are the functionality and performance differences, if any, between running as standalone vs. running as a 1-node replica set?
I've read that standalone mode is not recommended for production, although it sounds like it's the most basic configuration. I understand that this configuration is not used in most scenarios, but sometimes you may want to use it as a standard DB on a local machine. So why is standalone mode not recommended? Is it not stable enough, or other reasons?

Is there any problem/disadvantage with running Mongo as a 1-node replica set, assuming I don't really need the replication, load balancing or any other scalable functionality?
You don't have high availability afforded by a proper replica set. Thus it's not recommended for a production deployment. This is fine for development though.
Note that a replica set's function is primarily about high availability instead of scaling.
What are the functionality and performance differences, if any, between running as standalone vs. running as a 1-node replica set?
A single-node replica set would have the oplog. This means that you'll use more disk space to store the oplog, and also any insert/update operation would be written to the oplog as well (write amplification).
So why is standalone mode not recommended? Is it not stable enough, or other reasons?
MongoDB in production was designed with a replica set deployment in mind, for:
High availability in the face of node failures
Rolling maintenance/upgrades with no downtime
Possibility to scale-out reads
Possibility to have a replica of data in a special-purpose node that is not part of the high availability nodes
In short, MongoDB was designed to be a fault-tolerant distributed database (scales horizontally) instead of the typical SQL monolithic database (scales vertically). The idea is, if you lose one node of your replica set, the others will immediately take over. Most of the time your application don't even know there's a failure in the database side. In contrast, a failure in a monolithic database server would immediately disrupt your application.

I think kevinadi answered well, but I still want to add it.
A standalone is an instance of mongod that runs on a single server but is not part of a replica set. Standalone instances used for testing and development, but always recomended to use replica sets in production.
A single-node replica set would have the oplog which records all changes to its data sets . This means that you'll use more disk space to store the oplog, and also any insert/update operation would be written to the oplog as well (write amplification). It also supports point in time recovery.
Please follow Convert a Standalone to a Replica Set if you would like to convert the standalone database to replicaset.
Transactions have been introduced in MongoDB version 4.0. Starting in version 4.0, for situations that require atomicity for updates to multiple documents or consistency between reads to multiple documents, MongoDB provides multi-document transactions for replica sets. The transaction is not available in standalone because it requires oplog to maintain strong consistency within a cluster.

MongoDB standalone vs replica set and how to migrate data from a standalone to a replica set

I have a few questions about MongoDB standalone and Replica sets, I don't really get it.
When should I use either of them
Why all the replica sets tutorials show 3 connections, is there a reason?
Can I create a replica set for 1 instance only? and in that case how is it different than the standalone mongodb instance?
How to Migrate data from a standalone instance to replica sets?
All these questions I'm asking because recently I was trying to implement transactions and sessions can only start on "replica sets" I don't really get the difference at all.

When should I use either of them?
Replication is the process of synchronizing data across multiple servers. Replication provides redundancy and increases data availability with multiple copies of data on different database servers. Replication protects a database from the loss of a single server. Replication also allows you to recover from hardware failure and service interruptions. With additional copies of the data, you can dedicate one to disaster recovery, reporting, or backup.
To keep your data safe
High (24*7) availability of data
Disaster recovery
No downtime for maintenance (like backups, index rebuilds, compaction) Read scaling (extra copies to read from)
Replica set is transparent to the application
Why all the replica sets tutorials show 3 connections, is there a reason?
The basic implementation to take full advantage of replication specifies you
should have at least one primary node with two secondary nodes. So the
examples are always with 3 nodes. Not only this if from 3 the
Primary node goes down you still have 2 nodes (mongoDB will assign
using arbiter rule) and one primary and one secondary for high availability
Can I create a replica set for 1 instance only? and in that case how is it different than the standalone mongodb instance?
It does not make sense to have single instance with mongo replication.
How to Migrate data from a standalone instance to replica sets?
Convert a Standalone to a Replica Set . Your existing data will be migrated to all replication instances once they are up and running when converted to replication sets from standalone.

How to make MongoDB `repairDatabase` and `compact` commands work with replica-set? Downtime is ok

We need to free some of our MongoDB space, and we identified 100Gb + worth of documents that can be safely removed from a collection.
So we removed them from our test environment which has this setting:
mongodb version 3.0.1
no sharding
1 node, no replica
wiredtiger engine
When done, we found out that the space on disk was still used and needed to be reclaimed. We found this post and it helped us: after running both
db.runCommand({repairDatabase: 1})
and
db.runCommand({compact: collection-name })
We freed 100Gb +.
We then proceeded in production, forgetting that the setting was different since we had 1 replica node:
mongodb version 3.0.1
no sharding
1 primary node, 1 replica node
wiredtiger engine
After removing the documents, we run
db.runCommand({repairDatabase: 1})
and got the OK message (after a while, 10 min +). We tried running
db.runCommand({compact: collection-name })
and got this error:
will not run compact on an active replica set primary as this is a
slow blocking operation. use force:true to force
So we run
db.runCommand({compact: collection-name, force: true })
and got the OK message (almost instantly), but the disk on space is still used, it wasn't freed.
we searched for solutions for running the repairDatabase and compact commands with replica-set but the advise was focused on avoiding downtime as if that was the only issue. However, we can schedule downtime and our problem is rather that the commands don't work as expected since the space is not actually reclaimed.
What did we do wrong?

For replica set configurations, the best and the safest method to recover space is to perform an initial sync. If you need to recover space from all nodes in the set, you can perform a rolling initial sync. That is, perform initial sync on each of the secondaries, before finally stepping down the primary and perform initial sync on it.
Note that the rolling initial sync is only possible if your deployment contains at least three nodes replica set (for reasons I will describe below).
Rolling initial sync method is the safest method to perform replica set maintenance, and it also involves no downtime as a bonus.
Having said that, there are some things that are worth mentioning:
Regarding compact:
The compact command on WiredTiger on MongoDB 3.0.x was affected by this bug: SERVER-21833 which was fixed in MongoDB 3.2.3. Prior to this version, compact on WiredTiger could silently fail.
Regarding repairDatabase:
Please don't run repairDatabase on replica set nodes. This is strongly not recommended, as mentioned in the repairDatabase page. The name repairDatabase is a bit misleading, since the command doesn't attempt to repair anything. The command was intended to be used when there's disk corruption, which could lead to corrupt documents.
The repairDatabase command could be more accurately described as "salvage database". That is, it recreates the databases by discarding corrupt documents, in an attempt to get the database into a state where you can start it and salvage intact document from it.
In a replica set, MongoDB expects all nodes in the set to contain identical data. If you run repairDatabase on a replica set node, there is a chance that the node contains undetected corruption, and repairDatabase will dutifully remove the corrupt documents. Predictably, this makes that node contains a different dataset from the rest of the set. If an update happens to hit that single document, the whole set could crash. To make matters worse, it is entirely possible that this situation could stay dormant for a long time, only to strike suddenly with no apparent reason.
Regarding your setup:
I noticed that in your production environment, you created a replica set with two nodes. This setup is not recommended, since the failure of a single node will render the remaining node to become a secondary, and thus disallowing writes to the set.
Due to the way MongoDB high availability works (see Replica Set Election), it's strongly recommended to deploy three data-bearing nodes at a minimum, or at least add an arbiter node (see Replica Set Members) so that your replica set contains an odd number of members.
Having only two members in a replica set also makes rolling upgrades/initial sync/maintenance much harder or even impossible in some cases.
MongoDB 3.0.1 was released in March 17, 2015, which is more than 2 years ago as of this writing. If you're forced to use MongoDB 3.0 series, please consider moving to 3.0.15. Or better yet, to 3.4.7 (the latest as of Aug 10, 2017), which contains massive improvements over 3.0.1.

Migrating MongoDB instances with no down-time

We are using MongoDB in production environment and now, due to some issues of current servers, I'm going to change the server and start a new MongoDB instance.
We have a replica set and a single mongod instance (two different MongoDB networks for different purposes). Now, first I should migrate the single mongod instance and then the whole replica set to the new server.
What I want to know is, how can I migrate both instances with no down-time? I don't want to shutdown the server or stop write operations.
Thanks in advance.

So first of all you should never run mongodb as a single instance for production. At a minimum you should have 1 primary, 1 secondary and 1 arbiter.
Second, even with a replica set you will always have a bit of write downtime when you switch primaries, as writes are not possible during the election process. From the docs:
IMPORTANT Elections are essential for independent operation of a
replica set; however, elections take time to complete. While an
election is in process, the replica set has no primary and cannot
accept writes. MongoDB avoids elections unless necessary.
Elections are going to occur when for example you bring down the primary to move it to a new server or virtual instance, or upgrade the database version (like going from 2.4 to 2.6).
You can keep downtime to a minimum with an existing replica set by setting the appropriate options to allow queries to run against secondaries. Again from the docs:
Maintaining availability during a failover. Use primaryPreferred if
you want an application to read from the primary under normal
circumstances, but to allow stale reads from secondaries in an
emergency. This provides a “read-only mode” for your application
during a failover.
This takes care of reads at least. Writes are best dealt with by having your application retry failed writes, or queue them up.
Regarding your standalone the documented procedures for converting to a replica set are well tested and can be completed very quickly with minimal downtime:
http://docs.mongodb.org/manual/tutorial/convert-standalone-to-replica-set/

You cannot have no downtime (a new mongod will run on new IP so you need to at least connect to it). But you can minimize downtime by making geographically distributed replica set.
Please Read
http://docs.mongodb.org/manual/tutorial/deploy-geographically-distributed-replica-set/
Use the given process but please note:
Do not set priority 0 of instances at New Location so that they become primary when old ones at Old Location step down.
You still need to restart mongod in replica set mode at Old Location.
You need 3 instances including an arbiter at New Location, if you want it to be
replica set.
When complete data is in sync with instances at New Location, step down instances at Old Location (one by one). Now everything will go to New Location but the problem is that it is directed through a distant mongod.
So stop mongod at Old Location and start a new one at new Location. Connect your applications to New Location Mongod.
Note: I have not done the same so far. I had planned it once but then I got the problem and it was not of hosting provider. Practically you may get some issues.

Replica Set is the feature provided by the Mongodb database to achieve high availability and automatic failover.
It is kinda traditional master-slave configuration but have capability of automatic failover.
It is basically group/cluster of the mongod instances which communicates, replicates to each other to provide high availability and to do automatic failover
Basically, in replica sets there are minimum 2 and maximum of 12 mongod instances can exist
In replica set following types of server exist. out of all, one server is always primary.
http://blog.ajduke.in/2013/05/31/setup-mongodb-replica-set-in-4-steps/

John answer is right, btw in your case you have no way to avoid downtime, you can just try to make it shorter as possible.
You can prepare the new replica set and save its configuration.
Same for the single mongod instance, prepare a js file with specific configuration (ie: stuff going on the admin database).
disable client connections on production servers.
copy the datafiles from the old servers to the new ones (http://docs.mongodb.org/manual/core/backups/#backup-with-file-copies)
apply your previous saved replica set config and configuration.
done
you can use diffent ways as add an hidden secondary member on the replica set if you have a lot of data, so you can wait it's is up-to-date before stopping the production server. Basically for the replica set you have many ways to handle a migration, with the single instance instead you don't have such features.

We Keep Coding

iphone swift flutter scala powershell matlab mongodb postgresql perl eclipse