MongoDB standalone vs replica set and how to migrate data from a standalone to a replica set - mongodb

I have a few questions about MongoDB standalone and Replica sets, I don't really get it.
When should I use either of them
Why all the replica sets tutorials show 3 connections, is there a reason?
Can I create a replica set for 1 instance only? and in that case how is it different than the standalone mongodb instance?
How to Migrate data from a standalone instance to replica sets?
All these questions I'm asking because recently I was trying to implement transactions and sessions can only start on "replica sets" I don't really get the difference at all.

When should I use either of them?
Replication is the process of synchronizing data across multiple servers. Replication provides redundancy and increases data availability with multiple copies of data on different database servers. Replication protects a database from the loss of a single server. Replication also allows you to recover from hardware failure and service interruptions. With additional copies of the data, you can dedicate one to disaster recovery, reporting, or backup.
To keep your data safe
High (24*7) availability of data
Disaster recovery
No downtime for maintenance (like backups, index rebuilds, compaction) Read scaling (extra copies to read from)
Replica set is transparent to the application
Why all the replica sets tutorials show 3 connections, is there a reason?
The basic implementation to take full advantage of replication specifies you
should have at least one primary node with two secondary nodes. So the
examples are always with 3 nodes. Not only this if from 3 the
Primary node goes down you still have 2 nodes (mongoDB will assign
using arbiter rule) and one primary and one secondary for high availability
Can I create a replica set for 1 instance only? and in that case how is it different than the standalone mongodb instance?
It does not make sense to have single instance with mongo replication.
How to Migrate data from a standalone instance to replica sets?
Convert a Standalone to a Replica Set . Your existing data will be migrated to all replication instances once they are up and running when converted to replication sets from standalone.

Related

Mongodb - Setting the replication at the db or collection level

I come across the following phase and I had fair idea of distributed systems like hdfs and elasticsearch etc ..
A replica set in MongoDB is a group of mongod processes that maintain
the same data set. Replica sets provide redundancy and high
availability, and are the basis for all production deployments. This
section introduces replication in MongoDB as well as the components
and architecture of replica sets. The section also provides tutorials
for common tasks related to replica sets.
In all those distributed systems like hdfs , elasticsearch - we can set the replication factor at the file level or index level. It seems it is not possible to do with mongo db, the only way to do the replication with mongodb is at the instance / process level - which means the machines in the replicaset group will have similar data no matter what.
Isnt it possible to create the replication at the db level ?
In a MongoDB replica set, each document is stored on each node (hence the replication factor is the number of nodes, and it is not configurable).
The benefit of this design is each node can answer any query from its locally stored data, without needing to retrieve data from other nodes.

Mongo DB - difference between standalone & 1-node replica set

I needed to use Mongo DB transactions, and recently I understood that transactions don't work for Mongo standalone mode, but only for replica sets
(Mongo DB with C# - document added regardless of transaction).
Also, I read that standalone mode is not recommended for production.
So I found out that simply defining a replica set name in the mongod.cfg is enough to run Mongo DB as a replica set instead of standalone.
After changing that, Mongo transactions started working.
However, it feels a bit strange using it as replica-set although I'm not really using the replication functionality, and I want to make sure I'm using a valid configuration.
So my questions are:
Is there any problem/disadvantage with running Mongo as a 1-node replica set, assuming I don't really need the replication, load balancing or any other scalable functionality? (as said I need it to allow transactions)
What are the functionality and performance differences, if any, between running as standalone vs. running as a 1-node replica set?
I've read that standalone mode is not recommended for production, although it sounds like it's the most basic configuration. I understand that this configuration is not used in most scenarios, but sometimes you may want to use it as a standard DB on a local machine. So why is standalone mode not recommended? Is it not stable enough, or other reasons?
Is there any problem/disadvantage with running Mongo as a 1-node replica set, assuming I don't really need the replication, load balancing or any other scalable functionality?
You don't have high availability afforded by a proper replica set. Thus it's not recommended for a production deployment. This is fine for development though.
Note that a replica set's function is primarily about high availability instead of scaling.
What are the functionality and performance differences, if any, between running as standalone vs. running as a 1-node replica set?
A single-node replica set would have the oplog. This means that you'll use more disk space to store the oplog, and also any insert/update operation would be written to the oplog as well (write amplification).
So why is standalone mode not recommended? Is it not stable enough, or other reasons?
MongoDB in production was designed with a replica set deployment in mind, for:
High availability in the face of node failures
Rolling maintenance/upgrades with no downtime
Possibility to scale-out reads
Possibility to have a replica of data in a special-purpose node that is not part of the high availability nodes
In short, MongoDB was designed to be a fault-tolerant distributed database (scales horizontally) instead of the typical SQL monolithic database (scales vertically). The idea is, if you lose one node of your replica set, the others will immediately take over. Most of the time your application don't even know there's a failure in the database side. In contrast, a failure in a monolithic database server would immediately disrupt your application.
I think kevinadi answered well, but I still want to add it.
A standalone is an instance of mongod that runs on a single server but is not part of a replica set. Standalone instances used for testing and development, but always recomended to use replica sets in production.
A single-node replica set would have the oplog which records all changes to its data sets . This means that you'll use more disk space to store the oplog, and also any insert/update operation would be written to the oplog as well (write amplification). It also supports point in time recovery.
Please follow Convert a Standalone to a Replica Set if you would like to convert the standalone database to replicaset.
Transactions have been introduced in MongoDB version 4.0. Starting in version 4.0, for situations that require atomicity for updates to multiple documents or consistency between reads to multiple documents, MongoDB provides multi-document transactions for replica sets. The transaction is not available in standalone because it requires oplog to maintain strong consistency within a cluster.

Can a mongodb shard cluster have just one shard?

Technically is it supported to begin with just one shard for a shard cluster? So we can be ready for adding additional one(s) at anytime, at the same time save the cost of additional shard(s) before we really need it(them)?
To go further, is it possible to have a shard running on one single instance, instead of having to be based off of a 3 instance replica set?
From here, sharding is:
A database architecture that partitions data by key ranges and
distributes the data among two or more database instances.
A shard will be either a replica set or a standalone mongod instance. It is possible for you to use a single machine by using different ports to establish distinct communication endpoints for the config, mongod and mongos processes on the single machine. Also, yes, you may add a shard at a later time when you need to expand.
However, the point of providing sharding is to support horizontal scaling. Additionally, the point of sharded clustering is to provide failover and redundancy support. By using a single shard on a single server, you are losing the benefits of scaling and certainly failover.
The recommended production architecture includes:
Three config servers on separate machines for each sharded cluster.
Two or more replica sets as shards.
One or more query routers (mongos); typically, one mongos instance per application server.
Peruse the Sharded Cluster Requirements section in the documentation to get a feel for whether or not your environment needs sharding and sharded clusters since there is complexity in establishing such an architecture.

mongodb failover Demonstration! Help needed

Here is a newbie trying to play around Mongodb. I am trying to demonstrate scaling in my class, meaning, I need to show that I have 2 instances of mongoDB up and running and I need to replicate them, set one as master and the other as secondary.
Can any of you suggest me a simple way to demonstrate that if primary/master fails the slave/secondary comes up as the master?
Please keep it as simple as possible as I am teaching to a fairly beginners of MongoDB
MongoDB replica sets are not master/slave. In order to achieve automatic failover you need to have a majority of nodes in the replica set able to elect a new primary. The minimum number of nodes in your replica set should be 3, which can either be 3 data-bearing nodes or 2 data-bearing nodes and an arbiter, which is a node that votes in elections.
A demo using replication alone is more about failover and redundancy than scaling (better demo'd with sharding).
If you want a very simple (and non-production) way to stand up a replica set or sharded cluster in a development environment, I would suggest using the mlaunch script which is part of mtools.
For example, to create a 3-node replica set with an arbiter:
mlaunch --replicaset --nodes 2 --arbiter
To create a sharded cluster with 3 shards backed by a replica set (plus mongos and config server):
mlaunch --replicaset --sharded 3
As mentioned in the other comments here, the free MMS Monitoring service is a good way to visualise activity in your MongoDB deployment, and you can use db.shutdownServer() to shutdown specific nodes to see the outcome.
The easiest way would be to set up the MongoDB monitoring service. Stop the MongoD process on one and watch the other take over. But, use replica sets rather than master/secondary as they are the recommended approach.
Actually, it is pretty easy
Set up a replica set with 2 "normal" mongods and an arbiter
Connect to both of the normal mongods using mongo
Show the output of rs.status(). (Note the selffield)
Shut down the current primary
Show the output of rs.status() again and again, until the former secondary is elected primary
Another option would be to write a simple java app which utilizes the driver, put it in an infinite loop which writes one entry every second and puts out the number of objects in the database. Catch exceptions and write out that a problem occurred. Start the sharded cluster, then start your application. Shut down the primary while the program is running. during the elections, there may be exceptions be thrown. As soon as the former secondary is elected primary, the document count should start to rise again.

Migrating MongoDB instances with no down-time

We are using MongoDB in production environment and now, due to some issues of current servers, I'm going to change the server and start a new MongoDB instance.
We have a replica set and a single mongod instance (two different MongoDB networks for different purposes). Now, first I should migrate the single mongod instance and then the whole replica set to the new server.
What I want to know is, how can I migrate both instances with no down-time? I don't want to shutdown the server or stop write operations.
Thanks in advance.
So first of all you should never run mongodb as a single instance for production. At a minimum you should have 1 primary, 1 secondary and 1 arbiter.
Second, even with a replica set you will always have a bit of write downtime when you switch primaries, as writes are not possible during the election process. From the docs:
IMPORTANT Elections are essential for independent operation of a
replica set; however, elections take time to complete. While an
election is in process, the replica set has no primary and cannot
accept writes. MongoDB avoids elections unless necessary.
Elections are going to occur when for example you bring down the primary to move it to a new server or virtual instance, or upgrade the database version (like going from 2.4 to 2.6).
You can keep downtime to a minimum with an existing replica set by setting the appropriate options to allow queries to run against secondaries. Again from the docs:
Maintaining availability during a failover. Use primaryPreferred if
you want an application to read from the primary under normal
circumstances, but to allow stale reads from secondaries in an
emergency. This provides a “read-only mode” for your application
during a failover.
This takes care of reads at least. Writes are best dealt with by having your application retry failed writes, or queue them up.
Regarding your standalone the documented procedures for converting to a replica set are well tested and can be completed very quickly with minimal downtime:
http://docs.mongodb.org/manual/tutorial/convert-standalone-to-replica-set/
You cannot have no downtime (a new mongod will run on new IP so you need to at least connect to it). But you can minimize downtime by making geographically distributed replica set.
Please Read
http://docs.mongodb.org/manual/tutorial/deploy-geographically-distributed-replica-set/
Use the given process but please note:
Do not set priority 0 of instances at New Location so that they become primary when old ones at Old Location step down.
You still need to restart mongod in replica set mode at Old Location.
You need 3 instances including an arbiter at New Location, if you want it to be
replica set.
When complete data is in sync with instances at New Location, step down instances at Old Location (one by one). Now everything will go to New Location but the problem is that it is directed through a distant mongod.
So stop mongod at Old Location and start a new one at new Location. Connect your applications to New Location Mongod.
Note: I have not done the same so far. I had planned it once but then I got the problem and it was not of hosting provider. Practically you may get some issues.
Replica Set is the feature provided by the Mongodb database to achieve high availability and automatic failover.
It is kinda traditional master-slave configuration but have capability of automatic failover.
It is basically group/cluster of the mongod instances which communicates, replicates to each other to provide high availability and to do automatic failover
Basically, in replica sets there are minimum 2 and maximum of 12 mongod instances can exist
In replica set following types of server exist. out of all, one server is always primary.
http://blog.ajduke.in/2013/05/31/setup-mongodb-replica-set-in-4-steps/
John answer is right, btw in your case you have no way to avoid downtime, you can just try to make it shorter as possible.
You can prepare the new replica set and save its configuration.
Same for the single mongod instance, prepare a js file with specific configuration (ie: stuff going on the admin database).
disable client connections on production servers.
copy the datafiles from the old servers to the new ones (http://docs.mongodb.org/manual/core/backups/#backup-with-file-copies)
apply your previous saved replica set config and configuration.
done
you can use diffent ways as add an hidden secondary member on the replica set if you have a lot of data, so you can wait it's is up-to-date before stopping the production server. Basically for the replica set you have many ways to handle a migration, with the single instance instead you don't have such features.