I have a replica set with 3 nodes. Is there a way to convert this existing Replica Set to a Sharded Cluster. Please find below the config file of the replica set:
All of my 3 nodes in replica set have sharding enabled:-
sharding:
clusterRole: shardsvr
sharding is enabled at the database level ....and then implemented at the collection level. any operating replica set can work in a sharding environment but there wouldn't be a compelling reason to do sharding until you have multiple replica clusters working in parallel... it's a big decision - no going back...and you need to be very clear on your sharding key.... so definitely invest time in learning the technology at the MongoDB University.
Related
I come across the following phase and I had fair idea of distributed systems like hdfs and elasticsearch etc ..
A replica set in MongoDB is a group of mongod processes that maintain
the same data set. Replica sets provide redundancy and high
availability, and are the basis for all production deployments. This
section introduces replication in MongoDB as well as the components
and architecture of replica sets. The section also provides tutorials
for common tasks related to replica sets.
In all those distributed systems like hdfs , elasticsearch - we can set the replication factor at the file level or index level. It seems it is not possible to do with mongo db, the only way to do the replication with mongodb is at the instance / process level - which means the machines in the replicaset group will have similar data no matter what.
Isnt it possible to create the replication at the db level ?
In a MongoDB replica set, each document is stored on each node (hence the replication factor is the number of nodes, and it is not configurable).
The benefit of this design is each node can answer any query from its locally stored data, without needing to retrieve data from other nodes.
I'm taking the Mongodb University M103 course and over there they gave a brief overview of what a cluster and a replica set is.
From my understanding a cluster is a set of servers or nodes. While a replica set is a set of servers or nodes all of which has replication mechanism built into each of them for downtime access and faster read operation.
From that it seems that replica set is a specific type of cluster, but my confusion arises from MongoDB Atlas. In mongoDB atlas one has to create a cluster, is that a replica set as well?
Are those terms interchangeable in all scenarios?
Replica Set
In MongoDB, a replicaset is a concept that depicts a set of MongoDB server working in providing redundancy (all the servers in the replica set have the same data) and high availability (if a server goes down, the remaining servers can still fulfil requests). When you create a replicaset, you need a minimum of 3 servers. There will always be a primary (read and write) and the remaining are called secondaries (for reading only).
MongoDB Atlas Cluster
Atlas is a DaaS, meaning a database a service company. They remove the burdain of maintaining, scaling and monitoring MongoDB servers on premise, so that you can focus on your applications.
An Atlas MongoDB cluster is a set of configuration you give to Atlas so it can setup the MongoDB servers for you. Hence, a MongoDB ReplicaSet is a feature subset in Atlas.
For example, while creating an Atlas Cluster, they will ask you whether you want a replicaset, sharded cluster, etc. Also, in which cloud provider you want to deploy. Your backup policy, the specs of your MongoDB hardware and more...
The keyword here is configuration. At the end of the day, you will have your MongoDB servers (replicaset or not) up and ready.
Summary
MongoDB Cluster
A specific configuration set of MongoDB servers to provide specific
features. i.e. replicaset and sharding.
MongoDB Replicaset
A MongoDB cluster setup to provide redundancy and high
availability with 3 or more odd number of servers (3, 5, 7, etc.)
MongoDB Atlas Cluster
High level MongoDB cluster configuration that allows you to set a
replicaset or other type of MongoDB cluster with its location and performance range.
I would suggest you to play with their web console. You will definitely see the difference.
I am new to mongodb, after install hortonworks HDP cluster and embedded mongodb with 3 nodes at HDP cluster.
now, try to setup shardings with mongodb. I tried few things and executed few steps. when I mongo, I saw these 3 servers have shard0:PRIMARY, shard1:SECONDARY> and shard1:SECONDARY>
Q1. did this mean I have sharding working?
Q2. if this is not right, how to remove all settings and back to a initial settings?
Answer is simple. Yes, you have managed to start Replica Set, not cluster.
Cluster is group of mongod nodes or replica sets where sharded collections exists. Sharding is like oracle partitioning.
If you are building cluster, it's better to do it with replica sets, so every shard in cluster is one replica set of (at least) three nodes.
I want to make continuous backup of my Sharded Cluster on a single MongoDB server somewhere else.
So, is it possible to create Replica Set with Sharded Cluster (mongos instance) and single MongoDB server?
Did anyone experience creating Replica Sets with two Sharded Clusters or with one Sharded Cluster and one Single Server?
How does it work?
By the way, the best (and for now, the only) way to continuously backup Sharded Cluster is by using MongoDB Management Service (MMS).
I were also facing the same issue sometime back. I wanted to take replica of all sharded cluster into one MongoDB. But, i didn't found any solution for this scenario, and I think this is true, because -
If you configure the multiple shard server (say 2 shard server) with
one replica set, then this will not work because in a replica set (say
rs0) only 1-primary member is possible. And In this scenario, we will
have multiple primary depend on number of shard server.
To take the backup of your whole sharded cluster, you must stop all writes to the cluster. You can refer to MongoDB documentation on this - http://docs.mongodb.org/manual/tutorial/backup-sharded-cluster-with-database-dumps/
3 instances for config servers
1 instance for webserver & mongos
1 instance for shard 1
then when i need to start more shards i can just add more instances?
also, what is a replica set? if i had say 3 servers to shard 1 then is that a replica set?
A Replica Set is a set of computers that are clones of each other. (i.e.: replicas) Within a given set there is an elected master. By default reads and writes go to this elected master and the replicas just "tail" the changes to be up-to-date copies. If the master fails, a new one is elected and the system just keeps going. The documentation is here.
So you ask about scaling with MongoDB. There are two types of scaling:
Read Scaling: use Replica Sets (see here)
Write Scaling: use Sharding
The minimum config for Replica Sets is
- 2 full replicas
- 1 arbiter (lightweight process, breaks ties when voting)
The minimum config for Sharding is
- 1 config server
- 1 mongod process (only one shard)
- 1 or more mongos (generaly on app server)
However, you probably don't want to run like this in production. Running only a single DB, means that you only have one source for the data which can result in large down-times or total data loss. This is generally solved by using replica sets.
Additionally, the config server is quite important. MongoDB supports 1 or 3 config servers. Most production deployments use 3. Note that config servers and arbiters are very lightweight and can live on other boxes or on Amazon micro instances.
Most production deployments with sharding also involve replica sets. In fact, they usually start as replica sets.
then when i need to start more shards i can just add more instances?
From a sharding perspective it should be as easy as:
- start new shard server
- run the addshard command from a mongos
Note that when you add a shard, you will need to allow for time and resources as data migrates between shards and everything re-balances.