What is the difference between a cluster and a replica set in mongodb atlas? - mongodb

I'm taking the Mongodb University M103 course and over there they gave a brief overview of what a cluster and a replica set is.
From my understanding a cluster is a set of servers or nodes. While a replica set is a set of servers or nodes all of which has replication mechanism built into each of them for downtime access and faster read operation.
From that it seems that replica set is a specific type of cluster, but my confusion arises from MongoDB Atlas. In mongoDB atlas one has to create a cluster, is that a replica set as well?
Are those terms interchangeable in all scenarios?

Replica Set
In MongoDB, a replicaset is a concept that depicts a set of MongoDB server working in providing redundancy (all the servers in the replica set have the same data) and high availability (if a server goes down, the remaining servers can still fulfil requests). When you create a replicaset, you need a minimum of 3 servers. There will always be a primary (read and write) and the remaining are called secondaries (for reading only).
MongoDB Atlas Cluster
Atlas is a DaaS, meaning a database a service company. They remove the burdain of maintaining, scaling and monitoring MongoDB servers on premise, so that you can focus on your applications.
An Atlas MongoDB cluster is a set of configuration you give to Atlas so it can setup the MongoDB servers for you. Hence, a MongoDB ReplicaSet is a feature subset in Atlas.
For example, while creating an Atlas Cluster, they will ask you whether you want a replicaset, sharded cluster, etc. Also, in which cloud provider you want to deploy. Your backup policy, the specs of your MongoDB hardware and more...
The keyword here is configuration. At the end of the day, you will have your MongoDB servers (replicaset or not) up and ready.
Summary
MongoDB Cluster
A specific configuration set of MongoDB servers to provide specific
features. i.e. replicaset and sharding.
MongoDB Replicaset
A MongoDB cluster setup to provide redundancy and high
availability with 3 or more odd number of servers (3, 5, 7, etc.)
MongoDB Atlas Cluster
High level MongoDB cluster configuration that allows you to set a
replicaset or other type of MongoDB cluster with its location and performance range.
I would suggest you to play with their web console. You will definitely see the difference.

Related

Will an application written for standalone MongoDB work for replica-set or sharded cluster without any changes?

Currently we are working with standalone mongodb without any replication or sharding, Now we are considering moving to replica-set for production purposes.
Will an application written for standalone mongodb will work for replica-set or sharded replica-set without any changes or are there some standalone/replica-set specific features in mongodb ?
Provided the MongoDB uses the default ports (27017 for standalone mongod and mongos) you don't need to touch your client application at all, it will work in either case.
Of course, when you connect to a MongoDB then a sharded cluster has more options, but the defaults are fine.
Will an application written for standalone mongodb will work for
replica-set or sharded replica-set without any changes or are there
some standalone/replica-set specific features in mongodb ?
Here are some things to think about when an application is to run on a replica-set or a sharded cluster. In addition, replica-sets and sharded clusters has some features not available in standalone deployment (see the Transactions and Change Streams topic at the bottom).
Replica Sets
A replica-set is cluster with multiple database servers - with replicated data on each server. The topology of the replica-set has one primary node (or member) and remaining members are secondaries (there can be other special purpose nodes like arbiters).
The data redundancy and failover features of replica-sets give your applications additional capabilties - for example, an application always runs even if a server is down.
The data is always written to the primary and read from it, by default. You can configure that the data can be read from the secondary nodes also from your application - this the Read Preference. This configuration can be used by the applications accessing a replica-set in some scenarios (see Read Preference Use Cases). This is for replica-sets and has no usage for standalone deployment.
Also, see Replica Set Read and Write Semantics:
From the perspective of a client application, whether a MongoDB
instance is running as a single server (i.e. “standalone”) or a
replica set is transparent. However, MongoDB provides additional read
and write configurations for replica sets.
Then, there are some things like, the Connection String URI, which uses different format for replica-set and sharded clusters - this is used by the applications to connect.
Sharded Cluster
The application should not be run in sharded cluster deployment as it is. It will require design level changes - and will affect the queries. Sharding is about distributing the data among shards. Note that in sharded cluster each shard is a replica-set. A sharded database can have sharded and un-sharded collections. Sharded collections are the distributed data.
To create a sharded collection, you must figure a shard key - this is the most important aspect of your application accessing a sharded collection. Shard key determines how the queries access particular shard to get the data. So, your application must take into consideration the shard key - the queries need to be created with shard key usage. Shard key affects the performance of your application queries, primarily.
Also, in the sharded cluster environment the application accesses the database via a mongos router - not the servers directly.
There are many other finer aspects when working with sharded databases and accessing for applications - the topic is too broad to discuss here. Changing from standalone to sharded cluster is an architectural change. Some aspects that can affect the application due to migrating from standalone to a replica-set also apply here (as each shard is a replica-set).
Also, see Operational Restrictions in Sharded Clusters - these are specific to sharded clusters and not applicable to standalone deployments.
Transactions and Change Streams
Features like transactions and change streams are available with replica-sets and sharded clusters only (and not on single standalone servers). This gives additional capabilites to your applications and can solve complex business logic and scenarios.

Setting up Sharding

I am new to mongodb, after install hortonworks HDP cluster and embedded mongodb with 3 nodes at HDP cluster.
now, try to setup shardings with mongodb. I tried few things and executed few steps. when I mongo, I saw these 3 servers have shard0:PRIMARY, shard1:SECONDARY> and shard1:SECONDARY>
Q1. did this mean I have sharding working?
Q2. if this is not right, how to remove all settings and back to a initial settings?
Answer is simple. Yes, you have managed to start Replica Set, not cluster.
Cluster is group of mongod nodes or replica sets where sharded collections exists. Sharding is like oracle partitioning.
If you are building cluster, it's better to do it with replica sets, so every shard in cluster is one replica set of (at least) three nodes.

MongoDB data replication in Kubernetes

I've been configuring pods in Kubernetes to hold a mongodb and golang image each with a service to load-balance. The major issue I am facing is data replication between databases. Replication controllers/replicasets do not seem to do what the name implies, but rather is a blank-slate copy instead of a replica of existing/currently running pods. I cannot seem to find any examples or clear answers on how Kubernetes addresses this, or does it even?
For example, data insertions being sent by the Go program are going to automatically load balance to one of X replicated instances of mongodb by the service. This poses problems since they will all be maintaining separate documents without any relation to one another once Kubernetes begins to balance the connections among other pods. Is there a way to address this in Kubernetes, or does it require a complete re-write of the Go code to expect data replication among numerous available databases?
Sorry, I'm relatively new to Kubernetes and couldn't seem to find much information regarding this.
You're right, a replica set is not a replica of another container, it's just a container with the same configuration spun up within the same logical unit.
A replica set (or deployment, which is the resource you should be using now) will have multiple pods, and it's up to you, the operator, to configure the mongodb part.
I would recommend reading this example of how to set up a replica set with multiple mongodb containers:
https://medium.com/google-cloud/mongodb-replica-sets-with-kubernetes-d96606bd9474#.e8y706grr

Can a mongodb shard cluster have just one shard?

Technically is it supported to begin with just one shard for a shard cluster? So we can be ready for adding additional one(s) at anytime, at the same time save the cost of additional shard(s) before we really need it(them)?
To go further, is it possible to have a shard running on one single instance, instead of having to be based off of a 3 instance replica set?
From here, sharding is:
A database architecture that partitions data by key ranges and
distributes the data among two or more database instances.
A shard will be either a replica set or a standalone mongod instance. It is possible for you to use a single machine by using different ports to establish distinct communication endpoints for the config, mongod and mongos processes on the single machine. Also, yes, you may add a shard at a later time when you need to expand.
However, the point of providing sharding is to support horizontal scaling. Additionally, the point of sharded clustering is to provide failover and redundancy support. By using a single shard on a single server, you are losing the benefits of scaling and certainly failover.
The recommended production architecture includes:
Three config servers on separate machines for each sharded cluster.
Two or more replica sets as shards.
One or more query routers (mongos); typically, one mongos instance per application server.
Peruse the Sharded Cluster Requirements section in the documentation to get a feel for whether or not your environment needs sharding and sharded clusters since there is complexity in establishing such an architecture.

Is it possible to create Replica Set with Sharded Cluster (mongos) and Single Server (mongod)?

I want to make continuous backup of my Sharded Cluster on a single MongoDB server somewhere else.
So, is it possible to create Replica Set with Sharded Cluster (mongos instance) and single MongoDB server?
Did anyone experience creating Replica Sets with two Sharded Clusters or with one Sharded Cluster and one Single Server?
How does it work?
By the way, the best (and for now, the only) way to continuously backup Sharded Cluster is by using MongoDB Management Service (MMS).
I were also facing the same issue sometime back. I wanted to take replica of all sharded cluster into one MongoDB. But, i didn't found any solution for this scenario, and I think this is true, because -
If you configure the multiple shard server (say 2 shard server) with
one replica set, then this will not work because in a replica set (say
rs0) only 1-primary member is possible. And In this scenario, we will
have multiple primary depend on number of shard server.
To take the backup of your whole sharded cluster, you must stop all writes to the cluster. You can refer to MongoDB documentation on this - http://docs.mongodb.org/manual/tutorial/backup-sharded-cluster-with-database-dumps/