Using mongos with non-sharded replica sets - mongodb

Is there any advantage or disadvantage to using mongos in front of just a replica set (as opposed to a sharded cluster)? Having to specify the same replica set members in multiple apps doesn't feel right to me -- I'd rather do it in one central place.
If anything, sharding in the future would be more straight-forward. Right?
Thanks in advance!
Harry.

The downside is that you've likely introduced a single point of failure into your application. If the mongos goes down then the mongods are no longer accessible - you've defeated the high availability of replica sets.
You could set up multiple mongos instances but then you're back in the same boat. You could theoretically set up a load balancer in front of your mongos instances so that you only have to list one address and can still have high availability.

There is nothing wrong and in fact this is a typically good idea, mainly because it allows you to scale onto sharding with no downtime.
Of course for redundancy sake you would have more than one mongos. You would define, in each app, a list of load balanced mongos instances (in a more serious setup) which then back onto your cluster of replica set shards.

Related

Standard availability for mongoDB replica set cluster of 3 nodes

I have set up a replica set of mongoDB with one primary, one secondary and one arbiter node, mongoDB installed on three independent AWS instances. I need to document overall availability of the replica set cluster formed as per aforementioned configuration but don't have any reliable/standard data to establish so.
Is there any standard data which can be referred to establish avaialability of overall cluster/individual node in above case?
Your configuration will guarantee continued availability, even after one node goes down. However, availability after that depends on how quickly you can replace the downed node, and that is up to your monitoring and maintenance abilities.
If you do not notice for while that a node is down, or if your procedure for replacing the node takes a long time (you may need to commission a new VM, install MongoDB, reconfigure the replica set, allow time for the new node to sync), then another node may go down and leave you with no availability.
So your actual availability depends on the answers to four questions:
Which replica set configuration do you use? Because that determines how many nodes need to go down before the replica set stops being available
How likely it is that any single node will go down or lose connection to the rest?
How good is your monitoring, so you notice there is a problem?
How fast are your processes for repairing the problem?
The answer to the first one is straightforward; you have decided on the minimum of two data-bearing nodes and one arbiter.
The answer to the second one is not quite straightforward; it depends on the reliability of each node, and the connections between them, and whether two or more are likely to go down together (perhaps if they are in the same availability zone).
The third and fourth, we can't help you with; you'll have to assess those for yourself.

will mongodb replica set arbiter on a different continent slow down the cluster?

I have 2 datacenters in the same region(dc1, dc2) and one datacenter in a different continent, 300-400 ms ping away from the first two - dc3. I've experimented with one member of the replica set in each datacenter, but the one far away (dc3) kept slowing things down (slow oplog, etc), so now i plan on keeping the arbiter overseas (dc3). This way, if one of the local datacenters (either dc1 or dc2) goes down, there will be enough members to vote.
But, having a bad experience with a replica set member overseas, i need to ask if somebody experimented with this kind of setup. Do arbiters in any way slow things down?
Thanks.
Yes. Arbiters will affect the performance of aspects of the replica set. The arbiter will not be involved in read or write operations and won't have data replicated to it, naturally, but the arbiter is still involved in health checks and elections, and these will be affected by the ping times.

What is the advantage to explicitly connecting to a Mongo Replica Set?

Obviously, I know why to use a replica set in general.
But, I'm confused about the difference between connecting directly to the PRIMARY mongo instance and connecting to the replica set. Specifically, if I am connecting to Mongo from my node.js app using Mongoose, is there a compelling reason to use connectSet() instead of connect()? I would assume that the failover benefits would still be present with connect(), but perhaps this is where I am wrong...
The reason I ask is that, in mongoose, the connectSet() method seems to be less documented and well-used. Yet, I cannot imagine a scenario where you would NOT want to connect to the set, since it is recommended to always run Mongo on a 3x+ replica set...
If you connect only to the primary then you get failover (that is, if the primary fails, there will be a brief pause until a new master is elected). Replication within the replica set also makes backups easier. A downside is that all writes and reads go to the single primary (a MongoDB replica set only has one primary at a time), so it can be a bottleneck.
Allowing connections to slaves, on the other hand, allows you to scale for reads (not for writes - those still have to go the primary). Your throughput is no longer limited by the spec of the machine running the primary node but can be spread around the slaves. However, you now have a new problem of stale reads; that is, there is a chance that you will read stale data from a slave.
Now think hard about how your application behaves. Is it read-heavy? How much does it need to scale? Can it cope with stale data in some circumstances?
Incidentally, the point of a minimum 3 members in the replica set is to offer resiliency and safe replication, not to provide multiple nodes to connect to. If you have 3 nodes and you lose one, you still have enough nodes to elect a new primary and have replication to a backup node.

One MongoDB instance in a two or more replica set

is it possible to have One mongodb instance belong to one or more replica sets ?
I am using Replica Set - mongodb replication scheme.
No.
With Master-Slave you could hack this to make it work, but not with Replica Sets. However, you can run two instances on a single machine. Simply run them on different ports.
Please note that this is generally not advised. If you are sharing replicas, this typically means that you do not have enough hardware.

In Mongo what is the difference between sharding and replication?

Replication seems to be a lot simpler than sharding, unless I am missing the benefits of what sharding is actually trying to achieve. Don't they both provide horizontal scaling?
In the context of scaling MongoDB:
replication creates additional copies of the data and allows for automatic failover to another node. Replication may help with horizontal scaling of reads if you are OK to read data that potentially isn't the latest.
sharding allows for horizontal scaling of data writes by partitioning data across multiple servers using a shard key. It's important to choose a good shard key. For example, a poor choice of shard key could lead to "hot spots" of data only being written on a single shard.
A sharded environment does add more complexity because MongoDB now has to manage distributing data and requests between shards -- additional configuration and routing processes are added to manage those aspects.
Replication and sharding are typically combined to created a sharded cluster where each shard is supported by a replica set.
From a client application point of view you also have some control in relation to the replication/sharding interaction, in particular:
Read preferences
Write concerns
Consider you have a great music collection on your hard disk, you store the music in logical order based on year of release in different folders.
You are concerned that your collection will be lost if drive fails.
So you get a new disk and occasionally copy the entire collection keeping the same folder structure.
Sharding >> Keeping your music files in different folders
Replication >> Syncing your collection to other drives
Replication is a mostly traditional master/slave setup, data is synced to backup members and if the primary fails one of them can take its place. It is a reasonably simple tool. It's primarily meant for redundancy, although you can scale reads by adding replica set members. That's a little complicated, but works very well for some apps.
Sharding sits on top of replication, usually. "Shards" in MongoDB are just replica sets with something called a "router" in front of them. Your application will connect to the router, issue queries, and it will decide which replica set (shard) to forward things on to. It's significantly more complex than a single replica set because you have the router and config servers to deal with (these keep track of what data is stored where).
If you want to scale Mongo horizontally, you'd shard. 10gen likes to call the router/config server setup auto-sharding. It's possible to do a more ghetto form of sharding where you have the app decide which DB to write to as well.
Sharding
Sharding is a technique of splitting up a large collection amongst multiple servers. When we shard, we deploy multiple mongod servers. And in the front, mongos which is a router. The application talks to this router. This router then talks to various servers, the mongods. The application and the mongos are usually co-located on the same server. We can have multiple mongos services running on the same machine. It's also recommended to keep set of multiple mongods (together called replica set), instead of one single mongod on each server. A replica set keeps the data in sync across several different instances so that if one of them goes down, we won't lose any data. Logically, each replica set can be seen as a shard. It's transparent to the application, the way MongoDB chooses to shard is we choose a shard key.
Assume, for student collection we have stdt_id as the shard key or it could be a compound key. And the mongos server, it's a range based system. So based on the stdt_id that we send as the shard key, it'll send the request to the right mongod instance.
So, what do we need to really know as a developer?
insert must include a shard key, so if it's a multi-parted shard key, we must include the entire shard key
we've to understand what the shard key is on collection itself
for an update, remove, find - if mongos is not given a shard key - then it's going to have to broadcast the request to all the different shards that cover the collection.
for an update - if we don't specify the entire shard key, we have to make it a multi update so that it knows that it needs to broadcast it
Whenever you're thinking about sharding or replication, you need to think in the context of writers/update operations. If you don't need to scale writes then replications, as it fairly simpler, is a good choice for you.
On the other hand, if you workload mostly updates/writes then at some point you'll hit a write bottleneck. If write request comes Mongo blocks other writes request. Those write request blocks until the first request will be done. If you want to scale this writes and want parallelize it then you need to implement sharding.
Just to put this somewhere...
The most basic way to run mongo is as standalone server.
You write a config (file or cli options)
initiate the server using mongod
For this picture, I didn't include the "client". Check the next one.
A replica set is a set of servers initialized exactly as above with a different config file.
To link them, we connect to one of them, and initialize the replica set mode.
They will mirror each other (in the most common configuration). This system guarantees high availability of data.
The initialization of the replica set is represented in the red border box.
Sharding is not about replicating data, but about fragmenting data.
Each fragment of data is called chunk and goes to a different shard. shard = each replica set.
"main" server, running mongos instead of mongod. This is a router for queries from the client.
Obvious: The trade-off is a more complex architecture.
Novelty: configuration server (again, a different config file).
There is much more to add, but apart from the words the pictures hold much the same.
Even mongoDB recommends to study your case carefully before going sharding. Vertical scaling (vs) is probably a good idea at least once before horizontal scaling (hs).
vs is done upgrading hardware (cpu, ram, etc). hs is needs more computers (but could be cheap computers).
Both replication and sharding can be used (individually or together) for horizontal scaling of a MongoDB installation.
Sharding is MongoDB's solution for meeting the demands of data growth. Sharding stores data records across multiple servers to provide faster throughput on read and write queries, particularly for very large data sets.
Any of the servers in the sharded cluster can respond to a read or write operation, which greatly speeds up query responses.
Replication is MongoDB's solution for providing stability, backup, and disaster recovery to a MongoDB installation. This process copies and synchronizes the replica data set across multiple servers. This prevents downtime if one server goes offline.
Any of the secondary servers can respond to read queries, but only the primary server will perform write operations. The results of the write operation will then be propagated out to the secondary servers.
Scenario 1: Fault-Tolerance
In this scenario, the user is storing billing data in a MongoDB installation. This data is mission-critical to the user's business, and needs to be available 24/7, even if a server crashes or is taken offline.
MongoDB replication is the best solution for this user. With replication, the entire data set is mirrored on multiple servers. If a server fails or is taken offline, the other servers in the cluster take over.
Scenario 2: High Performance
In this scenario, the user is running a social networking site which is run from a MongoDB database. As the social network grows, the MongoDB data set has grown along with it. The user is seeing query times and page loads increase beyond an acceptable point. It is critical that the user's MongoDB installation receives a major performance boost.
Setting up a sharded MongoDB cluster is the best solution for this user. The sharded cluster will break up the user's data set and store parts of it on separate secondary servers. Each secondary server can respond to read or write queries on its portion of the data, which greatly increases the installation's response time
MongoDB Atlas is a Database as a service in could. It support three major cloud providers such as Azure , AWS and GCP. In cloud environment , we usually talk about high availability and scalability. In Atlas “clusters”, can be either a replica set or a sharded cluster.
These two address high availability and scalability features of our cloud environment.
In general Cluster is a group of servers used to achieve a specific task. So sharded clusters are used to store data in across multiple machines to meet the demand of data growth. As the size of the data increases, a single machine may not be sufficient to store the data nor provide an acceptable read and write throughput. Sharded clusters supports the horizontal scalability of the underling cloud environment.
A replica set in MongoDB is a group of mongod processes that maintain the same data set. Replica sets provide redundancy and high availability, and are the basis for all production deployments.In a replica, one node is a primary node that receives all write operations. All other instances, such as secondaries, apply operations from the primary so that they have the same data set. Replica set mainly focus on the availability of data.
Please check the documentation
Thank You.