optimal number of mongodb entities (administration/devops) - mongodb

I intend to check mongodb performance by running an application on 8 servers.
-1. here http://docs.mongodb.org/manual/tutorial/deploy-shard-cluster/ I read,
In production deployments, you must deploy exactly three config server
instances, each running on different servers to assure good uptime and
data safety. In test environments, you can run all three instances on
a single server.
What if I want to use optimally the resources of 8 servers (+ 1 dedicated server for the application)? Do I start 1 config server instance per server?
-2. I see here http://docs.mongodb.org/manual/core/replication-introduction/
that using replica sets with 3 mongod instances (with each mongod instance on a different server) is the way to go? Is this the optimal scenario when it comes to having 8 servers?
-3. How many replica sets would I use when I have 8 servers? 1 per server (8 servers == 8 replica sets == 3 mongod instances per server from different replica sets)?
-4. Is there any best practices documentation regarding optimization of this type?
Kind Regards,
Despot

What if I want to use optimally the resources of 8 servers (+ 1 dedicated server for the application)?
That's not an optimal way to plan, there is no way you know that you NEED 7 shards for your data.
Do I start 1 config server instance per server?
No, you are hardcoded to three.
Is this the optimal scenario when it comes to having 8 servers?
No, it is the minimum, you would ideally want more members, especially one bridging partitions; ensuring all the while you have an odd number of node on one side of your parition to ensure CAP.
Normally your replica set would consist of at least one extra member designed for backups, normally using a slaveDelay of maybe a day.
How many replica sets would I use when I have 8 servers?
Assuming (guessing) you want to use 7 shards you would have 7 replica sets, one per shard.
3 mongod instances per server from different replica sets
That would be a bad idea. You do not want to place the replica members on the same server as each other, you might as well be using no replication.
I would seriously plan more and check if you really need 7 shards, I highly doubt it.

Related

Sharding with replication

Sharding with replication]1
I have a multi tenant database with 3 tables(store,products,purchases) in 5 server nodes .Suppose I've 3 stores in my store table and I am going to shard it with storeId .
I need all data for all shards(1,2,3) available in nodes 1 and 2. But node 3 would contain only shard for store #1 , node 4 would contain only shard for store #2 and node 5 for shard #3. It is like a sharding with 3 replicas.
Is this possible at all? What database engines can be used for this purpose(preferably sql dbs)? Did you have any experience?
Regards
I have a feeling you have not adequately explained why you are trying this strange topology.
Anyway, I will point out several things relating to MySQL/MariaDB.
A Galera cluster already embodies multiple nodes (minimum of 3), but does not directly support "sharding". You can have multiple Galera clusters, one per "shard".
As with my comment about Galera, other forms of MySQL/MariaDB can have replication between nodes of each shard.
If you are thinking of having a server with all data, but replicate only parts to readonly Replicas, there are settings for replicate_do/ignore_database. I emphasize "readonly" because changes to these pseudo-shards cannot easily be sent back to the Primary server. (However see "multi-source replication")
Sharding is used primarily when there is simply too much traffic to handle on a single server. Are you saying that the 3 tenants cannot coexist because of excessive writes? (Excessive reads can be handled by replication.)
A tentative solution:
Have all data on all servers. Use the same Galera cluster for all nodes.
Advantage: When "most" or all of the network is working all data is quickly replicated bidirectionally.
Potential disadvantage: If half or more of the nodes go down, you have to manually step in to get the cluster going again.
Likely solution for the 'disadvantage': "Weight" the nodes differently. Give a height weight to the 3 in HQ; give a much smaller (but non-zero) weight to each branch node. That way, most of the branches could go offline without losing the system as a whole.
But... I fear that an offline branch node will automatically become readonly.
Another plan:
Switch to NDB. The network is allowed to be fragile. Consistency is maintained by "eventual consistency" instead of the "[virtually] synchronous replication" of Galera+InnoDB.
NDB allows you to immediately write on any node. Then the write is sent to the other nodes. If there is a conflict one of the values is declared the "winner". You choose which algorithm for determining the winner. An easy-to-understand one is "whichever write was 'first'".

Choosing between MongoDB and ElasticSearch - Scaling/Sharding

I'm currently deciding between MongoDB and Elasticsearch as a backend to a logging and analytics platform. I plan to use a cluster of 5 Intel Xeon Quad Core servers with 64GB RAM and a 500GB NVMe drive in each. With 1 replica set, it should support 1TB+ of data I'm guessing.
From what I've read on Elasticsearch, the recommended set-up for the above servers would be 5-10 shards, but shards cannot be increased in the future without a huge migration. So maybe I can add 5 more servers/nodes to the cluster for the same index, but not 10 or 20, because I can't create more shards to spread across the new nodes/servers - correct?
MongoDB appears to automatically manage sharding based on a key value and redistribute those shards as more nodes get added. So does that mean that I can add 50 more servers to the cluster in the future and MongoDB will happily spread the data from this one index across all the servers?
I basically only need 1TB of storage right now, but don't want to paint myself into a corner, should this 1 dataset end up growing to 100TB.
Without starting Elasticsearch with 100 shards at the beginning, which seems inefficient and bad practice, how can it scale past 5/10 servers for this single dataset?
As Val said, you would normally have time based indices, so you can easily (in a performant way) remove data after a certain retention period. So as your requirements change over time, you change your shard number (normally through an index template).
Current versions of Elasticsearch now support a _split API, which does exactly what you are asking for: Use 5 shards initially, but have the option to go up to any factor of 20 (just as an example) — so 5 -> 10 -> 30 would be options.
If you have 5 primary shards and a replication factor of 1, you could still spread out the load over 10 nodes: Writes to the 5 primary and 5 replica shards; reads will go to either one of them. Elasticsearch's write / read model is generally different than MongoDB's.
PS disclaimer: I work for Elastic now, but I have used MongoDB in production for 5 years as well.

Mongodb two nodes deployment

I know Mongodb suggest the replication set minimum is 3
Can I use two servers to install Mongodb with 4 replication sets in order to prevent Write failure when one node down?
My idea is to install two more instances on each server to fake/get rid of it:
Mongodb 1
Replcation set 1 (Master)
Replcation set 2 (Secondary)
Mongodb 2
Replcation set 3 (Secondary)
Replcation set 4 (Arbiter)
If one server down, it still has two replication sets for it.
Can anyone can comment my idea?
I would like to know is there any issue/risk needed to consider?
Thanks!
This is a bad idea. Your suggested configuration would be counter-productive for two main reasons:
Running two instances of MongoDB on the same hosts will lead to them both running slowly and awkwardly, competing for memory, cpu, disk i/o, etc.
If one of the hosts goes down, the two nodes on the other will not be able to continue the replica set because with only two nodes running out of four, you don't have the majority necessary to become primary.
MongoDB is intended to be run effectively on multiple hosts, so it is designed for that; even if you could find a way to shoe-horn a functioning replica set onto only two hosts, you would be working against MongoDB's design and struggling to make it work effectively.
Really, your best option is to run a single mongod instance on each of your two hosts, and to run an arbiter on a third (separate) host; that kind of normal 3-host configuration is an effective and straightforward way of achieving both data replication and high availability.
Please read https://docs.mongodb.com/manual/replication/
Even number of members in a set is not recommended. The only reason to add an Arbiter is to resolve this issue and increase number of members by 1 to make it odd. It makes no sense to add 2 Arbiters at all.
Similarly there is no point to put an Arbiter to the same server with any other member. It is quite lightweight, so you can co-locate it with an application server for example.
Having 2 db servers the best replica set configuration would be:
Mongodb 1
Master
Mongodb 2
Secondary
Any other server
Arbiter
Simple answer is, no you cannot do it..
Reason is "majority". If you loose one server of those two, you don't have majority of votes online.
This is reason, why 3 is minimum. It can be 3 data bearing nodes or 2 data bearing nodes and one arbiter. All of those have one vote and if you loose one of those, replica set still have 2/3 votes what is majority. Like 3/5 or 4/7.
Values 1/2 (or 2/4) are not majority.

Ideal number of replicas for database

I am looking for some best practices to be able to read enough to gauge how to decide on the number of replicas needed for Mongo. I am aware of Mongo Docs that talk about things like having odd number of nodes, and when does the need of arbiter arise, etc.
In our case the requirement for reads won’t be so high that reads will become a bottleneck. Neither are we targeting sharding at this moment. However, we are going to run mongo in a docker swarm and there could be multiple instances of certain services trying to write. Our swarm cluster won’t be very huge either most likely.
So how do I find logical answers to these:
Why not create one local mongo instance per physical node and tie it to that?
For any number of physical nodes, as long as read/write is not a bottle neck, 3 or 5 replicas are always going to be ideal for fault recovery and high availability. But why is 3 or 5 a good number. Why not 7 if I have say 10 physical nodes.
I am trying to find some good reads to be able to decide on how to arrive at a number. Any pointers?
To give you an answer, all depend on many criterias
What is your budget
How big is your data
what do you want to use your replica sets for
etc...
As an Example, In my case
We have 3 Data Centers across the country
One of them is very Small
We found our sweet spot in terms of number od nodes being 5
1 Primary + 1 Secondary in DC1
1 Arbiter in DC2
2 secondaries in DC3

mongoDB replication+sharding on 2 servers reasonable?

Consider the following setup:
There a 2 physical servers which are set up as a regular mongodb replication set (including an arbiter process, so automatic failover will work correctly).
now, as far as i understand, most actual work will be done on the primary server, while the slave will mostly just do work to keep its dataset in sync.
Would it be reasonable, to introduce sharding into this setup in a way that one would set up another replication set on the same 2 servers, so that each of them has one mongod process running as primary and one process running as secondary.
The expected result would be that both servers will share the workload of actual querys/inserts while both are up. In the case of one server failing the whole setup should elegantly fail over to continue running, until the other server is restored.
Are there any downsides to this setup, except the overall overhead in setup and number of processes (mongos/configservers/arbiters)?
That would definitely work. I'd asked a question in the #mongodb IRC channel a bit ago as to whether or not it was a bad idea to run multiple mongod processes on a single machine. The answer was "as long as you have the RAM/CPU/bandwidth, go nuts".
It's worth noting that if you're looking for high-performance reads, and don't mind writes being a bit slower, you could:
Do your writes in "safe mode", where the write doesn't return until it's been propagated to N servers (in this case, where N is the number of servers in the replica set, so all of them)
Set the driver-appropriate flag in your connection code to allow reading from slaves.
This would get you a clustered setup similar to MySQL - write once on the master, but any of the slaves is eligible for a read. In a circumstance where you have many more reads than writes (say, an order of magnitude), this may be higher performance, but I don't know how it'd behave when a node goes down (since writes may stall trying to write to 3 nodes, but only 2 are up, etc - that would need testing).
One thing to note is that while both machines are up, your queries are being split between them. When one goes down, all queries will go to the remaining machine thus doubling the demands placed on it. You'd have to make sure your machines could withstand a sudden doubling of queries.
In that situation, I'd reconsider sharding in the first place, and just make it an un-sharded replica set of 2 machines (+1 arbiter).
You are missing one crucial detail: if you have a sharded setup with two physical nodes only, if one dies, all your data is gone. This is because you don't have any redundancy below the sharding layer (the recommended way is that each shard is composed of a replica set).
What you said about the replica set however is true: you can run it on two shared-nothing nodes and have an additional arbiter. However, the recommended setup would be 3 nodes: one primary and two secondaries.
http://www.markus-gattol.name/ws/mongodb.html#do_i_need_an_arbiter