I'm really new into learning MongoDB and I'm having a hard time trying to understand sharding.
So I have my PC who is the host and I have created two VMs using VirtualBox. In my PC (host) there is a DB with some data. So my issue is which of those 3 components should be the the Config Server, the Shard and the Query Router (mongo). Can somebody please help me explaining this ? (I have read the documentation and still haven't understand it completely).
A sharded cluster needs to run at least three processes, a "decent" sharded cluster runs at least 10 processes (1 mongos Router, 3 Config servers, 2 x 3 Shard servers). In production, they run on different machines, but it is no problem to run them all on the same machine.
I would suggest these steps for learning:
Deploy a Stand alone MongoDB
Deploy a Replica Set, see Deploy a Replica Set or Deploy Replica Set With Keyfile Authentication
Deploy a Sharded Cluster, see Deploy a Sharded Cluster or Deploy Sharded Cluster with Keyfile Authentication
In the beginning it is an overkill to use different machines. Use localhost for all.
You can run multiple mongod/mongos instances on one machine, just ensure they are using different ports (e.g. the default port 27017), different dbPath and preferable also different log files.
You may run your first trial without authentication. Once you got that working, use authentication with keyfile and as next step use authentication by x.509 certificates.
If you use Windows, then have a look at https://github.com/Wernfried/mongoDB-oneclick
Related
I am new to docker and just started playing around it. I have a following setup of my app in production as of now:
Server machine 1 : running spring-boot microservices
Server machine 2 : running redis
Server machine 3 : running postgres
If I use docker in server machine 1 and run all of the microservices as container and run the redis and postgres as a container as well in server machine 1, is this is correct thing to do ? Or I have to run the docker on all the server machines and run containers separately.
Which is the best practice to do ?
When first starting out I suggest doing it all on 1 machine. Your database containers can use volumes to save data to the machine itself. So when you need to switch to a different machine, because 1 machine is too slow, you can easily transfer your database data. When starting to use more than 1 machine to run Docker you probably want to use a deployment option like Kubernetes or Docker swarm. This will simplify the process of setting up your environments on different machines, because it will be done by Kubernetes.
Also when your application is getting a lot of traffic you might want to switch to Managed Databases, which are provided by services like GCP, AWS, Digitalocean, etc. A managed database will scale automatically, get updates frequently and back-up automatically. This will take a lot of burden of your shoulders. I personally use Managed Databases myself.
My suggestion for now: Use 1 machine, learn Kubernetes when your application gets more traffic. Look into managed databases (available for Redis and Postgres).
We want to create a MongoDB shard (v. 2.4). The official documentation recommends to have 3 config servers.
However, the policies of our company won't allow us to get 3 extra servers for this purpose. Since we have already 3 application servers (1 web node, 2 process nodes) we are considering to put the configuration servers in the same application servers, with the mongos. Availability is not critical for us.
What do you think about this configuration? Can we face some problem or is it discouraged for some reason?
Given that Availability is not critical for your use case, I would say it should be fine to place the config servers in the same application servers and mongos.
If one of the process nodes is down, you will lose: 1 x mongos, 1 application server and 1 config server. During this down time, the other two config servers will be read-only , which means there won't be balancing of shards, modification to cluster config etc. Although your other two mongos should still be operational (CRUD wise). If your web-node is down, then you have a bigger problem to deal with.
If two of the nodes are down (2 process nodes, or 1 web server and process node), again, you would have bigger problem to deal with. i.e. Your applications are probably not going to work anyway.
Having said that, please consider the capacity of these nodes to be able to handle a mongos, an application server and a config server. i.e. CPU, RAM, network connections, etc.
I would recommend to test the deployment architecture in a development/staging cluster first under your typical workload and use case.
Also see Sharded Cluster High Availability for more info.
Lastly, I would recommend to check out MongoDB v3.2 which is the current stable release. The config servers in v3.2 are modelled as a replica set, see Sharded Cluster config servers for more info.
I'm running 4 vms (centos) on a single machine (Windows 2008 R2). The 4 vms are setup as below:
1 mongos
1 mongo configure server
2 mongod as sharding servers
OK, everything was fine before a power off accident. When the power came back, I did manually reboot all the VMs, and found the sharding setting is gone. I mean, the mongos can talk to the configure server, but somehow the sharding data is lost and it show the database is not sharded.
I setup the sharding based on documents from mongodb websites (e.g. running some command in mongo shell to enable sharding for the db and each collection). Do I need to do all the mongo shell commands again after rebooting? Or is it supposed to recover automatically once the sharding is enabled?
Thanks.
Once you've established a sharded cluster, it is certainly supposed to stay configured, even if servers fail, even if they all fail at the same time. Restarting the servers should bring everything up just the way it was before the outage. Based on your description, it is difficult to reason what might have gone wrong. A dump of the config database, and the log files of all the affected servers, would be necessary to analyze what happened. This should perhaps be filed as a ticket with MongoDB support.
(It is, by the way, recommended to run three config servers rather than a single one, for availability reasons. But even so, even a single server should recover just fine after having failed. The three-server recommendation is only to make sure that there's always a live config server even if one of them should fail.)
I'm trying to figure out how different instances of mongos server work together.
If I have 1 configserver and some shards, for example four, each of them composed by only one node (a master of course), and have four mongos server... do the mongos server communicate between them? Is it possible that one mongos redirect its load to another mongos?
When you have multiple mongos instances, they do not automatically load-balance between each other. They don't even know about each others existence.
The MongoDB drivers for most programming languages allow to specify multiple mongos instances when creating a connection. In that case the driver will usually ping all of them and connect to the one with the lowest latency. This will usually be the one which is closest geographically. When all have the same network distance, the one which is least busy right now will usually respond first. The driver will then stay connected to that one mongos, unless the program explicitely reconnects or the mongos can no longer be reached (in that case the driver will usually automatically pick another one from the initial list).
That means using multiple mongos instances is normally only a valid method for scaling when you have a large number of low-load clients, not one high-load client. When you want your one high-load client to make use of many mongos instances, you need to implement this yourself by creating a separate connection to each mongos instance and implement your own mechanism to distribute queries among them.
Short answer
As of MongoDB 2.4, the mongos servers only provide a routing service to direct read/write queries to the appropriate shard(s). The mongos servers discover the configuration for your sharded cluster via the config servers. You can find out more details in the MongoDB documentation: Sharded Cluster Query Routing.
Longer scoop
I'm trying to figure out how different instances of mongos server work togheter.
The mongos servers do not currently talk directly to each other. They do coordinate activity some activity via your config servers:
reading the sharded cluster metadata
initiating a balancing round (any mongos can start a balancing round, but only one round can be active at a time)
If I have 1 configserver
You should always have 3 config servers in production. If you somehow lose or corrupt your config server, you will have to combine your data and re-shard your database(s). The sharded cluster metadata saved on the config servers is the definitive source for what sharded data ranges should live on each shard.
some shards, for example four, each of them composed by only one node (a master of course)
Ideally each shard should be backed by a replica set if you want optimal uptime. Replica sets provide for auto-failover and can be very useful for administrative purposes (for example, taking backups or adding indexes offline).
Is it possible that one mongos redirect its load to another mongos?
No, the mongos do not perform any load balancing. The typical recommendation is to deploy one mongos per app server.
From an application/driver point of view you can specify multiple mongos in your connect string for failover purposes. The application drivers will generally connect to the nearest available mongos (by network ping time), and attempt to reconnect to in the event the current mongos connection fails.
The rule of thumb is to have the 'mongos' process running on each of your application servers. This keeps your application talking to localhost which is fast and your mongos processes scale with your app.
Say we have 2 distinct mongo clusters (sharded), is it possible to configure one mongos process to talk to two different clusters? It would be awesome to abstract away the fact that the databases lived in different places.
Or would you have to launch two different mongos processes on different ports? If this IS possible I still worry that it might be dangerous having two different mongos processes fighting for resources.
Or something completely different? Ideas?
Each mongos belongs to one, only one, cluster (defined by the config db servers). The mongos processes don't use much resources; you can run multiple on a machine.
You can have more than one sharded db/collection per cluster.