mongoDB cluster structure and costs - mongodb

I am new to mongoDB and I use kubernetes in order to provision mongoDB.
I have understood that the deployment of mongo is divided into config/mongos/mongo-data-nodes
each of the config and the data nodes requires 3 replicas.
data nodes will be divided into shards.
this will give me a very a large amount of running mongo instances as a big potion of these instances will not be functioning as they will just be replicas.
is there a way to run multiple replicas on the same instance?
is it recommended to run multiple replicas on the same instance?
how do you manage your mongoDB cluster in order to avoid reaching dozens of mongoDB instances where only a portion of them are actually usable?
any help would be appreciated

Related

Mongodb instance alone vs mongodb on kubernetes

what is the better solution for running a mongodb instance? Lets say we have a running Kube cluster. MongoDb itself has its own clustering/sharding solution. We want this mongodb to grow in size and we expect it to get quite big, so we definitely need to use its sharding solution.
How does this fit into Kubernetes? Seems to me they don't really work well together? What I'm talking about is that Kubernetes "clones" pods over nodes, while the point of mongodb sharding is that you separate data over a cluster (not cloning data)? Am I wrong about something here?
Thank you for your input
We have been running a sharded cluster of Mongo and ES in production with TBs of data for 3 years and never faced replication or other scaling issues.
i would suggest checking out this official link form MongoDb : https://docs.mongodb.com/kubernetes-operator/master/tutorial/deploy-sharded-cluster/
Using this you can deploy the shareded cluster of MongoDB on Kubernetes.
Basically mongo has its own operator which will operate and manage the PODs for you if any data replication required scaling etc.
It's required first to before you setup the cluster : https://docs.mongodb.com/kubernetes-operator/master/tutorial/install-k8s-operator/
If you just want to use the helm chart check out this once : https://hub.kubeapps.com/charts/bitnami/mongodb-sharded
If you want to setup eveything by own you can refer the : https://medium.com/google-cloud/sharded-mongodb-in-kubernetes-statefulsets-on-gke-ba08c7c0c0b0
One of my fav channel: https://www.youtube.com/watch?v=7Lp6R4CmuKE

Citus: Can I view sharded tables of each node on master node?

I am managing my PostgreSQL cluster in docker-compose and I have connected to master(coordinator) node via external client. For a given table(for example companies) and a given node number(for example worker 2 or other identifier), can I view sharded tables(for example companies_000005},companies_0000010} and their rows on worker 2) in master node without directly connecting to the individual node?
To see some information about the shards (such as shard sizes or which node the shard is on), you can use the following query with Citus 10 and later:
SELECT * FROM citus_shards;
Also, accessing the shards directly is not a suggested pattern, and it prevents certain checks/enforcements that Citus does around distributed locking and correctness etc. With that in mind, why do you need to access the shards directly?

MongoDB data replication in Kubernetes

I've been configuring pods in Kubernetes to hold a mongodb and golang image each with a service to load-balance. The major issue I am facing is data replication between databases. Replication controllers/replicasets do not seem to do what the name implies, but rather is a blank-slate copy instead of a replica of existing/currently running pods. I cannot seem to find any examples or clear answers on how Kubernetes addresses this, or does it even?
For example, data insertions being sent by the Go program are going to automatically load balance to one of X replicated instances of mongodb by the service. This poses problems since they will all be maintaining separate documents without any relation to one another once Kubernetes begins to balance the connections among other pods. Is there a way to address this in Kubernetes, or does it require a complete re-write of the Go code to expect data replication among numerous available databases?
Sorry, I'm relatively new to Kubernetes and couldn't seem to find much information regarding this.
You're right, a replica set is not a replica of another container, it's just a container with the same configuration spun up within the same logical unit.
A replica set (or deployment, which is the resource you should be using now) will have multiple pods, and it's up to you, the operator, to configure the mongodb part.
I would recommend reading this example of how to set up a replica set with multiple mongodb containers:
https://medium.com/google-cloud/mongodb-replica-sets-with-kubernetes-d96606bd9474#.e8y706grr

Is it possible to create Replica Set with Sharded Cluster (mongos) and Single Server (mongod)?

I want to make continuous backup of my Sharded Cluster on a single MongoDB server somewhere else.
So, is it possible to create Replica Set with Sharded Cluster (mongos instance) and single MongoDB server?
Did anyone experience creating Replica Sets with two Sharded Clusters or with one Sharded Cluster and one Single Server?
How does it work?
By the way, the best (and for now, the only) way to continuously backup Sharded Cluster is by using MongoDB Management Service (MMS).
I were also facing the same issue sometime back. I wanted to take replica of all sharded cluster into one MongoDB. But, i didn't found any solution for this scenario, and I think this is true, because -
If you configure the multiple shard server (say 2 shard server) with
one replica set, then this will not work because in a replica set (say
rs0) only 1-primary member is possible. And In this scenario, we will
have multiple primary depend on number of shard server.
To take the backup of your whole sharded cluster, you must stop all writes to the cluster. You can refer to MongoDB documentation on this - http://docs.mongodb.org/manual/tutorial/backup-sharded-cluster-with-database-dumps/

setting development project with mongo database on EC2 cluster

I would like to create a development project on EC2 cluster. Current design suggest accessing mongo database files stored on EBS volume. If that is possible to run distributed computing and access same files in /data/db/ simultaneously from different nodes?
No, that will not work. You cannot access the same mongodb database files from different processes on different nodes.
The way you use mongoDB in a distributed environment is with replica sets and sharding. In both cases you have mongodb instances running on each node. Replica sets duplicate the same data across all the nodes in the set, for data redundancy and fault tolerance. Sharding allows you to distribute different sets of data on different nodes to provide horizontal scaling. Large production environments use both replica sets and sharding.
Best place to read up on all of this is:
http://docs.mongodb.org/manual/administration/replica-sets/
http://docs.mongodb.org/manual/sharding/
http://docs.mongodb.org/ecosystem/platforms/amazon-ec2/