Mongos + AutoScaling - mongodb

We're currently running a cluster of application servers that are under an autoscaling group in AWS. Each of this application servers has it's own instance of mongos running, so the application just connects to localhost to gain access to the MongoDB cluster.
I read in the documentation that the balancer is a process running under mongos. What happens if the server is scaled down and the balancer is running in that server? Would it be possible to say that only this mongos instance at this server ip will run the balancer?
Thanks

Yes the documentation explicitly states that every mongos has a balancer process which is associated with it which is responsible for distributing data (evenly) in a sharded collection across different shard. By default 'balancer' process is enabled. Optionally it can be disabled.
Hence,
If a server is scaled down 'balancer' will still be running on server with mongos
Only servers that run mongos instance will have 'balancer' running.

Related

mongodb cluster with ELB endpoint as dns

This is not a technical but more of architectural question I am asking here.
I have followed this blog for setting up the mongodb cluster. We have 2 private subnets in which I have configured 3 member replica set of mongodb. Now I want use a single dns like mongod.some_subdomain.example.com for whole cluster.
I do not have access to Route53 and setting/updating the dns records takes at least 2 hours in my case since I am dependant on our cloud support for it. I am not sure which server primarily responds to applications requests in mongodb cluster.
So is there a way to put the whole cluster behind ELB and use ELB as DNS to route traffic to primary and at the same time if there is failover then next primary would be the member of ELB except the arbiter node.
The driver will attempt to connect to all nodes in the replica set configuration. If you put nodes behind proxies the driver will bypass the proxies and try to talk to the nodes directly.
You can proxy standalone and sharded cluster deployments as the driver doesn't need a direct connection to data nodes in those but mapping multiple mongoses to a single address can create problems with retryable reads/writes, sessions, transactions etc. This is not a supported configuration.

Standalone MongoDB installation for Production

I want to deploy MongoDB to Kubernetes cluster with 2 nodes, there is no chance to add another node in the future.
I want to deploy MongoDB as standalone because both node will be able to access to same disk space via NFS and I don't have requirements for replication or high availability. However, in the MongoDB docs, it is clearly stated that standalone deployment is not suitable for production environment.
MongoDB Deploy Standalone
You can deploy a standalone MongoDB instance for Cloud Manager to manage. Use standalone instances for testing and development. Do not use these deployments for production systems as they lack replication and high availability.
What kind of drawbacks I can face? Should I deploy as replica set with arbiter instance? If yes, why?
Of course you can deploy a Standalone MongoDB for production. But if this node fails, then your application is not available anymore. If you don't have any requirement for availability then go for a Standalone MongoDB.
However, running 2 MongoDB services which access the same physical disk (i.e. dbPath) will not work. Each MongoDB instance need to have a dedicated data folder.
In your case, I would suggest a Replica Set. All data from one node will be replicated to the other one. If one node fails then the application goes into "read/only" mode.
You can deploy an arbiter instance on the primary node. If the secondary node goes down, then the application is still fully available.
It is always recommended to deploy as replicaSet for production , however if you deploy as standalone and you have 2x kubernetes nodes , kubernetes can ensure there is always 1x running instance attached to the NFS storage in any of the available nodes , but the risk is that when the data on the storage is corrupted you will not have where to replicate from unless you do often backups and you dont care if you miss some recenly inserted data ...

Zookeeper for High availability

How does zookeeper work in the following situation.
Consider I am having 3 (1,2,3) vm's and different services are running at their endpoints. My entire administration setup (TAC) is available only on the 1st vm (virtual machine) that means whenever a client wants to connect, it would by default connect to the first vm. My other 2 vm's they are just running bunch of services. This entire cluster setup is maintained by the Zookeeper.
My question is what is the 1st vm fails. I know zookeeper maintains high availability by electing another vm as the master but client by default only points to 1st vm but not to other two. Is there any chance I can overcome this situation by getting the Ip of the first node as my admin setup is entirely present only on that node or in any other method?

load balancing postgres instances via aws network balancer possible?

We have an application that has multiple postgres databases (that are guaranteed to be in sync) installed on AWS EC2 instances in different availability zones. I would like to abstract them behind a single DNS so that, if one of the EC2 instances crashes, the clients can still access the db. I am wondering if I can use an AWS network load balancer to load balance my databases? Why or why not? If not, is there any other standard, easy-to-implement solution that I can use? (I am aware of http://www.pgpool.net/mediawiki/index.php/Main_Page for example. However, am leery of using something that I have to setup myself, especially since I would have to replicate pgpool instances as well...)
Having just tried it myself, it does seem you can set up a network load balancer to load balance your databases. My production setup uses patroni to manage failover, and patroni provides an HTTP API for health checks. The master returns a 200 response, while the replicas return a 503. This works fine for my use case, where the replicas are there just for failover, not for replicated reads. I'm assuming you could come up with some code that returns a successful response for health checks based on your needs.
I configured the load balancer to listen to port 5432 and the health checks to connect on port 8008. I modified the security group for the postgres instances to allow connections from my VPC's IP range, since the NLB doesn't have security groups. Connecting via psql to the NLB's DNS name worked as expected.
Though it works, I think I'll stick with my current setup, which has a PgBouncer running on each application instance (so no need to worry managing a pool of bouncer instances) with consul-template updating pgbouncer.ini and reloading PgBouncer when the leader key changes in consul.

MongoDB replica set in Azure, where do I point the firewall?

I have a mongoDB replica set in azure
I have:
server1 Primary
server2 secondary
server3 Arbiter
I have a dev environment on my local machine that I want to point to this mongoDB instance
What do I open on my Azure Firewall to make sure this configuration is setup with best practices.
Do I create a load balanced endpoint to the Primary and Secondary or do I create a single endpoint to the arbiter, or perhaps even something else?
thanks!
MongoDB will not play well with a load-balanced endpoint (as you might end up sending traffic to a secondary, and you'd have no control over this unless you implemented a custom probe for each VM, and then you'd need to update the probe's status based on the replicaset node's health, for each node). The MongoDB client-side driver is designed to work with a replicaset's topology to make the correct decision on which node to communicate with. Each replicaset node should have a discrete addressable ip:port. If you have all your instances in a single cloud service (e.g. myservice.cloudapp.net) then you'll need one port per instance (since they'd all share a single ip address). If each instance is in a different cloud service, then you can have the same port for each, with different dns name / ip address for each.
The best solution with an iptables is to open the third with an ip rule. It's open in the twice configuration and secure. This solution is the best architecture for your code.