mongodb cluster with ELB endpoint as dns - mongodb

This is not a technical but more of architectural question I am asking here.
I have followed this blog for setting up the mongodb cluster. We have 2 private subnets in which I have configured 3 member replica set of mongodb. Now I want use a single dns like mongod.some_subdomain.example.com for whole cluster.
I do not have access to Route53 and setting/updating the dns records takes at least 2 hours in my case since I am dependant on our cloud support for it. I am not sure which server primarily responds to applications requests in mongodb cluster.
So is there a way to put the whole cluster behind ELB and use ELB as DNS to route traffic to primary and at the same time if there is failover then next primary would be the member of ELB except the arbiter node.

The driver will attempt to connect to all nodes in the replica set configuration. If you put nodes behind proxies the driver will bypass the proxies and try to talk to the nodes directly.
You can proxy standalone and sharded cluster deployments as the driver doesn't need a direct connection to data nodes in those but mapping multiple mongoses to a single address can create problems with retryable reads/writes, sessions, transactions etc. This is not a supported configuration.

Related

How to connect a web server to a Kubernetes statefulset and headless service

I have been learning Kubernetes for a few weeks and now I am trying to figure out the right way to connect a web server to a statefulset correctly.
Let's say I deployed a master-slave Postgres statefulset and now I will connect my web server to it. By using a cluster IP service, the requests will be load balanced across the master and the slaves for both reading (SELECT) and writing (UPDATE, INSERT, DELETE) records, right? But I can't do that because writing requests should be handled by the master. However, when I point my web server to the master using the headless service that will give us a DNS entry for each pod, I won't get any load balancing to the other slave replications and all of the requests will be handled by one instance and that is the master. So how am I supposed to connect them the right way? By obtaining both load balancing to all replications along with the slave in reading records and forwarding writing records requests to the master?
Should I use two endpoints in the web server and configure them in writing and reading records?
Or maybe I am using headless services and statefulsets the wrong way since I am new to Kubernetes?
Well, your thinking is correct - the master should be read-write and replicas should be read only. How to configure it properly? There are different possible approaches.
First approach is what you thinking about, to setup two headless services - one for accessing primary instances, the second one to access to the replica instances - good example is Kubegres:
In this example, Kubegres created 2 Kubernetes Headless services (of default type ClusterIP) using the name defined in YAML (e.g. "mypostgres"):
a Kubernetes service "mypostgres" allowing to access to the Primary PostgreSql instances
a Kubernetes service "mypostgres-replica" allowing to access to the Replica PostgreSql instances
Then you will have two endpoints:
Consequently, a client app running inside a Kubernetes cluster, would use the hostname "mypostgres" to connect to the Primary PostgreSql for read and write requests, and optionally it can also use the hostname "mypostgres-replica" to connect to any of the available Replica PostgreSql for read requests.
Check this starting guide for more details.
It's worth noting that there are many database solutions which are using this approach - another example is MySQL. Here is a good article in Kubernetes documentation about setting MySQL using Stateful set.
Another approach is to use some middleware component which will act as a gatekeeper to the cluster, for example Pg-Pool:
Pg pool is a middleware component that sits in front of the Postgres servers and acts as a gatekeeper to the cluster.
It mainly serves two purposes: Load balancing & Limiting the requests.
Load Balancing: Pg pool takes connection requests and queries. It analyzes the query to decide where the query should be sent.
Read-only queries can be handled by read-replicas. Write operations can only be handled by the primary server. In this way, it loads balances the cluster.
Limits the requests: Like any other system, Postgres has a limit on no. of concurrent connections it can handle gracefully.
Pg-pool limits the no. of connections it takes up and queues up the remaining. Thus, gracefully handling the overload.
Then you will have one endpoint for all operations - the Pg-Pool service. Check this article for more details, including the whole setup process.

load balancing postgres instances via aws network balancer possible?

We have an application that has multiple postgres databases (that are guaranteed to be in sync) installed on AWS EC2 instances in different availability zones. I would like to abstract them behind a single DNS so that, if one of the EC2 instances crashes, the clients can still access the db. I am wondering if I can use an AWS network load balancer to load balance my databases? Why or why not? If not, is there any other standard, easy-to-implement solution that I can use? (I am aware of http://www.pgpool.net/mediawiki/index.php/Main_Page for example. However, am leery of using something that I have to setup myself, especially since I would have to replicate pgpool instances as well...)
Having just tried it myself, it does seem you can set up a network load balancer to load balance your databases. My production setup uses patroni to manage failover, and patroni provides an HTTP API for health checks. The master returns a 200 response, while the replicas return a 503. This works fine for my use case, where the replicas are there just for failover, not for replicated reads. I'm assuming you could come up with some code that returns a successful response for health checks based on your needs.
I configured the load balancer to listen to port 5432 and the health checks to connect on port 8008. I modified the security group for the postgres instances to allow connections from my VPC's IP range, since the NLB doesn't have security groups. Connecting via psql to the NLB's DNS name worked as expected.
Though it works, I think I'll stick with my current setup, which has a PgBouncer running on each application instance (so no need to worry managing a pool of bouncer instances) with consul-template updating pgbouncer.ini and reloading PgBouncer when the leader key changes in consul.

How do you create a mongodb replica set or shard that is externally available in kubernetes?

I have followed tutorials and set up working mongodb replica sets however when it comes to exposing them as a service I am stuck with using a LoadBalancer which directs to any pod. In most cases this ends up being a secondary database and not terrible helpful. I have also managed to get separate mongodb replicas set up and then tried to connect to those externally however connections fail because internal replicaset ips are all through local google cloud dns.
What I am hoping for is something like this.
Then (potentially) there is a single connection uri that could connect you to your mongodb replicaset without needing individual mongodb connection details.
I'm not sure if this is possible but any help is greatly appreciated!
The loadbalancer type service will route traffic to any one pod matches its selector, which is not how mongodb replica set works. The connection string should contain all instances in the set. You probably need to expose each replica instance with type=loadbalancer. Then you may connect via "mongodb://mongo-0_IP,mongo-1_IP,mongo-2_IP:27017/dbname_?"
If you configure a mongodb replica set with stateful sets, you should
also create a headless service. Then you can connect to the replica set with a url like:
“mongodb://mongo-0.mongo,mongo-1.mongo,mongo-2.mongo:27017/dbname_?”
Here mongo-0,mongo-1,mongo-2 are pod names and "mongo" is the headless service name.
If you still want to able to connect to a specific mongo instance, you can create separate service (of type=NodePort) for each of the deployment/replica and then you should be able to connect to a specific mongo instance using <any-node-ip>:<nodeport>
But you'll not be able to leverage the advantages of having a mongo replica set in this case.

MongoDB replica set in Azure, where do I point the firewall?

I have a mongoDB replica set in azure
I have:
server1 Primary
server2 secondary
server3 Arbiter
I have a dev environment on my local machine that I want to point to this mongoDB instance
What do I open on my Azure Firewall to make sure this configuration is setup with best practices.
Do I create a load balanced endpoint to the Primary and Secondary or do I create a single endpoint to the arbiter, or perhaps even something else?
thanks!
MongoDB will not play well with a load-balanced endpoint (as you might end up sending traffic to a secondary, and you'd have no control over this unless you implemented a custom probe for each VM, and then you'd need to update the probe's status based on the replicaset node's health, for each node). The MongoDB client-side driver is designed to work with a replicaset's topology to make the correct decision on which node to communicate with. Each replicaset node should have a discrete addressable ip:port. If you have all your instances in a single cloud service (e.g. myservice.cloudapp.net) then you'll need one port per instance (since they'd all share a single ip address). If each instance is in a different cloud service, then you can have the same port for each, with different dns name / ip address for each.
The best solution with an iptables is to open the third with an ip rule. It's open in the twice configuration and secure. This solution is the best architecture for your code.

How do I configure mongodb replicaset using elastic IP's in EC2?

tldr: What will I need to do in order to use an elastic IP in my MongoDB replicaset configuration?
We have a three-node MongoDB replicaset running on EC2. One of the instances in the set was retired by AWS yesterday, and so we were forced to stop and restart the EC2 instance.
Unfortunately, when we first configured the replicaset, we were fairly new to AWS and not aware that the public DNS address of the instances was subject to change. We used the public DNS of each instance in the replicaset configuration, and in all of the application connection strings in our code. After reading up on the subject yesterday, I tried to get the node back online by assigning an elastic IP to the instance and changing the replicaset configuration to use that IP. After some pain, I was able to get the other two nodes back up and running with that configuration, but the instance with the elastic IP refused to re-join the replicaset, and the error in mongod.log says:
[rsStart] replSet info self not present in the repl set configuration
After yet more reading, I found that I should not have used the actual elastic IP in the config, but rather the public DNS name of the elastic IP. My question is, before I take everything offline again to try this change, what exactly will I need to do in order to use the elastic IP in the replicaset configuration? I found some information on this 10Gen page: http://docs.mongodb.org/ecosystem/platforms/amazon-ec2/#communication-across-regions that made me think I might need to mess with the hostname of the instance and/or the hosts file, but I haven't been able to find anybody describing my exact scenario.
Any thoughts?
It turned out to be a pretty simple fix; once I changed the replicaset configuration to use the public DNS of the elastic IP, the mongo node came back online. I didn't have to touch the hostname or the hosts file.
You should never use an Elastic IP for internal traffic like replication. You will be charged $0.01/GB for this traffic, whereas using the internal IP would be free.
If you're using something like replica sets, you really should be running in a VPC. Unlike normal EC2 instances, instances in an VPC keep the same private IP addresses and Elastic IP addresses even when stopped.