load balancing postgres instances via aws network balancer possible? - postgresql

We have an application that has multiple postgres databases (that are guaranteed to be in sync) installed on AWS EC2 instances in different availability zones. I would like to abstract them behind a single DNS so that, if one of the EC2 instances crashes, the clients can still access the db. I am wondering if I can use an AWS network load balancer to load balance my databases? Why or why not? If not, is there any other standard, easy-to-implement solution that I can use? (I am aware of http://www.pgpool.net/mediawiki/index.php/Main_Page for example. However, am leery of using something that I have to setup myself, especially since I would have to replicate pgpool instances as well...)

Having just tried it myself, it does seem you can set up a network load balancer to load balance your databases. My production setup uses patroni to manage failover, and patroni provides an HTTP API for health checks. The master returns a 200 response, while the replicas return a 503. This works fine for my use case, where the replicas are there just for failover, not for replicated reads. I'm assuming you could come up with some code that returns a successful response for health checks based on your needs.
I configured the load balancer to listen to port 5432 and the health checks to connect on port 8008. I modified the security group for the postgres instances to allow connections from my VPC's IP range, since the NLB doesn't have security groups. Connecting via psql to the NLB's DNS name worked as expected.
Though it works, I think I'll stick with my current setup, which has a PgBouncer running on each application instance (so no need to worry managing a pool of bouncer instances) with consul-template updating pgbouncer.ini and reloading PgBouncer when the leader key changes in consul.

Related

How to connect a web server to a Kubernetes statefulset and headless service

I have been learning Kubernetes for a few weeks and now I am trying to figure out the right way to connect a web server to a statefulset correctly.
Let's say I deployed a master-slave Postgres statefulset and now I will connect my web server to it. By using a cluster IP service, the requests will be load balanced across the master and the slaves for both reading (SELECT) and writing (UPDATE, INSERT, DELETE) records, right? But I can't do that because writing requests should be handled by the master. However, when I point my web server to the master using the headless service that will give us a DNS entry for each pod, I won't get any load balancing to the other slave replications and all of the requests will be handled by one instance and that is the master. So how am I supposed to connect them the right way? By obtaining both load balancing to all replications along with the slave in reading records and forwarding writing records requests to the master?
Should I use two endpoints in the web server and configure them in writing and reading records?
Or maybe I am using headless services and statefulsets the wrong way since I am new to Kubernetes?
Well, your thinking is correct - the master should be read-write and replicas should be read only. How to configure it properly? There are different possible approaches.
First approach is what you thinking about, to setup two headless services - one for accessing primary instances, the second one to access to the replica instances - good example is Kubegres:
In this example, Kubegres created 2 Kubernetes Headless services (of default type ClusterIP) using the name defined in YAML (e.g. "mypostgres"):
a Kubernetes service "mypostgres" allowing to access to the Primary PostgreSql instances
a Kubernetes service "mypostgres-replica" allowing to access to the Replica PostgreSql instances
Then you will have two endpoints:
Consequently, a client app running inside a Kubernetes cluster, would use the hostname "mypostgres" to connect to the Primary PostgreSql for read and write requests, and optionally it can also use the hostname "mypostgres-replica" to connect to any of the available Replica PostgreSql for read requests.
Check this starting guide for more details.
It's worth noting that there are many database solutions which are using this approach - another example is MySQL. Here is a good article in Kubernetes documentation about setting MySQL using Stateful set.
Another approach is to use some middleware component which will act as a gatekeeper to the cluster, for example Pg-Pool:
Pg pool is a middleware component that sits in front of the Postgres servers and acts as a gatekeeper to the cluster.
It mainly serves two purposes: Load balancing & Limiting the requests.
Load Balancing: Pg pool takes connection requests and queries. It analyzes the query to decide where the query should be sent.
Read-only queries can be handled by read-replicas. Write operations can only be handled by the primary server. In this way, it loads balances the cluster.
Limits the requests: Like any other system, Postgres has a limit on no. of concurrent connections it can handle gracefully.
Pg-pool limits the no. of connections it takes up and queues up the remaining. Thus, gracefully handling the overload.
Then you will have one endpoint for all operations - the Pg-Pool service. Check this article for more details, including the whole setup process.

mongodb cluster with ELB endpoint as dns

This is not a technical but more of architectural question I am asking here.
I have followed this blog for setting up the mongodb cluster. We have 2 private subnets in which I have configured 3 member replica set of mongodb. Now I want use a single dns like mongod.some_subdomain.example.com for whole cluster.
I do not have access to Route53 and setting/updating the dns records takes at least 2 hours in my case since I am dependant on our cloud support for it. I am not sure which server primarily responds to applications requests in mongodb cluster.
So is there a way to put the whole cluster behind ELB and use ELB as DNS to route traffic to primary and at the same time if there is failover then next primary would be the member of ELB except the arbiter node.
The driver will attempt to connect to all nodes in the replica set configuration. If you put nodes behind proxies the driver will bypass the proxies and try to talk to the nodes directly.
You can proxy standalone and sharded cluster deployments as the driver doesn't need a direct connection to data nodes in those but mapping multiple mongoses to a single address can create problems with retryable reads/writes, sessions, transactions etc. This is not a supported configuration.

How does Cassandra driver update contactPoints if all pods are restarted in Kubernetes without restarting the client application?

We have created a statefulset & headless service. There are 2 ways by which we can define peer ips in application:
Use 'cassandra-headless-service-name' in contactPoints
Fetch the peers ip from headless-service & externalize the peers ip and read these ips when initializing the connection.
SO far so good.
Above will work if one/some pods are restarted, not all. In this case, driver will updated the new ips automatically.
But, how this will work in case of complete outage ? If all pods are down & when they come back, if all pods ip are changed (IP can change in Kubernetes), how do application will connect to Cassandra?
In a complete outage, you're right, the application will not have any valid endpoints for the cluster. Those will need to be refreshed (and the app restarted) before the app will connect to Cassandra.
We actually wrote a RESTful API that we can use query current, valid endpoints by cluster. That way, the app teams can find the current IPs for their cluster at any time. I recommend doing something similar.

Pgbouncer - Can I use pgbouncer for load balancing the request in postgres cluster?

Currently, I am using pgbouncer for connection pooling in the postgresql cluster. I just want to make sure, Whether it is possible to load balance request between the nodes in the postgresql cluster using pgbouncer.
Now there's pgbouncer-rr-patch(pgbouncer fork by AWS) that can do load balancing:
Routing: intelligently send queries to different database servers from one client connection; use it to partition or load balance across multiple servers/clusters.
From the PgBouncer FAQ
How to load-balance queries between several servers?
PgBouncer does not have internal multi-host configuration. It is possible via some external tools:
DNS round-robin. Use several IPs behind one DNS name. PgBouncer does not look up DNS each time new connection is launched. Instead it caches all IPs and does round-robin internally. Note: if there is more than 8 IPs behind one name, the DNS backend must support EDNS0 protocol. See README for details.
Use a TCP connection load-balancer. Either LVS or HAProxy seem to be good choices. On PgBouncer side it may be good idea to make server_lifetime smaller and also turn server_round_robin on - by default idle connections are reused by LIFO algorithm which may work not so well when load-balancing is needed

MongoDB replica set in Azure, where do I point the firewall?

I have a mongoDB replica set in azure
I have:
server1 Primary
server2 secondary
server3 Arbiter
I have a dev environment on my local machine that I want to point to this mongoDB instance
What do I open on my Azure Firewall to make sure this configuration is setup with best practices.
Do I create a load balanced endpoint to the Primary and Secondary or do I create a single endpoint to the arbiter, or perhaps even something else?
thanks!
MongoDB will not play well with a load-balanced endpoint (as you might end up sending traffic to a secondary, and you'd have no control over this unless you implemented a custom probe for each VM, and then you'd need to update the probe's status based on the replicaset node's health, for each node). The MongoDB client-side driver is designed to work with a replicaset's topology to make the correct decision on which node to communicate with. Each replicaset node should have a discrete addressable ip:port. If you have all your instances in a single cloud service (e.g. myservice.cloudapp.net) then you'll need one port per instance (since they'd all share a single ip address). If each instance is in a different cloud service, then you can have the same port for each, with different dns name / ip address for each.
The best solution with an iptables is to open the third with an ip rule. It's open in the twice configuration and secure. This solution is the best architecture for your code.