Is it possible to build a multi cluster postgres and distribute database but the app should not be aware about the distribution and multi cluster/nodes but instead connecting the some sort of LB or Proxy
Related
I have been learning Kubernetes for a few weeks and now I am trying to figure out the right way to connect a web server to a statefulset correctly.
Let's say I deployed a master-slave Postgres statefulset and now I will connect my web server to it. By using a cluster IP service, the requests will be load balanced across the master and the slaves for both reading (SELECT) and writing (UPDATE, INSERT, DELETE) records, right? But I can't do that because writing requests should be handled by the master. However, when I point my web server to the master using the headless service that will give us a DNS entry for each pod, I won't get any load balancing to the other slave replications and all of the requests will be handled by one instance and that is the master. So how am I supposed to connect them the right way? By obtaining both load balancing to all replications along with the slave in reading records and forwarding writing records requests to the master?
Should I use two endpoints in the web server and configure them in writing and reading records?
Or maybe I am using headless services and statefulsets the wrong way since I am new to Kubernetes?
Well, your thinking is correct - the master should be read-write and replicas should be read only. How to configure it properly? There are different possible approaches.
First approach is what you thinking about, to setup two headless services - one for accessing primary instances, the second one to access to the replica instances - good example is Kubegres:
In this example, Kubegres created 2 Kubernetes Headless services (of default type ClusterIP) using the name defined in YAML (e.g. "mypostgres"):
a Kubernetes service "mypostgres" allowing to access to the Primary PostgreSql instances
a Kubernetes service "mypostgres-replica" allowing to access to the Replica PostgreSql instances
Then you will have two endpoints:
Consequently, a client app running inside a Kubernetes cluster, would use the hostname "mypostgres" to connect to the Primary PostgreSql for read and write requests, and optionally it can also use the hostname "mypostgres-replica" to connect to any of the available Replica PostgreSql for read requests.
Check this starting guide for more details.
It's worth noting that there are many database solutions which are using this approach - another example is MySQL. Here is a good article in Kubernetes documentation about setting MySQL using Stateful set.
Another approach is to use some middleware component which will act as a gatekeeper to the cluster, for example Pg-Pool:
Pg pool is a middleware component that sits in front of the Postgres servers and acts as a gatekeeper to the cluster.
It mainly serves two purposes: Load balancing & Limiting the requests.
Load Balancing: Pg pool takes connection requests and queries. It analyzes the query to decide where the query should be sent.
Read-only queries can be handled by read-replicas. Write operations can only be handled by the primary server. In this way, it loads balances the cluster.
Limits the requests: Like any other system, Postgres has a limit on no. of concurrent connections it can handle gracefully.
Pg-pool limits the no. of connections it takes up and queues up the remaining. Thus, gracefully handling the overload.
Then you will have one endpoint for all operations - the Pg-Pool service. Check this article for more details, including the whole setup process.
My setup (running locally in two minikubes) is I have two k8s clusters:
frontend cluster is running a golang api-server,
backend cluster is running an ha bitnami postgres cluster (used bitnami postgresql-ha chart for this)
Although if i set the pgpool service to use nodeport and i get the ip + port for the node that the pgpool pod is running on i can hardwire this (host + port) to my database connector in the api-server (in the other cluster) this works.
However what i haven't been able to figure out is how to generically connect to the other cluster (e.g. to pgpool) without using the ip address?
I also tried using Skupper, which also has an example of connecting to a backend cluster with postgres running on it, but their example doesn't use bitnami ha postgres helm chart, just a simple postgres install, so it is not at all the same.
Any ideas?
For those times when you have to, or purposely want to, connect pods/deployments across multiple clusters, Nethopper (https://www.nethopper.io/) is a simple and secure solution. The postgresql-ha scenario above is covered under their free tier. There is a two cluster minikube 'how to' tutorial at https://www.nethopper.io/connect2clusters which is very similar to your frontend/backend use case. Nethopper is based on skupper.io, but the configuration is much easier and user friendly, and is centralized so it scales to many clusters if you need to.
To solve your specific use case, you would:
First install your api server in the frontend and your bitnami postgresql-ha chart in the backend, as you normally would.
Go to https://mynethopper.com/ and
Register
Clouds -> define both clusters (clouds), frontend and backend
Application Network -> create an application network
Application Network -> attach both clusters to the network
Application Network -> install nethopper-agent in each cluster with copy paste instructions.
Objects -> import and expose pgpool (call the service 'pgpool') in your backend.
Objects -> distribute the service 'pgpool' to frontend, using a distribution rule.
Now, you should see 'pgpool' service in the frontend cluster
kubectl get service
When the API server pods in the frontend request service from pgpool, they will connect to pgpool in the backend, magically. It's like the 'pgpool' pod is now running in the frontend.
The nethopper part should only take 5-10 minutes, and you do NOT need IP addresses, TLS certs, K8s ingresses or loadbalancers, a VPN, or an istio service mesh or sidecars.
After moving to the one cluster architecture, it became easier to see how to connect to the bitnami postgres-ha cluster, by trying a few different things finally this worked:
-postgresql-ha-postgresql-headless:5432
(that's the host and port I'm using to call from my golang server)
Now i believe it should be fairly straightforward to also run the two cluster case using skupper to bind to the headless service.
I have an application (AWS API Gateway) using an Aurora PostgreSQL cluster.
The cluster has 1 read/write (primary) and one reader endpoint.
At the moment, my application connections to the specific writer instance for all operations:
rds-instance-1.xxx.ap-southeast-2.rds.amazonaws.com
But I have the following endpoints available:
rds.cluster-xxx.ap-southeast-2.rds.amazonaws.com
rds.cluster-ro-xxx.ap-southeast-2.rds.amazonaws.com
rds-instance-1.xxx.ap-southeast-2.rds.amazonaws.com
rds-instance-1-ap-southeast-2c.xxx.ap-southeast-2.rds.amazonaws.com
If I am doing read and write operations, should I be connecting to the instance endpoint I'm using? Or should i use rds.cluster-xxx.ap-southeast-2.rds.amazonaws.com ? What are the benefits of using the different endpoints? I understand that if I connect to a read only endpoint I can only do reads, but for read/writes what's the difference connecting to:
rds.cluster-xxx.ap-southeast-2.rds.amazonaws.com
Or
rds-instance-1.xxx.ap-southeast-2.rds.amazonaws.com
?
What is the right / best endpoint to use for general workloads, and why?
You should use cluster reader/writer endpoint.
rds.cluster-xxx.ap-southeast-2.rds.amazonaws.com
rds.cluster-ro-xxx.ap-southeast-2.rds.amazonaws.com
The main benefit of using cluster endpoint is that if the failover occurs due to some reason you will not worry about the endpoint and you will can expect a minimal interruption of service.
Or what if you have 3 read replica then how you will manage to connect the reader? so Better to use cluster reader/writer endpoint.
Using the Reader Endpoint
You use the reader endpoint for read-only connections for your Aurora
cluster. This endpoint uses a load-balancing mechanism to help your
cluster handle a query-intensive workload. The reader endpoint is the
endpoint that you supply to applications that do reporting or other
read-only operations on the cluster.
Using the Cluster Endpoint
You use the cluster endpoint when you administer your cluster, perform
extract, transform, load (ETL) operations, or develop and test
applications. The cluster endpoint connects to the primary instance of
the cluster. The primary instance is the only DB instance where you
can create tables and indexes, run INSERT statements, and perform
other DDL and DML operations.
Instance endpoint
The instance endpoint provides direct control over connections to the
DB cluster, for scenarios where using the cluster endpoint or reader
endpoint might not be appropriate. For example, your client
application might require more fine-grained load balancing based on
workload type. In this case, you can configure multiple clients to
connect to different Aurora Replicas in a DB cluster to distribute
read workloads. For an example that uses instance endpoints to improve
connection speed after a failover for Aurora PostgreSQL
You can check furhter details AWS RDS Endpoints
We have 2 kubernetes clusters hosted on different data centers and we're deploying the applications to both these clusters. We have an external load balancer which is outside the clusters but the the load balancer only accepts static IPs. We don't have control over the clusters and we can't provision a static IP. How can we go about this?
We've also tried kong as an api gateway. We were able to create an upstream with targets as load balanced application endpoints and providing different weights but this doesn't give us active/passive or active/failover. Is there a way we can configure kong/nginx upstream to achieve this?
Consider using HA proxy, where you can configure your passive cluster as backup upstream, and you will get active/passive cluster working. As mentioned in this nice guide about HA proxy
backup meaning it won’t participate in the load balance unless both
the nodes above have failed their health check (more on that later).
This configuration is referred to as active-passive since the backup
node is just sitting there passively doing nothing. This enables you
to economize by having the same backup system for different
application servers.
Hope it helps!
I want to use following deployment architecture.
One machine running my webserver(nginx)
Two or more machines running uwsgi
Postgresql as my db on another server.
All the three are three different host machines on AWS. During development I used docker and was able to run all these three on my local machine. But I am clueless now as I want to split those three into three separate hosts and run it. Any guidance, clues, references will be greatly appreciated. I preferably want to do this using docker.
If you're really adamant on keeping the services separate on individual hosts then there's nothing stopping you from still using your containers on a Docker installed EC2 host for nginx/uswgi, you could even use a CoreOS AMI which comes with a nice secure Docker instance pre-loaded (https://coreos.com/os/docs/latest/booting-on-ec2.html).
For the database use PostgreSQL on AWS RDS.
If you're running containers you can also look at AWS ECS which is Amazons container service, which would be my initial recommendation, but I saw that you wanted all these services to be on individual hosts.
you can use docker stack to deploy the application in swarm,
join the other 2 hosts as worker and use the below option
https://docs.docker.com/compose/compose-file/#placement
deploy:
placement:
constraints:
- node.role == manager
change the node role as manager or worker1 or workern this will restrict the services to run on individual hosts.
you can also make this more secure by using vpn if you wish