I am currently working on Kube deployment, unfortunatelly we have only 2 locations and the requirements are that we need to tolerate one location failure. It is not required that it must be automatic or zero downtime (so it is not expected to be HA, we just want disaster recovery).
I am completely aware that 2 masters cluster has lower availability than 1 master cluster. But the idea is that with two masters, when one location fails (or is taken down), admin will spin up another new master in the other location, ensuring majority.
Expected advantage over single master: The state of the cluster should survive location failure (we have also Postgres and other stateful components in Kube and we want Postgres to reconfigure slave to master, if node with master is taken down).
Is this approach sane? Or is there some other way how to deploy Kube in 2 locations?
Related
Deploy Mongo Database with a single master and two read replicas in the Kubernetes cluster of at least 3 worker nodes that are available in different availability zones.
Points to keep in mind while deploying the DB:
All replicas of DB should be deployed in a separate worker node of diff Availability Zone(For high availability).
Autoscale the read replicas if needed.
Data should be persistent.
Try to run the containers in nonprivileged mode if possible.
Use best practices as much as you can.
Push the task into a separate branch with a proper README file.
I want to understand what could be the possible impact of a master node failure in a k8s cluster with only one master node with internal etcd store.
As per my understanding, all kinds of deployed workload containers (including stateless and stateful sets with persistent volume claims) running on worker nodes would keep on running until recreation of any container is required as they don't have a direct functional dependency on the master node and etcd store for their core functions. And, the unavailability of the master node only affects the control plane operations for the cluster.
Is my understanding correct? If not, could you please explain the impact of the master node failure on my workload running on that cluster?
I understand that the best way to achieve HA for k8s cluster is to set up a multi-master cluster with possibly externalizing etcd stores also for decoupling of them. This question is to understand the exact impact of the master node failure to take an informed call before configuring a multi-master cluster.
Etcd operators on the quorum system so as long as the cluster sees a majority it will continue operating. If the failed node was the current leader, the others would trigger an election after the heartbeat timeout.
For kube-apiserver, it's a horizontal service so losing a node is not interesting, just like any other webapp. Some (most) controllers are singletons, but they run on every control plane node and use kube-apiserver for leader elections so as with Etcd, if the leader dies then a few seconds later another copy will get the leader lock and take over.
I see most of the K8S master components has a leader selection process except apiServer. If only one node will be the leader any point of time, why would we need more then 3 master cluster for bigger k8s cluster?
The requirement of minimum 3 hosts comes from the fact that Kubernetes HA cluster uses etcd for storing and syncing configuration. etcd requires minimum 3 nodes to ensure HA. In general case we need to use n+1 model when want to deploy Kubernetes HA cluster
In a single master setup, the master node manages the etcd database, API server, controller manager and scheduler, along with the worker nodes. However, if that single master node fails, all the worker node fail as well and entire cluster will be lost.
In a multi-master setup, by contrast, multi-master provides high availability for a single cluster and improves network performance because all the masters behave like a unified data center.
A multi-master setup protects against a wide range of failure modes, from a loss of single worker node to the failure of the master node’s etcd service. By providing redundancy, a multi-master cluster serves a highly available system for your end users.
Do not use a cluster with two master replicas. Consensus on a two-replica cluster requires both replicas running when changing persistent state. As a result, both replicas are needed and a failure of any replica turns cluster into majority failure state. A two-replica cluster is thus inferior, in terms of HA, to a single replica cluster.
Here are useful documentation: kubernetes-ha-cluster, creating-ha-cluster.
Articles: ha-cluster, ha.
A few days ago, I looked up why none of pods are being scheduled to the master node, and found this question: Allow scheduling of pods on Kubernetes master?
It tells that it is because the master node is tainted with "NoSchedule" effect, and gives the command to remove that taint.
But before I execute that command on my cluster, I want to understand why it was there in the first place.
Is there a reason why the master node should not run pods? Any best-practices it relates to?
The purpose of kubernetes is to deploy application easily and scale them based on the demand. The pod is a basic entity which runs the application and can be increased and decreased based on high and low demands respectively (Horizontal Pod Autoscalar).
These worker pods needs to be run on worker nodes specially if you’re looking at big application where your cluster might scale upto 100’s of nodes based on demand (Cluster Autoscalar). These increasing pods can put up pressure on your nodes and once they do you can always increase the worker node in cluster using cluster autoscalar.
Suppose, you made your master schedulable then the high memory and CPU pressure put your master at risk of crashing the master. Mind you can’t autoscale the master using autoscalar. This way you’re putting your whole cluster at risk. If you have single master then your will not be able to schedule anything if master crashed. If you have 3 master and one of them crashed, then the other two master has to take the extra load of scheduling and managing worker nodes and increasing the load on themselves and hence the increased risk of failure
Also, In case of larger cluster, you already need the master nodes with high resources just to manage your worker nodes. You can’t put additional load on master nodes to run the workload as well in that case. Please have a look at the setting up large cluster in kubernetes here
If you have manageable workload and you know it doesn’t increase beyond a certain level. You can make master schedulable. However for production cluster it is not recommended at all.
Primary role of master is cluster management. Already many components of k8 are running on master.Suppose If pods scheduled on master without limit of resources and pods are consuming all the resources( cpu or memory), then master and in turn whole cluster will be at risk.
So while designing Highly Available production cluster minimum 3 master, 3 etcd, 3 infra node are created and application pods are not scheduled on these nodes. Separate worker nodes added to assign workload.
Master is intended for cluster management tasks and should not be used to run workloads. In development and test environments it is ok to schedule pods on master servers but in production better to keep it only for cluster level management activities. Use workers or nodes to schedule workloads
For HA and Quorum I will install three master / etc nodes in three different data centers.
But I want to configure one node to never become a leader. Only acts as follower for etcd quorum.
Is this possible?
I believe, today it is not a supported option and is not recommended.
what you want is to have 3 node control plane ( including etcd ) and one of the node should participate in leader election but not become leader and shouldnt store data. you are looking for some kind of ARBITER feature that exists in mongodb HA cluster.
ARBITER feature is not supported in ETCD. you might need to raise a PR to get that addressed.
The controller manager and scheduler always connect the local apiserver. You might want to route those calls to apiserver on the active master. You might need to open another PR for kubernetes community to get that addressed.