I have K8s cluster with 3 nodes "VMs" doing both master/worker with Etcd installed on all of the 3 nodes "untainted master" installed via kubespray "kubeadm based tool"
Now I would like to replace one VM with another.
Is there a direct way to do so because the only workaround will be adding 2 nodes to have always an odd number of ETCD via kubespray scale.yml ex:node4 and node 5 then removing the additional node3 and node 5 keeping node 4
I don't like the approach.
Any ideas are welcomed
Best Regards
If you have 3 main (please avoid using master, 💪) control plane nodes you should be fine replacing 1 at a time. The only thing is that your cluster will not be able to make any decisions/schedule any new workload, but the existing workloads will run fine.
The recommendation of 5 main nodes is based on the fact that you will always have a majority to reach quorum on the state decisions 📝 for etcd even if one node goes down. So if you have 5 nodes and one of them goes down you will still be able to schedule/run workloads.
In other words:
3 main nodes
Can tolerate a failure of one node.
Will not be able to make decisions
5 main nodes
Can tolerate a failure of one node.
Can still make decisions because there are 4 nodes still available.
If 2 failures happen then it's tolerated but there is no quorum.
To summarize, Raft 🚣 which is the consensus protocol for etcd tolerates up to (N-1)/2 failures and requires a majority or quorum of (N/2)+1. The recommended procedure is to update the nodes one at a time: bring one node down, and then a bring another one up, and wait for it to join the cluster (all control plane components)
Related
I am new to the Kubernetes and cluster.
I would like to bring up an High Availability Master Only Kubernetes Cluster(Need Not to!).
I have the 2 Instances/Servers running Kubernetes daemon, and running different kind of pods on both the Nodes.
Now I would like to somehow create the cluster and if the one of the host(2) down, then all the pods from that host(2) should move to the another host(1).
once the host(2) comes up. the pods should float back.
Please let me know if there is any way i can achieve this?
Since your requirement is to have a 2 node master-only cluster and also have HA capabilities then unfortunately there is no straightforward way to achieve it.
Reason being that a 2 node master-only cluster deployed by kubeadm has only 2 etcd pods (one on each node). This gives you no fault tolerance. Meaning if one of the nodes goes down, etcd cluster would lose quorum and the remaining k8s master won't be able to operate.
Now, if you were ok with having an external etcd cluster where you can maintain an odd number of etcd members then yes, you can have a 2 node k8s cluster and still have HA capabilities.
It is possible that master node serves also as a worker node however it is not advisable on production environments, mainly for performance reasons.
By default, kubeadm configures master node so that no workload can be run on it and only regular nodes, added later would be able to handle it. But you can easily override this default behaviour.
In order to enable workload to be scheduled also on master node you need to remove from it the following taint, which is added by default:
kubectl taint nodes --all node-role.kubernetes.io/master-
To install and configure multi-master kubernetes cluster you can follow this tutorial. It describes scenario with 3 master nodes but you can easily customize it to your needs.
For HA and Quorum I will install three master / etc nodes in three different data centers.
But I want to configure one node to never become a leader. Only acts as follower for etcd quorum.
Is this possible?
I believe, today it is not a supported option and is not recommended.
what you want is to have 3 node control plane ( including etcd ) and one of the node should participate in leader election but not become leader and shouldnt store data. you are looking for some kind of ARBITER feature that exists in mongodb HA cluster.
ARBITER feature is not supported in ETCD. you might need to raise a PR to get that addressed.
The controller manager and scheduler always connect the local apiserver. You might want to route those calls to apiserver on the active master. You might need to open another PR for kubernetes community to get that addressed.
I have kubernetes HA environment with three masters. Just have a test, shutdown two masters(kill the apiserver/kcm/scheduler process), then only one master can work well. I can use kubectl to create a deployment successfully ,some pods were scheduled to different nodes and start. So can anyone explain why it is advised odd number of masters? Thanks.
Because if you have an even number of servers, it's a lot easier to end up in a situation where the network breaks and you have exactly 50% on each side. With an odd number, you can't (easily) have a situation where more than one partition in the network thinks it has majority control.
Short answer: To have higher fault tolerence for etcd.
Etcd uses RAFT for leader selection. An etcd cluster needs a majority of nodes, a quorum, to agree on a leader. For a cluster with n members, quorum is (n/2)+1.
In terms of fault tolerance, adding an additional node to an odd-sized cluster decreases the fault tolerance. How? We still have the same number of nodes that may fail without losing quorum however we have more nodes that can fail which means possibility of losing quorum is actually higher than before.
For fault tolerance please check this official etcd doc for more information.
I am currently working with GPU's and since they are expensive I want them to scale down and up depending on the load. However scaling up the cluster and preparing the node takes around 8 minutes since it installs the drivers and do some other preparation.
So to solve this problem, I want to let one node stay in idle state and autoscale the rest of the nodes. Is there any way to do it?
This way when a request comes, the idle node will take it and a new idle node will be created.
Thanks!
There are three different approaches:
1 - The 1st approach is entirely manual. This will help you keep a node in an idle state without incurring downtime for your application during the autoscaling process.
You would have to prevent one specific node from autosaling (let's call it "node A"). Create a new node and make replicas of the node A's pods to that new node.
The node will be running while it is not part of the autoscaling process.
Once the autoscaling process is complete, and the boot is finished, you may safely drain that node.
a. Create a new node.
b. Prevent node A from evicting its pods by adding the annotation "cluster-autoscaler.kubernetes.io/safe-to-evict": "false"
c. Copy a replica of node A, make replicas of the pods into that new node.
d. Once the autoscaler has scaled all the nodes, and the boot time has
completed, you may safely drain node A, and delete it.
2 - You could run a Pod Disruption Budget.
3 - If you would like to block the node A from being deleted when the autoscaler scales down, you could set the annotation "cluster-autoscaler.kubernetes.io/scale-down-disabled": "true" on one particular node. This only works during a scaling down process.
I am currently working on Kube deployment, unfortunatelly we have only 2 locations and the requirements are that we need to tolerate one location failure. It is not required that it must be automatic or zero downtime (so it is not expected to be HA, we just want disaster recovery).
I am completely aware that 2 masters cluster has lower availability than 1 master cluster. But the idea is that with two masters, when one location fails (or is taken down), admin will spin up another new master in the other location, ensuring majority.
Expected advantage over single master: The state of the cluster should survive location failure (we have also Postgres and other stateful components in Kube and we want Postgres to reconfigure slave to master, if node with master is taken down).
Is this approach sane? Or is there some other way how to deploy Kube in 2 locations?