Is it possible to promote a Kubernetes worker node to master?

Is it possible to promote a Kubernetes worker node to master? - kubernetes

Is it possible to promote a Kubernetes worker node to master to quickly recover from the loss of a master (1 of 3) and restore safety to the cluster? Preferably without disrupting all the pods already running on it. Bare metal deployment. Tx.

It doesn't look like a worker node can be promoted to master in general. However it is easy to sort out for a specific case:
Control plane node disappears from the network
Node is manually deleted: k delete node2.example.com --ignore-daemonsets --delete-local-data
Some time later it reboots and rejoins the cluster
Check that it has rejoined the etcd cluster:
# k exec -it etcd-node1.example.com -n kube-system -- /bin/sh
# etcdctl --endpoints 127.0.0.1:2379 --cacert /etc/kubernetes/pki/etcd/ca.crt \
--cert /etc/kubernetes/pki/etcd/server.crt --key /etc/kubernetes/pki/etcd/server.key \
member list
506801cdae97607b, started, node1.example.com, https://65.21.128.36:2380, https://xxx:2379, false
8051adea81dc4c6a, started, node2.example.com, https://95.217.56.177:2380, https://xxx:2379, false
ccd32aaf544c8ef9, started, node3.example.com, https://65.21.121.254:2380, https://xxx:2379, false
If it is part of the cluster then re-label it:
k label node node2.example.com node-role.kubernetes.io/control-plane=
k label node node2.example.com node-role.kubernetes.io/master=

Related

Create same master and working node in kubenetes

I am preparing dev environment and want to create a single host to be master and worker node for kubernetes.
How can I achieve my goal?

The difference between master node and worker node is that "regular pods cannot be scheduled on a master node because of a taint"
You just need to remove node-role.kubernetes.io/master:NoSchedule taint so that pods can be scheduled on that (master) node.
Following is the command:
kubectl taint nodes <masternodename> node-role.kubernetes.io/master:NoSchedule-

The master node is responsible for running several Kubernetes processes that are absolutely necessary to run and manage the cluster properly. [1]
The worker nodes are the part of the Kubernetes clusters which actually execute the containers and applications on them. [1]
Worker nodes are generally more powerful than master nodes because they have to run hundreds of clusters on them. However, master nodes hold more significance because they manage the distribution of workload and the state of the cluster. [1]
By removing taint you will be able to schedule pods on that node.
You should firstly check the present taint by running:
kubectl describe node <nodename> | grep Taints
In case the present one is master node you should remove that taint by running:
kubectl taint node <mastername> node-role.kubernetes.io/master:NoSchedule-
References:
[1] - What is Kubernetes cluster? What are worker and master nodes?
See also:
Creating a cluster with kubeadm,
This four similar questions:
Master tainted - no pods can be deployed
Remove node-role.kubernetes.io/master:NoSchedule taint,
Allow scheduling of pods on Kubernetes master?
Are the master and worker nodes the same node in case of a single node cluster?
Taints and Tolerations.

You have to remove the NoSchedule taint from the MASTER node.
I just spun up a kubeadm node and the taint is on my control-plane, not master.
So I did the following (sydney is the node name):
$kubectl describe node sydney | grep Taints
Taints: node-role.kubernetes.io/control-plane:NoSchedule
$kubectl taint nodes sydney node-role.kubernetes.io/control-plane:NoSchedule-
node/sydney untainted
$kubectl describe node sydney | grep Taints
Taints: <none>

Migrating Kubernetes cluster to other OpenStack region

I am trying to migrate Kubernetes cluster (master and worker instances) to different OpenStack region. I managed to start cluster after some simple modifications (changed cloud-config, node labels). There is one problem left - storage. In this setup I use OpenStack Internal Cloud Provider which manages cinder volumes as PV for pods. New region uses different zone name and volume types. Also volume ids have changed. It is not possible to change this values by modifying SC and PV definitions via e.g. kubectl.
I wonder if it is possible to change this directly in etcd database?
So far, I tried to modify PV definition, but it appears that Kubernetes inserts also additional characters and modifying it is not so obvious.
What I did:
Get PV definition from etcd and saved to file:
docker run --rm -it --net host -v /etc/kubernetes:/etc/kubernetes k8s.gcr.io/etcd:3.4.3-0 etcdctl --cert /etc/kubernetes/pki/etcd/healthcheck-client.crt --key /etc/kubernetes/pki/etcd/healthcheck-client.key --cacert /etc/kubernetes/pki/etcd/ca.crt --endpoints https://kube-dev02-master01:2379 get /registry/persistentvolumes/pvc-1625baa0-e36c-4e2b-ad3d-0dfecc910ae0 --print-value-only > pv1.txt.
I changed region name, zone name and volume id (with vi).
Loaded modified value to etcd:
docker run --rm -it --net host -v /etc/kubernetes:/etc/kubernetes k8s.gcr.io/etcd:3.4.3-0 etcdctl --cert /etc/kubernetes/pki/etcd/healthcheck-client.crt --key /etc/kubernetes/pki/etcd/healthcheck-client.key --cacert /etc/kubernetes/pki/etcd/ca.crt --endpoints https://kube-dev02-master01:2379 put /registry/persistentvolumes/pvc-1625baa0-e36c-4e2b-ad3d-0dfecc910ae0 "$(cat pv1.txt)"
Checked PV from kubectl:
[kubeadmin#kube-dev02-master01 ~]$ kubectl get pv
Error from server: illegal base64 data at input byte 5
So it seems that something could be wrong with encoding, but I do not know where.
Output of PV value stored in etcd
Kubernetes v1.17.5, etcd v.3.4.3-0 installed with Kubeadm.

How to remove NotReady nodes from kubernetes cluster automatically

I'm running the kuberenets cluster on bare metal servers and my cluster nodes keep added and removed regularly. But when a node is removed, kubernetes does not remove it automatically from nodes list and kubectl get nodes keep showing NotReady nodes. Is there any automated way to achieve this? I want similar behavior for nodes as kubernetes does for pods.

To remove a node follow the below steps
Run on Master
# kubectl cordon <node-name>
# kubectl drain <node-name> --force --ignore-daemonsets --delete-emptydir-data
# kubectl delete node <node-name>

You can use this little bash command, or set it as a cron-job.
kubectl delete node $(kubectl get nodes | grep NotReady | awk '{print $1;}')

Argo Workflow distribution on KOPS cluster

Using KOPS tool, I deployed a cluster with:
1 Master
2 slaves
1 Load Balancer
Now, I am trying to deploy an Argo Workflow, but I don't know the process. Will it install on Node or Master of the k8s cluster I built? How does it work?
Basically, if anyone can describe the functional flow or steps of deploying ARGO work flow on kubernetes, it would be nice. First, I need to understand where is it deployed on Master or Worker Node?

Usually, kops creates Kubernetes cluster with taints on a master node that prevent regular pods scheduling on it.
Although, there was an issues with some cluster network implementation, and sometimes you are getting a cluster without taints on the master.
You can change taints on the master node by running the following commands:
add taints (no pods on master):
kubectl taint node kube-master node-role.kubernetes.io/master:NoSchedule
remove taints (allow to schedule pods on master):
kubectl taint nodes --all node-role.kubernetes.io/master-
If you want to know whether the taints are applied to the master node of not, run the following command:
kubectl get node node-master --export -o yaml
Find a spec: section. In case the taints are present, you should see something like this:
...
spec:
externalID: node-master
podCIDR: 192.168.0.0/24
taints:
- effect: NoSchedule
key: node-role.kubernetes.io/master
...

How to gracefully remove a node from Kubernetes?

I want to scale up/down the number of machines to increase/decrease the number of nodes in my Kubernetes cluster. When I add one machine, I’m able to successfully register it with Kubernetes; therefore, a new node is created as expected. However, it is not clear to me how to smoothly shut down the machine later. A good workflow would be:
Mark the node related to the machine that I am going to shut down as unschedulable;
Start the pod(s) that is running in the node in other node(s);
Gracefully delete the pod(s) that is running in the node;
Delete the node.
If I understood correctly, even kubectl drain (discussion) doesn't do what I expect since it doesn’t start the pods before deleting them (it relies on a replication controller to start the pods afterwards which may cause downtime). Am I missing something?
How should I properly shutdown a machine?

List the nodes and get the <node-name> you want to drain or (remove from cluster)
kubectl get nodes
1) First drain the node
kubectl drain <node-name>
You might have to ignore daemonsets and local-data in the machine
kubectl drain <node-name> --ignore-daemonsets --delete-local-data
2) Edit instance group for nodes (Only if you are using kops)
kops edit ig nodes
Set the MIN and MAX size to whatever it is -1
Just save the file (nothing extra to be done)
You still might see some pods in the drained node that are related to daemonsets like networking plugin, fluentd for logs, kubedns/coredns etc
3) Finally delete the node
kubectl delete node <node-name>
4) Commit the state for KOPS in s3: (Only if you are using kops)
kops update cluster --yes
OR (if you are using kubeadm)
If you are using kubeadm and would like to reset the machine to a state which was there before running kubeadm join then run
kubeadm reset

Find the node with kubectl get nodes. We’ll assume the name of node to be removed is “mynode”, replace that going forward with the actual node name.
Drain it with kubectl drain mynode
Delete it with kubectl delete node mynode
If using kubeadm, run on “mynode” itself kubeadm reset

Rafael. kubectl drain does work as you describe. There is some downtime, just as if the machine crashed.
Can you describe your setup? How many replicas do you have, and are you provisioned such that you can't handle any downtime of a single replica?

If the cluster is created by kops
1.kubectl drain <node-name>
now all the pods will be evicted
ignore daemeondet:
2.kubectl drain <node-name> --ignore-daemonsets --delete-local-data
3.kops edit ig nodes-3 --state=s3://bucketname
set max and min value of instance group to 0
4. kubectl delete node
5. kops update cluster --state=s3://bucketname --yes
Rolling update if required:
6. kops rolling-update cluster --state=s3://bucketname --yes
validate cluster:
7.kops validate cluster --state=s3://bucketname
Now the instance will be terminated.

The below command only works if you have a lot of replicas, disruption budgets, etc. - but helps a lot with improving cluster utilization. In our cluster we have integration tests kicked off throughout the day (pods run for an hour and then spin down) as well as some dev-workload (runs for a few days until a dev spins it down manually). I am running this every night and get from ~100 nodes in the cluster down to ~20 - which adds up to a fair amount of savings:
for node in $(kubectl get nodes -o name| cut -d "/" -f2); do
kubectl drain --ignore-daemonsets --delete-emptydir-data $node;
kubectl delete node $node;
done

Remove worker node from Kubernetes
kubectl get nodes
kubectl drain < node-name > --ignore-daemonsets
kubectl delete node < node-name >

When draining a node we can have the risk that the nodes remain unbalanced and that some processes suffer downtime. The purpose of this method is to maintain the load balance between nodes as much as possible in addition to avoiding downtime.
# Mark the node as unschedulable.
echo Mark the node as unschedulable $NODENAME
kubectl cordon $NODENAME
# Get the list of namespaces running on the node.
NAMESPACES=$(kubectl get pods --all-namespaces -o custom-columns=:metadata.namespace --field-selector spec.nodeName=$NODENAME | sort -u | sed -e "/^ *$/d")
# forcing a rollout on each of its deployments.
# Since the node is unschedulable, Kubernetes allocates
# the pods in other nodes automatically.
for NAMESPACE in $NAMESPACES
do
echo deployment restart for $NAMESPACE
kubectl rollout restart deployment/name -n $NAMESPACE
done
# Wait for deployments rollouts to finish.
for NAMESPACE in $NAMESPACES
do
echo deployment status for $NAMESPACE
kubectl rollout status deployment/name -n $NAMESPACE
done
# Drain node to be removed
kubectl drain $NODENAME

There exists some strange behaviors for me when kubectl drain. Here are my extra steps, otherwise DATA WILL LOST in my case!
Short answer: CHECK THAT no PersistentVolume is mounted to this node. If have some PV, see the following descriptions to remove it.
When executing kubectl drain, I noticed, some Pods are not evicted (they just did not appear in those logs like evicting pod xxx).
In my case, some are pods with soft anti-affinity (so they do not like to go to the remaining nodes), some are pods of StatefulSet of size 1 and wants to keep at least 1 pod.
If I directly delete that node (using the commands mentioned in other answers), data will get lost because those pods have some PersistentVolumes, and deleting a Node will also delete PersistentVolumes (if using some cloud providers).
Thus, please manually delete those pods one by one. After deleted, kuberentes will re-schedule the pods to other nodes (because this node is SchedulingDisabled).
After deleting all pods (excluding DaemonSets), please CHECK THAT no PersistentVolume is mounted to this node.
Then you can safely delete the node itself :)