How to remove NotReady nodes from kubernetes cluster automatically - kubernetes

I'm running the kuberenets cluster on bare metal servers and my cluster nodes keep added and removed regularly. But when a node is removed, kubernetes does not remove it automatically from nodes list and kubectl get nodes keep showing NotReady nodes. Is there any automated way to achieve this? I want similar behavior for nodes as kubernetes does for pods.

To remove a node follow the below steps
Run on Master
# kubectl cordon <node-name>
# kubectl drain <node-name> --force --ignore-daemonsets --delete-emptydir-data
# kubectl delete node <node-name>

You can use this little bash command, or set it as a cron-job.
kubectl delete node $(kubectl get nodes | grep NotReady | awk '{print $1;}')

Related

How to remove the pods of a removed nodes

I have removed and delete a node from k8s cluster using the following commands:
kubectl drain worker1 --ignore-daemonsets
kubectl delete worker1
After that, I saw the kube-proxy and the weave daemonset(both for worker1) still existed (it is expected since I ignored the daemonset)even the nodes is drained and deleted.
How can I remove these pods if the node(worker1) is drained and deleted.
Thank you
Find out the name of the pod which is scheduled on that deleted node and delete the pod using kubectl delete pods <pod_name> --grace-period=0 --force -n <namespace>
Use below command to display more details about pod including the node on which the pod is scheduled
kubectl get pods -n <namespace> -o wide
You could also use kubeadm reset on that node. Please note this will uninstall and remove all Kubernetes related software from that node.

How to simulate nodeNotReady for a node in Kubernetes

My ceph cluster is running on AWS with 3 masters 3 workers configuration. When I do kubectl get nodes it shows me all the nodes in the ready state.
Is there is any way I can simulate manually to get nodeNotReady error for a node?.
just stop kebelet service on one of the node that you want to see as NodeNotReady
If you just want NodeNotReady you can delete the CNI you have installed.
kubectl get all -n kube-system find the DaemonSet of your CNI and delete it or just do a reverse of installing it: kubectl delete -f link_to_your_CNI_yaml
You could also try to overwhelm the node with too many pods (resources). You can also share your main goal so we can adjust the answer.
About the answer from P Ekambaram you could just ssh to a node and then stop the kubelet.
To do that in kops you can just:
ssh -A admin#Node_PublicDNS_name
systemctl stop kubelet
EDIT:
Another way is to overload the Node which will cause: System OOM encountered and that will result in Node NotReady state.
This is just one of the ways of how to achieve it:
SSH into the Node you want to get into NotReady
Install Stress
Run stress: stress --cpu 8 --io 4 --hdd 10 --vm 4 --vm-bytes 1024M --timeout 5m (you can adjust the values of course)
Wait till Node crash.
After you stop the stress the Node should get back to healthy state automatically.
Not sure what is the purpose to simulate NotReady
if the purpose is to not schedule any new pods then you can use kubectl cordon node
NODE_NAME This will add the unschedulable taint to it and prevent new pods from being scheduled there.
If the purpose is to evict existing pod then you can use kubectl drain NODE_NAME
In general you can play with taints and toleration to achieve your goal related to the above and you can much more with those!
Now NotReady status comes from the taint node.kubernetes.io/not-ready Ref
Which is set by
In version 1.13, the TaintBasedEvictions feature is promoted to beta and enabled by default, hence the taints are automatically added by the NodeController
Therefore if you want to manually set that taint kubectl taint node NODE_NAME node.kubernetes.io/not-ready=:NoExecute the NodeController will reset it automatically!
So to absolutely see the NotReady status this is the best way
Lastly, if you want to remove your networking in a particular node then you can taint it like this kubectl taint node NODE_NAME dedicated/not-ready=:NoExecute

Does kubectl drain remove pod first or create pod first

Kubernetes version 1.12.3. Does kubectl drain remove pod first or create pod first.
You can use kubectl drain to safely evict all of your pods from a node before you perform maintenance on the node (e.g. kernel upgrade, hardware maintenance, etc.)
When kubectl drain return successfuly it means it has removed all the pods successfully from that node and it is safe to bring that node down(physically shut off, or start maintainence)
Now if you turn on the machine and want to schedule pods again on that node you need to run:
kubectl uncordon <node name>
So, kubectl drain removes pods from the node and don't schedule any pods on that until you uncordon that node
kubectl drain will ignore certain system pods on the node that cannot be killed.
The given node will be marked unscheduled to prevent new pods from arriving.
When you are ready to put the node back into service, use kubectl uncordon, which will make the node schedulable again.
For for details use command:
kubectl drain --help
With this I hope you will get information which you are looking.

Kubernetes: How to gracefully delete pods in daemonset?

If there is an update in the docker image, rolling update strategy will update all the pods one by one in a daemonset, similarly is it possible to restart the pods gracefully without any changes the daemonset config or can it be triggered explicitly?
Currently, I am doing it manually by
kubectl delete pod <pod-name>
One by one until each pod gets into running state.
You could try and use Node maintenance operations:
Use kubectl drain to gracefully terminate all pods on the node while marking the node as unschedulable (with --ignore-daemonsets, from Konstantin Vustin's comment):
kubectl drain $NODENAME --ignore-daemonsets
This keeps new pods from landing on the node while you are trying to get them off.
Then:
Make the node schedulable again:
kubectl uncordon $NODENAME
To trigger restart of all pods managed by deamonset in namespace [namespace_name]:
kubectl rollout restart de -n [namespace_name]

How to gracefully remove a node from Kubernetes?

I want to scale up/down the number of machines to increase/decrease the number of nodes in my Kubernetes cluster. When I add one machine, I’m able to successfully register it with Kubernetes; therefore, a new node is created as expected. However, it is not clear to me how to smoothly shut down the machine later. A good workflow would be:
Mark the node related to the machine that I am going to shut down as unschedulable;
Start the pod(s) that is running in the node in other node(s);
Gracefully delete the pod(s) that is running in the node;
Delete the node.
If I understood correctly, even kubectl drain (discussion) doesn't do what I expect since it doesn’t start the pods before deleting them (it relies on a replication controller to start the pods afterwards which may cause downtime). Am I missing something?
How should I properly shutdown a machine?
List the nodes and get the <node-name> you want to drain or (remove from cluster)
kubectl get nodes
1) First drain the node
kubectl drain <node-name>
You might have to ignore daemonsets and local-data in the machine
kubectl drain <node-name> --ignore-daemonsets --delete-local-data
2) Edit instance group for nodes (Only if you are using kops)
kops edit ig nodes
Set the MIN and MAX size to whatever it is -1
Just save the file (nothing extra to be done)
You still might see some pods in the drained node that are related to daemonsets like networking plugin, fluentd for logs, kubedns/coredns etc
3) Finally delete the node
kubectl delete node <node-name>
4) Commit the state for KOPS in s3: (Only if you are using kops)
kops update cluster --yes
OR (if you are using kubeadm)
If you are using kubeadm and would like to reset the machine to a state which was there before running kubeadm join then run
kubeadm reset
Find the node with kubectl get nodes. We’ll assume the name of node to be removed is “mynode”, replace that going forward with the actual node name.
Drain it with kubectl drain mynode
Delete it with kubectl delete node mynode
If using kubeadm, run on “mynode” itself kubeadm reset
Rafael. kubectl drain does work as you describe. There is some downtime, just as if the machine crashed.
Can you describe your setup? How many replicas do you have, and are you provisioned such that you can't handle any downtime of a single replica?
If the cluster is created by kops
1.kubectl drain <node-name>
now all the pods will be evicted
ignore daemeondet:
2.kubectl drain <node-name> --ignore-daemonsets --delete-local-data
3.kops edit ig nodes-3 --state=s3://bucketname
set max and min value of instance group to 0
4. kubectl delete node
5. kops update cluster --state=s3://bucketname --yes
Rolling update if required:
6. kops rolling-update cluster --state=s3://bucketname --yes
validate cluster:
7.kops validate cluster --state=s3://bucketname
Now the instance will be terminated.
The below command only works if you have a lot of replicas, disruption budgets, etc. - but helps a lot with improving cluster utilization. In our cluster we have integration tests kicked off throughout the day (pods run for an hour and then spin down) as well as some dev-workload (runs for a few days until a dev spins it down manually). I am running this every night and get from ~100 nodes in the cluster down to ~20 - which adds up to a fair amount of savings:
for node in $(kubectl get nodes -o name| cut -d "/" -f2); do
kubectl drain --ignore-daemonsets --delete-emptydir-data $node;
kubectl delete node $node;
done
Remove worker node from Kubernetes
kubectl get nodes
kubectl drain < node-name > --ignore-daemonsets
kubectl delete node < node-name >
When draining a node we can have the risk that the nodes remain unbalanced and that some processes suffer downtime. The purpose of this method is to maintain the load balance between nodes as much as possible in addition to avoiding downtime.
# Mark the node as unschedulable.
echo Mark the node as unschedulable $NODENAME
kubectl cordon $NODENAME
# Get the list of namespaces running on the node.
NAMESPACES=$(kubectl get pods --all-namespaces -o custom-columns=:metadata.namespace --field-selector spec.nodeName=$NODENAME | sort -u | sed -e "/^ *$/d")
# forcing a rollout on each of its deployments.
# Since the node is unschedulable, Kubernetes allocates
# the pods in other nodes automatically.
for NAMESPACE in $NAMESPACES
do
echo deployment restart for $NAMESPACE
kubectl rollout restart deployment/name -n $NAMESPACE
done
# Wait for deployments rollouts to finish.
for NAMESPACE in $NAMESPACES
do
echo deployment status for $NAMESPACE
kubectl rollout status deployment/name -n $NAMESPACE
done
# Drain node to be removed
kubectl drain $NODENAME
There exists some strange behaviors for me when kubectl drain. Here are my extra steps, otherwise DATA WILL LOST in my case!
Short answer: CHECK THAT no PersistentVolume is mounted to this node. If have some PV, see the following descriptions to remove it.
When executing kubectl drain, I noticed, some Pods are not evicted (they just did not appear in those logs like evicting pod xxx).
In my case, some are pods with soft anti-affinity (so they do not like to go to the remaining nodes), some are pods of StatefulSet of size 1 and wants to keep at least 1 pod.
If I directly delete that node (using the commands mentioned in other answers), data will get lost because those pods have some PersistentVolumes, and deleting a Node will also delete PersistentVolumes (if using some cloud providers).
Thus, please manually delete those pods one by one. After deleted, kuberentes will re-schedule the pods to other nodes (because this node is SchedulingDisabled).
After deleting all pods (excluding DaemonSets), please CHECK THAT no PersistentVolume is mounted to this node.
Then you can safely delete the node itself :)