Kubernetes cordon node command - kubernetes

If I cordon a node, I make it unschedulable, so this command applies a taint on this node to mark it as NoSchedule with kubernetes specific key. But then when I create a taint with NoSchedule effect and add another key value, for ex. env=production and create a toleration on pod to match this key and effect NoSchedule - pod anyway won't be scheduled on this node. Why so?
Maybe cordon command somehow internally marks node as no schedule and not only applies a taint
P.S After running kubectl uncordon <node> the toleration worked

You are right as you have already applied the
cordon on the node
K8s won't schedule the PODs on it unless and until you mark the node with uncordon.
kubectl uncordon <node>

Related

How to simulate nodeNotReady for a node in Kubernetes

My ceph cluster is running on AWS with 3 masters 3 workers configuration. When I do kubectl get nodes it shows me all the nodes in the ready state.
Is there is any way I can simulate manually to get nodeNotReady error for a node?.
just stop kebelet service on one of the node that you want to see as NodeNotReady
If you just want NodeNotReady you can delete the CNI you have installed.
kubectl get all -n kube-system find the DaemonSet of your CNI and delete it or just do a reverse of installing it: kubectl delete -f link_to_your_CNI_yaml
You could also try to overwhelm the node with too many pods (resources). You can also share your main goal so we can adjust the answer.
About the answer from P Ekambaram you could just ssh to a node and then stop the kubelet.
To do that in kops you can just:
ssh -A admin#Node_PublicDNS_name
systemctl stop kubelet
EDIT:
Another way is to overload the Node which will cause: System OOM encountered and that will result in Node NotReady state.
This is just one of the ways of how to achieve it:
SSH into the Node you want to get into NotReady
Install Stress
Run stress: stress --cpu 8 --io 4 --hdd 10 --vm 4 --vm-bytes 1024M --timeout 5m (you can adjust the values of course)
Wait till Node crash.
After you stop the stress the Node should get back to healthy state automatically.
Not sure what is the purpose to simulate NotReady
if the purpose is to not schedule any new pods then you can use kubectl cordon node
NODE_NAME This will add the unschedulable taint to it and prevent new pods from being scheduled there.
If the purpose is to evict existing pod then you can use kubectl drain NODE_NAME
In general you can play with taints and toleration to achieve your goal related to the above and you can much more with those!
Now NotReady status comes from the taint node.kubernetes.io/not-ready Ref
Which is set by
In version 1.13, the TaintBasedEvictions feature is promoted to beta and enabled by default, hence the taints are automatically added by the NodeController
Therefore if you want to manually set that taint kubectl taint node NODE_NAME node.kubernetes.io/not-ready=:NoExecute the NodeController will reset it automatically!
So to absolutely see the NotReady status this is the best way
Lastly, if you want to remove your networking in a particular node then you can taint it like this kubectl taint node NODE_NAME dedicated/not-ready=:NoExecute

Kubernetes - Enable automatic pod rescheduling on taint/toleration

In the following scenario:
Pod X has a toleration for a taint
However node A with such taint does not exists
Pod X get scheduled on a different node B in the meantime
Node A with the proper taint becomes Ready
Here, Kubernetes does not trigger an automatic rescheduling of the pod X on node A as it is properly running on node B. Is there a way to enable that automatic rescheduling to node A?
Natively, probably not, unless you:
change the taint of nodeB to NoExecute (it probably already was set) :
NoExecute - the pod will be evicted from the node (if it is already running on the node), and will not be scheduled onto the node (if it is not yet running on the node).
update the toleration of the pod
That is:
You can put multiple taints on the same node and multiple tolerations on the same pod.
The way Kubernetes processes multiple taints and tolerations is like a filter: start with all of a node’s taints, then ignore the ones for which the pod has a matching toleration; the remaining un-ignored taints have the indicated effects on the pod. In particular,
if there is at least one un-ignored taint with effect NoSchedule then Kubernetes will not schedule the pod onto that node
If that is not possible, then using Node Affinity could help (but that differs from taints)

Stop scheduling pods on kubernetes master

For testing purpose, I have enabled pod scheduling on kubernetes master node with the following command
kubectl taint nodes --all node-role.kubernetes.io/master-
Now I have added worker node to the cluster and I would like to stop scheduling pods on master. How do I do that?
You simply taint the node again.
kubectl taint nodes master node-role.kubernetes.io/master=:NoSchedule
Taints and tolerations work together to ensure that pods are not scheduled onto inappropriate nodes. One or more taints are applied to a node; this marks that the node should not accept any pods that do not tolerate the taints. Tolerations are applied to pods, and allow (but do not require) the pods to schedule onto nodes with matching taints.
Even placed a taint on Master node,you can specify a toleration for a pod in the PodSpec, the pod would be able to schedule onto Master node:
tolerations:
- key: node-role.kubernetes.io/master
effect: NoSchedule
To learn more,see Taints and Tolerations

Does kubectl drain remove pod first or create pod first

Kubernetes version 1.12.3. Does kubectl drain remove pod first or create pod first.
You can use kubectl drain to safely evict all of your pods from a node before you perform maintenance on the node (e.g. kernel upgrade, hardware maintenance, etc.)
When kubectl drain return successfuly it means it has removed all the pods successfully from that node and it is safe to bring that node down(physically shut off, or start maintainence)
Now if you turn on the machine and want to schedule pods again on that node you need to run:
kubectl uncordon <node name>
So, kubectl drain removes pods from the node and don't schedule any pods on that until you uncordon that node
kubectl drain will ignore certain system pods on the node that cannot be killed.
The given node will be marked unscheduled to prevent new pods from arriving.
When you are ready to put the node back into service, use kubectl uncordon, which will make the node schedulable again.
For for details use command:
kubectl drain --help
With this I hope you will get information which you are looking.

Kubernetes: Dynamically identify node and taint

I have an application pod which will be deployed on k8s cluster
But as Kubernetes scheduler decides on which node this pod needs to run
Now I want to add taint to the node dynamically where my application pod is running with NOschedule so that no new pods will be scheduled on this node
I know that we can use kubectl taint node with NOschedule if I know the node name but I want to achieve this dynamically based on which node this application pod is running
The reason why I want to do this is this is critical application pod which shouldn’t have down time and for good reasons I have only 1 pod for this application across the cluster
Please suggest
In addition to #Rico answer.
You can use feature called node affinity, this is still a beta but some functionality is already implemented.
You should add a label to your node, for example test-node-affinity: test. Once this is done you can Add the nodeAffinity of field affinity in the PodSpec.
spec:
...
affinity:
nodeAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
nodeSelectorTerms:
- matchExpressions:
- key: test-node-affinity
operator: In
values:
- test
This will mean the POD will look for a node with key test-node-affinity and value test and will be deployed there.
I recommend reading this blog Taints and tolerations, pod and node affinities demystified by Toader Sebastian.
Also familiarise yourself with Taints and Tolerations from Kubernetes docs.
You can get the node where your pod is running with something like this:
$ kubectl get pod myapp-pod -o=jsonpath='{.spec.nodeName}'
Then you can taint it:
$ kubectl taint nodes <node-name-from-above> key=value:NoSchedule
or the whole thing in one command:
$ kubectl taint nodes $(kubectl get pod myapp-pod -o=jsonpath='{.spec.nodeName}') key=value:NoSchedule