Is there anyway we could include custom node conditions in k8s and mark a Node is not ready until those conditions are met. Kubectl should return Status as NotReady if custom condition is not met
kubectl get nodes
NAME STATUS ROLES AGE VERSION
ip-10-24-11-46.eu-west-1.compute.internal Ready master 23d v1.18.3
ip-10-24-12-111.eu-west-1.compute.internal NotReady worker 23d v1.18.3
ip-10-24-12-197.eu-west-1.compute.internal Ready worker 22d v1.18.3
There is no such functionality. However, what you can do is create script with will
check your needed conditions.
If conditions doesnt met - ssh to that node and shoot yourself into leg by hitting systemctl stop kubelet :)
continue check condition
as soon as condition met - ssh to same node and systemctl start kubelet
Idea with stopping kubelet has been taken from How to simulate nodeNotReady for a node in Kubernetes
But really, would be interesting to know use case...
Related
When using kubelet kubeconfig, only the worker node is displayed but the master node is not displayed, Like the following output on the aws eks worker node:
kubectl get node --kubeconfig /var/lib/kubelet/kubeconfig
NAME STATUS ROLES AGE VERSION
ip-172-31-12-2.ap-east-1.compute.internal Ready <none> 30m v1.18.9-eks-d1db3c
ip-172-31-42-138.ap-east-1.compute.internal Ready <none> 4m7s v1.18.9-eks-d1db3c
For some reasons, I need to hide the information of other worker and master nodes, and only display the information of the worker node where the kubectl command is currently executed.
what should i do ?
I really appreciate your help.
After I update the backend code (pushing update to gcr.io), I delete the pod. Usually a new pod spins up.
But after today the whole cluster just breaks down. I really cannot comprehend what is happening here (I did not touch any of the other items).
I am really looking in the dark here. Where do I start looking?
I see that the logs show:
0/2 nodes are available: 2 node(s) had taints that the pod didn't tolerate.
when I look this up:
kubectl describe node | grep -i taint
Taints: node.kubernetes.io/unreachable:NoSchedule
Taints: node.kubernetes.io/unreachable:NoSchedule
But I have no clue what this is or how they even get there.
EDIT:
It looks like I need to remove the taints, but I am not able to (taint not found?)
kubectl taint nodes --all node-role.kubernetes.io/unreachable-
taint "node-role.kubernetes.io/unreachable" not found
taint "node-role.kubernetes.io/unreachable" not found
Likely problem with the nodes. Debug with some of these (sample):
$ kubectl get nodes
NAME STATUS ROLES AGE VERSION
k8s-master Ready master 1d v1.14.2
k8s-node1 NotReady <none> 1d v1.14.2
k8s-node2 NotReady <none> 1d v1.14.2 <-- Does it say NotReady?
$ kubectl describe node k8s-node1
...
# Do you see something like this? What's the event message?
MemoryPressure...
DiskPressure...
PIDPressure...
Check if the kubelet is running on every node (it might be crashing and restarting)
ssh k8s-node1
# ps -Af | grep kubelet
# systemctl status kubelet
# journalctl -xeu kubelet
Nuclear option:
If you are using a node pool, delete your nodes and let the autoscaler restart brand new nodes.
Related question/answer.
✌️
I have a Kubernetes cluster of 3 masters and 2 nodes in VM cloude on CentOS7:
[root#kbn-mst-02 ~]# kubectl get nodes
NAME STATUS ROLES AGE VERSION
kbn-mst-01 Ready master 15d v1.18.3
kbn-mst-02 Ready master 14d v1.18.3
kbn-mst-03 Ready master 14d v1.18.3
kbn-wn-01 Ready <none> 25h v1.18.5
kbn-wn-02 Ready <none> 150m v1.18.5
If I turn off kbn-mst-03 (212.46.30.7), then kbn-wn-01 and kbn-wn-02 get status NotReady:
[root#kbn-mst-02 ~]# kubectl get nodes
NAME STATUS ROLES AGE VERSION
kbn-mst-01 Ready master 15d v1.18.3
kbn-mst-02 Ready master 14d v1.18.3
kbn-mst-03 NotReady master 14d v1.18.3
kbn-wn-01 NotReady <none> 25h v1.18.5
kbn-wn-02 NotReady <none> 154m v1.18.5
The log on kbn-wn-02 shows lost connection to 212.46.30.7:
Jul 3 09:28:10 kbn-wn-02 kubelet: E0703 09:28:10.295233 12339 kubelet_node_status.go:402] Error updating node status, will retry: error getting node "kbn-wn-02": Get https://212.46.30.7:6443/api/v1/nodes/kbn-wn-02?resourceVersion=0&timeout=10s: context deadline exceede
Turning off other masters doesn't change the status of nodes.
Why does kbn-wn-02 have a hard bind to kbn-mst-03 (212.46.30.7) and how can I change it?
Currently your worker nodes only know about kbn-mst-03 master and when that master is turned off the kubelet on worker nodes can not send health status and metrics of the worker node to master node kbn-mst-03and hence you see worker nodes asNotReady`. This is also the reason why turning off other masters does not change the status of nodes because they are not known and contacted at all by kubelet of worker nodes.
You should use a load balancer in-front of the masters and use the load balancer endpoint while creating the worker nodes. Then if one master is turned off other master nodes will be able to handle requests because the load balancer will stop sending traffic to failed master and route traffic to other master.
How can you change the hard bind to one master and move to use load balancer endpoint will depend on what tool you used to setup the kubernetes cluster. If you are using kubeadm then you can specify a load balancer endpoint in kubeadm init in master nodes and use that endpoint in kubeadm join in worker nodes.
From kubeadm docs here
--control-plane-endpoint can be used to set the shared endpoint for all control-plane nodes.
--control-plane-endpoint allows both IP addresses and DNS names that can map to IP addresses. Please contact your network administrator to
evaluate possible solutions with respect to such mapping.
Here is an example mapping:
192.168.0.102 cluster-endpoint
Where 192.168.0.102 is the IP address of this node and cluster-endpoint is a custom DNS name that maps to
this IP. This will allow you to pass
--control-plane-endpoint=cluster-endpoint to kubeadm init and pass the same DNS name to kubeadm join. Later you can modify cluster-endpoint
to point to the address of your load-balancer in an high availability
scenario.
I have a k8s cluster setup using kubespray.
Last week one of my k8s nodes have very low storage, so all the pods has been evicted, include some important pods like calico-node, kube-proxy (I thought that these pods are critical and never been evicted no matter what)
After that all the calico-node pods become not ready, when I check the log, it is said that:
Warning: Readiness probe failed: calico/node is not ready: BIRD is not ready: BGP not established with 192.168.0.xxx, where 192.168.0.xxx is the IP of above problematic node.
My question is how can I restore that node? is it safe to just run the kubespray's cluster.yml again?
My k8s version is v1.13.3
Thanks.
When node has a disk pressure its status changes to NotReady and a taint is added to the node: Taints: node.kubernetes.io/disk-pressure:NoSchedule.
All pods running on this node are getting evicted, except api-server, kube-controller and kube-scheduler- eviction manager will save those pods from getting evicted with error message: cannot evict a critical static pod [...]
Once the node is freed from disk pressure it will change its status to Ready and previously added taint will be removed. You can check it by running kubectl describe node <node_name>. In the conditions field you should see that DiskPressure has changed status to False which means that node has enough space available. Similar information can be also found in Events field.
Normal NodeReady 1s kubelet, node1 Node node1 status is now: NodeReady
Normal NodeHasNoDiskPressure 1s (x2 over 1s) kubelet, node1 Node node1 status is now: NodeHasNoDiskPressure
After confirming that the node is ready with sufficient disk space you can restart kubelet and run kubespray's cluster.yml- the pods will be redeployed on the node. You just have to make sure that node is ready to handle deployments.
I am comparatively new to kubernetes but i have successfully created many clusters before. Now i am facing an issue where i tried to add a node to an already existing cluster. At first kubeadm join seems to be successful but even after initializing the pod network only the master became into Ready.
root#master# kubectl get nodes
NAME STATUS ROLES AGE VERSION
master-virtual-machine Ready master 18h v1.9.0
testnode-virtual-machine NotReady <none> 16h v1.9.0
OS: Ubuntu 16.04
Any help will be appreciated.
Thanks.
try the following on the slave node and try to get the status again on master.
> sudo swapoff -a
> exit