Kubernetes ALL workloads fail when deploying a single update - kubernetes

After I update the backend code (pushing update to gcr.io), I delete the pod. Usually a new pod spins up.
But after today the whole cluster just breaks down. I really cannot comprehend what is happening here (I did not touch any of the other items).
I am really looking in the dark here. Where do I start looking?
I see that the logs show:
0/2 nodes are available: 2 node(s) had taints that the pod didn't tolerate.
when I look this up:
kubectl describe node | grep -i taint
Taints: node.kubernetes.io/unreachable:NoSchedule
Taints: node.kubernetes.io/unreachable:NoSchedule
But I have no clue what this is or how they even get there.
EDIT:
It looks like I need to remove the taints, but I am not able to (taint not found?)
kubectl taint nodes --all node-role.kubernetes.io/unreachable-
taint "node-role.kubernetes.io/unreachable" not found
taint "node-role.kubernetes.io/unreachable" not found

Likely problem with the nodes. Debug with some of these (sample):
$ kubectl get nodes
NAME STATUS ROLES AGE VERSION
k8s-master Ready master 1d v1.14.2
k8s-node1 NotReady <none> 1d v1.14.2
k8s-node2 NotReady <none> 1d v1.14.2 <-- Does it say NotReady?
$ kubectl describe node k8s-node1
...
# Do you see something like this? What's the event message?
MemoryPressure...
DiskPressure...
PIDPressure...
Check if the kubelet is running on every node (it might be crashing and restarting)
ssh k8s-node1
# ps -Af | grep kubelet
# systemctl status kubelet
# journalctl -xeu kubelet
Nuclear option:
If you are using a node pool, delete your nodes and let the autoscaler restart brand new nodes.
Related question/answer.
✌️

Related

Custom conditions for k8s node status

Is there anyway we could include custom node conditions in k8s and mark a Node is not ready until those conditions are met. Kubectl should return Status as NotReady if custom condition is not met
kubectl get nodes
NAME STATUS ROLES AGE VERSION
ip-10-24-11-46.eu-west-1.compute.internal Ready master 23d v1.18.3
ip-10-24-12-111.eu-west-1.compute.internal NotReady worker 23d v1.18.3
ip-10-24-12-197.eu-west-1.compute.internal Ready worker 22d v1.18.3
There is no such functionality. However, what you can do is create script with will
check your needed conditions.
If conditions doesnt met - ssh to that node and shoot yourself into leg by hitting systemctl stop kubelet :)
continue check condition
as soon as condition met - ssh to same node and systemctl start kubelet
Idea with stopping kubelet has been taken from How to simulate nodeNotReady for a node in Kubernetes
But really, would be interesting to know use case...

GKE autoscaler not scaling down the nodes

I have created a google Kubernetes engine with autoscale enabled with minimum and maximum nodes. A few days ago I deployed couple of servers on production which increased the nodes count as expected. but when I deleted those deployments I expect it to resize the nodes which are to scale down. I waited more than an hour but it still did not scale down.
All my other pods are controlled by replica controller since I deployed with kind: deployment.
All my statefulset pods are using PVC as volume.
I'm not sure what prevented the nodes to scale down so I manually scaled the nodes for now. Since I made the changes manually I can not get the autoscaler logs now.
Does anyone know what could be the issue here?
GKE version is 1.16.15-gke.4300
As mentioned in this link
https://github.com/kubernetes/autoscaler/blob/master/cluster-autoscaler/FAQ.md#what-types-of-pods-can-prevent-ca-from-removing-a-node
I'm not using any local storage.
pods not having PodDisruptionBudget(don't know what is that)
Pods are created by deployments (helm charts)
only thing is I don't have "cluster-autoscaler.kubernetes.io/safe-to-evict": "true" this annotation. is this must?
I have tested Cluster Autoscaler on my GKE cluster. It work's bit differently than you expected.
Backgorund
You can enable autoscaling using command or enable it during creation like it's described in this documentation.
In Cluster Autoscaler documentation you can find various information like Operation criteria, Limitations, etc.
As I mentioned in comment section, Cluster Autoscaler - Frequently Asked Questions won't work if will encounter one of below situation:
Pods with restrictive PodDisruptionBudget.
Kube-system pods that:
are not run on the node by default, *
don't have a pod disruption budget set or their PDB is too restrictive (since CA 0.6).
Pods that are not backed by a controller object (so not created by deployment, replica set, job, statefulset etc). *
Pods with local storage. *
Pods that cannot be moved elsewhere due to various constraints (lack of resources, non-matching node selectors or affinity, matching anti-affinity, etc)
Pods that have the following annotation set:
"cluster-autoscaler.kubernetes.io/safe-to-evict": "false"
For my tests I've used 6 nodes, with autoscaling range 1-6 and nginx application with requests cpu: 200m and memory: 128Mi.
As OP mentioned that is not able to provide autoscaler logs, I will paste my logs from Logs Explorer. Description of how they can be achieved is in Viewing cluster autoscaler events documentation.
In those logs you should search noScaleDown events. You will find there a few information, however the most important is:
reason: {
parameters: [
0: "kube-dns-66d6b7c877-hddgs"
]
messageId: "no.scale.down.node.pod.kube.system.unmovable"
As it's described in NoScaleDown node-level reasons for "no.scale.down.node.pod.kube.system.unmovable":
Pod is blocking scale down because it's a non-daemonset, non-mirrored, non-pdb-assigned kube-system pod. See the Kubernetes Cluster Autoscaler FAQ for more details.
Solution
If you want to make Cluster Autoscaler work on GKE, you have to create Disruptions with proper information, how to create it can be found in How to set PDBs to enable CA to move kube-system pods?
kubectl create poddisruptionbudget <pdb name> --namespace=kube-system --selector app=<app name> --max-unavailable 1
where you have to specify the correct selector and --max-unavailable or --min-available depends on your needs. For more details, please read Specifying a PodDisruptionBudget documentation.
Tests
$ kubectl get deploy,nodes
NAME READY UP-TO-DATE AVAILABLE AGE
deployment.apps/nginx-deployment 16/16 16 16 66m
NAME STATUS ROLES AGE VERSION
node/gke-cluster-1-default-pool-6d42fa0a-1ckn Ready <none> 11m v1.16.15-gke.6000
node/gke-cluster-1-default-pool-6d42fa0a-2j4j Ready <none> 11m v1.16.15-gke.6000
node/gke-cluster-1-default-pool-6d42fa0a-388n Ready <none> 3h33m v1.16.15-gke.6000
node/gke-cluster-1-default-pool-6d42fa0a-5x35 Ready <none> 3h33m v1.16.15-gke.6000
node/gke-cluster-1-default-pool-6d42fa0a-pdfk Ready <none> 3h33m v1.16.15-gke.6000
node/gke-cluster-1-default-pool-6d42fa0a-wqtm Ready <none> 11m v1.16.15-gke.6000
$ kubectl get pdb -A
NAMESPACE NAME MIN AVAILABLE MAX UNAVAILABLE ALLOWED DISRUPTIONS AGE
kube-system kubedns 1 N/A 1 43m
Scaledown deployment
$ kubectl scale deploy nginx-deployment --replicas=2
deployment.apps/nginx-deployment scaled
After a while (~10-15 minutes) in the event viewer you will find the Decision event and inside you will find information that the node was deleted.
...
scaleDown: {
nodesToBeRemoved: [
0: {
node: {
mig: {
zone: "europe-west2-c"
nodepool: "default-pool"
name: "gke-cluster-1-default-pool-6d42fa0a-grp"
}
name: "gke-cluster-1-default-pool-6d42fa0a-wqtm"
Number of nodes decreased:
$ kubectl get nodes
NAME STATUS ROLES AGE VERSION
gke-cluster-1-default-pool-6d42fa0a-2j4j Ready <none> 30m v1.16.15-gke.6000
gke-cluster-1-default-pool-6d42fa0a-388n Ready <none> 3h51m v1.16.15-gke.6000
gke-cluster-1-default-pool-6d42fa0a-5x35 Ready <none> 3h51m v1.16.15-gke.6000
gke-cluster-1-default-pool-6d42fa0a-pdfk Ready <none> 3h51m v1.16.15-gke.6000
Another place where you can confirm it's scaling down is kubectl get events --sort-by='.metadata.creationTimestamp'
Output:
5m16s Normal NodeNotReady node/gke-cluster-1-default-pool-6d42fa0a-wqtm Node gke-cluster-1-default-pool-6d42fa0a-wqtm status is now: NodeNotReady
4m56s Normal NodeNotReady node/gke-cluster-1-default-pool-6d42fa0a-1ckn Node gke-cluster-1-default-pool-6d42fa0a-1ckn status is now: NodeNotReady
4m Normal Deleting node gke-cluster-1-default-pool-6d42fa0a-wqtm because it does not exist in the cloud provider node/gke-cluster-1-default-pool-6d42fa0a-wqtm Node gke-cluster-1-default-pool-6d42fa0a-wqtm event: DeletingNode
3m55s Normal RemovingNode node/gke-cluster-1-default-pool-6d42fa0a-wqtm Node gke-cluster-1-default-pool-6d42fa0a-wqtm event: Removing Node gke-cluster-1-default-pool-6d42fa0a-wqtm from Controller
3m50s Normal Deleting node gke-cluster-1-default-pool-6d42fa0a-1ckn because it does not exist in the cloud provider node/gke-cluster-1-default-pool-6d42fa0a-1ckn Node gke-cluster-1-default-pool-6d42fa0a-1ckn event: DeletingNode
3m45s Normal RemovingNode node/gke-cluster-1-default-pool-6d42fa0a-1ckn Node gke-cluster-1-default-pool-6d42fa0a-1ckn event: Removing Node gke-cluster-1-default-pool-6d42fa0a-1ckn from Controller
Conclusion
By default, kube-system pods prevent CA from removing nodes on which they are running. Users can manually add PDBs for the kube-system pods that can be safely rescheduled elsewhere. It can be achieved using:
kubectl create poddisruptionbudget <pdb name> --namespace=kube-system --selector app=<app name> --max-unavailable 1
List of possible reasons why CA won't autoscale can be found in Cluster Autoscaler - Frequently Asked Questions.
To verify which pods could still block CA downscale, you can use Autoscaler Events.

How to constaint kubectl kubeconfig only display work node but the master node is not displayed

When using kubelet kubeconfig, only the worker node is displayed but the master node is not displayed, Like the following output on the aws eks worker node:
kubectl get node --kubeconfig /var/lib/kubelet/kubeconfig
NAME STATUS ROLES AGE VERSION
ip-172-31-12-2.ap-east-1.compute.internal Ready <none> 30m v1.18.9-eks-d1db3c
ip-172-31-42-138.ap-east-1.compute.internal Ready <none> 4m7s v1.18.9-eks-d1db3c
For some reasons, I need to hide the information of other worker and master nodes, and only display the information of the worker node where the kubectl command is currently executed.
what should i do ?
I really appreciate your help.

My worker node status is Ready,SchedulingDisabled

I have kubernetes cluster and every thing work fine. after some times I drain my worker node and reset it and join it again to master but
#kubectl get nodes
NAME STATUS ROLES AGE VERSION
ubuntu Ready master 159m v1.14.0
ubuntu1 Ready,SchedulingDisabled <none> 125m v1.14.0
ubuntu2 Ready,SchedulingDisabled <none> 96m v1.14.0
what should i do?
To prevent a node from scheduling new pods use:
kubectl cordon <node-name>
Which will cause the node to be in the status: Ready,SchedulingDisabled.
To tell is to resume scheduling use:
kubectl uncordon <node-name>
More information about draining a node can be found here. And manual node administration here
I fixed it using:
kubectl uncordon <node-name>
scheduling disabled represents that node got into maintenance mode , as you drained nodes of the pods which has prerequisite that the node must be in maintenance . check the spec of that nodes
kubetcl get nodes node-name -o yaml
taints:
- effect: NoSchedule
key: node.kubernetes.io/unschedulable
timeAdded: "2022-06-29T13:05:11Z"
unschedulable: true
so when you want it to be back to schedulable state you will have to put node out of maintenance means uncordon it
kubectl uncordon <node-name>
I know it's similar to above answers but one should know what's happening behind the scenes .
For Openshift (see oc administrator commands) you can use: oc adm uncordon <node-name>
Here's a quick way to reschedule all nodes:
oc get nodes --no-headers | awk '{print $1}' | xargs oc adm uncordon

kubectl get nodes not showing workers

I am following this tutorial with 2 vms running CentOS7. Everything looks fine (no errors during installation/setup) but I can't see my nodes.
NOTE:
I am running this on VMWare VMs
kub1 is my master and kub2 my worker node
kubectl get nodes output:
[root#kub1 ~]# kubectl cluster-info
Kubernetes master is running at http://kub1:8080
To further debug and diagnose cluster problems, use 'kubectl cluster-info dump'.
[root#kub2 ~]# kubectl cluster-info
Kubernetes master is running at http://kub1:8080
To further debug and diagnose cluster problems, use 'kubectl cluster-info dump'.
nodes:
[root#kub1 ~]# kubectl get nodes
[root#kub1 ~]# kubectl get nodes -a
[root#kub1 ~]#
[root#kub2 ~]# kubectl get nodes -a
[root#kub2 ~]# kubectl get no
[root#kub2 ~]#
cluster events:
[root#kub1 ~]# kubectl get events -a
LASTSEEN FIRSTSEEN COUNT NAME KIND SUBOBJECT TYPE REASON SOURCE MESSAGE
1h 1h 1 kub2.local Node Normal Starting {kube-proxy kub2.local} Starting kube-proxy.
1h 1h 1 kub2.local Node Normal Starting {kube-proxy kub2.local} Starting kube-proxy.
1h 1h 1 kub2.local Node Normal Starting {kubelet kub2.local} Starting kubelet.
1h 1h 1 node-kub2 Node Normal Starting {kubelet node-kub2} Starting kubelet.
1h 1h 1 node-kub2 Node Normal Starting {kubelet node-kub2} Starting kubelet.
/var/log/messages:
kubelet.go:1194] Unable to construct api.Node object for kubelet: can't get ip address of node node-kub2: lookup node-kub2: no such host
QUESTION: any idea why my nodes are not shown using "kubectl get nodes"?
My issue was that the KUBELET_HOSTNAME on /etc/kubernetes/kubeletvalue didn't match the hostname.
I commented that line, then restarted the services and I could see my worker after that.
hope that helps
Not sure about your scenario, but I have solved it after 3-4 hours of efforts.
Solved
I was facing this issue, because my docker cgroup driver was different than kubernetes cgroup driver.
Just updated it to cgroupfs using following commands mentioned in doc.
cat << EOF > /etc/docker/daemon.json
{
"exec-opts": ["native.cgroupdriver=cgroupfs"]
}
EOF
Restart docker service service docker restart.
Reset kubernetes on slave node: kubeadm reset
Joined master again: kubeadm join <><>
It was visible on master using kubectl get nodes.
I had a similar problem after installing k8s using kubespray on fedora31, and to debug the issue, tried to run a random container directly using docker run that failed with:
docker: Error response from daemon: cgroups: cgroup mountpoint does not exist: unknown.
this is a known problem cause by cgroup version on fedora 31, and the fix is to update grub to use the previous version:
sudo dnf install grubby
sudo grubby --update-kernel=ALL --args="systemd.unified_cgroup_hierarchy=0"