How to inspect space issues on nodes in kubernate cluster - kubernetes

I have the micro cluster on GCloud running smallest imaginable Node.js application.
Eventhough app is going down all the time. Here is output of kubernate describe pod:
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Warning FailedScheduling 1m (x2 over 1m) default-scheduler 0/3 nodes are available: 1 Insufficient cpu, 2 node(s) were not ready, 2 node(s) were out of disk space.
Warning FailedScheduling 55s (x9 over 2m) default-scheduler 0/3 nodes are available: 1 node(s) were not ready, 1 node(s) were out of disk space, 2 Insufficient cpu.
I suspect maybe there is some garbage left from previous versions of an app. How can I check what causes problems and get rid of unnecessary processes and data?

Related

Pod creation in EKS cluster fails with FailedScheduling error

I have created a new EKS cluster with 1 worker node in a public subnet. I am able to query node, connect to the cluster, and run pod creation command, however, when I am trying to create a pod it fails with the below error got by describing the pod. Please guide.
Tolerations: node.kubernetes.io/not-ready:NoExecute op=Exists for 300s
node.kubernetes.io/unreachable:NoExecute op=Exists for 300s
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Warning FailedScheduling 81s default-scheduler 0/1 nodes are available: 1 Too many pods. preemption: 0/1 nodes are available: 1 No preemption victims found for incoming pod.
Warning FailedScheduling 16m default-scheduler 0/2 nodes are available: 2 Too many pods, 2 node(s) had untolerated taint {node.kubernetes.io/unschedulable: }, 2 node(s) were unschedulable. preemption: 0/2 nodes are available: 2 Preemption is not helpful for scheduling.
Warning FailedScheduling 16m default-scheduler 0/3 nodes are available: 2 node(s) had untolerated taint {node.kubernetes.io/unschedulable: }, 2 node(s) were unschedulable, 3 Too many pods. preemption: 0/3 nodes are available: 1 No preemption victims found for incoming pod, 2 Preemption is not helpful for scheduling.
Warning FailedScheduling 14m (x3 over 22m) default-scheduler 0/2 nodes are available: 1 node(s) had untolerated taint {node.kubernetes.io/unschedulable: }, 1 node(s) were unschedulable, 2 Too many pods. preemption: 0/2 nodes are available: 1 No preemption victims found for incoming pod, 1 Preemption is not helpful for scheduling.
Warning FailedScheduling 12m default-scheduler 0/2 nodes are available: 1 Too many pods, 2 node(s) had untolerated taint {node.kubernetes.io/unschedulable: }, 2 node(s) were unschedulable. preemption: 0/2 nodes are available: 2 Preemption is not helpful for scheduling.
Warning FailedScheduling 7m14s default-scheduler no nodes available to schedule pods
Warning FailedScheduling 105s (x5 over 35m) default-scheduler 0/1 nodes are available: 1 Too many pods. preemption: 0/1 nodes are available: 1 No preemption victims found for incoming pod.
I am able to get status of the node and it looks ready:
kubectl get nodes
NAME STATUS ROLES AGE VERSION
ip-10-0-12-61.ec2.internal Ready <none> 15m v1.24.7-eks-fb459a0
While troubleshooting I tried below options:
recreate the complete demo cluster - still the same error
try recreating pods with different images - still the same error
trying to increase to instance type to t3.micro - still the same error
reviewed security groups and other parameters in a cluster - Couldnt come to RCA
it's due to the node's POD limit or IP limit on Nodes.
So if we see official Amazon doc, t3.micro maximum 2 interface you can use and 2 private IP. Roughly you might be getting around 4 IPs to use and 1st IP get used by Node etc, There will be also default system PODs running as Daemon set and so.
Add new instance or upgrade to larger instance who can handle more pods.

Pod stuck in pending status

Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Warning FailedScheduling 18m (x145 over 3h19m) default-scheduler 0/2 nodes are available: 1 node(s) didn't match Pod's node affinity, 1 node(s) had taint {node-role.kubernetes.io/controlplane: true}, that the pod didn't tolerate.
Please provide me a solution to deploy the pods on worker server.
It seems that your pod is not getting scheduled to a Node. Can you try to run the below command ?
kubectl taint nodes <name-node-master> node-role.kubernetes.io/control-plane:NoSchedule-
To find the name of your node please use
kubectl get nodes

1 node(s) had taint {node.kubernetes.io/disk-pressure: }, that the pod didn't tolerate

My kubernetes K3s cluster gives this error:
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Warning FailedScheduling 17m default-scheduler 0/2 nodes are available: 1 node(s) didn't match Pod's node affinity/selector, 1 node(s) had taint {node.kubernetes.io/disk-pressure: }, that the pod didn't tolerate.
Warning FailedScheduling 17m default-scheduler 0/2 nodes are available: 1 node(s) didn't match Pod's node affinity/selector, 1 node(s) had taint {node.kubernetes.io/disk-pressure: }, that the pod didn't tolerate.
In order to list the taints in the cluster I executed:
kubectl get nodes -o json | jq '.items[].spec'
which outputs:
{
"podCIDR": "10.42.0.0/24",
"podCIDRs": [
"10.42.0.0/24"
],
"providerID": "k3s://antonis-dell",
"taints": [
{
"effect": "NoSchedule",
"key": "node.kubernetes.io/disk-pressure",
"timeAdded": "2021-12-17T10:54:31Z"
}
]
}
{
"podCIDR": "10.42.1.0/24",
"podCIDRs": [
"10.42.1.0/24"
],
"providerID": "k3s://knodea"
}
When I use kubectl describe node antonis-dell I get:
Name: antonis-dell
Roles: control-plane,master
Labels: beta.kubernetes.io/arch=amd64
beta.kubernetes.io/instance-type=k3s
beta.kubernetes.io/os=linux
kubernetes.io/arch=amd64
kubernetes.io/hostname=antonis-dell
kubernetes.io/os=linux
node-role.kubernetes.io/control-plane=true
node-role.kubernetes.io/master=true
node.kubernetes.io/instance-type=k3s
Annotations: csi.volume.kubernetes.io/nodeid: {"ch.ctrox.csi.s3-driver":"antonis-dell"}
flannel.alpha.coreos.com/backend-data: {"VNI":1,"VtepMAC":"f2:d5:6c:6a:85:0a"}
flannel.alpha.coreos.com/backend-type: vxlan
flannel.alpha.coreos.com/kube-subnet-manager: true
flannel.alpha.coreos.com/public-ip: 192.168.1.XX
k3s.io/hostname: antonis-dell
k3s.io/internal-ip: 192.168.1.XX
k3s.io/node-args: ["server"]
k3s.io/node-config-hash: YANNMDBIL7QEFSZANHGVW3PXY743NWWRVFKBKZ4FXLV5DM4C74WQ====
k3s.io/node-env:
{"K3S_DATA_DIR":"/var/lib/rancher/k3s/data/e61cd97f31a54dbcd9893f8325b7133cfdfd0229ff3bfae5a4f845780a93e84c","K3S_KUBECONFIG_MODE":"644"}
node.alpha.kubernetes.io/ttl: 0
volumes.kubernetes.io/controller-managed-attach-detach: true
CreationTimestamp: Fri, 17 Dec 2021 12:11:39 +0200
Taints: node.kubernetes.io/disk-pressure:NoSchedule
where it seems that node has a disk-pressure taint.
This command doesn't work: kubectl taint node antonis-dell node.kubernetes.io/disk-pressure:NoSchedule- and it seems to me that even if it worked, this is not a good solution because the taint assigned by the control plane.
Furthermore in the end of command kubectl describe node antonis-dell I observed this:
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Warning FreeDiskSpaceFailed 57m kubelet failed to garbage collect required amount of images. Wanted to free 32967806976 bytes, but freed 0 bytes
Warning FreeDiskSpaceFailed 52m kubelet failed to garbage collect required amount of images. Wanted to free 32500092928 bytes, but freed 0 bytes
Warning FreeDiskSpaceFailed 47m kubelet failed to garbage collect required amount of images. Wanted to free 32190205952 bytes, but freed 0 bytes
Warning FreeDiskSpaceFailed 42m kubelet failed to garbage collect required amount of images. Wanted to free 32196628480 bytes, but freed 0 bytes
Warning FreeDiskSpaceFailed 37m kubelet failed to garbage collect required amount of images. Wanted to free 32190926848 bytes, but freed 0 bytes
Warning FreeDiskSpaceFailed 2m21s (x7 over 32m) kubelet (combined from similar events): failed to garbage collect required amount of images. Wanted to free 30909374464 bytes, but freed 0 bytes
Maybe the disk-pressure is related to this? How can I delete the unwanted images?
Posting the answer as a community wiki, feel free to edit and expand.
node.kubernetes.io/disk-pressure:NoSchedule taint indicates that some disk pressure happens (as it's called).
The kubelet detects disk pressure based on imagefs.available, imagefs.inodesFree, nodefs.available and nodefs.inodesFree(Linux only) observed on a Node. The observed values are then compared to the corresponding thresholds that can be set on the kubelet to determine if the Node condition and taint should be added/removed.
More details on disk-pressure are available in Efficient Node Out-of-Resource Management in Kubernetes under How Does Kubelet Decide that Resources Are Low? section:
memory.available — A signal that describes the state of cluster
memory. The default eviction threshold for the memory is 100 Mi. In
other words, the kubelet starts evicting Pods when the memory goes
down to 100 Mi.
nodefs.available — The nodefs is a filesystem used by
the kubelet for volumes, daemon logs, etc. By default, the kubelet
starts reclaiming node resources if the nodefs.available < 10%.
nodefs.inodesFree — A signal that describes the state of the nodefs
inode memory. By default, the kubelet starts evicting workloads if the
nodefs.inodesFree < 5%.
imagefs.available — The imagefs filesystem is
an optional filesystem used by a container runtime to store container
images and container-writable layers. By default, the kubelet starts
evicting workloads if the imagefs.available < 15 %.
imagefs.inodesFree — The state of the imagefs inode memory. It has no default eviction threshold.
What to check
There are different things that can help, such as:
prune unused objects like images (with Docker CRI) - prune images.
The docker image prune command allows you to clean up unused images. By default, docker image prune only cleans up dangling images. A dangling image is one that is not tagged and is not referenced by any container.
check files/logs on the node if they take a lot of space.
any another reason why disk space was consumed.

Can kubernetes provide a verbose description of its scheduling decisions?

When scheduling a kubernetes Job and Pod, if the Pod can't be placed the explanation available from kubectl describe pods PODNAME looks like:
Warning FailedScheduling <unknown> default-scheduler 0/172 nodes are available:
1 Insufficient pods, 1 node(s) were unschedulable, 11 Insufficient memory,
30 Insufficient cpu, 32 node(s) didn't match node selector, 97 Insufficient nvidia.com/gpu.
That's useful but a little too vague. I'd like more detail than that.
Specifically can I list all nodes with the reason the pod wasn't scheduled to each particular node?
I was recently changing labels and the node selector and want to determine if I made a mistake somewhere in that process or if the nodes I need really are just busy.
You can find more details related to problems with scheduling particular Pod in kube-scheduler logs. If you set up your cluster with kubeadm tool, kube-scheduler as well as other key components of the cluster is deployed as a system Pod. You can list such Pods with the following command:
kubectl get pods -n kube-system
which will show you among others your kube-scheduler Pod:
NAME READY STATUS RESTARTS AGE
kube-scheduler-master-ubuntu-18-04 1/1 Running 0 2m37s
Then you can check its logs. In my example the command will look as follows:
kubectl logs kube-scheduler-master-ubuntu-18-04 -n kube-system
You should find there the information you need.
One more thing...
If you've already verified it, just ignore this tip
Let's start from the beginning...
I've just created a simple job from the example you can find here:
kubectl apply -f https://k8s.io/examples/controllers/job.yaml
job.batch/pi created
If I run:
kubectl get jobs
it shows me:
NAME COMPLETIONS DURATION AGE
pi 0/1 17m 17m
Hmm... completions 0/1 ? Something definitely went wrong. Let's check it.
kubectl describe job pi
tells me basically nothing. In it's events I can see only:
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal SuccessfulCreate 18m job-controller Created pod: pi-zxp4p
as if everything went well... but we already know it didn't. So let's investigate further. As you probably know, job-controller creates Pods that run to completion to perform certain task. From the perspective of the job-controller everything went well (we've just seen it in it's events):
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal SuccessfulCreate 23m job-controller Created pod: pi-zxp4p
It did it's part of the task and reported that everything went fine. But it's just part of the whole task. It passed actual Pod creation task further to the kube-scheduler controller as being just a job-controller it isn't responsible (and doesn't even have enough privileges) to schedule the actual Pod on particular node. If we run:
kubectl get pods
we can see one Pod in a Pending state:
NAME READY STATUS RESTARTS AGE
pi-zxp4p 0/1 Pending 0 30m
Let's describe it:
kubectl describe pod pi-zxp4p
In events we can see some very important and specific info:
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Warning FailedScheduling 20s (x24 over 33m) default-scheduler 0/1 nodes are available: 1 node(s) had taints that the pod didn't tolerate.
so now we know the actual reason why our Pod couldn't be scheduled.
Pay attention to different fields of the event:
From: default-scheduler - it means that the message was originated from our kube-scheduler.
Type: Warning, which isn't as important as Critical or Error so chances are that it may not appear in kube-scheduler logs if the last one was started with the default level of log verbosity.
You can read here that:
As per the comments, the practical default level is V(2). Developers
and QE environments may wish to run at V(3) or V(4). If you wish to
change the log level, you can pass in -v=X where X is the desired
maximum level to log.

How to diagnose a stuck Kubernetes rollout / deployment?

It seems a deployment has gotten stuck. How can I diagnose this further?
kubectl rollout status deployment/wordpress
Waiting for rollout to finish: 2 out of 3 new replicas have been updated...
It's stuck on that for ages already. It is not terminating the two older pods:
kubectl get pods
NAME READY STATUS RESTARTS AGE
nfs-server-r6g6w 1/1 Running 0 2h
redis-679c597dd-67rgw 1/1 Running 0 2h
wordpress-64c944d9bd-dvnwh 4/4 Running 3 3h
wordpress-64c944d9bd-vmrdd 4/4 Running 3 3h
wordpress-f59c459fd-qkfrt 0/4 Pending 0 22m
wordpress-f59c459fd-w8c65 0/4 Pending 0 22m
And the events:
kubectl get events --all-namespaces
NAMESPACE LAST SEEN FIRST SEEN COUNT NAME KIND SUBOBJECT TYPE REASON SOURCE MESSAGE
default 25m 2h 333 wordpress-686ccd47b4-4pbfk.153408cdba627f50 Pod Warning FailedScheduling default-scheduler No nodes are available that match all of the predicates: Insufficient cpu (1), Insufficient memory (2), MatchInterPodAffinity (1).
default 25m 2h 337 wordpress-686ccd47b4-vv9dk.153408cc8661c49d Pod Warning FailedScheduling default-scheduler No nodes are available that match all of the predicates: Insufficient cpu (1), Insufficient memory (2), MatchInterPodAffinity (1).
default 22m 22m 1 wordpress-686ccd47b4.15340e5036ef7d1c ReplicaSet Normal SuccessfulDelete replicaset-controller Deleted pod: wordpress-686ccd47b4-4pbfk
default 22m 22m 1 wordpress-686ccd47b4.15340e5036f2fec1 ReplicaSet Normal SuccessfulDelete replicaset-controller Deleted pod: wordpress-686ccd47b4-vv9dk
default 2m 22m 72 wordpress-f59c459fd-qkfrt.15340e503bd4988c Pod Warning FailedScheduling default-scheduler No nodes are available that match all of the predicates: Insufficient cpu (1), Insufficient memory (2), MatchInterPodAffinity (1).
default 2m 22m 72 wordpress-f59c459fd-w8c65.15340e50399a8a5a Pod Warning FailedScheduling default-scheduler No nodes are available that match all of the predicates: Insufficient cpu (1), Insufficient memory (2), MatchInterPodAffinity (1).
default 22m 22m 1 wordpress-f59c459fd.15340e5039d6c622 ReplicaSet Normal SuccessfulCreate replicaset-controller Created pod: wordpress-f59c459fd-w8c65
default 22m 22m 1 wordpress-f59c459fd.15340e503bf844db ReplicaSet Normal SuccessfulCreate replicaset-controller Created pod: wordpress-f59c459fd-qkfrt
default 3m 23h 177 wordpress.1533c22c7bf657bd Ingress Normal Service loadbalancer-controller no user specified default backend, using system default
default 22m 22m 1 wordpress.15340e50356eaa6a Deployment Normal ScalingReplicaSet deployment-controller Scaled down replica set wordpress-686ccd47b4 to 0
default 22m 22m 1 wordpress.15340e5037c04da6 Deployment Normal ScalingReplicaSet deployment-controller Scaled up replica set wordpress-f59c459fd to 2
You can use describe kubectl describe po wordpress-f59c459fd-qkfrt but from the message the pods cannot be scheduled in any of the nodes.
Provide more capacity, like try to add a node, to allow the pods to be scheduled.
The new deployment had a replica count of 3 while the previous had 2. I assumed I could set a high value for replica count and it would try to deploy as many replicas as it could before it reaches it's resource capacity. However this does not seem to be the case...