Pods getting scheduled irrespective of insufficient resources - kubernetes

Upon submitting few jobs (say, 50) targeted on a single node, I am getting pod status as "OutOfpods" for few jobs. I have reduced the maximum number of pods on this worker node to "10", but still observe above issue.
Kubelet configuration is default with no changes.
kubernetes version: v1.22.1
Worker Node
Os: CentOs 7.9
memory: 528 GB
CPU: 40 cores
kubectl describe pod :
Warning OutOfpods 72s kubelet Node didn't have enough resource: pods, requested: 1, used: 10, capacity: 10

I have realized this to be a known issue for kubelet v1.22 as confirmed here. The fix will be reflected in the next latest release.
Simple resolution here is to downgrade kubernetes to v1.21.

I'm seeing this problem as well w/ K8s v1.22. I'm scheduling around 100 containers w/ one node with an extended resource called "executors" and a capacity of 300 per node. Each container requests 10. The pods stay pending for a long time, but as soon as they are assigned by the scheduler, kubelet on the node says its out of resource. Its just a warning I suppose, but it actually leads to "failed" status on the pod atleast briefly. I have to check whether its re-created as pending or not.
Normal Scheduled 40m default-scheduler Successfully assigned ci-smoke/userbench-4a306d7-l1all-8zv7n-3803535768 to sb-bld-srv-39
Warning OutOfwdc.com/executors 40m kubelet Node didn't have enough resource:wdc.com/executors, requested: 10, used: 300, capacity: 300```

Related

Unable to launch new pods despite resources seemingly available

I'm unable to launch new pods despite resources seemingly being available.
Judging from the below screenshot there should be room for about 40 new pods.
And also judging from the following screenshot the nodes seems fairly underutilized
However I'm currently facing the below error message
0/3 nodes are available: 1 Insufficient cpu, 2 node(s) had volume node affinity conflict.
And last night it was the following
0/3 nodes are available: 1 Too many pods, 2 node(s) had volume node affinity conflict.
Most of my services require very little memory and cpu. And therefore their resources are configured as seen below
resources:
limits:
cpu: 100m
memory: 64Mi
requests:
cpu: 100m
memory: 32Mi
Why I can't deploy more pods? And how to fix this?
Your problem is "volume node affinity conflict".
From Kubernetes Pod Warning: 1 node(s) had volume node affinity conflict:
The error "volume node affinity conflict" happens when the persistent volume claims that the pod is using are scheduled on different zones, rather than on one zone, and so the actual pod was not able to be scheduled because it cannot connect to the volume from another zone.
First, try to investigate exactly where the problem is. You can find a detailed guide here. You will need commands like:
kubectl get pv
kubectl describe pv
kubectl get pvc
kubectl describe pvc
Then you can delete the PV and PVC and move pods to the same zone along with the PV and PVC.
volume node affinity conflict - the volume you tried to mount is not available on any of the node. You can resolve this or paste your volumes section to the question for further examination.

Is it possible to get the details of the node where the pod ran before restart?

I'm running a kubernetes cluster of 20+ nodes. And one pod in a namespace got restarted. The pod got killed due to OOM with exit code 137 and restarted again as expected. But would like to know the node in which the pod was running earlier. Any place we could check the logs for the info? Like tiller, kubelet, kubeproxy etc...
But would like to know the node in which the pod was running earlier.
If a pod is killed with ExitCode: 137, e.g. when it used more memory than its limit, it will be restarted on the same node - not re-scheduled. For this, check your metrics or container logs.
But Pods can also be killed due to over-committing a node, see e.g. How to troubleshoot Kubernetes OOM and CPU Throttle.

Kubernetes pod eviction schedules evicted pod to node already under DiskPressure

We are running a kubernetes (1.9.4) cluster with 5 masters and 20 worker nodes. We are running one statefulset pod with replication 3 among other pods in this cluster. Initially the statefulset pods are distributed to 3 nodes. However the pod-2 on node-2 got evicted due to the disk pressure on node-2. However, when the pod-2 is evicted it went to node-1 where pod-1 was already running and node-1 was already experiencing node pressure. As per our understanding, the kubernetes-scheduler should not have scheduled a pod (non critical) to a node where there is already disk pressure. Is this the default behavior to not schedule the pods to a node under disk pressure or is it allowed. The reason is, at the same time we do observe, node-0 without any disk issue. So we were hoping that evicted pod on node-2 should have ideally come on node-0 instead of node-1 which is under disk pressure.
Another observation we had was, when the pod-2 on node-2 was evicted, we see that same pod is successfully scheduled and spawned and moved to running state in node-1. However we still see "Failed to admit pod" error in node-2 for many times for the same pod-2 that was evicted. Is this any issue with the kube-scheduler.
Yes, Scheduler should not assign a new pod to a node with a DiskPressure Condition.
However, I think you can approach this problem from few different angles.
Look into configuration of your scheduler:
./kube-scheduler --write-config-to kube-config.yaml
and check it needs any adjustments. You can find info about additional options for kube-scheduler here:
You can also configure aditional scheduler(s) depending on your needs. Tutorial for that can be found here
Check the logs:
kubeclt logs: kube-scheduler events logs
journalctl -u kubelet: kubelet logs
/var/log/kube-scheduler.log (on the master)
Look more closely at Kubelet's Eviction Thresholds (soft and hard) and how much node memory capacity is set.
Bear in mind that:
Kubelet may not observe resources pressure fast enough
or
Kubelet may evict more Pods than needed due to stats collection timing gap
Please check out my suggestions and let me know if they helped.

How to find out the minimum and maximum usable CPU and memory space left on a kubernetes node

I'm trying to deploy Magento on a GCE n1-standard-1 machine, but I keep getting the following error message.
pod (magento-magento-1486272877-zd34d) failed to fit in any node fit failure summary on nodes : Insufficient cpu (1)
I'm using the official Magento helm chart, and I've configured the values.yml file to contain very low CPU requests: cpu: 25m
When I look at the node details on the kubernetes dashboard, I see that my CPU is already spinning at 0.728 (72.80%) while it's not even doing anything besides the system containers. Also see image below:
Does this mean I have 1 - 0.728 = 0.272m left for container requests? Then why is kubernetes still telling me that it has insufficient CPU when specifying 0.25m?
Thanks for your help.
I didn't see that the CPU limits were 0.248 according to the picture in my post, so I put cpu: 20m and it worked.
There is a nifty kubectl command to get information about your nodes resources...
kubectl top nodes
And pods...
kubectl top pods
Pods with containers
kubectl top pods --containers=true

Kubernetes Horizontal Pod Autoscaler on GKE - "failed to get CPU utilization"

I am fairly new to Kubernetes and GKE (Google Container Engine) as a whole, so I was playing with the horizontal pod autoscaling and cluster autoscaling features by hitting my load balancer hard enough to make it scale up enough pods that it needed more instances, so it scaled those up but then it got to the point that there were some pods in Pending state, but it had also reached the max number of instances for the autoscaling cluster, so they were left in Pending state.
I then stopped the load test hoping it would come down on its own, but it wouldn't. I looked at kubectl describe hpa and I would see errors like:
7m 18s 18 {horizontal-pod-autoscaler } Warning FailedGetMetrics failed to get CPU consumption and request: metrics obtained for 4/5 of pods
7m 18s 18 {horizontal-pod-autoscaler } Warning FailedComputeReplicas failed to get CPU utilization: failed to get CPU consumption and request: metrics obtained for 4/5 of pods
There are actually only 4 pods running (and none in pending state), and looking at the heapster logs (kubectl logs -f heapster-v1.1.0-<id> --namespace=kube-system heapster) I can see it is actually looking for metrics in a pod that doesn't exist anymore (this would be the mystery 5th pod it's complaining about).
The issue with this is that because it is missing the 5th pod, it can't finish getting the current CPU utilization for the 4 pods that are running, and thus horizontal pod autoscaling doesn't work.
Any ideas how to get out of a situation like this?
I've tried removing the hpa and creating it again, but it didn't help.