Today when I checked the kubernetes cluster, some of the pod shows the status was evicted. But I only see the evicted status and could not found the detail logs why the pod was evicted. Disk Pressure? CPU pressure? what should I do to found the reason of the pod evicted?
you can try to look at logs of that particular pod.
Do a describe on that pod and see if you find anything.
kubectl get pods -o wide
try the above command to see on which node it was running and run a describe on that node and you find at-least some information related to the eviction.
Eviction is a process where a Pod assigned to a Node is asked for termination. One of the most common cases in Kubernetes is Preemption, where in order to schedule a new Pod in a Node with limited resources, another Pod needs to be terminated to leave resources to the first one.
So, to answer your question, the pod would have got evicted with limited CPU or memory resources allocated.
Related
I have a pod that had a Pod Disruption Budget that says at least one has to be running. While it generally works very well it leads to a peculiar problem.
I have this pod sometimes in a failed state (due to some development) so I have two pods, scheduled for two different nodes, both in a CrashLoopBackOff state.
Now if I want to run a drain or k8s version upgrade, what happens is that pod cannot ever be evicted since it knows that there should be at least one running, which will never happen.
So k8s does not evict a pod due to Pod Disruption Budget even if the pod is not running. Is there a way to do something with this? I think ideally k8s should treat failed pods as candidates for eviction regardless of the budget (as deleting a failing pod cannot "break" anything anyway)
...if I want to run a drain or k8s version upgrade, what happens is that pod cannot ever be evicted since it knows that there should be at least one running...
kubectl drain --disable-eviction <node> will delete pod that is protected by PDB. Since you are fully aware of the downtime, you can first delete the PDB in question before draining the node.
I hit this issue too during k8s upgrade. Fyi, as mentioned in the other answer, kubectl drain --disable-eviction <node> may cause service downtime, and deleting pods might not work always when deleted pods are immediately recreated by the deployment managing the pods. Also, even if the pods are deleted successfully, it may cause service downtime depending on PodDisruptionBudget.
Instead, I increased the number of replicas of the pods in the deployment to honor PodDisruptionBudget.minAvailable or PodDisruptionBudget.maxUnavailable and was able to successfully upgrade k8s while honoring PodDisruptionBudget.
We are running a kubernetes (1.9.4) cluster with 5 masters and 20 worker nodes. We are running one statefulset pod with replication 3 among other pods in this cluster. Initially the statefulset pods are distributed to 3 nodes. However the pod-2 on node-2 got evicted due to the disk pressure on node-2. However, when the pod-2 is evicted it went to node-1 where pod-1 was already running and node-1 was already experiencing node pressure. As per our understanding, the kubernetes-scheduler should not have scheduled a pod (non critical) to a node where there is already disk pressure. Is this the default behavior to not schedule the pods to a node under disk pressure or is it allowed. The reason is, at the same time we do observe, node-0 without any disk issue. So we were hoping that evicted pod on node-2 should have ideally come on node-0 instead of node-1 which is under disk pressure.
Another observation we had was, when the pod-2 on node-2 was evicted, we see that same pod is successfully scheduled and spawned and moved to running state in node-1. However we still see "Failed to admit pod" error in node-2 for many times for the same pod-2 that was evicted. Is this any issue with the kube-scheduler.
Yes, Scheduler should not assign a new pod to a node with a DiskPressure Condition.
However, I think you can approach this problem from few different angles.
Look into configuration of your scheduler:
./kube-scheduler --write-config-to kube-config.yaml
and check it needs any adjustments. You can find info about additional options for kube-scheduler here:
You can also configure aditional scheduler(s) depending on your needs. Tutorial for that can be found here
Check the logs:
kubeclt logs: kube-scheduler events logs
journalctl -u kubelet: kubelet logs
/var/log/kube-scheduler.log (on the master)
Look more closely at Kubelet's Eviction Thresholds (soft and hard) and how much node memory capacity is set.
Bear in mind that:
Kubelet may not observe resources pressure fast enough
or
Kubelet may evict more Pods than needed due to stats collection timing gap
Please check out my suggestions and let me know if they helped.
If Pod's status is Failed, Kubernetes will try to create new Pods until it reaches terminated-pod-gc-threshold in kube-controller-manager. This will leave many Failed Pods in a cluster and need to be cleaned up.
Are there other reasons except Evicted that will cause Pod Failed?
There can be many causes for the POD status to be FAILED. You just need to check for problems(if there exists any) by running the command
kubectl -n <namespace> describe pod <pod-name>
Carefully check the EVENTS section where all the events those occurred during POD creation are listed. Hopefully you can pinpoint the cause of failure from there.
However there are several reasons for POD failure, some of them are the following:
Wrong image used for POD.
Wrong command/arguments are passed to the POD.
Kubelet failed to check POD liveliness(i.e., liveliness probe failed).
POD failed health check.
Problem in network CNI plugin (misconfiguration of CNI plugin used for networking).
For example:
In the above example, the image "not-so-busybox" couldn't be pulled as it doesn't exist, so the pod FAILED to run. The pod status and events clearly describe the problem.
Simply do this:
kubectl get pods <pod_name> -o yaml
And in the output, towards the end, you can see something like this:
This will give you a good idea of where exactly did the pod fail and what happened.
PODs will not survive scheduling failures, node failures, or other evictions, such as lack of resources, or in the case of node maintenance.
Pods should not be created manually but almost always via controllers like Deployments (self-healing, replication etc).
Reason why pod failed or was terminated can be obtain by
kubectl describe pod <pod_name>
Others situation I have encountered when pod Failed:
Issues with image (not existing anymore)
When pod is attempting to access i.e ConfigMap or Secrets but it is not found in namespace.
Liveness Probe Failure
Persistent Volume fails to mount
Validation Error
In addition, eviction is based on resources - EvictionPolicy
It can be also caused by DRAINing the Node/Pod. You can read about DRAIN here.
k8s version: 1.12.1
I created pod with api on node and allocated an IP (through flanneld). When I used the kubectl describe pod command, I could not get the pod IP, and there was no such IP in etcd storage.
It was only a few minutes later that the IP could be obtained, and then kubectl get pod STATUS was Running.
Has anyone ever encountered this problem?
Like MatthiasSommer mentioned in comment, process of creating pod might take a while.
If POD will stay for a longer time in ContainerCreating status you can check what is stopping it change to status Running by command:
kubectl describe pod <pod_name>
Why creating of pod may take a longer time?
Depends on what is included in manifest, pod can share namespace, storage volumes, secrets, assignin resources, configmaps etc.
kube-apiserver validates and configures data for api objects.
kube-scheduler needs to check and collect resurces requrements, constraints, etc and assign pod to the node.
kubelet is running on each node and is ensures that all containers fulfill pod specification and are healty.
kube-proxy is also running on each node and it is responsible for network on pod.
As you see there are many requests, validates, syncs and it need a while to create pod fulfill all requirements.
I'm looking at Prometheus metrics in a Grafana dashboard, and I'm confused by a few panels that display metrics based on an ID that is unfamiliar to me. I assume that /kubepods/burstable/pod99b2fe2a-104d-11e8-baa7-06145aa73a4c points to a single pod, and I assume that /kubepods/burstable/pod99b2fe2a-104d-11e8-baa7-06145aa73a4c/<another-long-string> resolves to a container in the pod, but how do I resolve this ID to the pod name and a container i.e. how to do I map this ID to the pod name I see when I run kubectl get pods?
I already tried running kubectl describe pods --all-namespaces | grep "99b2fe2a-104d-11e8-baa7-06145aa73a4c" but that didn't turn up anything.
Furthermore, there are several subpaths in /kubepods, such as /kubepods/burstable and /kubepods/besteffort. What do these mean and how does a given pod fall into one or another of these subpaths?
Lastly, where can I learn more about what manages /kubepods?
Prometheus Query:
sum (container_memory_working_set_bytes{id!="/",kubernetes_io_hostname=~"^$Node$"}) by (id)
/
Thanks for reading.
Eric
OK, now that I've done some digging around, I'll attempt to answer all 3 of my own questions. I hope this helps someone else.
How to do I map this ID to the pod name I see when I run kubectl get pods?
Given the following, /kubepods/burstable/pod99b2fe2a-104d-11e8-baa7-06145aa73a4c, the last bit is the pod UID, and can be resolved to a pod by looking at the metadata.uid property on the pod manifest:
kubectl get pod --all-namespaces -o json | jq '.items[] | select(.metadata.uid == "99b2fe2a-104d-11e8-baa7-06145aa73a4c")'
Once you've resolved the UID to a pod, we can resolve the second UID (container ID) to a container by matching it with the .status.containerStatuses[].containerID in the pod manifest:
~$ kubectl get pod my-pod-6f47444666-4nmbr -o json | jq '.status.containerStatuses[] | select(.containerID == "docker://5339636e84de619d65e1f1bd278c5007904e4993bc3972df8628668be6a1f2d6")'
Furthermore, there are several subpaths in /kubepods, such as /kubepods/burstable and /kubepods/besteffort. What do these mean and how does a given pod fall into one or another of these subpaths?
Burstable, BestEffort, and Guaranteed are Quality of Service (QoS) classes that Kubernetes assigns to pods based on the memory and cpu allocations in the pod spec. More information on QoS classes can be found here https://kubernetes.io/docs/tasks/configure-pod-container/quality-service-pod/.
To quote:
For a Pod to be given a QoS class of Guaranteed:
Every Container in the Pod must have a memory limit and a memory request, and they must be the same.
Every Container in the Pod must have a cpu limit and a cpu request, and they must be the same.
A Pod is given a QoS class of Burstable if:
The Pod does not meet the criteria for QoS class Guaranteed.
At least one Container in the Pod has a memory or cpu request.
For a Pod to be given a QoS class of BestEffort, the Containers in the
Pod must not have any memory or cpu limits or requests.
Lastly, where can I learn more about what manages /kubepods?
/kubepods/burstable, /kubepods/besteffort, and /kubepods/guaranteed are all a part of the cgroups hierarchy, which is located in /sys/fs/cgroup directory. Cgroups is what manages resource usage for container processes such as CPU, memory, disk I/O, and network. Each resource has its own place in the cgroup hierarchy filesystem, and in each resource sub-directory are /kubepods subdirectories. More info on cgroups and Docker containers here: https://docs.docker.com/config/containers/runmetrics/#control-groups