Reason for repeated pod eviction - kubernetes

A node on my 5-node cluster had memory usage peak at ~90% last night. Looking around with kubectl, a single pod (in a 1-replica deployment) was the culprit of the high memory usage and was evicted.
However, logs show that the pod was evicted about 10 times (AGE corresponds to around the time when memory usage peaked, all evictions on the same node)
NAMESPACE NAME READY STATUS RESTARTS AGE
example-namespace example-deployment-84f8d7b6d9-2qtwr 0/1 Evicted 0 14h
example-namespace example-deployment-84f8d7b6d9-6k2pn 0/1 Evicted 0 14h
example-namespace example-deployment-84f8d7b6d9-7sbw5 0/1 Evicted 0 14h
example-namespace example-deployment-84f8d7b6d9-8kcbg 0/1 Evicted 0 14h
example-namespace example-deployment-84f8d7b6d9-9fw2f 0/1 Evicted 0 14h
example-namespace example-deployment-84f8d7b6d9-bgrvv 0/1 Evicted 0 14h
...
node memory usage graph:
Status: Failed
Reason: Evicted
Message: Pod The node had condition: [MemoryPressure].
My question is to do with how or why this situation would happen, and/or what steps can I take to debug and figure out why the pod was repeatedly evicted? The pod uses an in-memory database so it makes sense that after some time it eats up a lot of memory, but it's memory usage on boot shouldn't be abnormal at all.
My intuition would have been that the high memory usage pod gets evicted, deployment replaces the pod, new pod isn't using that much memory, all is fine. But the eviction happened many times, which doesn't make sense to me.

The simplest steps are to run the following commands to debug and read the logs from the specific Pod.
Look at the Pod's states and last restarts:
kubectl describe pods ${POD_NAME}
Look for it's node name and run the same for the node:
kubectl describe node ${NODE_NAME}
And you will see some information in Conditions section.
Examine pod logs:
kubectl logs --previous ${POD_NAME} ${CONTAINER_NAME}
If you want to rerun your pod and watch the logs directly, rerun your pod and do the command:
kubectl logs ${POD_NAME} -f
More info with kubectl logs command and its flags here

Related

Kubernetes pods are pending not active

If I run this:
kubectl get pods -n kube-system
I get this output:
NAME READY STATUS RESTARTS AGE
coredns-6fdd4f6856-6bl64 0/1 Pending 0 1h
coredns-6fdd4f6856-xgrbm 0/1 Pending 0 1h
kubernetes-dashboard-65c76f6c97-c69jg 0/1 Pending 0 13m
supposedly I need a kubernetes scheduler in order to actually launch containers? Does anyone know how to initiate a kube-scheduler?
More than a Kubernetes scheduler issue, it looks like it's more about not having enough resources on your nodes (or no nodes at all) in your cluster to schedule any workloads. You can check your nodes with:
$ kubectl get nodes
Also, you are not likely able to see any control plane resource on the kube-system namespace because you may be using managed services like EKS or GKE.

my kubernetes cluster does not scale down

I have kuberentes cluster. One master and one worker.
I install metric-server for auto scaling and then i run stress test
$ kubectl run autoscale-test --image=ubuntu:16.04 --requests=cpu=1000m --command sleep 1800
deployment "autoscale-test" created
$ kubectl autoscale deployment autoscale-test --cpu-percent=25 --min=1 --max=5
deployment "autoscale-test" autoscaled
$ kubectl get hpa
NAME REFERENCE TARGETS MINPODS MAXPODS REPLICAS AGE
autoscale-test Deployment/autoscale-test 0% / 25% 1 5 1 1m
$ kubectl get pod
NAME READY STATUS RESTARTS AGE
autoscale-test-59d66dcbf7-9fqr8 1/1 Running 0 9m
kubectl exec autoscale-test-59d66dcbf7-9fqr8 -- apt-get update
kubectl exec autoscale-test-59d66dcbf7-9fqr8 -- apt-get install stress
$ kubectl exec autoscale-test-59d66dcbf7-9fqr8 -- stress --cpu 2 --timeout 600s &
stress: info: [227] dispatching hogs: 2 cpu, 0 io, 0 vm, 0 hdd
everything works fine and the pod was auto scaled but after that the pod that was created by autoscale is still running and they do not terminate after the stress test
the hpa shows that the 0% of cpu is in use but the 5 autoscaled pod still running
#kubectl get hpa
NAME REFERENCE TARGETS MINPODS MAXPODS REPLICAS AGE
autoscale-test Deployment/autoscale-test 0%/25% 1 5 5 74m
#kubectl get pods --all-namespaces
NAMESPACE NAME READY STATUS RESTARTS AGE
default autoscale-test-8f4d84bbf-7ddjw 1/1 Running 0 61m
default autoscale-test-8f4d84bbf-bmr59 1/1 Running 0 61m
default autoscale-test-8f4d84bbf-cxt26 1/1 Running 0 61m
default autoscale-test-8f4d84bbf-x9jws 1/1 Running 0 61m
default autoscale-test-8f4d84bbf-zbhvk 1/1 Running 0 71m
I wait for an hour but nothing happen
From the documentation:
--horizontal-pod-autoscaler-downscale-delay: The value for this option is a duration that specifies how long the autoscaler has to wait before another downscale operation can be performed after the current one has completed. The default value is 5 minutes (5m0s).
Note: When tuning these parameter values, a cluster operator should be
aware of the possible consequences. If the delay (cooldown) value is
set too long, there could be complaints that the Horizontal Pod
Autoscaler is not responsive to workload changes. However, if the
delay value is set too short, the scale of the replicas set may keep
thrashing as usual.
Finally, just before HPA scales the target, the scale recommendation
is recorded. The controller considers all recommendations within a
configurable window choosing the highest recommendation from within
that window. This value can be configured using the
--horizontal-pod-autoscaler-downscale-stabilization-window flag, which defaults to 5 minutes. This means that scaledowns will occur
gradually, smoothing out the impact of rapidly fluctuating metric
values.

Check Kubernetes Pod Status for Completed State

Is there a way to check whether a pod status is in the completed state? I have a Pod that I only wanted to use once (where init containers didn't quite serve my purpose) and want to write a check to wait for Completed status.
I am able to get it for Running, Pending, but not for Completed.
Running:
[user#sandbox gcp_kubernetes_installation]$ kubectl get pods --field-selector=status.phase=Running -n mynamespace
NAME READY STATUS RESTARTS AGE
mssql-deployment-795dfcf9f7-l2b44 1/1 Running 0 6m
data-load-pod 1/1 Running 0 5m
Pending:
[user#sandbox gcp_kubernetes_installation]$ kubectl get pods --field-selector=status.phase=Pending -n mynamespace
NAME READY STATUS RESTARTS AGE
app-deployment-0 0/1 Pending 0 5m
Completed:
[user#sandbox gcp_kubernetes_installation]$ kubectl get pod -n namespace
NAME READY STATUS RESTARTS AGE
mssql-deployment-795dfcf9f7-l2b44 1/1 Running 0 11m
data-load-data-load-pod 0/1 Completed 0 10m
app-deployment-0 0/1 Pending 0 10m
[user#sandbox gcp_kubernetes_installation]$ kubectl get pods --field-selector=status.phase=Completed -n namespace
No resources found.
I believe there may be a bug in the field-selector, but just wondering if there are any fixes or details on a workaround.
The correct status.phase for completed pods is Succeeded.
So, to filter only completed pods, you should use this:
kubectl get pod --field-selector=status.phase=Succeeded
Although, the use of bare pods is not recommended. Consider using a Job Controller:
A Job creates one or more Pods and ensures that a specified number of
them successfully terminate. As pods successfully complete, the Job
tracks the successful completions.
You can check job conditions and wait for them with this:
kubectl wait --for=condition=complete job/myjob

istio-pilot on minikube is always in pending state

istio-pilot pod on minikube kubernetes cluster is always in Pending state. Increased CPU=4 and memory=8GB. Still the status of istio-pilot pod is Pending.
Is specific change required to run istio on minikube other than the ones mentioned in documentation?
Resolved the issue . Im running minikube with Virtual box and running minikube with higher memory and CPU does not reflect until minikube is deleted and started with new parameters. Without this it was resulting in Insufficient memory.
I saw istio-pilot in 1.1 rc3 consume a lot of CPU and was in Pending state due to the following message in kubectl describe <istio-pilot pod name> -n=istio-system:
Warning FailedScheduling 1m (x25 over 3m) default-scheduler 0/2 nodes are available:
1 Insufficient cpu, 1 node(s) had taints that the pod didn't tolerate.
I was able to reduce it by doing --set pilot.resources.requests.cpu=30m when installing istio using helm.
https://github.com/istio/istio/blob/1.1.0-rc.3/install/kubernetes/helm/istio/charts/pilot/values.yaml#L16

What's the meaning of "READY=2/2" output by command "kubectl get pod $yourpod"

kubectl get pod run-sh-1816639685-xejyk
NAME READY STATUS RESTARTS AGE
run-sh-1816639685-xejyk 2/2 Running 0 26m
What's the meaning of "READY=2/2"? The same with "1/1"?
it shows how many containers in a pod are considered ready. You can have some containers starting faster then others or having their readiness checks not yet fulfilled (or still in initial delay). In such cases there will be less containers ready in pod then their total number (ie. 1/2) hence the whole pod will not be considered ready.