My pod status showing OOM-killed, but doesn't restarted container. Why? - kubernetes

I am running POD with three containers, one app POD and two sidecar containers. Here app container memory limit is exceeded and doesn't restart. Inside the app pod, a Java app is running. Here is the describe POD command status.
Host Port: 0/TCP
State: Waiting
Reason: CrashLoopBackOff
Last State: Terminated
Reason: OOMKilled
Exit Code: 137
Started: Wed, 29 Sep 2021 09:41:50 +0000
Finished: Wed, 29 Sep 2021 09:42:47 +0000
Ready: False
Restart Count: 14
Limits:
memory: 300Mi
Requests:
memory: 300Mi

With Restart Count: 14 it seems that the pod did restart. However the Restart does an exponential backoff. You can get insight into when the next restart will happen by looking at the Events and the CrashLoopBackOff: "Back-off 20s restarting failed part will tell you when it will be tried next.

Related

Kubernetes OOMKilled with multiple containers

I run my service in Kubernetes cluster (AWS EKS). Recently, I have added a new container (side car) to the pod. After that, I've started observing OOMKilled, but metrics do not show any high memory usage. This is the config:
Containers:
side-car:
Container ID: ...
Image: ...
...
State: Running
Started: Mon, 21 Feb 2022 09:11:07 +0100
Last State: Terminated
Reason: OOMKilled
Exit Code: 137
Started: Thu, 17 Feb 2022 18:36:28 +0100
Finished: Mon, 21 Feb 2022 09:11:06 +0100
Ready: True
Restart Count: 1
Limits:
cpu: 1
memory: 2Gi
Requests:
cpu: 1
memory: 2Gi
...
...
my-service:
Container ID: ...
...
...
...
State: Running
Started: Thu, 17 Feb 2022 18:36:28 +0100
Ready: True
Restart Count: 0
Limits:
cpu: 3
memory: 3Gi
Requests:
cpu: 2
memory: 3Gi
Both side car and my service do have memory limits (and request) set. During OOMKilled none of the containers use more memory than requested/limited. E.g. in one case side-car was using 20MiB, my-service: 800MiB, way low than limits are. Still Kubernetes restarted the container (side-car).
Just for the record, before adding the side-car, my-service was running without problem and no OOMKilled was observed.
Maybe yous sidecar container do has some performance issues that you cant catch and at some point of time it do request more than limit?
Check OOM kill due to container limit reached
State: Running
Started: Thu, 10 Oct 2019 11:14:13 +0200
Last State: Terminated
Reason: OOMKilled
Exit Code: 137
The Exit code 137 is important because it means that the system
terminated the container as it tried to use more memory than its
limit.
In order to monitor this, you always have to look at the use of memory
compared to the limit. Percentage of the node memory used by a pod is
usually a bad indicator as it gives no indication on how close to the
limit the memory usage is. In Kubernetes, limits are applied to
containers, not pods, so monitor the memory usage of a container vs.
the limit of that container.
Most likely you don't get to see when the memory usage goes above the limit, as usually the metrics are pulled at defined intervals (cAdvisor, which is currently the de-facto source for metrics, only refreshes its data every 10-15 seconds by default).
How to troubleshoot further? Connect to the respective node that's running the sidecar container and look at the kernel logs. You can use tail /var/log/syslog -f | grep -i kernel (a sample of how this looks like is in this movie). You should see 2 lines like the ones below, which will indicate the aftermath of the container's cgroup limit being breached, and the respective process terminated:
Jan 16 21:33:51 aks-agentpool-20086390-vmss00003K kernel: [ 8334.895437] Memory cgroup out of memory: Killed process 14300 (dotnet) total-vm:172050596kB, anon-rss:148368kB, file-rss:25056kB, shmem-rss:0kB, UID:0 pgtables:568kB oom_score_adj:-997
Jan 16 21:33:51 aks-agentpool-20086390-vmss00003K kernel: [ 8334.906653] oom_reaper: reaped process 14300 (dotnet), now anon-rss:0kB, file-rss:0kB, shmem-rss:0kB
Pay special attention to the anon-rss and file-rss values, and compare their sum against the limit you've set for the sidecar container.
If you have control over the code that runs in the sidecar container, then you can add some instrumentation code to print out the amount of memory used at small enough intervals, and simply output that to the console. Once the container is OOMKilled, you'll still have access to the logs to see what happened (use the --previous flag with the kubectl logs command). Have a look at this answer for more info.
Including this just for completeness: your system could potentially run so low on memory that somehow the OOM killer is invoked and your sidecar container is chosen to be terminated (such a scenario is described here). That's highly unlikely in your case though, as from my understanding you get the sidecar container to be terminated repeatedly, which most likely points to an issue with that container only.

Why Redis in K8S restarts all the time?

Redis pod restarts like crazy.
How can I find out the reason for this behavior?
I figured out, that the resources quota should be upgraded, but I have no clue what would be the best cpu/ram ratio. And why there are no crash events or logs?
Here are the pods:
> kubectl get pods
redis-master-5d9cfb54f8-8pbgq 1/1 Running 33 3d16h
Here are the logs:
> kubectl logs --follow redis-master-5d9cfb54f8-8pbgq
[1] 08 Sep 07:02:12.152 # Server started, Redis version 2.8.19
[1] 08 Sep 07:02:12.153 # WARNING you have Transparent Huge Pages (THP) support enabled in your kernel. This will create latency and memory usage issues with Redis. To fix this issue run the command 'echo never > /sys/kernel/mm/transparent_hugepage/enabled' as root, and add it to your /etc/rc.local in order to retain the setting after a reboot. Redis must be restarted after THP is disabled.
[1] 08 Sep 07:02:12.153 * The server is now ready to accept connections on port 6379
[1] 08 Sep 07:03:13.085 * 10000 changes in 60 seconds. Saving...
[1] 08 Sep 07:03:13.085 * Background saving started by pid 8
[8] 08 Sep 07:03:13.101 * DB saved on disk
[8] 08 Sep 07:03:13.101 * RDB: 0 MB of memory used by copy-on-write
[1] 08 Sep 07:03:13.185 * Background saving terminated with success
[1] 08 Sep 07:04:14.018 * 10000 changes in 60 seconds. Saving...
[1] 08 Sep 07:04:14.018 * Background saving started by pid 9
...
[93] 08 Sep 08:38:30.160 * DB saved on disk
[93] 08 Sep 08:38:30.164 * RDB: 2 MB of memory used by copy-on-write
[1] 08 Sep 08:38:30.259 * Background saving terminated with success
[1] 08 Sep 08:39:31.072 * 10000 changes in 60 seconds. Saving...
[1] 08 Sep 08:39:31.074 * Background saving started by pid 94
Here is previous logs of the same pod.
> kubectl logs --previous --follow redis-master-5d9cfb54f8-8pbgq
[1] 08 Sep 09:41:46.057 * Background saving terminated with success
[1] 08 Sep 09:42:47.073 * 10000 changes in 60 seconds. Saving...
[1] 08 Sep 09:42:47.076 * Background saving started by pid 140
[140] 08 Sep 09:43:14.398 * DB saved on disk
[140] 08 Sep 09:43:14.457 * RDB: 1 MB of memory used by copy-on-write
[1] 08 Sep 09:43:14.556 * Background saving terminated with success
[1] 08 Sep 09:44:15.073 * 10000 changes in 60 seconds. Saving...
[1] 08 Sep 09:44:15.077 * Background saving started by pid 141
[1 | signal handler] (1599558267) Received SIGTERM scheduling shutdown...
[1] 08 Sep 09:44:28.052 # User requested shutdown...
[1] 08 Sep 09:44:28.052 # There is a child saving an .rdb. Killing it!
[1] 08 Sep 09:44:28.052 * Saving the final RDB snapshot before exiting.
[1] 08 Sep 09:44:49.592 * DB saved on disk
[1] 08 Sep 09:44:49.592 # Redis is now ready to exit, bye bye...
Here is the description of the pod. As you can see the limit is 100Mi, but I can't see the threshold, after which the pod restarts.
> kubectl describe pod redis-master-5d9cfb54f8-8pbgq
Name: redis-master-5d9cfb54f8-8pbgq
Namespace: cryptoman
Priority: 0
Node: gke-my-cluster-default-pool-818613a8-smmc/10.172.0.28
Start Time: Fri, 04 Sep 2020 18:52:17 +0300
Labels: app=redis
pod-template-hash=5d9cfb54f8
role=master
tier=backend
Annotations: <none>
Status: Running
IP: 10.36.2.124
IPs: <none>
Controlled By: ReplicaSet/redis-master-5d9cfb54f8
Containers:
master:
Container ID: docker://3479276666a41df502f1f9eb9bb2ff9cfa592f08a33e656e44179042b6233c6f
Image: k8s.gcr.io/redis:e2e
Image ID: docker-pullable://k8s.gcr.io/redis#sha256:f066bcf26497fbc55b9bf0769cb13a35c0afa2aa42e737cc46b7fb04b23a2f25
Port: 6379/TCP
Host Port: 0/TCP
State: Running
Started: Wed, 09 Sep 2020 10:27:56 +0300
Last State: Terminated
Reason: OOMKilled
Exit Code: 0
Started: Wed, 09 Sep 2020 07:34:18 +0300
Finished: Wed, 09 Sep 2020 10:27:55 +0300
Ready: True
Restart Count: 42
Limits:
cpu: 100m
memory: 250Mi
Requests:
cpu: 100m
memory: 250Mi
Environment: <none>
Mounts:
/var/run/secrets/kubernetes.io/serviceaccount from default-token-5tds9 (ro)
Conditions:
Type Status
Initialized True
Ready True
ContainersReady True
PodScheduled True
Volumes:
default-token-5tds9:
Type: Secret (a volume populated by a Secret)
SecretName: default-token-5tds9
Optional: false
QoS Class: Guaranteed
Node-Selectors: <none>
Tolerations: node.kubernetes.io/not-ready:NoExecute op=Exists for 300s
node.kubernetes.io/unreachable:NoExecute op=Exists for 300s
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal SandboxChanged 52m (x42 over 4d13h) kubelet, gke-my-cluster-default-pool-818613a8-smmc Pod sandbox changed, it will be killed and re-created.
Normal Killing 52m (x42 over 4d13h) kubelet, gke-my-cluster-default-pool-818613a8-smmc Stopping container master
Normal Created 52m (x43 over 4d16h) kubelet, gke-my-cluster-default-pool-818613a8-smmc Created container master
Normal Started 52m (x43 over 4d16h) kubelet, gke-my-cluster-default-pool-818613a8-smmc Started container master
Normal Pulled 52m (x42 over 4d13h) kubelet, gke-my-cluster-default-pool-818613a8-smmc Container image "k8s.gcr.io/redis:e2e" already present on machine
This is the limit after which it restarts. CPU is just throttled, memory is OOM'ed.
Limits:
cpu: 100m
memory: 250Mi
Reason: OOMKilled
Remove requests & limits
Run the pod, make sure it doesn't restart
If you already have prometheus, run VPA Recommender to check how much resources it needs. Or just use any monitoring stack: GKE Prometheus, prometheus-operator, DataDog etc to check actual resource consumption and adjust limits accordingly.
Max's answer is very complete. But if you don't have Prometheus installed or don't want to, there is another way simple to check actual resource consumption installing the metrics server project in your cluster. After installing it you can check the CPU and memory usage with kubectl top node to check consumption on the node, and kubectl top pod to check consumption on pods. I use it and is very useful.
Or you can just increase the CPU and memory limits, but you will not be able to ensure how much resources the container will need. Basically will be a waste of resources.
The main problem is that you didn't limit the redis application. So redis just increases the memory, and when it reaches Pod limits.memory 250Mb, it is killed by an OOM, and restart it.
Then, if you remove the limits.memory, the redis will continue eating memory until the Node has not enough memory to run other processes and K8s kills it and marks it as "Evicted".
Therefore, configure the memory in redis application to limit the memory used by redis inside redis.conf file and depending on your needs set a LRU or LFU policy to remove some keys (https://redis.io/topics/lru-cache):
maxmemory 256mb
maxmemory-policy allkeys-lfu
And limit the memory of the Pod at around the double of redis maxmemory to give the rest of the processes and objects saved in redis some margin:
resources:
limits:
cpu: 100m
memory: 512Mi
Now the pods are getting evicted.
Can I find out the reason?
NAME READY STATUS RESTARTS AGE
redis-master-7d97765bbb-7kjwn 0/1 Evicted 0 38h
redis-master-7d97765bbb-kmc9g 1/1 Running 0 30m
redis-master-7d97765bbb-sf2ss 0/1 Evicted 0 30m

Is it normal for bokeh serve on Kubernetes to restart periodically?

I have a bokeh dashboard served in a docker container, which is running on kubernetes. I can access my dashboard remotely, no problems. But I noticed my pod containing the bokeh serve code restarts a lot, i.e. 14 times in the past 2 hours. Sometimes the status will come back as 'CrashLoopBackOff' and sometimes it will be 'Running' normally.
My question is, is there something about the way bokeh serve works that requires kubernetes to restart it so frequently? Is it something to do with memory (OOMKilled)?
Here is a section of my describe pod:
Name: bokeh-744d4bc9d-5pkzq
Namespace: default
Priority: 0
PriorityClassName: <none>
Node: 10.183.226.51/10.183.226.51
Start Time: Tue, 18 Feb 2020 11:55:44 +0000
Labels: name=bokeh
pod-template-hash=744d4bc9d
Annotations: kubernetes.io/psp: xyz-privileged-psp
Status: Running
IP: 172.30.255.130
Controlled By: ReplicaSet/bokeh-744d4bc9d
Containers:
dashboard-application:
Container ID: containerd://16d10dc5dd89235b0xyz2b5b31f8e313f3f0bb7efe82a12e00c1f01708e2f894
Image: us.icr.io/oss-data-science-np-dal/bokeh:118
Image ID: us.icr.io/oss-data-science-np-dal/bokeh#sha256:037a5b52a6e7c792fdxy80b01e29772dbfc33b10e819774462bee650cf0da
Port: 5006/TCP
Host Port: 0/TCP
State: Running
Started: Tue, 18 Feb 2020 14:25:36 +0000
Last State: Terminated
Reason: OOMKilled
Exit Code: 137
Started: Tue, 18 Feb 2020 14:15:26 +0000
Finished: Tue, 18 Feb 2020 14:23:54 +0000
Ready: True
Restart Count: 17
Limits:
cpu: 800m
memory: 600Mi
Requests:
cpu: 600m
memory: 400Mi
Liveness: http-get http://:5006/ delay=10s timeout=1s period=10s #success=1 #failure=3
Readiness: http-get http://:5006/ delay=10s timeout=1s period=3s #success=1 #failure=3
Environment: <none>
Mounts:
/var/run/secrets/kubernetes.io/serviceaccount from default-token-cjhfk (ro)
Conditions:
Type Status
Initialized True
Ready True
ContainersReady True
PodScheduled True
Volumes:
default-token-cjhfk:
Type: Secret (a volume populated by a Secret)
SecretName: default-token-cjhfk
Optional: false
QoS Class: Burstable
Node-Selectors: <none>
Tolerations: node.kubernetes.io/not-ready:NoExecute for 600s
node.kubernetes.io/unreachable:NoExecute for 600s
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Warning Unhealthy 36m (x219 over 150m) kubelet, 10.183.226.51 Liveness probe failed: Get http://172.30.255.130:5006/: net/http: request canceled (Client.Timeout exceeded while awaiting headers)
Warning BackOff 21m (x34 over 134m) kubelet, 10.183.226.51 Back-off restarting failed container
Warning Unhealthy 10m (x72 over 150m) kubelet, 10.183.226.51 Readiness probe failed: Get http://172.30.255.130:5006/RCA: net/http: request canceled (Client.Timeout exceeded while awaiting headers)
Warning Unhealthy 6m4s (x957 over 150m) kubelet, 10.183.226.51 Readiness probe failed: Get http://172.30.255.130:5006/: net/http: request canceled (Client.Timeout exceeded while awaiting headers)
Warning Unhealthy 68s (x23 over 147m) kubelet, 10.183.226.51 Liveness probe failed: Get http://172.30.255.130:5006/RCA: net/http: request canceled (Client.Timeout exceeded while awaiting headers)
I'm new to k8s, so any information you have to spare on this kind of issue will be much appreciated!
If a Container allocates more memory than its limit, the Container becomes a candidate for termination. If the Container continues to consume memory beyond its limit, the Container is terminated. If a terminated Container can be restarted, the kubelet restarts it, as with any other type of runtime failure. This is documented here.
You may have to increase limits and requests in your pod spec. Check the official doc here.
Other way to look at it is to try to optimize your code so that it does not exceed the memory specified in limits.
OOMKill means your pod is consuming too much RAM and was killed in order to avoid disruption of the other workload running on the node.
You can either edit your code to use less RAM if feasible, or increase limits.memory.
You generally want to have requests = limits, except if your pod run some heavy stuff at the beginning and then does nothing.
You may want to take a look at the official documentation.

Kubernetes eviction manager evicting control plane pods to reclaim ephemeral storage

I am using Kubernetes v1.13.0. My master is also functioning as a worker-node, so it has workload pods running on it, apart from control plane pods.
The kubelet logs on my master show the following lines:
eviction_manager.go:340] eviction manager: must evict pod(s) to reclaim ephemeral-storage
eviction_manager.go:358] eviction manager: pods ranked for eviction: kube-controller-manager-vm2_kube-system(1631c2c238e0c5117acac446b26d9f8c), kube-apiserver-vm2_kube-system(ce43eba098d219e13901c4a0b829f43b), etcd-vm2_kube-system(91ab2b0ddf4484a5ac6ee9661dbd0b1c)
Once the kube-apiserver pod is evicted, the cluster becomes unusable.
What can I do to fix this? Should I add more ephemeral storage? How would I go about doing that? That means adding more space to the root partition on my host?
My understanding is that ephemeral storage consists of /var/log and /var/lib/kubelet folders, which both come under the root partition.
A df -h on my host shows:
Filesystem Size Used Avail Use% Mounted on
/dev/vda1 39G 33G 6.2G 85% /
So it looks like the root partition has lot of memory left, and there is no disk pressure. So what is causing this issue? Some of my worker pods must be doing something crazy with storage, but it's still 6G seems like plenty of room.
Will adding more space to the root partition fix this issue temporarily?
kubectl describe vm2 gives the following info:
Conditions:
Type Status LastHeartbeatTime LastTransitionTime Reason Message
---- ------ ----------------- ------------------ ------ -------
MemoryPressure False Fri, 11 Jan 2019 21:25:43 +0000 Wed, 05 Dec 2018 19:16:41 +0000 KubeletHasSufficientMemory kubelet has sufficient memory available
DiskPressure False Fri, 11 Jan 2019 21:25:43 +0000 Fri, 11 Jan 2019 20:58:07 +0000 KubeletHasNoDiskPressure kubelet has no disk pressure
PIDPressure False Fri, 11 Jan 2019 21:25:43 +0000 Wed, 05 Dec 2018 19:16:41 +0000 KubeletHasSufficientPID kubelet has sufficient PID available
Ready True Fri, 11 Jan 2019 21:25:43 +0000 Thu, 06 Dec 2018 17:00:02 +0000 KubeletReady kubelet is posting ready status. AppArmor enabled
Capacity:
cpu: 8
ephemeral-storage: 40593708Ki
hugepages-1Gi: 0
hugepages-2Mi: 0
memory: 32946816Ki
pods: 110
Allocatable:
cpu: 8
ephemeral-storage: 37411161231
hugepages-1Gi: 0
hugepages-2Mi: 0
memory: 32844416Ki
pods: 110
It seems to me that there was pressure on ephemeral-storage, and the eviction manager is trying to reclaim some storage by evicting least recently used pods. But it should not evict the control plane pods, otherwise cluster is unusable.
Currently, the Kubelet evicts the control plane pods. Then I try to manually start the apiserver and other control plane pods by adding and removing a space in the /etc/kubernetes/manifests files. This does start the apiserver, but then it again gets evicted. Ideally, the Kubelet should ensure that the static pods in /etc/kubernetes/manifests are always on and properly managed.
I am trying to understand what is going on here, and how to fix this issue, so that my kubernetes cluster becomes more robust, and I don't have to keep manually restarting the apiserver.
I had this same problem and solved it by changing the threshold for evictionHard.
Looking at /etc/systemd/system/kubelet.service.d/10-kubeadm.conf I have:
[Service]
Environment="KUBELET_KUBECONFIG_ARGS=--bootstrap-kubeconfig=/etc/kubernetes/bootstrap-kubelet.conf --kubeconfig=/etc/kubernetes/kubelet.conf"
Environment="KUBELET_CONFIG_ARGS=--config=/var/lib/kubelet/config.yaml"
# This is a file that "kubeadm init" and "kubeadm join" generates at runtime, populating the KUBELET_KUBEADM_ARGS variable dynamically
EnvironmentFile=-/var/lib/kubelet/kubeadm-flags.env
# This is a file that the user can use for overrides of the kubelet args as a last resort. Preferably, the user should use
# the .NodeRegistration.KubeletExtraArgs object in the configuration files instead. KUBELET_EXTRA_ARGS should be sourced from this file.
EnvironmentFile=-/etc/default/kubelet
ExecStart=
ExecStart=/usr/bin/kubelet $KUBELET_KUBECONFIG_ARGS $KUBELET_CONFIG_ARGS $KUBELET_KUBEADM_ARGS $KUBELET_EXTRA_ARGS
so I see my config file for kubelet is /var/lib/kubelet/config.yaml
Opening that I changed evitionHard settings to be (I think they were 10 or 15% before):
...
evictionHard:
imagefs.available: 1%
memory.available: 100Mi
nodefs.available: 1%
nodefs.inodesFree: 1%
...
There is also the --experimental-allocatable-ignore-eviction (https://kubernetes.io/docs/reference/command-line-tools-reference/kubelet/) setting which should completely disable eviction.
This is because of your kubelet config setting for eviction nodefs and imagefs % too high, set it lower, then the issues will be resolved:
Modify config in /var/lib/kubelet/config.yaml
Find out the section evction and set the percentage lower like this:
evictionHard:
imagefs.available: 1%
memory.available: 100Mi
nodefs.available: 1%
nodefs.inodesFree: 1%

Kubernetes terminating pods without a clear reason in the logs

Is there a way to know why Kubernetes is terminating pods?
If I go to Logging in the Google console, the only message I can find related to this event is:
shutting down, got signal: Terminated
Also, the pods in status Terminating are never being terminated, a few of them are in this status for more than 24 hours now.
I'm not using livenessProbes or readinessProbes.
I am using terminationGracePeriodSeconds: 30
EDIT: added the result of kubectl describe pod <podname> for the pod that is the Terminating status for 9 hours as of now:
Name: storeassets-5383k
Namespace: default
Node: gke-recommendation-engin-default-pool-c9b136a8-0qms/10.132.0.85
Start Time: Sat, 11 Mar 2017 06:27:32 +0000
Labels: app=storeassets
deployment=ab08dc44070ffbbceb69ff6a5d99ae61
version=v1
Status: Terminating (expires Tue, 14 Mar 2017 01:30:48 +0000)
Termination Grace Period: 30s
Reason: NodeLost
Message: Node gke-recommendation-engin-default-pool-c9b136a8-0qms which was running pod storeassets-5383k is unresponsive
IP: 10.60.3.7
Controllers: ReplicationController/storeassets
Containers:
storeassets:
Container ID: docker://7b38f1de0321de4a5f2b484f5e2263164a32e9019b275d25d8823de93fb52c30
Image: eu.gcr.io/<project-name>/recommendation-content-realtime
Image ID: docker://sha256:9e8cf1b743f94f365745a011702a4ae1c2e636ceaaec4dd8d36fef6f787aefe7
Port:
Command:
python
-m
realtimecontent.storeassets
Requests:
cpu: 100m
State: Running
Started: Sat, 11 Mar 2017 06:27:33 +0000
Ready: True
Restart Count: 0
Volume Mounts:
/var/run/secrets/kubernetes.io/serviceaccount from default-token-qwfs4 (ro)
Environment Variables:
RECOMMENDATION_PROJECT: <project-name>
RECOMMENDATION_BIGTABLE_ID: recommendation-engine
GOOGLE_APPLICATION_CREDENTIALS: recommendation-engine-credentials.json
Conditions:
Type Status
Initialized True
Ready False
PodScheduled True
Volumes:
default-token-qwfs4:
Type: Secret (a volume populated by a Secret)
SecretName: default-token-qwfs4
QoS Class: Burstable
Tolerations: <none>
No events.
As for why the pods are getting terminated, it must be because your image/container is exiting with a successful status.
Try logging your pod until it exits. You might be able to see the reason why from there.