Kubernetes OOM killing pod - kubernetes

I have a simple container that consists of OpenLDAP installed on Alpine. It's installed to run as a non-root user. I am able to run the container without any issues using my local Docker engine. However, when I deploy it to our Kubernetes system it is killed almost immediately as OOMKilled. I've tried increasing the memory without any change. I've also looked at the memory usage for the pod and don't see anything unusual.
The server is started as slapd -d debug -h ldap://0.0.0.0:1389/ -u 1000 -g 1000, where 1000 is the uid and gid, respectively.
The node trace shows this output:
May 13 15:33:44 pu1axb-arcctl00 kernel: Task in /kubepods/burstable/podbac2e0ae-9e9c-420e-be4e-c5941a2d562f/7d71b550e2d37e5d8d78c73ba8c7ab5f7895d9c2473adf4443675b9872fb84a4 killed as a result of limit of /kubepods/burstable/podbac2e0ae-9e9c-420e-be4e-c5941a2d562f
May 13 15:33:44 pu1axb-arcctl00 kernel: memory: usage 512000kB, limit 512000kB, failcnt 71
May 13 15:33:44 pu1axb-arcctl00 kernel: memory+swap: usage 0kB, limit 9007199254740988kB, failcnt 0
May 13 15:33:44 pu1axb-arcctl00 kernel: kmem: usage 7892kB, limit 9007199254740988kB, failcnt 0
May 13 15:33:44 pu1axb-arcctl00 kernel: Memory cgroup stats for /kubepods/burstable/podbac2e0ae-9e9c-420e-be4e-c5941a2d562f: cache:0KB rss:0KB rss_huge:0KB shmem:0KB mapped_file:0KB dirty:0KB writeback:0KB inactive_anon:0KB active_anon:0KB inactive_file:0KB active_file:0KB unevictable:
May 13 15:33:44 pu1axb-arcctl00 kernel: Memory cgroup stats for /kubepods/burstable/podbac2e0ae-9e9c-420e-be4e-c5941a2d562f/db65b4f82efd556a780db6eb2c3ddf4b594774e4e5f523a8ddb178fd3256bdda: cache:0KB rss:44KB rss_huge:0KB shmem:0KB mapped_file:0KB dirty:0KB writeback:0KB inactive_anon:
May 13 15:33:44 pu1axb-arcctl00 kernel: Memory cgroup stats for /kubepods/burstable/podbac2e0ae-9e9c-420e-be4e-c5941a2d562f/59f908d8492f3783da587beda7205c3db5ee78f0744d8cb49b0491bcbb95c4c7: cache:0KB rss:0KB rss_huge:0KB shmem:0KB mapped_file:0KB dirty:0KB writeback:0KB inactive_anon:0
May 13 15:33:44 pu1axb-arcctl00 kernel: Memory cgroup stats for /kubepods/burstable/podbac2e0ae-9e9c-420e-be4e-c5941a2d562f/7d71b550e2d37e5d8d78c73ba8c7ab5f7895d9c2473adf4443675b9872fb84a4: cache:4KB rss:504060KB rss_huge:0KB shmem:0KB mapped_file:0KB dirty:0KB writeback:0KB inactive_a
May 13 15:33:44 pu1axb-arcctl00 kernel: [ pid ] uid tgid total_vm rss pgtables_bytes swapents oom_score_adj name
May 13 15:33:44 pu1axb-arcctl00 kernel: [69022] 0 69022 242 1 28672 0 -998 pause
May 13 15:33:44 pu1axb-arcctl00 kernel: [69436] 1000 69436 591 454 45056 0 969 docker-entrypoi
May 13 15:33:44 pu1axb-arcctl00 kernel: [69970] 1000 69970 401 2 45056 0 969 nc
May 13 15:33:44 pu1axb-arcctl00 kernel: [75537] 1000 75537 399 242 36864 0 969 sh
May 13 15:33:44 pu1axb-arcctl00 kernel: [75544] 1000 75544 648 577 45056 0 969 bash
May 13 15:33:44 pu1axb-arcctl00 kernel: [75966] 1000 75966 196457 126841 1069056 0 969 slapd
May 13 15:33:44 pu1axb-arcctl00 kernel: Memory cgroup out of memory: Kill process 75966 (slapd) score 1961 or sacrifice child
May 13 15:33:44 pu1axb-arcctl00 kernel: Killed process 75966 (slapd) total-vm:785828kB, anon-rss:503016kB, file-rss:4348kB, shmem-rss:0kB
I find it hard to believe it's really running out of memory. It's a simple LDAP container with only 8-10 elements in the directory tree and the pod is not showing memory issues on the dashboard (Lens). We have other Alpine images which don't have this issue.
I'm relatively new to Kubernetes, so I'm hoping the users on SO can give me some guidance on how to debug this. I can provide more info once I know what is helpful. As I mentioned increasing the memory has no affect. I plan to switch from "burstable" to "guaranteed" deployment and see if that makes a difference.
===== UPDATE - Is working now =====
I believe I was confusing the meaning of resource "limits" vs "requests". I had been trying several variations on these before making the original post. After reading through the responses I now have the pod deployed with the following settings:
resources:
limits:
cpu: 50m
memory: 1Gi
requests:
cpu: 50m
memory: 250Mi
Looking at the memory footprint in Lens it's holding steady at around 715Mi for the usage. This is higher that our other pods by at least 25%. Perhaps the LDAP server just needs more. Regardless, I thank you all for your timely help.

Check your deployment or pod spec for resource limits.
If your application requires more memory than it is allowed, it will be OOMKilled by the kubernetes.
...
resources:
limits:
memory: 200Mi
requests:
memory: 100Mi
...
Equivalent JAVA JVM flags to better understand this concept
requests = Xms
limits = Xmx
Read more:
https://kubernetes.io/docs/tasks/configure-pod-container/assign-memory-resource/
https://kubernetes.io/docs/concepts/configuration/manage-resources-containers/

I'm hoping the users on SO can give me some guidance on how to debug this.
Before starting debugging you can check (and improve) your yaml files.
You can set up default memory request and a default memory limit for containers like this:
apiVersion: v1
kind: LimitRange
metadata:
name: mem-limit-range
spec:
limits:
- default:
memory: 512Mi
defaultRequest:
memory: 256Mi
type: Container
A request is a bid for the minimum amount of that resource your container will need. It doesn’t say how much of a resource you will be using, just how much you will need. You are telling the scheduler just how many resources your container needs to do its job. Requests are used for scheduling by the Kubernetes scheduler. For CPU requests they are also used to configure how the containers are scheduled by the Linux kernel.
A limit is the maximum amount of that resource your container will ever use. Limits must be greater than or equal to requests. If you set only limits, the request will be the same as the limit.
If you want to put one container in the pod, you can set memory limits like this:
apiVersion: v1
kind: Pod
metadata:
name: default-mem-demo-2
spec:
containers:
- name: default-mem-demo-2-ctr
image: nginx
resources:
limits:
memory: "1Gi"
If you specify a Container's limit, but not its request - the Container will be not assigned the default memory request value (in this situation 256Mi).
You can also put one container in the pod and set memory requests like this:
apiVersion: v1
kind: Pod
metadata:
name: default-mem-demo-3
spec:
containers:
- name: default-mem-demo-3-ctr
image: nginx
resources:
requests:
memory: "128Mi"
But in this situation the Container's memory limit is set to 512Mi, which is the default memory limit for the namespace.
If you want to debug a problem, you should know why it happened. Generally OOM the problem may appear, e.g. due to limit overcommit or container limit reached (I know that you have 1 container, but you should know how to proceed in other situation). You can read good article about it here.
You may find it a good idea to run cluster monitoring for example with Prometheus. Here is the guide how to setup Kubernetes monitoring with Prometheus. You should be interested in metric container_memory_failcnt. You can read more about it here.
You can also read this page about setting up oomkill-alerting in Kubernetes cluster.

Related

Jupyterhub difference between memory guarantee and mem_guarantee?

As question states what is the difference?
kubespawner_override:
memory:
limit: 64G
guarantee: 64G
cpu:
limit: 16
guarantee: 16
--- or
kubespawner_override:
mem_limit: 64G
mem_guarantee: 64G
cpu_limit: 16
cpu_guarantee: 16
Set user memory limits
Spawner mem_guarantee
I want to set the single pods to ensure they use at least these values.

Kubernetes OOMKilled with multiple containers

I run my service in Kubernetes cluster (AWS EKS). Recently, I have added a new container (side car) to the pod. After that, I've started observing OOMKilled, but metrics do not show any high memory usage. This is the config:
Containers:
side-car:
Container ID: ...
Image: ...
...
State: Running
Started: Mon, 21 Feb 2022 09:11:07 +0100
Last State: Terminated
Reason: OOMKilled
Exit Code: 137
Started: Thu, 17 Feb 2022 18:36:28 +0100
Finished: Mon, 21 Feb 2022 09:11:06 +0100
Ready: True
Restart Count: 1
Limits:
cpu: 1
memory: 2Gi
Requests:
cpu: 1
memory: 2Gi
...
...
my-service:
Container ID: ...
...
...
...
State: Running
Started: Thu, 17 Feb 2022 18:36:28 +0100
Ready: True
Restart Count: 0
Limits:
cpu: 3
memory: 3Gi
Requests:
cpu: 2
memory: 3Gi
Both side car and my service do have memory limits (and request) set. During OOMKilled none of the containers use more memory than requested/limited. E.g. in one case side-car was using 20MiB, my-service: 800MiB, way low than limits are. Still Kubernetes restarted the container (side-car).
Just for the record, before adding the side-car, my-service was running without problem and no OOMKilled was observed.
Maybe yous sidecar container do has some performance issues that you cant catch and at some point of time it do request more than limit?
Check OOM kill due to container limit reached
State: Running
Started: Thu, 10 Oct 2019 11:14:13 +0200
Last State: Terminated
Reason: OOMKilled
Exit Code: 137
The Exit code 137 is important because it means that the system
terminated the container as it tried to use more memory than its
limit.
In order to monitor this, you always have to look at the use of memory
compared to the limit. Percentage of the node memory used by a pod is
usually a bad indicator as it gives no indication on how close to the
limit the memory usage is. In Kubernetes, limits are applied to
containers, not pods, so monitor the memory usage of a container vs.
the limit of that container.
Most likely you don't get to see when the memory usage goes above the limit, as usually the metrics are pulled at defined intervals (cAdvisor, which is currently the de-facto source for metrics, only refreshes its data every 10-15 seconds by default).
How to troubleshoot further? Connect to the respective node that's running the sidecar container and look at the kernel logs. You can use tail /var/log/syslog -f | grep -i kernel (a sample of how this looks like is in this movie). You should see 2 lines like the ones below, which will indicate the aftermath of the container's cgroup limit being breached, and the respective process terminated:
Jan 16 21:33:51 aks-agentpool-20086390-vmss00003K kernel: [ 8334.895437] Memory cgroup out of memory: Killed process 14300 (dotnet) total-vm:172050596kB, anon-rss:148368kB, file-rss:25056kB, shmem-rss:0kB, UID:0 pgtables:568kB oom_score_adj:-997
Jan 16 21:33:51 aks-agentpool-20086390-vmss00003K kernel: [ 8334.906653] oom_reaper: reaped process 14300 (dotnet), now anon-rss:0kB, file-rss:0kB, shmem-rss:0kB
Pay special attention to the anon-rss and file-rss values, and compare their sum against the limit you've set for the sidecar container.
If you have control over the code that runs in the sidecar container, then you can add some instrumentation code to print out the amount of memory used at small enough intervals, and simply output that to the console. Once the container is OOMKilled, you'll still have access to the logs to see what happened (use the --previous flag with the kubectl logs command). Have a look at this answer for more info.
Including this just for completeness: your system could potentially run so low on memory that somehow the OOM killer is invoked and your sidecar container is chosen to be terminated (such a scenario is described here). That's highly unlikely in your case though, as from my understanding you get the sidecar container to be terminated repeatedly, which most likely points to an issue with that container only.

Why does DigitalOcean k8s node capacity shows subtracted value from node pool config?

I'm running a 4vCPU 8GB Node Pool, but all of my nodes report this for Capacity:
Capacity:
cpu: 4
ephemeral-storage: 165103360Ki
hugepages-2Mi: 0
memory: 8172516Ki
pods: 110
I'd expect it to show 8388608Ki (the equivalent of 8192Mi/8Gi).
How come?
Memory can be reserved for both system services (system-reserved) and the Kubelet itself (kube-reserved). https://kubernetes.io/docs/tasks/administer-cluster/reserve-compute-resources/ has details but DO is probably setting it up for you.

Kubernetes: specify CPUs for cpumanager

Is it possible to specify CPU ID list to the Kubernetes cpumanager? The goal is to make sure pods get CPUs from a single socket (0). I brought all the CPUs on the peer socket offline as mentioned here, for example:
$ echo 0 > /sys/devices/system/cpu/cpu5/online
After doing this, the Kubernetes master indeed sees the remaining online CPUs
kubectl describe node foo
Capacity:
cpu: 56 <<< socket 0 CPU count
ephemeral-storage: 958774760Ki
hugepages-1Gi: 120Gi
memory: 197524872Ki
pods: 110
Allocatable:
cpu: 54 <<< 2 system reserved CPUs
ephemeral-storage: 958774760Ki
hugepages-1Gi: 120Gi
memory: 71490952Ki
pods: 110
System Info:
Machine ID: 1155420082478559980231ba5bc0f6f2
System UUID: 4C4C4544-0044-4210-8031-C8C04F584B32
Boot ID: 7fa18227-748f-496c-968c-9fc82e21ecd5
Kernel Version: 4.4.13
OS Image: Ubuntu 16.04.4 LTS
Operating System: linux
Architecture: amd64
Container Runtime Version: docker://17.3.3
Kubelet Version: v1.11.1
Kube-Proxy Version: v1.11.1
However, cpumanager still seems to think there are 112 CPUs (socket0 + socket1).
cat /var/lib/kubelet/cpu_manager_state
{"policyName":"static","defaultCpuSet":"0-111"}
As a result, the kubelet system pods are throwing the following error:
kube-system kube-proxy-nk7gc 0/1 rpc error: code = Unknown desc = failed to update container "eb455f81a61b877eccda0d35eea7834e30f59615346140180f08077f64896760": Error response from daemon: Requested CPUs are not available - requested 0-111, available: 0,2,4,6,8,10,12,14,16,18,20,22,24,26,28,30,32,34,36,38,40,42,44,46,48,50,52,54,56,58,60,62,64,66,68,70,72,74,76,78,80,82,84,86,88,90,92,94,96,98,100,102,104,106,108,110 762 36d <IP address> foo <none>
I was able to get this working. Posting this as an answer so that someone in need might benefit.
It appears the CPU set is read from /var/lib/kubelet/cpu_manager_state file and it is not updated across kubelet restarts. So this file needs to be removed before restarting kubelet.
The following worked for me:
# On a running worker node, bring desired CPUs offline. (run as root)
$ cpu_list=`lscpu | grep "NUMA node1 CPU(s)" | awk '{print $4}'`
$ chcpu -d $cpu_list
$ rm -f /var/lib/kubelet/cpu_manager_state
$ systemctl restart kubelet.service
# Check the CPU set seen by the CPU manager
$ cat /var/lib/kubelet/cpu_manager_state
# Try creating pods and check the syslog:
Dec 3 14:36:05 k8-2-w1 kubelet[8070]: I1203 14:36:05.122466 8070 state_mem.go:84] [cpumanager] updated default cpuset: "0,10,12,14,16,18,20,22,24,26,28,30,32,34,36,38,40,42,44,46,48,50,52,54,56,66,68,70,72,74,76,78,80,82,84,86,88,90,92,94,96,98,100,102,104,106,108,110"
Dec 3 14:36:05 k8-2-w1 kubelet[8070]: I1203 14:36:05.122643 8070 policy_static.go:198] [cpumanager] allocateCPUs: returning "2,4,6,8,58,60,62,64"
Dec 3 14:36:05 k8-2-w1 kubelet[8070]: I1203 14:36:05.122660 8070 state_mem.go:76] [cpumanager] updated desired cpuset (container id: 356939cdf32d0f719e83b0029a018a2ca2c349fc0bdc1004da5d842e357c503a, cpuset: "2,4,6,8,58,60,62,64")
I have reported a bug here as I think the CPU set should be updated after kubelet restarts.

Kubernetes MySQL pod getting killed due to memory issue

In my Kubernetes 1.11 cluster a MySQL pod is getting killed due to Out of memory issue:
> kernel: Out of memory: Kill process 8514 (mysqld) score 1011 or
> sacrifice child kernel: Killed process 8514 (mysqld)
> total-vm:2019624kB, anon-rss:392216kB, file-rss:0kB, shmem-rss:0kB
> kernel: java invoked oom-killer: gfp_mask=0x201da, order=0,
> oom_score_adj=828 kernel: java
> cpuset=dab20a22eebc2a23577c05d07fcb90116a4afa789050eb91f0b8c2747267d18e
> mems_allowed=0 kernel: CPU: 1 PID: 28667 Comm: java Kdump: loaded Not
> tainted 3.10.0-862.3.3.el7.x86_64 #1 kernel
My questions:
How to prevent that my pod gets OOM-killed? Is there a deployment setting I need to enable?
What is the configuration to prevent new pod getting scheduled on a node, when there is not enough memory available on said node?
We disabled the swap space. Do we need to disable memory overcommitting setting on the host level, setting /proc/sys/vm/overcommit_memory to 0?
Thanks
SR
When defining a Pod manifest it's a best practice to define resources section with limits and requests for CPU and memory:
resources:
limits:
cpu: "1"
memory: 512Mi
requests:
cpu: 500m
memory: 256Mi
This definition helps the scheduler identifying three Quality of Service (QoS) categories:
Guaranteed
Burstable
BestEffort
and pods in the last category are the most expendable.