Kubernetes OOMKilled with multiple containers - kubernetes

I run my service in Kubernetes cluster (AWS EKS). Recently, I have added a new container (side car) to the pod. After that, I've started observing OOMKilled, but metrics do not show any high memory usage. This is the config:
Containers:
side-car:
Container ID: ...
Image: ...
...
State: Running
Started: Mon, 21 Feb 2022 09:11:07 +0100
Last State: Terminated
Reason: OOMKilled
Exit Code: 137
Started: Thu, 17 Feb 2022 18:36:28 +0100
Finished: Mon, 21 Feb 2022 09:11:06 +0100
Ready: True
Restart Count: 1
Limits:
cpu: 1
memory: 2Gi
Requests:
cpu: 1
memory: 2Gi
...
...
my-service:
Container ID: ...
...
...
...
State: Running
Started: Thu, 17 Feb 2022 18:36:28 +0100
Ready: True
Restart Count: 0
Limits:
cpu: 3
memory: 3Gi
Requests:
cpu: 2
memory: 3Gi
Both side car and my service do have memory limits (and request) set. During OOMKilled none of the containers use more memory than requested/limited. E.g. in one case side-car was using 20MiB, my-service: 800MiB, way low than limits are. Still Kubernetes restarted the container (side-car).
Just for the record, before adding the side-car, my-service was running without problem and no OOMKilled was observed.

Maybe yous sidecar container do has some performance issues that you cant catch and at some point of time it do request more than limit?
Check OOM kill due to container limit reached
State: Running
Started: Thu, 10 Oct 2019 11:14:13 +0200
Last State: Terminated
Reason: OOMKilled
Exit Code: 137
The Exit code 137 is important because it means that the system
terminated the container as it tried to use more memory than its
limit.
In order to monitor this, you always have to look at the use of memory
compared to the limit. Percentage of the node memory used by a pod is
usually a bad indicator as it gives no indication on how close to the
limit the memory usage is. In Kubernetes, limits are applied to
containers, not pods, so monitor the memory usage of a container vs.
the limit of that container.

Most likely you don't get to see when the memory usage goes above the limit, as usually the metrics are pulled at defined intervals (cAdvisor, which is currently the de-facto source for metrics, only refreshes its data every 10-15 seconds by default).
How to troubleshoot further? Connect to the respective node that's running the sidecar container and look at the kernel logs. You can use tail /var/log/syslog -f | grep -i kernel (a sample of how this looks like is in this movie). You should see 2 lines like the ones below, which will indicate the aftermath of the container's cgroup limit being breached, and the respective process terminated:
Jan 16 21:33:51 aks-agentpool-20086390-vmss00003K kernel: [ 8334.895437] Memory cgroup out of memory: Killed process 14300 (dotnet) total-vm:172050596kB, anon-rss:148368kB, file-rss:25056kB, shmem-rss:0kB, UID:0 pgtables:568kB oom_score_adj:-997
Jan 16 21:33:51 aks-agentpool-20086390-vmss00003K kernel: [ 8334.906653] oom_reaper: reaped process 14300 (dotnet), now anon-rss:0kB, file-rss:0kB, shmem-rss:0kB
Pay special attention to the anon-rss and file-rss values, and compare their sum against the limit you've set for the sidecar container.
If you have control over the code that runs in the sidecar container, then you can add some instrumentation code to print out the amount of memory used at small enough intervals, and simply output that to the console. Once the container is OOMKilled, you'll still have access to the logs to see what happened (use the --previous flag with the kubectl logs command). Have a look at this answer for more info.
Including this just for completeness: your system could potentially run so low on memory that somehow the OOM killer is invoked and your sidecar container is chosen to be terminated (such a scenario is described here). That's highly unlikely in your case though, as from my understanding you get the sidecar container to be terminated repeatedly, which most likely points to an issue with that container only.

Related

My pod status showing OOM-killed, but doesn't restarted container. Why?

I am running POD with three containers, one app POD and two sidecar containers. Here app container memory limit is exceeded and doesn't restart. Inside the app pod, a Java app is running. Here is the describe POD command status.
Host Port: 0/TCP
State: Waiting
Reason: CrashLoopBackOff
Last State: Terminated
Reason: OOMKilled
Exit Code: 137
Started: Wed, 29 Sep 2021 09:41:50 +0000
Finished: Wed, 29 Sep 2021 09:42:47 +0000
Ready: False
Restart Count: 14
Limits:
memory: 300Mi
Requests:
memory: 300Mi
With Restart Count: 14 it seems that the pod did restart. However the Restart does an exponential backoff. You can get insight into when the next restart will happen by looking at the Events and the CrashLoopBackOff: "Back-off 20s restarting failed part will tell you when it will be tried next.

Kubernetes OOM killing pod

I have a simple container that consists of OpenLDAP installed on Alpine. It's installed to run as a non-root user. I am able to run the container without any issues using my local Docker engine. However, when I deploy it to our Kubernetes system it is killed almost immediately as OOMKilled. I've tried increasing the memory without any change. I've also looked at the memory usage for the pod and don't see anything unusual.
The server is started as slapd -d debug -h ldap://0.0.0.0:1389/ -u 1000 -g 1000, where 1000 is the uid and gid, respectively.
The node trace shows this output:
May 13 15:33:44 pu1axb-arcctl00 kernel: Task in /kubepods/burstable/podbac2e0ae-9e9c-420e-be4e-c5941a2d562f/7d71b550e2d37e5d8d78c73ba8c7ab5f7895d9c2473adf4443675b9872fb84a4 killed as a result of limit of /kubepods/burstable/podbac2e0ae-9e9c-420e-be4e-c5941a2d562f
May 13 15:33:44 pu1axb-arcctl00 kernel: memory: usage 512000kB, limit 512000kB, failcnt 71
May 13 15:33:44 pu1axb-arcctl00 kernel: memory+swap: usage 0kB, limit 9007199254740988kB, failcnt 0
May 13 15:33:44 pu1axb-arcctl00 kernel: kmem: usage 7892kB, limit 9007199254740988kB, failcnt 0
May 13 15:33:44 pu1axb-arcctl00 kernel: Memory cgroup stats for /kubepods/burstable/podbac2e0ae-9e9c-420e-be4e-c5941a2d562f: cache:0KB rss:0KB rss_huge:0KB shmem:0KB mapped_file:0KB dirty:0KB writeback:0KB inactive_anon:0KB active_anon:0KB inactive_file:0KB active_file:0KB unevictable:
May 13 15:33:44 pu1axb-arcctl00 kernel: Memory cgroup stats for /kubepods/burstable/podbac2e0ae-9e9c-420e-be4e-c5941a2d562f/db65b4f82efd556a780db6eb2c3ddf4b594774e4e5f523a8ddb178fd3256bdda: cache:0KB rss:44KB rss_huge:0KB shmem:0KB mapped_file:0KB dirty:0KB writeback:0KB inactive_anon:
May 13 15:33:44 pu1axb-arcctl00 kernel: Memory cgroup stats for /kubepods/burstable/podbac2e0ae-9e9c-420e-be4e-c5941a2d562f/59f908d8492f3783da587beda7205c3db5ee78f0744d8cb49b0491bcbb95c4c7: cache:0KB rss:0KB rss_huge:0KB shmem:0KB mapped_file:0KB dirty:0KB writeback:0KB inactive_anon:0
May 13 15:33:44 pu1axb-arcctl00 kernel: Memory cgroup stats for /kubepods/burstable/podbac2e0ae-9e9c-420e-be4e-c5941a2d562f/7d71b550e2d37e5d8d78c73ba8c7ab5f7895d9c2473adf4443675b9872fb84a4: cache:4KB rss:504060KB rss_huge:0KB shmem:0KB mapped_file:0KB dirty:0KB writeback:0KB inactive_a
May 13 15:33:44 pu1axb-arcctl00 kernel: [ pid ] uid tgid total_vm rss pgtables_bytes swapents oom_score_adj name
May 13 15:33:44 pu1axb-arcctl00 kernel: [69022] 0 69022 242 1 28672 0 -998 pause
May 13 15:33:44 pu1axb-arcctl00 kernel: [69436] 1000 69436 591 454 45056 0 969 docker-entrypoi
May 13 15:33:44 pu1axb-arcctl00 kernel: [69970] 1000 69970 401 2 45056 0 969 nc
May 13 15:33:44 pu1axb-arcctl00 kernel: [75537] 1000 75537 399 242 36864 0 969 sh
May 13 15:33:44 pu1axb-arcctl00 kernel: [75544] 1000 75544 648 577 45056 0 969 bash
May 13 15:33:44 pu1axb-arcctl00 kernel: [75966] 1000 75966 196457 126841 1069056 0 969 slapd
May 13 15:33:44 pu1axb-arcctl00 kernel: Memory cgroup out of memory: Kill process 75966 (slapd) score 1961 or sacrifice child
May 13 15:33:44 pu1axb-arcctl00 kernel: Killed process 75966 (slapd) total-vm:785828kB, anon-rss:503016kB, file-rss:4348kB, shmem-rss:0kB
I find it hard to believe it's really running out of memory. It's a simple LDAP container with only 8-10 elements in the directory tree and the pod is not showing memory issues on the dashboard (Lens). We have other Alpine images which don't have this issue.
I'm relatively new to Kubernetes, so I'm hoping the users on SO can give me some guidance on how to debug this. I can provide more info once I know what is helpful. As I mentioned increasing the memory has no affect. I plan to switch from "burstable" to "guaranteed" deployment and see if that makes a difference.
===== UPDATE - Is working now =====
I believe I was confusing the meaning of resource "limits" vs "requests". I had been trying several variations on these before making the original post. After reading through the responses I now have the pod deployed with the following settings:
resources:
limits:
cpu: 50m
memory: 1Gi
requests:
cpu: 50m
memory: 250Mi
Looking at the memory footprint in Lens it's holding steady at around 715Mi for the usage. This is higher that our other pods by at least 25%. Perhaps the LDAP server just needs more. Regardless, I thank you all for your timely help.
Check your deployment or pod spec for resource limits.
If your application requires more memory than it is allowed, it will be OOMKilled by the kubernetes.
...
resources:
limits:
memory: 200Mi
requests:
memory: 100Mi
...
Equivalent JAVA JVM flags to better understand this concept
requests = Xms
limits = Xmx
Read more:
https://kubernetes.io/docs/tasks/configure-pod-container/assign-memory-resource/
https://kubernetes.io/docs/concepts/configuration/manage-resources-containers/
I'm hoping the users on SO can give me some guidance on how to debug this.
Before starting debugging you can check (and improve) your yaml files.
You can set up default memory request and a default memory limit for containers like this:
apiVersion: v1
kind: LimitRange
metadata:
name: mem-limit-range
spec:
limits:
- default:
memory: 512Mi
defaultRequest:
memory: 256Mi
type: Container
A request is a bid for the minimum amount of that resource your container will need. It doesn’t say how much of a resource you will be using, just how much you will need. You are telling the scheduler just how many resources your container needs to do its job. Requests are used for scheduling by the Kubernetes scheduler. For CPU requests they are also used to configure how the containers are scheduled by the Linux kernel.
A limit is the maximum amount of that resource your container will ever use. Limits must be greater than or equal to requests. If you set only limits, the request will be the same as the limit.
If you want to put one container in the pod, you can set memory limits like this:
apiVersion: v1
kind: Pod
metadata:
name: default-mem-demo-2
spec:
containers:
- name: default-mem-demo-2-ctr
image: nginx
resources:
limits:
memory: "1Gi"
If you specify a Container's limit, but not its request - the Container will be not assigned the default memory request value (in this situation 256Mi).
You can also put one container in the pod and set memory requests like this:
apiVersion: v1
kind: Pod
metadata:
name: default-mem-demo-3
spec:
containers:
- name: default-mem-demo-3-ctr
image: nginx
resources:
requests:
memory: "128Mi"
But in this situation the Container's memory limit is set to 512Mi, which is the default memory limit for the namespace.
If you want to debug a problem, you should know why it happened. Generally OOM the problem may appear, e.g. due to limit overcommit or container limit reached (I know that you have 1 container, but you should know how to proceed in other situation). You can read good article about it here.
You may find it a good idea to run cluster monitoring for example with Prometheus. Here is the guide how to setup Kubernetes monitoring with Prometheus. You should be interested in metric container_memory_failcnt. You can read more about it here.
You can also read this page about setting up oomkill-alerting in Kubernetes cluster.

Why Redis in K8S restarts all the time?

Redis pod restarts like crazy.
How can I find out the reason for this behavior?
I figured out, that the resources quota should be upgraded, but I have no clue what would be the best cpu/ram ratio. And why there are no crash events or logs?
Here are the pods:
> kubectl get pods
redis-master-5d9cfb54f8-8pbgq 1/1 Running 33 3d16h
Here are the logs:
> kubectl logs --follow redis-master-5d9cfb54f8-8pbgq
[1] 08 Sep 07:02:12.152 # Server started, Redis version 2.8.19
[1] 08 Sep 07:02:12.153 # WARNING you have Transparent Huge Pages (THP) support enabled in your kernel. This will create latency and memory usage issues with Redis. To fix this issue run the command 'echo never > /sys/kernel/mm/transparent_hugepage/enabled' as root, and add it to your /etc/rc.local in order to retain the setting after a reboot. Redis must be restarted after THP is disabled.
[1] 08 Sep 07:02:12.153 * The server is now ready to accept connections on port 6379
[1] 08 Sep 07:03:13.085 * 10000 changes in 60 seconds. Saving...
[1] 08 Sep 07:03:13.085 * Background saving started by pid 8
[8] 08 Sep 07:03:13.101 * DB saved on disk
[8] 08 Sep 07:03:13.101 * RDB: 0 MB of memory used by copy-on-write
[1] 08 Sep 07:03:13.185 * Background saving terminated with success
[1] 08 Sep 07:04:14.018 * 10000 changes in 60 seconds. Saving...
[1] 08 Sep 07:04:14.018 * Background saving started by pid 9
...
[93] 08 Sep 08:38:30.160 * DB saved on disk
[93] 08 Sep 08:38:30.164 * RDB: 2 MB of memory used by copy-on-write
[1] 08 Sep 08:38:30.259 * Background saving terminated with success
[1] 08 Sep 08:39:31.072 * 10000 changes in 60 seconds. Saving...
[1] 08 Sep 08:39:31.074 * Background saving started by pid 94
Here is previous logs of the same pod.
> kubectl logs --previous --follow redis-master-5d9cfb54f8-8pbgq
[1] 08 Sep 09:41:46.057 * Background saving terminated with success
[1] 08 Sep 09:42:47.073 * 10000 changes in 60 seconds. Saving...
[1] 08 Sep 09:42:47.076 * Background saving started by pid 140
[140] 08 Sep 09:43:14.398 * DB saved on disk
[140] 08 Sep 09:43:14.457 * RDB: 1 MB of memory used by copy-on-write
[1] 08 Sep 09:43:14.556 * Background saving terminated with success
[1] 08 Sep 09:44:15.073 * 10000 changes in 60 seconds. Saving...
[1] 08 Sep 09:44:15.077 * Background saving started by pid 141
[1 | signal handler] (1599558267) Received SIGTERM scheduling shutdown...
[1] 08 Sep 09:44:28.052 # User requested shutdown...
[1] 08 Sep 09:44:28.052 # There is a child saving an .rdb. Killing it!
[1] 08 Sep 09:44:28.052 * Saving the final RDB snapshot before exiting.
[1] 08 Sep 09:44:49.592 * DB saved on disk
[1] 08 Sep 09:44:49.592 # Redis is now ready to exit, bye bye...
Here is the description of the pod. As you can see the limit is 100Mi, but I can't see the threshold, after which the pod restarts.
> kubectl describe pod redis-master-5d9cfb54f8-8pbgq
Name: redis-master-5d9cfb54f8-8pbgq
Namespace: cryptoman
Priority: 0
Node: gke-my-cluster-default-pool-818613a8-smmc/10.172.0.28
Start Time: Fri, 04 Sep 2020 18:52:17 +0300
Labels: app=redis
pod-template-hash=5d9cfb54f8
role=master
tier=backend
Annotations: <none>
Status: Running
IP: 10.36.2.124
IPs: <none>
Controlled By: ReplicaSet/redis-master-5d9cfb54f8
Containers:
master:
Container ID: docker://3479276666a41df502f1f9eb9bb2ff9cfa592f08a33e656e44179042b6233c6f
Image: k8s.gcr.io/redis:e2e
Image ID: docker-pullable://k8s.gcr.io/redis#sha256:f066bcf26497fbc55b9bf0769cb13a35c0afa2aa42e737cc46b7fb04b23a2f25
Port: 6379/TCP
Host Port: 0/TCP
State: Running
Started: Wed, 09 Sep 2020 10:27:56 +0300
Last State: Terminated
Reason: OOMKilled
Exit Code: 0
Started: Wed, 09 Sep 2020 07:34:18 +0300
Finished: Wed, 09 Sep 2020 10:27:55 +0300
Ready: True
Restart Count: 42
Limits:
cpu: 100m
memory: 250Mi
Requests:
cpu: 100m
memory: 250Mi
Environment: <none>
Mounts:
/var/run/secrets/kubernetes.io/serviceaccount from default-token-5tds9 (ro)
Conditions:
Type Status
Initialized True
Ready True
ContainersReady True
PodScheduled True
Volumes:
default-token-5tds9:
Type: Secret (a volume populated by a Secret)
SecretName: default-token-5tds9
Optional: false
QoS Class: Guaranteed
Node-Selectors: <none>
Tolerations: node.kubernetes.io/not-ready:NoExecute op=Exists for 300s
node.kubernetes.io/unreachable:NoExecute op=Exists for 300s
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal SandboxChanged 52m (x42 over 4d13h) kubelet, gke-my-cluster-default-pool-818613a8-smmc Pod sandbox changed, it will be killed and re-created.
Normal Killing 52m (x42 over 4d13h) kubelet, gke-my-cluster-default-pool-818613a8-smmc Stopping container master
Normal Created 52m (x43 over 4d16h) kubelet, gke-my-cluster-default-pool-818613a8-smmc Created container master
Normal Started 52m (x43 over 4d16h) kubelet, gke-my-cluster-default-pool-818613a8-smmc Started container master
Normal Pulled 52m (x42 over 4d13h) kubelet, gke-my-cluster-default-pool-818613a8-smmc Container image "k8s.gcr.io/redis:e2e" already present on machine
This is the limit after which it restarts. CPU is just throttled, memory is OOM'ed.
Limits:
cpu: 100m
memory: 250Mi
Reason: OOMKilled
Remove requests & limits
Run the pod, make sure it doesn't restart
If you already have prometheus, run VPA Recommender to check how much resources it needs. Or just use any monitoring stack: GKE Prometheus, prometheus-operator, DataDog etc to check actual resource consumption and adjust limits accordingly.
Max's answer is very complete. But if you don't have Prometheus installed or don't want to, there is another way simple to check actual resource consumption installing the metrics server project in your cluster. After installing it you can check the CPU and memory usage with kubectl top node to check consumption on the node, and kubectl top pod to check consumption on pods. I use it and is very useful.
Or you can just increase the CPU and memory limits, but you will not be able to ensure how much resources the container will need. Basically will be a waste of resources.
The main problem is that you didn't limit the redis application. So redis just increases the memory, and when it reaches Pod limits.memory 250Mb, it is killed by an OOM, and restart it.
Then, if you remove the limits.memory, the redis will continue eating memory until the Node has not enough memory to run other processes and K8s kills it and marks it as "Evicted".
Therefore, configure the memory in redis application to limit the memory used by redis inside redis.conf file and depending on your needs set a LRU or LFU policy to remove some keys (https://redis.io/topics/lru-cache):
maxmemory 256mb
maxmemory-policy allkeys-lfu
And limit the memory of the Pod at around the double of redis maxmemory to give the rest of the processes and objects saved in redis some margin:
resources:
limits:
cpu: 100m
memory: 512Mi
Now the pods are getting evicted.
Can I find out the reason?
NAME READY STATUS RESTARTS AGE
redis-master-7d97765bbb-7kjwn 0/1 Evicted 0 38h
redis-master-7d97765bbb-kmc9g 1/1 Running 0 30m
redis-master-7d97765bbb-sf2ss 0/1 Evicted 0 30m

Kubernetes eviction manager evicting control plane pods to reclaim ephemeral storage

I am using Kubernetes v1.13.0. My master is also functioning as a worker-node, so it has workload pods running on it, apart from control plane pods.
The kubelet logs on my master show the following lines:
eviction_manager.go:340] eviction manager: must evict pod(s) to reclaim ephemeral-storage
eviction_manager.go:358] eviction manager: pods ranked for eviction: kube-controller-manager-vm2_kube-system(1631c2c238e0c5117acac446b26d9f8c), kube-apiserver-vm2_kube-system(ce43eba098d219e13901c4a0b829f43b), etcd-vm2_kube-system(91ab2b0ddf4484a5ac6ee9661dbd0b1c)
Once the kube-apiserver pod is evicted, the cluster becomes unusable.
What can I do to fix this? Should I add more ephemeral storage? How would I go about doing that? That means adding more space to the root partition on my host?
My understanding is that ephemeral storage consists of /var/log and /var/lib/kubelet folders, which both come under the root partition.
A df -h on my host shows:
Filesystem Size Used Avail Use% Mounted on
/dev/vda1 39G 33G 6.2G 85% /
So it looks like the root partition has lot of memory left, and there is no disk pressure. So what is causing this issue? Some of my worker pods must be doing something crazy with storage, but it's still 6G seems like plenty of room.
Will adding more space to the root partition fix this issue temporarily?
kubectl describe vm2 gives the following info:
Conditions:
Type Status LastHeartbeatTime LastTransitionTime Reason Message
---- ------ ----------------- ------------------ ------ -------
MemoryPressure False Fri, 11 Jan 2019 21:25:43 +0000 Wed, 05 Dec 2018 19:16:41 +0000 KubeletHasSufficientMemory kubelet has sufficient memory available
DiskPressure False Fri, 11 Jan 2019 21:25:43 +0000 Fri, 11 Jan 2019 20:58:07 +0000 KubeletHasNoDiskPressure kubelet has no disk pressure
PIDPressure False Fri, 11 Jan 2019 21:25:43 +0000 Wed, 05 Dec 2018 19:16:41 +0000 KubeletHasSufficientPID kubelet has sufficient PID available
Ready True Fri, 11 Jan 2019 21:25:43 +0000 Thu, 06 Dec 2018 17:00:02 +0000 KubeletReady kubelet is posting ready status. AppArmor enabled
Capacity:
cpu: 8
ephemeral-storage: 40593708Ki
hugepages-1Gi: 0
hugepages-2Mi: 0
memory: 32946816Ki
pods: 110
Allocatable:
cpu: 8
ephemeral-storage: 37411161231
hugepages-1Gi: 0
hugepages-2Mi: 0
memory: 32844416Ki
pods: 110
It seems to me that there was pressure on ephemeral-storage, and the eviction manager is trying to reclaim some storage by evicting least recently used pods. But it should not evict the control plane pods, otherwise cluster is unusable.
Currently, the Kubelet evicts the control plane pods. Then I try to manually start the apiserver and other control plane pods by adding and removing a space in the /etc/kubernetes/manifests files. This does start the apiserver, but then it again gets evicted. Ideally, the Kubelet should ensure that the static pods in /etc/kubernetes/manifests are always on and properly managed.
I am trying to understand what is going on here, and how to fix this issue, so that my kubernetes cluster becomes more robust, and I don't have to keep manually restarting the apiserver.
I had this same problem and solved it by changing the threshold for evictionHard.
Looking at /etc/systemd/system/kubelet.service.d/10-kubeadm.conf I have:
[Service]
Environment="KUBELET_KUBECONFIG_ARGS=--bootstrap-kubeconfig=/etc/kubernetes/bootstrap-kubelet.conf --kubeconfig=/etc/kubernetes/kubelet.conf"
Environment="KUBELET_CONFIG_ARGS=--config=/var/lib/kubelet/config.yaml"
# This is a file that "kubeadm init" and "kubeadm join" generates at runtime, populating the KUBELET_KUBEADM_ARGS variable dynamically
EnvironmentFile=-/var/lib/kubelet/kubeadm-flags.env
# This is a file that the user can use for overrides of the kubelet args as a last resort. Preferably, the user should use
# the .NodeRegistration.KubeletExtraArgs object in the configuration files instead. KUBELET_EXTRA_ARGS should be sourced from this file.
EnvironmentFile=-/etc/default/kubelet
ExecStart=
ExecStart=/usr/bin/kubelet $KUBELET_KUBECONFIG_ARGS $KUBELET_CONFIG_ARGS $KUBELET_KUBEADM_ARGS $KUBELET_EXTRA_ARGS
so I see my config file for kubelet is /var/lib/kubelet/config.yaml
Opening that I changed evitionHard settings to be (I think they were 10 or 15% before):
...
evictionHard:
imagefs.available: 1%
memory.available: 100Mi
nodefs.available: 1%
nodefs.inodesFree: 1%
...
There is also the --experimental-allocatable-ignore-eviction (https://kubernetes.io/docs/reference/command-line-tools-reference/kubelet/) setting which should completely disable eviction.
This is because of your kubelet config setting for eviction nodefs and imagefs % too high, set it lower, then the issues will be resolved:
Modify config in /var/lib/kubelet/config.yaml
Find out the section evction and set the percentage lower like this:
evictionHard:
imagefs.available: 1%
memory.available: 100Mi
nodefs.available: 1%
nodefs.inodesFree: 1%

Kubernetes MySQL pod getting killed due to memory issue

In my Kubernetes 1.11 cluster a MySQL pod is getting killed due to Out of memory issue:
> kernel: Out of memory: Kill process 8514 (mysqld) score 1011 or
> sacrifice child kernel: Killed process 8514 (mysqld)
> total-vm:2019624kB, anon-rss:392216kB, file-rss:0kB, shmem-rss:0kB
> kernel: java invoked oom-killer: gfp_mask=0x201da, order=0,
> oom_score_adj=828 kernel: java
> cpuset=dab20a22eebc2a23577c05d07fcb90116a4afa789050eb91f0b8c2747267d18e
> mems_allowed=0 kernel: CPU: 1 PID: 28667 Comm: java Kdump: loaded Not
> tainted 3.10.0-862.3.3.el7.x86_64 #1 kernel
My questions:
How to prevent that my pod gets OOM-killed? Is there a deployment setting I need to enable?
What is the configuration to prevent new pod getting scheduled on a node, when there is not enough memory available on said node?
We disabled the swap space. Do we need to disable memory overcommitting setting on the host level, setting /proc/sys/vm/overcommit_memory to 0?
Thanks
SR
When defining a Pod manifest it's a best practice to define resources section with limits and requests for CPU and memory:
resources:
limits:
cpu: "1"
memory: 512Mi
requests:
cpu: 500m
memory: 256Mi
This definition helps the scheduler identifying three Quality of Service (QoS) categories:
Guaranteed
Burstable
BestEffort
and pods in the last category are the most expendable.