Kubernetes MySQL pod getting killed due to memory issue - kubernetes

In my Kubernetes 1.11 cluster a MySQL pod is getting killed due to Out of memory issue:
> kernel: Out of memory: Kill process 8514 (mysqld) score 1011 or
> sacrifice child kernel: Killed process 8514 (mysqld)
> total-vm:2019624kB, anon-rss:392216kB, file-rss:0kB, shmem-rss:0kB
> kernel: java invoked oom-killer: gfp_mask=0x201da, order=0,
> oom_score_adj=828 kernel: java
> cpuset=dab20a22eebc2a23577c05d07fcb90116a4afa789050eb91f0b8c2747267d18e
> mems_allowed=0 kernel: CPU: 1 PID: 28667 Comm: java Kdump: loaded Not
> tainted 3.10.0-862.3.3.el7.x86_64 #1 kernel
My questions:
How to prevent that my pod gets OOM-killed? Is there a deployment setting I need to enable?
What is the configuration to prevent new pod getting scheduled on a node, when there is not enough memory available on said node?
We disabled the swap space. Do we need to disable memory overcommitting setting on the host level, setting /proc/sys/vm/overcommit_memory to 0?
Thanks
SR

When defining a Pod manifest it's a best practice to define resources section with limits and requests for CPU and memory:
resources:
limits:
cpu: "1"
memory: 512Mi
requests:
cpu: 500m
memory: 256Mi
This definition helps the scheduler identifying three Quality of Service (QoS) categories:
Guaranteed
Burstable
BestEffort
and pods in the last category are the most expendable.

Related

Why does DigitalOcean k8s node capacity shows subtracted value from node pool config?

I'm running a 4vCPU 8GB Node Pool, but all of my nodes report this for Capacity:
Capacity:
cpu: 4
ephemeral-storage: 165103360Ki
hugepages-2Mi: 0
memory: 8172516Ki
pods: 110
I'd expect it to show 8388608Ki (the equivalent of 8192Mi/8Gi).
How come?
Memory can be reserved for both system services (system-reserved) and the Kubelet itself (kube-reserved). https://kubernetes.io/docs/tasks/administer-cluster/reserve-compute-resources/ has details but DO is probably setting it up for you.

Kubernetes OOM killing pod

I have a simple container that consists of OpenLDAP installed on Alpine. It's installed to run as a non-root user. I am able to run the container without any issues using my local Docker engine. However, when I deploy it to our Kubernetes system it is killed almost immediately as OOMKilled. I've tried increasing the memory without any change. I've also looked at the memory usage for the pod and don't see anything unusual.
The server is started as slapd -d debug -h ldap://0.0.0.0:1389/ -u 1000 -g 1000, where 1000 is the uid and gid, respectively.
The node trace shows this output:
May 13 15:33:44 pu1axb-arcctl00 kernel: Task in /kubepods/burstable/podbac2e0ae-9e9c-420e-be4e-c5941a2d562f/7d71b550e2d37e5d8d78c73ba8c7ab5f7895d9c2473adf4443675b9872fb84a4 killed as a result of limit of /kubepods/burstable/podbac2e0ae-9e9c-420e-be4e-c5941a2d562f
May 13 15:33:44 pu1axb-arcctl00 kernel: memory: usage 512000kB, limit 512000kB, failcnt 71
May 13 15:33:44 pu1axb-arcctl00 kernel: memory+swap: usage 0kB, limit 9007199254740988kB, failcnt 0
May 13 15:33:44 pu1axb-arcctl00 kernel: kmem: usage 7892kB, limit 9007199254740988kB, failcnt 0
May 13 15:33:44 pu1axb-arcctl00 kernel: Memory cgroup stats for /kubepods/burstable/podbac2e0ae-9e9c-420e-be4e-c5941a2d562f: cache:0KB rss:0KB rss_huge:0KB shmem:0KB mapped_file:0KB dirty:0KB writeback:0KB inactive_anon:0KB active_anon:0KB inactive_file:0KB active_file:0KB unevictable:
May 13 15:33:44 pu1axb-arcctl00 kernel: Memory cgroup stats for /kubepods/burstable/podbac2e0ae-9e9c-420e-be4e-c5941a2d562f/db65b4f82efd556a780db6eb2c3ddf4b594774e4e5f523a8ddb178fd3256bdda: cache:0KB rss:44KB rss_huge:0KB shmem:0KB mapped_file:0KB dirty:0KB writeback:0KB inactive_anon:
May 13 15:33:44 pu1axb-arcctl00 kernel: Memory cgroup stats for /kubepods/burstable/podbac2e0ae-9e9c-420e-be4e-c5941a2d562f/59f908d8492f3783da587beda7205c3db5ee78f0744d8cb49b0491bcbb95c4c7: cache:0KB rss:0KB rss_huge:0KB shmem:0KB mapped_file:0KB dirty:0KB writeback:0KB inactive_anon:0
May 13 15:33:44 pu1axb-arcctl00 kernel: Memory cgroup stats for /kubepods/burstable/podbac2e0ae-9e9c-420e-be4e-c5941a2d562f/7d71b550e2d37e5d8d78c73ba8c7ab5f7895d9c2473adf4443675b9872fb84a4: cache:4KB rss:504060KB rss_huge:0KB shmem:0KB mapped_file:0KB dirty:0KB writeback:0KB inactive_a
May 13 15:33:44 pu1axb-arcctl00 kernel: [ pid ] uid tgid total_vm rss pgtables_bytes swapents oom_score_adj name
May 13 15:33:44 pu1axb-arcctl00 kernel: [69022] 0 69022 242 1 28672 0 -998 pause
May 13 15:33:44 pu1axb-arcctl00 kernel: [69436] 1000 69436 591 454 45056 0 969 docker-entrypoi
May 13 15:33:44 pu1axb-arcctl00 kernel: [69970] 1000 69970 401 2 45056 0 969 nc
May 13 15:33:44 pu1axb-arcctl00 kernel: [75537] 1000 75537 399 242 36864 0 969 sh
May 13 15:33:44 pu1axb-arcctl00 kernel: [75544] 1000 75544 648 577 45056 0 969 bash
May 13 15:33:44 pu1axb-arcctl00 kernel: [75966] 1000 75966 196457 126841 1069056 0 969 slapd
May 13 15:33:44 pu1axb-arcctl00 kernel: Memory cgroup out of memory: Kill process 75966 (slapd) score 1961 or sacrifice child
May 13 15:33:44 pu1axb-arcctl00 kernel: Killed process 75966 (slapd) total-vm:785828kB, anon-rss:503016kB, file-rss:4348kB, shmem-rss:0kB
I find it hard to believe it's really running out of memory. It's a simple LDAP container with only 8-10 elements in the directory tree and the pod is not showing memory issues on the dashboard (Lens). We have other Alpine images which don't have this issue.
I'm relatively new to Kubernetes, so I'm hoping the users on SO can give me some guidance on how to debug this. I can provide more info once I know what is helpful. As I mentioned increasing the memory has no affect. I plan to switch from "burstable" to "guaranteed" deployment and see if that makes a difference.
===== UPDATE - Is working now =====
I believe I was confusing the meaning of resource "limits" vs "requests". I had been trying several variations on these before making the original post. After reading through the responses I now have the pod deployed with the following settings:
resources:
limits:
cpu: 50m
memory: 1Gi
requests:
cpu: 50m
memory: 250Mi
Looking at the memory footprint in Lens it's holding steady at around 715Mi for the usage. This is higher that our other pods by at least 25%. Perhaps the LDAP server just needs more. Regardless, I thank you all for your timely help.
Check your deployment or pod spec for resource limits.
If your application requires more memory than it is allowed, it will be OOMKilled by the kubernetes.
...
resources:
limits:
memory: 200Mi
requests:
memory: 100Mi
...
Equivalent JAVA JVM flags to better understand this concept
requests = Xms
limits = Xmx
Read more:
https://kubernetes.io/docs/tasks/configure-pod-container/assign-memory-resource/
https://kubernetes.io/docs/concepts/configuration/manage-resources-containers/
I'm hoping the users on SO can give me some guidance on how to debug this.
Before starting debugging you can check (and improve) your yaml files.
You can set up default memory request and a default memory limit for containers like this:
apiVersion: v1
kind: LimitRange
metadata:
name: mem-limit-range
spec:
limits:
- default:
memory: 512Mi
defaultRequest:
memory: 256Mi
type: Container
A request is a bid for the minimum amount of that resource your container will need. It doesn’t say how much of a resource you will be using, just how much you will need. You are telling the scheduler just how many resources your container needs to do its job. Requests are used for scheduling by the Kubernetes scheduler. For CPU requests they are also used to configure how the containers are scheduled by the Linux kernel.
A limit is the maximum amount of that resource your container will ever use. Limits must be greater than or equal to requests. If you set only limits, the request will be the same as the limit.
If you want to put one container in the pod, you can set memory limits like this:
apiVersion: v1
kind: Pod
metadata:
name: default-mem-demo-2
spec:
containers:
- name: default-mem-demo-2-ctr
image: nginx
resources:
limits:
memory: "1Gi"
If you specify a Container's limit, but not its request - the Container will be not assigned the default memory request value (in this situation 256Mi).
You can also put one container in the pod and set memory requests like this:
apiVersion: v1
kind: Pod
metadata:
name: default-mem-demo-3
spec:
containers:
- name: default-mem-demo-3-ctr
image: nginx
resources:
requests:
memory: "128Mi"
But in this situation the Container's memory limit is set to 512Mi, which is the default memory limit for the namespace.
If you want to debug a problem, you should know why it happened. Generally OOM the problem may appear, e.g. due to limit overcommit or container limit reached (I know that you have 1 container, but you should know how to proceed in other situation). You can read good article about it here.
You may find it a good idea to run cluster monitoring for example with Prometheus. Here is the guide how to setup Kubernetes monitoring with Prometheus. You should be interested in metric container_memory_failcnt. You can read more about it here.
You can also read this page about setting up oomkill-alerting in Kubernetes cluster.

Kubernetes Pod CPU Resource Based on Instance Type

Suppose for a pod i define the resources as below
resources:
requests:
memory: 500Mi
cpu: 500m
limits:
memory: 1000Mi
cpu: 1000m
This means i would be requiring minimum of 1/2 cpu core (or 1/2 vCPU). In cloud (AWS) we have different ec2 families. if we create a cluster using C4 or R4 types of instances does the performance change. Do we need to baseline the CPU usage based on the instance family on which we are going to run the pod.

filebeat :7.3.2 POD OOMKilled with 1Gi memory

I am running filebeat as deamon set with 1Gi memory setting. my pods getting crashed with OOMKilled status.
Here is my limit setting
resources:
limits:
memory: 1Gi
requests:
cpu: 100m
memory: 1Gi
What is the recommended memory setting to run the filebeat.
Thanks
The RAM usage of Filebeat is relative to how much it is doing, in general. You can limit the number of harvesters to try and reduce things, but overall you just need to run it uncapped and measure what the normal usage is for your use case and scenario.

Kubernetes set only container resources limits implies same values for resources requests

I have a pod with only one container that have this resources configuration:
resources:
limits:
cpu: 1000m
memory: 1000Mi
From the node where the pod is scheduled I read this:
CPU Requests CPU Limits Memory Requests Memory Limits
1 (50%) 1 (50%) 1000Mi (12%) 1000Mi (12%)
Why the "resources requests" are setted when I dont' want that?
Container’s request is set to match its limit regardless if there is a default memory request for the namespace.(Kubernetes Doc)