Pod restart in OpenShift after deployment - kubernetes

Few pods in my openshift cluster are still restarted multiple times after deployment.
with describe output:
Last State: Terminated
Reason: OOMKilled
Exit Code: 137
Also, memory usage is well below the memory limits.
Any other parameter which I am missing to check?
There are no issues with the cluster in terms of resources.

„OOMKilled“ means your container memory limit was reached and the container was therefore restarted.
Especially Java-based applications can consume a large amount of memory when starting up. After the startup, the memory usage often drops considerably.
So in your case, increase the ‚requests.limit.memory‘ to avoid these OOMKills. Note that the ‚requests‘ can still be lower and should roughly reflect what your container consumes after the startup.

Basically status OOM means the container memory limit has been crossed (Out of Memory).
If the memory allocated by all of the processes in a container exceeds the memory limit, the node Out of Memory (OOM) killer will immediately select and kill a process in the container [1].
If the container does not exit immediately, an OOM kill is detectable as follows:
A container process exited with code 137, indicating it received a SIGKILL signal
The oom_kill counter in /sys/fs/cgroup/memory/memory.oom_control is incremented
If one or more processes in a pod are OOM killed, when the pod subsequently exits, whether immediately or not, it will have phase Failed and reason OOMKilled. An OOM killed pod may be restarted depending on the value of restartPolicy [2].
To check the status:
oc get pod <pod name> -o yaml
There are applications that consumes huge amounts of memory only during the start.
In this article one can find two solutions to handle the OOMKilled issues
You’ll need to size the container workload for different node configurations when using memory limits. Unfortunately there is no formula that can be applied to calculate the rate of increase in container memory usage with increasing number of cpus on the node.
One of the kernel tuneables that can help reduce the memory usage of containers is slub_max_order. A value of 0 (default is 3) can help bring down the overall memory usage of the container but can have negative performance implication for certain workloads. It’s advisable to benchmark the container workload with this tuneable. [3]
References:
Configuring cluster memory to meet container memory and risk requirements.
APPLICATION MEMORY SIZING.
OOMKilled Containers and Number of CPUs

Related

What happens when a pod resource limit is not exceeded but a single container resource limit is?

I am searching for an specific information regarding kubernetes requests and limits and still didn't find an answer or just didn't understand quite well. Say I've defined two containers A and B for a single pod, both with its resources limits and requests:
A:
RAM request: 1Gi
RAM limit: 2Gi
B:
RAM request: 1Gi
RAM limit: 2Gi
So, we have a PoD limit of 4Gi (total). Suppose the A container exceeded the limit (say +1Gi), but B is consuming 64Mi only. So, my questions are:
What happens to the pod? Is it evicted?
Is the container A restarted?
Is the container A allowed to use the B's available RAM?
Thanks!
What happens to the pod? Is it evicted?
If the memory limit of a container is exceeded, the kernel's OOM killer is invoked and terminates the container's process. The Pod then starts a new container on the same Node.
(CPU limits use a different mechanism (CFS Bandwidth Control) that throttles the processes' CPU cycles instead of terminating the process.)
Is the container A restarted?
Yes.
Is the container A allowed to use the B's available RAM?
The memory is tracked separately for each container. They are not pooled together into the same limit.
Just to add some details
Memory request: Is the memory reserved for container, whether it is used completely or not.
Memory Limit: Is a restriction limit of max memory this container is supposed to use. So when containers memory requests exceeds, then whether to allocate or not depends on the free memory available in the machine running that container at that point of time
To answer your queries, from my understanding:
If Container A reaches its Memory limit of 2GI, it OOMed and this will restart the containers.
If Container A exceeds its Memory request of 1GI, it tries to get the required memory from whats available of the machine(max to what limit is set)
Hope this answers you queries

How can I correctly set Kubernetes pod eviction limits, to avoid system OOM killer

I've spent over a full day trying to make sense of Kubernetes' resource management. Specifically, I'm trying to set up eviction thresholds and resource reservations in such a way that there is always at least 1GiB of memory available.
Going on the documentation regarding resource reservations and out-of-resource handling, I figured setting the following eviction policy would suffice:
--eviction-hard=memory.available<1Gi
However, in practice, this does not work at all, as the computation the kubelet does seems to be different from the computation the kernel does when it needs to determine whether or not the OOM killer needs to be invoked. E.g. when I load up my system with a bunch of pods running an artificial memory hog, I get the following report from free -m:
Total: 15866
Used: 14628
free: 161
shared: 53
buff/cache: 1077
available: 859
According to the kernel, there's 859 MiB memory available. Yet, the kubelet does not invoke its eviction policy. In fact, I've been able to invoke the system OOM killer before the kubelet eviction policy was invoked, even when ramping up memory usage incredibly slowly (to allow the kubelet housekeeing control loop to sleep 10 seconds, as per its default configuration).
I've found this script which used to be in Kubernetes documentation and is supposed to calculate the available memory in the same way the Kubelet does. I ran it in parallel to free -m above and got the following result:
memory.available_in_mb 1833
That's almost 1000M difference!
Now, I understand the calculation was by design, but that leaves me with the obvious question: how can I reliably manage system resource usage so that the system OOM killer does not get invoked? What eviction policy can I set so the kubelet will start evicting pods when there's less than a gigabyte of memory available?
According to documentation https://kubernetes.io/docs/tasks/administer-cluster/reserve-compute-resources/, you should add the Kubelet flag --system-reserved=memory=1024Mi

Which component in Kubernetes is responsible for resource limits?

When a pod is created but no resource limit is specified, which component is responsible to calculate or assign resource limit to that pod? Is that the kubelet or the Docker?
If a pod doesn't specify any resource limits, it's ultimately up to the Linux kernel scheduler on the node to assign CPU cycles or not to the process, and to OOM-kill either the pod or other processes on the node if memory use is excessive. Neither Kubernetes nor Docker will assign or guess at any sort of limits.
So if you have a process with a massive memory leak, and it gets scheduled on a very large but quiet instance with 256 GB of available memory, it will get to use almost all of that memory before it gets OOM-killed. If a second replica gets scheduled on a much smaller instance with only 4 GB, it's liable to fail much sooner. Usually you'll want to actually set limits for consistent behavior.

How does Kubernetes enforce the memory limit for a container?

How does it enforce the limit? Is it via cgroups? Or is there an actual process watching container processes and terminating them?
I seem to have a container process that gets a SIGKILL, but the pod does not get restarted (but the process dies because SIGKILL). So I'm unsure what the cause of it is.
https://github.com/kubernetes/kubernetes/issues/50632
IIUC, the process which consumes the most memory will be oom-killed in
this case. The container won't terminate unless the killed perocess is
the main process within the container.

limit the amount of memory kube-controller-manager uses

running v1.10 and i notice that kube-controller-managers memory usage spikes and the OOMs all the time. it wouldn't be so bad if the system didn't fall to a crawl before this happens tho.
i tried modifying /etc/kubernetes/manifests/kube-controller-manager.yaml to have a resource.limits.memory=1Gi but the kube-controller-manager pod never seems to want to come back up.
any other options?
There is a bug in kube-controller-manager, and it's fixed in https://github.com/kubernetes/kubernetes/pull/65339
First of all, you missed information about the amount of memory you use per node.
Second, what do you mean by "system didn't fall to a crawl" - do you mean nodes are swapping?
All Kubernetes masters and nodes are expected to have swap disabled - it's recommended by the Kubernetes community, as mentioned in the Kubernetes documentation.
Support for swap is non-trivial and degrades performance.
Turn off swap on every node by:
sudo swapoff -a
Finally,
resource.limits.memory=1Gi
is default value per pod. These limits are hard limits. Pod reaching this level of allocated memory can cause OOM, even if you have gigabytes of unallocated memory.