In Kubernetes, is it possible to enforce virtual memory (physical page swapping to disk) on a pod/container with memory requests and limits set?
For instance, as per the Kubernetes documentation, “if you set a memory limit of 4GiB for a container, the kubelet (and container runtime) enforce the limit. The runtime prevents the container from using more than the configured resource limit. For example: when a process in the container tries to consume more than the allowed amount of memory, the system kernel terminates the process that attempted the allocation, with an out of memory (OOM) error.”
Hence, is it possible to configure the pod (and hence linux kernel) to enforce virtual memory (that is paging and memory swapping ) on the specified physical memory limits of the pod (4GiB) instead of OOM error? am I missing something?
Reading the kernel documentation on this leads me to believe this is not possible. And I don't think this is a desirable behavior. Let's just think about the following scenario: You have a machine with 64GB of physical memory with 10GB of those used. Then you start a process with a "physical" memory limit of 500MB. If this memory limit is reached the kernel would start swapping and the process would stall even though there is enough memory available to service the memory requests of the process.
The memory limit you specify on the container is actually not a physical memory limit, but a virtual memory limit with overcommit allowed. This means your process can allocate as much memory as it wants (until you reach the overcommit limit), but it gets killed as soon as it tries to use too much memory.
Related
Few pods in my openshift cluster are still restarted multiple times after deployment.
with describe output:
Last State: Terminated
Reason: OOMKilled
Exit Code: 137
Also, memory usage is well below the memory limits.
Any other parameter which I am missing to check?
There are no issues with the cluster in terms of resources.
„OOMKilled“ means your container memory limit was reached and the container was therefore restarted.
Especially Java-based applications can consume a large amount of memory when starting up. After the startup, the memory usage often drops considerably.
So in your case, increase the ‚requests.limit.memory‘ to avoid these OOMKills. Note that the ‚requests‘ can still be lower and should roughly reflect what your container consumes after the startup.
Basically status OOM means the container memory limit has been crossed (Out of Memory).
If the memory allocated by all of the processes in a container exceeds the memory limit, the node Out of Memory (OOM) killer will immediately select and kill a process in the container [1].
If the container does not exit immediately, an OOM kill is detectable as follows:
A container process exited with code 137, indicating it received a SIGKILL signal
The oom_kill counter in /sys/fs/cgroup/memory/memory.oom_control is incremented
If one or more processes in a pod are OOM killed, when the pod subsequently exits, whether immediately or not, it will have phase Failed and reason OOMKilled. An OOM killed pod may be restarted depending on the value of restartPolicy [2].
To check the status:
oc get pod <pod name> -o yaml
There are applications that consumes huge amounts of memory only during the start.
In this article one can find two solutions to handle the OOMKilled issues
You’ll need to size the container workload for different node configurations when using memory limits. Unfortunately there is no formula that can be applied to calculate the rate of increase in container memory usage with increasing number of cpus on the node.
One of the kernel tuneables that can help reduce the memory usage of containers is slub_max_order. A value of 0 (default is 3) can help bring down the overall memory usage of the container but can have negative performance implication for certain workloads. It’s advisable to benchmark the container workload with this tuneable. [3]
References:
Configuring cluster memory to meet container memory and risk requirements.
APPLICATION MEMORY SIZING.
OOMKilled Containers and Number of CPUs
I am searching for an specific information regarding kubernetes requests and limits and still didn't find an answer or just didn't understand quite well. Say I've defined two containers A and B for a single pod, both with its resources limits and requests:
A:
RAM request: 1Gi
RAM limit: 2Gi
B:
RAM request: 1Gi
RAM limit: 2Gi
So, we have a PoD limit of 4Gi (total). Suppose the A container exceeded the limit (say +1Gi), but B is consuming 64Mi only. So, my questions are:
What happens to the pod? Is it evicted?
Is the container A restarted?
Is the container A allowed to use the B's available RAM?
Thanks!
What happens to the pod? Is it evicted?
If the memory limit of a container is exceeded, the kernel's OOM killer is invoked and terminates the container's process. The Pod then starts a new container on the same Node.
(CPU limits use a different mechanism (CFS Bandwidth Control) that throttles the processes' CPU cycles instead of terminating the process.)
Is the container A restarted?
Yes.
Is the container A allowed to use the B's available RAM?
The memory is tracked separately for each container. They are not pooled together into the same limit.
Just to add some details
Memory request: Is the memory reserved for container, whether it is used completely or not.
Memory Limit: Is a restriction limit of max memory this container is supposed to use. So when containers memory requests exceeds, then whether to allocate or not depends on the free memory available in the machine running that container at that point of time
To answer your queries, from my understanding:
If Container A reaches its Memory limit of 2GI, it OOMed and this will restart the containers.
If Container A exceeds its Memory request of 1GI, it tries to get the required memory from whats available of the machine(max to what limit is set)
Hope this answers you queries
I was wondering if it was possible to force Kubernetes to allocate the cores from the same CPU while spinning up a POD. What I would like Kubernetes to do is, as new PODs are created, the cores allocated to them should come from -let's say- CPU1 as long as there are cores still available on it. CPU2's, CPU3's, etc. cores should not be used in the newly initiated pod. I would like my PODs to have cores allocated from a single CPU as long as it is possible.
Is there a way to achieve this?
Also, is there a way to see from which physical CPUs the cores(cpu) of a POD is coming from?
Thanks a lot.
Edit: Let me explain why I want to do this.
We are running a Spark cluster on Kubernetes. The lead of system/linux administration team warned us about the concept of NUMA. He told us that we could improve the performance of our executor pods if we were to allocate the cores from the same physical CPU. That is why I started digging into this.
I found this Kubernetes CPU Manager. The documentation says:
CPU Manager allocates CPUs in a topological order on a best-effort
basis. If a whole socket is free, the CPU Manager will exclusively
allocate the CPUs from the free socket to the workload. This boosts
the performance of the workload by avoiding any cross-socket traffic.
Also on the same page:
Allocate all the logical CPUs (hyperthreads) from the same physical
CPU core if available and the container requests an entire core worth
of CPUs.
So now I am starting to think maybe what I need is to enable the static policy for the CPU manager to get what I want.
When a pod is created but no resource limit is specified, which component is responsible to calculate or assign resource limit to that pod? Is that the kubelet or the Docker?
If a pod doesn't specify any resource limits, it's ultimately up to the Linux kernel scheduler on the node to assign CPU cycles or not to the process, and to OOM-kill either the pod or other processes on the node if memory use is excessive. Neither Kubernetes nor Docker will assign or guess at any sort of limits.
So if you have a process with a massive memory leak, and it gets scheduled on a very large but quiet instance with 256 GB of available memory, it will get to use almost all of that memory before it gets OOM-killed. If a second replica gets scheduled on a much smaller instance with only 4 GB, it's liable to fail much sooner. Usually you'll want to actually set limits for consistent behavior.
I am running one virtual machine on a host system, with 50% memory and 50% CPU allocated to it.
Will this reduce the system performance by half?
Give me your comments and suggestions.
Alex is correct. The reason that it could take less than half of your system performance, is because most virtualization systems will not dedicate precisely that amount of CPU and memory to your virtual machine if the software running inside the VM is not demanding that much. If the VM is running a demanding workload though, this will not be the case.
The reason that it could take more than half of your system performance is because any virtualization system has its own overhead, just in order to provide the virtualization to the VM. Some memory is consumed in tracking the memory and resources used by the virtual machine, and some CPU is consumed in handling the needs of the VM (interrupts from network traffic, etc.).