Pin Kubernetes pods/deployments/replica sets/daemon sets to run on specific cpu only - kubernetes

I need to restrict an app/deployment to run on specific cpus only (say 0-3 or just 1 or 2 etc.) I found out about CPU Manager and tried implement it with static policy but not able to achieve what I intend to.
I tried the following so far:
Enabled cpu manager static policy on kubelet and verified that it is enabled
Reserved the cpu with --reserved-cpus=0-3 option in the kubelet
Ran a sample nginx deployment with limits equal to requests and cpu of integer value i.e. QoS of guaranteed is ensured and able to validate the cpu affinity with taskset -c -p $(pidof nginx)
So, this makes my nginx app to be restricted to run on all cpus other than reserved cpus (0-3), i.e. if my machine has 32 cpus, the app can run on any of the 4-31 cpus. And so can any other apps/deployments that will run. As I understand, the reserved cpus 0-3 will be reserved for system daemons, OS daemons etc.
My questions-
Using the Kubernetes CPU Manager features, is it possible to pin certain cpu to an app/pod (in this case, my nginx app) to run on a specific cpu only (say 2 or 3 or 4-5)? If yes, how?
If point number 1 is possible, can we perform the pinning at container level too i.e. say Pod A has two containers Container B and Container D. Is it possible to pin cpu 0-3 to Container B and cpu 4 to Container B?
If none of this is possible using Kubernetes CPU Manager, what are the alternatives that are available at this point of time, if any?

As I understand your question, you want to set up your dedicated number of CPU for each app/pod. As I've searched.
I am only able to find some documentation that might help. The other one is a Github topic I think this is a workaround to your problem.
This is a disclaimer, based from what I've read, searched and understand there is no direct solution for this issue, only workarounds. I am still searching further for this.

Related

Rightsizing Kubernetes Nodes | How much cost we save when we switch from VMs to containers

We are running 4 different micro-services on 4 different ec2 autoscaling groups:
service-1 - vcpu:4, RAM:32 GB, VM count:8
service-2 - vcpu:4, RAM:32 GB, VM count:8
service-3 - vcpu:4, RAM:32 GB, VM count:8
service-4 - vcpu:4, RAM:32 GB, VM count:16
We are planning to migrate this workload on EKS (in containers)
We need help in deciding the right node configuration (in EKS) to start with.
We can start with a small machine vcpu:4, RAM:32 GB, but will not get any cost saving as each container will need a separate vm.
We can use a large machine vcpu:16, RAM: 128 GB, but when these machines scale out, scaled out machine will be large and thus can be underutiliized.
Or we can go with a Medium machine like vcpu: 8, RAM:64 GB.
Other than this recommendation, we were also evaluating the cost saving of moving to containers.
As per our understanding, every VM machine comes with following overhead
Overhead of running hypervisor/virtualisation
Overhead of running separate Operating system
Note: One large VM vs many small VMs cost the same on public cloud as cost is based on number of vCPUs + RAM.
Hypervisor/virtualization cost is only valid if we are running on-prem, so no need to consider this.
On the 2nd point, how much resources a typical linux machine can take to run a OS? If we provision a small machine (vcpu:2, RAM:4GB), an approximate cpu usage is 0.2% and memory consumption (other than user space is 500Mb).
So, running large instances (count:5 instances in comparison to small instances count:40) can save 35 times of this cpu and RAM, which does not seem significant.
You are unlikely to see any cost savings in resources when you move to containers in EKS from applications running directly on VM's.
A Linux Container is just an isolated Linux process with specified resource limits, it is no different from a normal process when it comes to resource consumption. EKS still uses virtual machines to provide compute to the cluster, so you will still be running processes on a VM, regardless of containerization or not and from a resource point of view it will be equal. (See this answer for a more detailed comparison of VM's and containers)
When you add Kubernetes to the mix you are actually adding more overhead compared to running directly on VM's. The Kubernetes control plane runs on a set of dedicated VM's. In EKS those are fully managed in a PaaS, but Amazon charges a small hourly fee for each cluster.
In addition to the dedicated control plane nodes, each worker node in the cluster need a set of programs (system pods) to function properly (kube-proxy, kubelet etc.) and you may also define containers that must run on each node (daemon sets), like log collectors and security agents.
When it comes to sizing the nodes you need to find a balance between scaling and cost optimization.
The larger the worker node is the smaller the relative overhead of system pods and daemon sets become. In theory a worker node large enough to accommodate all your containers would maximize resources consumed by your applications compared to supporting applications on the node.
The smaller the worker nodes are the smaller the horizontal scaling steps can be, which is likely to reduce waste when scaling. It also provides better resilience as a node failure will impact fewer containers.
I tend to prefer nodes that are small so that scaling can be handled efficiently. They should be slightly larger than what is required from the largest containers, so that system pods and daemon sets also can fit.

Is there a way to force the use of the same physical CPU while allocating cores to a pod in Kubernetes?

I was wondering if it was possible to force Kubernetes to allocate the cores from the same CPU while spinning up a POD. What I would like Kubernetes to do is, as new PODs are created, the cores allocated to them should come from -let's say- CPU1 as long as there are cores still available on it. CPU2's, CPU3's, etc. cores should not be used in the newly initiated pod. I would like my PODs to have cores allocated from a single CPU as long as it is possible.
Is there a way to achieve this?
Also, is there a way to see from which physical CPUs the cores(cpu) of a POD is coming from?
Thanks a lot.
Edit: Let me explain why I want to do this.
We are running a Spark cluster on Kubernetes. The lead of system/linux administration team warned us about the concept of NUMA. He told us that we could improve the performance of our executor pods if we were to allocate the cores from the same physical CPU. That is why I started digging into this.
I found this Kubernetes CPU Manager. The documentation says:
CPU Manager allocates CPUs in a topological order on a best-effort
basis. If a whole socket is free, the CPU Manager will exclusively
allocate the CPUs from the free socket to the workload. This boosts
the performance of the workload by avoiding any cross-socket traffic.
Also on the same page:
Allocate all the logical CPUs (hyperthreads) from the same physical
CPU core if available and the container requests an entire core worth
of CPUs.
So now I am starting to think maybe what I need is to enable the static policy for the CPU manager to get what I want.

how do we choose --nthreads and --nprocs per worker in dask distributed running via helm on kubernetes?

I'm running some I/O intensive Python code on Dask and want to increase the number of threads per worker. I've deployed a Kubernetes cluster that runs Dask distributed via helm. I see from the worker deployment template that the number of threads for a worker is set to the number of CPUs, but I'd like to set the number of threads higher unless that's an anti-pattern. How do I do that?
It looks like from this similar question that I can ssh to the dask scheduler and spin up workers with dask-worker? But ideally I'd be able to configure the worker resources via helm so that I don't have to interact with the scheduler other than submitting jobs to it via the Client.
Kubernetes resource limits and requests should match the --memory-limit and --nthreads parameters given to the dask-worker command. For more information please follow the link 1 (Best practices described on Dask`s official documentation) and 2
Threading in Python is a careful art and is really dependent on your code. To do the easy one, -nprocs should almost certainly be 1, if you want more processes, launch more replicas instead. For the thread count, first remember the GIL means only one thread can be running Python code at a time. So you only get concurrency gains under two main sitations: 1) some threads are blocked on I/O like waiting to hear back from a database or web API or 2) some threads are running non-GIL-bound C code inside NumPy or friends. For the second situation, you still can't get more concurrency than the number of CPUs since that's just how many slots there are to run at once, but the first can benefit from more threads than CPUs in some situations.
There's a limitation of Dask's helm chart that doesn't allow for the setting of --nthreads in the chart. I confirmed this with the Dask team and filed an issue: https://github.com/helm/charts/issues/18708.
In the meantime, use Dask Kubernetes for a higher degree of customization.

Which component in Kubernetes is responsible for resource limits?

When a pod is created but no resource limit is specified, which component is responsible to calculate or assign resource limit to that pod? Is that the kubelet or the Docker?
If a pod doesn't specify any resource limits, it's ultimately up to the Linux kernel scheduler on the node to assign CPU cycles or not to the process, and to OOM-kill either the pod or other processes on the node if memory use is excessive. Neither Kubernetes nor Docker will assign or guess at any sort of limits.
So if you have a process with a massive memory leak, and it gets scheduled on a very large but quiet instance with 256 GB of available memory, it will get to use almost all of that memory before it gets OOM-killed. If a second replica gets scheduled on a much smaller instance with only 4 GB, it's liable to fail much sooner. Usually you'll want to actually set limits for consistent behavior.

Kubernetes limit and request of resource would be better to be closer

I was told by a more experienced DevOps person, that resource(CPU and Memory) limit and request would be nicer to be closer for scheduling pods.
Intuitively I can image less scaling up and down would require less calculation power for K8s? or can someone explain it in more detail?
The resource requests and limits do two fundamentally different things. The Kubernetes scheduler places a pod on a node based only on the sum of the resource requests: if the node has 8 GB of RAM, and the pods currently scheduled on that node requested 7 GB of RAM, then a new pod that requests 512 MB will fit there. The limits control how much resource the pod is actually allowed to use, with it getting CPU-throttled or OOM-killed if it uses too much.
In practice many workloads can be "bursty". Something might require 2 GB of RAM under peak load, but far less than that when just sitting idle. It doesn't necessarily make sense to provision enough hardware to run everything at peak load, but then to have it sit idle most of the time.
If the resource requests and limits are far apart then you can "fit" more pods on the same node. But, if the system as a whole starts being busy, you can wind up with many pods that are all using above their resource request, and actually use more memory than the node has, without any individual pod being above its limit.
Consider a node with 8 GB of RAM, and pods with 512 MB RAM resource requests and 2 GB limits. 16 of these pods "fit". But if each pod wants to use 1 GB RAM (allowed by the resource limits) that's more total memory than the node has, and you'll start getting arbitrary OOM-kills. If the pods request 1 GB RAM instead, only 8 will "fit" and you'll need twice the hardware to run them at all, but in this scenario the cluster will run happily.
One strategy for dealing with this in a cloud environment is what your ops team is asking, make the resource requests and limits be very close to each other. If a node fills up, an autoscaler will automatically request another node from the cloud. Scaling down is a little trickier. But this approach avoids problems where things die randomly because the Kubernetes nodes are overcommitted, at the cost of needing more hardware for the idle state.