Kubernetes limit nvdia GPU - kubernetes

I have a node with 2 gpu on it, and I deploy two containers each with 1 gpu limit
resources:
limits:
nvidia.com/gpu: 1
It works well with nvidia official container . However with my container I can see 2 gpu inside each container
So the resources limits only works during scheduling which I can only deploy two pods with gpu limits 1.
I am expecting inside the container it can only use 1 gpu which matches the resource limit.
Is it sth to do with the container? I thought it should be controlled on the kubernetes level?
Any suggestions?
Regards
David

Related

kubernetes pod resource cpu on nodes with different cpu cores count

This is a bit crazy, but we run a kubernetes cluster with 4 nodes (w/ Docker as container engine):
node01/node02: 8 cores
node03/node04: 4 cores
I am confusing about exactly what pod resource request cpu give as real cpu for a containerized application.
In my understanding, pods from a deployment that request 1 CPU, will all have the same cpu shares, so this mean a container will run faster on node01/node02 than 03/04 ?
Not necessarily:
If the application is single-threaded, it will run at the same speed no matter how many cores the system it's on has.
If the application is disk- or database-bound, adding more cores won't make it go faster.
If other pods (or non-Kubernetes processes) are running on either of the nodes, those share the CPU resource, and a busy 8-core system could in practice be slower than an idle 4-core system.
If the pod spec has resource requests, it could be prevented from running on the smaller system
resources:
requests:
cpu: 6 # can't run on the 4-core system
If the pod spec has resource limits, that can prevent it from using all of the cores, even if it's scheduled on the larger system
resources:
limits:
cpu: 3 # even if it's scheduled on the 8-core system

How many cores do kubernetes pods use when it's CPU usage is limited by policy?

Kubernetes allows to limit pod resource usage.
requests:
cpu: 100m
memory: 128Mi
limits:
cpu: 200m # which is 20% of 1 core
memory: 256Mi
Let's say my kubernetes node has 2 core. And I run this pod with limit of CPU: 200m on this node. In this case, will my pod use it's underlying node's 1Core's 200m or 2Core's 100m+100m?
This calculation is needed for my gunicorn worker's number formula, or nginx worker's number etc..
In gunicorn documentation it says
Generally we recommend (2 x $num_cores) + 1 as the number of workers
to start off with.
So should I use 5 workers? (my node has 2 cores). Or it doesn't even matter since my pod has only allocated 200m cpu and I should consider my pod has 1 core?
TLDR: How many cores do pods use when its cpu usage is limited by kubernetes? If I run top inside pod, I'm seeing 2 cores available. But I'm not sure my application is using this 2 core's 10%+10% or 1core's 20%..
It will limit to 20% of one core, i.e. 200m. Also, limit means a pod can touch a maximum of that much CPU and no more. So pod CPU utilization will not always touch the limit.
Total CPU limit of a cluster is the total amount of cores used by all nodes present in cluster.
If you have a 2 node cluster and the first node has 2 cores and second node has 1 core, K8s CPU capacity will be 3 cores (2 core + 1 core). If you have a pod which requests 1.5 cores, then it will not be scheduled to the second node, as that node has a capacity of only 1 core. It will instead be scheduled to first node, since it has 2 cores.
CPU is measured in units called millicores. Each node in the cluster introspects the operating system to determine the amount of CPU cores on the node and then multiples that value by 1000 to express its total capacity. For example, if a node has 2 cores, the node’s CPU capacity would be represented as 2000m. If you wanted to use a 1/10 of a single core, you would represent that as 100m.
So, if in your cluster you provided 200m milicores, then it will stick to one core and take up the 20 percent of that core. Now if you provided another pod with 1.5m, then only it will take up more than one core.

"Limits" property ignored when deploying a container in a Kubernetes cluster

I am deploying a container in Google Kubernetes Engine with this YAML fragment:
spec:
containers:
- name: service
image: registry/service-go:latest
resources:
requests:
memory: "20Mi"
cpu: "20m"
limits:
memory: "100Mi"
cpu: "50m"
But it keeps taking 120m. Why is "limits" property being ignored? Everything else is working correctly. If I request 200m, 200m are being reserved, but limit keeps being ignored.
My Kubernetes version is 1.10.7-gke.1
I only have the default namespace and when executing
kubectl describe namespace default
Name: default
Labels: <none>
Annotations: <none>
Status: Active
No resource quota.
Resource Limits
Type Resource Min Max Default Request Default Limit Max Limit/Request Ratio
---- -------- --- --- --------------- ------------- -----------------------
Container cpu - - 100m - -
Considering Resources Request Only
The google cloud console works well, I think you have multiple containers in your pod, this is why. The value shown above is the sum of resources requests declared in your truncated YAML file. You can verify easily with kubectl.
First verify the number of containers in you pod.
kubectl describe pod service-85cc4df46d-t6wc9
Then, look the description of the node via kubectl, you should have the same informations as the console says.
kubectl describe node gke-default-pool-abcdefgh...
What is the difference between resources request and limit ?
You can imagine your cluster as a big square box. This is the total of your allocatable resources. When you drop a Pod in the big box, Kubernetes will check if there is an empty space for the requested resources of the pod (is the small box fits in the big box?). If there is enough space available, then it will schedule your workload on the selected node.
Resources limits are not taken into account by the scheduler. All is done at the kernel level with CGroups. The goal is to restrict workloads to take all the CPU or Memory on the node they are scheduled on.
If your resources requests == resources limits then, workloads cannot escape their "box" and are not able to use available CPU/Memory next to them. In other terms, your resource are guaranteed for the pod.
But, if the limits are greater than your requests, this is called overcommiting resources. You bet that all the workloads on the same node are not fully loaded at the same time (generally the case).
I recommend to not overcommiting the memory resource, do not let the pod escape the "box" in term of memory, it can leads to OOMKilling.
You can try logging into the node running your pod and run:
ps -Af | grep docker
You'll see the full command line that kubelet sends to docker. Representing the memory limit it should have something like --memory. Note that the request value for memory is only used by the Kubernetes scheduler to determine whether it has exceeded all pods/containers running on a node.
Representing the requests for CPUs you'll see the --cpu-shares flag. In this case the limit is not a hard limit but again it's a way for the Kubernetes scheduler to not allocate containers/pod passed that limit when running multiple containers/pods on a specific node. You can learn more about cpu-shares here and from the Kubernetes side here. So in essence, if you don't have enough workloads on the node, it will always go over its CPU share if it needs to and that's what you are probably seeing.
Docker has other ways of restricting the CPUs such as cpu-period/cpu-quota and cpuset-cpus but not used bu Kubernetes as of this writing. In this, I believe mesos does somehow better when dealing with CPU/memory reservations and quotas imo.
Hope it helps.

How to find out the minimum and maximum usable CPU and memory space left on a kubernetes node

I'm trying to deploy Magento on a GCE n1-standard-1 machine, but I keep getting the following error message.
pod (magento-magento-1486272877-zd34d) failed to fit in any node fit failure summary on nodes : Insufficient cpu (1)
I'm using the official Magento helm chart, and I've configured the values.yml file to contain very low CPU requests: cpu: 25m
When I look at the node details on the kubernetes dashboard, I see that my CPU is already spinning at 0.728 (72.80%) while it's not even doing anything besides the system containers. Also see image below:
Does this mean I have 1 - 0.728 = 0.272m left for container requests? Then why is kubernetes still telling me that it has insufficient CPU when specifying 0.25m?
Thanks for your help.
I didn't see that the CPU limits were 0.248 according to the picture in my post, so I put cpu: 20m and it worked.
There is a nifty kubectl command to get information about your nodes resources...
kubectl top nodes
And pods...
kubectl top pods
Pods with containers
kubectl top pods --containers=true

Node not ready, pods pending

I am running a cluster on GKE and sometimes I get into a hanging state. Right now I was working with just two nodes and allowed the cluster to autoscale. One of the nodes has a NotReady status and simply stays in it. Because of that, half of my pods are Pending, because of insufficient CPU.
How I got there
I deployed a pod which has quite high CPU usage from the moment it starts. When I scaled it to 2, I noticed CPU usage was at 1.0; the moment I scaled the Deployment to 3 replicas, I expected to have the third one in Pending state until the cluster adds another node, then schedule it there.
What happened instead is the node switched to a NotReady status and all pods that were on it are now Pending.
However, the node does not restart or anything - it is just not used by Kubernetes. The GKE then thinks that there are enough resources as the VM has 0 CPU usage and won't scale up to 3.
I cannot manually SSH into the instance from console - it is stuck in the loading loop.
I can manually delete the instance and then it starts working - but I don't think that's the idea of fully managed.
One thing I noticed - not sure if related: in GCE console, when I look at VM instances, the Ready node is being used by the instance group and the load balancer (which is the service around an nginx entry point), but the NotReady node is only in use by the instance group - not the load balancer.
Furthermore, in kubectl get events, there was a line:
Warning CreatingLoadBalancerFailed {service-controller } Error creating load balancer (will retry): Failed to create load balancer for service default/proxy-service: failed to ensure static IP 104.199.xx.xx: error creating gce static IP address: googleapi: Error 400: Invalid value for field 'resource.address': '104.199.xx.xx'. Specified IP address is already reserved., invalid
I specified loadBalancerIP: 104.199.xx.xx in the definition of the proxy-service to make sure that on each restart the service gets the same (reserved) static IP.
Any ideas on how to prevent this from happening? So that if a node gets stuck in NotReady state it at least restarts - but ideally doesn't get into such state to begin with?
Thanks.
The first thing I would do is to define Resources and Limits for those pods.
Resources tell the cluster how much memory and CPU you think that the pod is going to use. You do this to help the scheduler to find the best location to run those pods.
Limits are crucial here: they are set to prevent your pods damaging the stability of the nodes. It's better to have a pod killed by an OOM than a pod bringing a node down because of resource starvation.
For example, in this case you're saying that you want 200m CPU (20%) for your pod but if for any chance it goes above 300 (30%), you want the scheduler to kill it and start a new one.
spec:
containers:
- image: nginx
imagePullPolicy: Always
name: nginx
resources:
limits:
cpu: 300m
memory: 200Mi
requests:
cpu: 200m
memory: 100Mi
You can read more here: http://kubernetes.io/docs/admin/limitrange/
For AWS I can tell. You can create dynamic scaling policies based on CPU and memory utilization.
It goes in NotReady state because of out of memory or maybe insufficient CPU. You can create a custom memory metric to collect memory metric of all the worker nodes in the cluster collectively and push it to cloudwatch.
You can follow this documentation- https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/mon-scripts.html
CPU metric is already there so no need to create it. So a memory metric will be created for you cluster.
You can now create an alarm for it when it goes above certain threshold. Now you have to go to the Auto Scaling Group through AWS console. Now you have to add a scaling policy for your autoscaling group selecting the alarm that you created and add number of instance accordingly.