I have the following questions regarding request/limit quota for ns:
Considering the following namespace resource setup:
- request: 1 core/1GiB
- limit: 2 core/2GiB
Does it mean a namespace is guaranteed to have 1/1GiB? How is it achieved physically on cluster nodes? Does it mean k8s somehow strictly reserve these values for a ns (at a time it's created)? At which point of time reservation takes place?
Limit 2 core/2GiB - does it mean it's not guaranteed for a ns and depends on current cluster's state? Like if currently cluster has only 100MiB of free ram available, but in runtime pod needs 200Mib more above a resource request - pod will be restarted? Where does k8s take this resource if pod needs to go above it's request?
Regarding namespace granularity and k8s horizontal auto scaling: consider we have 2 applications and 2 namespaces - 1 ns per each app. We set both ns quota as such that there's some free buffer for 2 extra pods and horizontal auto scaling up to 2 pods with certain CPU threshold. So, is there really a point in doing such a set up? My concern is that if NS reserves it's resources and no other ns can utilize them - we can just create 2 extra pods in each ns replica set with no auto scaling, using these pods constantly. I can see a point in using auto scaling if we have more than 1 application in 1 ns, so that these apps could share same resource buffer for scaling. Is this assumption correct?
How do you think is this a good practice to have 1 ns per app? Why?
p.s. i know what resource request/limit are and difference between them. In most info sources there's just very high level explanation of the concept.
Thanks in advance.
The docs clearly states the following:
In the case where the total capacity of the cluster is less than the sum of the quotas of the namespaces, there may be contention for resources. This is handled on a first-come-first-served basis.
and
ResourceQuotas are independent of the cluster capacity. They are expressed in absolute units. So, if you add nodes to your cluster, this does not automatically give each namespace the ability to consume more resources.
and
resource quota divides up aggregate cluster resources, but it creates no restrictions around nodes: pods from several namespaces may run on the same node
ResourceQuotas is a constraint set in the namespace and does not reserve capacity, it just set a limit of resources that can be consumed by each namespace.
To effectively "reserve" the capacity, you have to set the restrictions to all namespaces, so that other namespaces does not use more resources than you cluster can provide. This way you can have more guarantees that a namespace will have available capacity to run their load.
The docs suggests:
Proportionally divide total cluster resources among several teams(namespaces).
Allow each team to grow resource usage as needed, but have a generous limit to prevent accidental resource exhaustion.
Detect demand from one namespace, add nodes, and increase quota.
Given that, the answer for your questions are:
it is not a reserved capacity, the reservation happens on resource(pod) creation.
Running resources are not affected after reservation. New resources are rejected if the resource creation will over commit the quotas(limits)
As stated in the docs, if the limit are higher than the capacity, the reservation will happen in a first-come-first-served basis.
This question can make to it's own question in SO, in simple terms, for resource isolation and management.
Related
I have a bunch of pods in a cluster that is almost requesting all (7.35/8) available CPU resources on a node:
even though their actual total usage is almost nothing (0.34/8).
The pod that is currently requesting the most only requests 210m which I guess is not an outrageous amount - also I would like to enforce some sensible minimum request size for all pods in the cluster. Of course that will accumulate when there are lots of pods.
It seems I could easily scale down the request by a factor of 10 and leave the limits where they are to begin with.
But is there something else that I should look into instead before doing that - reducing replica count etc.?
Also it looks a bit strange that the pods are not more evenly distributed between the nodes.
Your request values seems overestimated.
You need time and metrics to find the right request/limit for your workload.
Keep in mind that if you change those values, your pods will restart.
Also, It's normal that you can find some unbalance nodes on your cluster. Kubernetes will never remove a pod if you don't ask.
For example, if your create a cluster with 3 nodes, fill those 3 nodes with pods and then add another 3 nodes. The new nodes will stay empty.
You can setup some HorizontalPodAutoScaler on your cluster to adapt your number of pod to your workload.
Doing that, your workload will spread among nodes and with a correct balance. (if you use the default Scheduling Policy
I suggest following:
Resource Allocation: Based on history value set your request to meaningful value with buffer. Also to have guaranteed pod resource allocation it may be a good idea to set request and limit as same value. But that means you pod cannot burst for new resource. One more thing to note is scheduling only happens based on requested value, so if node has no more resource left, then pod will be killed and rescheduled if you request is trying to burst to limit.
Resource quotas: Check Kubernetes Resource Quotas to have sensible namespace level quotas to control overly provisioned resources by developers
Affinity/AntiAffinity: Check concept of Anti-affinity to have your replicas or different pods scheduled across your cluster. You can ensure for eg., that one host or Avalability zone etc can have only one replica of your pod (helps in HA), spread different pods to different nodes (layer scheduling etc) - Check this video
There are good answers already but I would like to add some more info.
It is very important to have a good strategy when calculating how much resources you would need for each container.
Optimally, your pods should be using exactly the amount of resources you requested but that's almost impossible to achieve. If the usage is lower than your request, you are wasting resources. If it's higher, you are risking performance issues. Consider a 25% margin up and down the request value as a good starting point. Regarding limits, achieving a good setting would depend on trying and adjusting. There is no optimal value that would fit everyone as it depends on many factors related to the application itself, the demand model, the tolerance to errors etc.
Kubernetes best practices: Resource requests and limits is a very good guide explaining the idea behind these mechanisms with a detailed explanation and examples.
Also, Managing Resources for Containers will provide you with the official docs regarding:
Requests and limits
Resource types
Resource requests and limits of Pod and Container
Resource units in Kubernetes
How Pods with resource requests are scheduled
How Pods with resource limits are run, etc
Just in case you'll need a reference.
I did performance test to my application.
The usage of cpu was 30% while testing. - the test sever has 4 cores.
so I thought one core had 30/4 usage = 7.5%.
I want to a container in k8s to keep the usage of CPU under 30%
so I decided that the cpu limits was 250m + 50m(extra core).
I am wondering if this way is right? otherwise, is there the best other way to decide the limit of cpu?
To answer your question, I am wondering if this way is right?, yes, that's the right way. You can also use a tool called goldilocks, a kubernetes controller that collects data about running pods and provides recommendations on how to set resource requests and limits. It could help you identify a starting point for resource requests and limits.
I'm not sure if your calculations are correct, as CPU resources are defined in millicores. If your container needs two full cores to run, you would put the value “2000m”. If your container only needs ¼ of a core, you would put a value of “250m”.
So if you want 30% of 1 of your cores then 300m is the correct amount. But if you want 30% of your 4 cores, then I would say 1200m is the correct amount.
There is kubernetes documentation about that.
You could also consider using Vertical Pod Autoscaler.
Vertical Pod Autoscaler (VPA) frees the users from necessity of setting up-to-date resource limits and requests for the containers in their pods. When configured, it will set the requests automatically based on usage and thus allow proper scheduling onto nodes so that appropriate resource amount is available for each pod. It will also maintain ratios between limits and requests that were specified in initial containers configuration.
I would also recommend to read below tutorials about limits and requests:
Setting the right requests and limits in Kubernetes
Kubernetes best practices: Resource requests and limits
I would like to define a policy to dynamically assigns resource limits to pods and containers. For example, if there are 4 number of pods scheduled in a specific node, and the memory capacity is 100mi, each pod to be assigned with 25mi memory limit. In other words, the fair share of the node capacity.
So, is it necessary to change the codes in scheduler.go or I need to change other objects as well?
I do agree with Arslanbekov answer, it's contrary to the ideology of scalability used by kubernetes.
The principle is that you define what resources is needed by your application and the cluster do all it can to give this resource to the pod, scalling the resources (pod, nodes) depending on the global consumption of all apps.
What you are asking is the reverse, give resources to the pod depending on the node resources, this way could prove very difficult to allow automatic scallability of the nodes as it would be the resource aim to attain (I may be confusing in my explanation but that shows how difficult it could be).
One way to do what you want would be to size all your pod to the same size to use 80% of the nodes but this would prove wrong if an app need more resources.
I think this is contrary to the ideology of the kubernetes. In this approach, the new application will not be able to get to the node.
At each point in time for the scheduler will be the utilization of 100% each node.
LimitRange allows us to configure these properties per resource (memory, CPU):
Limit: default maximum ammount of the resource that will be provisioned.
Request: default initial ammount of the resource that will be provisioned.
However I just realized there are two other options, min and max. Since min/max seem to overlap with request/limit, what the difference between all these properties?
I found the answer digging in the docs. Limit and Request params are overridable by the pod configurations. Min and Max enforce the values configured in the LimitRange:
Motivation for minimum and maximum memory constraints
As a cluster administrator, you might want to impose restrictions on
the amount of memory that Pods can use. For example:
Each Node in a cluster has 2 GB of memory. You do not want to accept any Pod that requests more than 2 GB of memory, because no Node
in the cluster can support the request.
A cluster is shared by your production and development departments. You want to allow production workloads to consume up to 8
GB of memory, but you want development workloads to be limited to 512
MB. You create separate namespaces for production and development, and
you apply memory constraints to each namespace.
When I resize a replication controller using kubectl, if the cluster does not have enough resource, there will have one or more pods always in pending.
Is there has any tool will auto resize GKE cluster when the resource is running out?
I had a similar requirement (for the Go build system): wanted to know when scheduled vs. available CPU or memory was > 1, and scale out nodes when that was true (or, more accurately, when it was ~.8). There's not a built-in metric, but as you suggest you can do it with a custom metric.
This was all done in Go, but it will give you the basic idea:
Create the metrics (memory and CPU, in my case
Put values to the metrics
The key takeaway IMO is that you have to iterate over each pod in the cluster to determine how much capacity is consumed, then iterate over each node in the cluster to determine how much capacity is available. It's then just a matter of pointing your autoscaler to the custom metric(s).
Big big big thing worth noting: I ultimately determined that scaling on the built-in CPU utilization metric was just as good as (if not better than, but more on that in a bit) than the custom metric. Each pod we scheduled pegged the CPU, so when pods were maxed out so was CPU. The build-in CPU utilization metric is probably better because you don't have the latency that comes with periodically putting custom metrics.
You can turn on autoscaling for the Instance Group that your GKE nodes belong to.