Using Horizontal Pod Autoscaling along with resource requests and limits - kubernetes

Say we have the following deployment:
apiVersion: apps/v1
kind: Deployment
metadata:
...
spec:
replicas: 2
template:
spec:
containers:
- image: ...
...
resources:
requests:
cpu: 100m
memory: 50Mi
limits:
cpu: 500m
memory: 300Mi
And we also create a HorizontalPodAutoscaler object which automatically scales up/down the number of pods based on CPU average utilization. I know that the HPA will compute the number of pods based on the resource requests, but what if I want the containers to be able to request more resources before scaling horizontally?
I have two questions:
1) Are resource limits even used by K8s when a HPA is defined?
2) Can I tell the HPA to scale based on resource limits rather than requests? Or as a means of implementing such a control, can I set the targetUtilization value to be more than 100%?

No, HPA is not looking at limits at all. You can specify target utilization to any value even higher than 100%.

Hi in deployment we have resources requests and limits. As per documentation here those parameters acts before HPA gets main role as autoscaler:
When you create a Pod, the Kubernetes scheduler selects a node for
the Pod to run on. Each node has a maximum capacity for each of the
resource types: the amount of CPU and memory it can provide for
Pods.
Then the kubelet starts a Container of a Pod, it passes the CPU and memory limits to the container runtime.
If a Container exceeds its memory limit, it might be terminated. If it is restartable, the kubelet will restart it, as with any other type of runtime failure.
If a Container exceeds its memory request, it is likely that its Pod will be evicted whenever the node runs out of memory.
On the other hand:
The Horizontal Pod Autoscaler is implemented as a control loop, with a period controlled by the controller manager’s (with default value of 15 seconds).
The controller manager queries the resource utilization against the metrics specified in each HorizontalPodAutoscaler definition.
Note:
Please note that if some of the pod’s containers do not have the relevant resource request set, CPU utilization for the pod will not be defined and the autoscaler will not take any action for that metric.
Hope this help

Related

Trying to understand the meaning of averageUtilization in Kubernetes autoscaling

The docs says:
For per-pod resource metrics (like CPU), the controller fetches the metrics from the resource metrics API for each Pod targeted by the HorizontalPodAutoscaler. Then, if a target utilization value is set, the controller calculates the utilization value as a percentage of the equivalent resource request on the containers in each Pod. If a target raw value is set, the raw metric values are used directly. The controller then takes the mean of the utilization or the raw value (depending on the type of target specified) across all targeted Pods, and produces a ratio used to scale the number of desired replicas.
Assume I have a Pod with:
resources:
limits:
cpu: "0.3"
memory: 500M
requests:
cpu: "0.01"
memory: 40M
and now I have an autoscaling definition as:
type: Resource
resource:
name: cpu
target:
type: Utilization
averageUtilization: 60
Which according to the docs:
With this metric the HPA controller will keep the average utilization of the pods in the scaling target at 60%. Utilization is the ratio between the current usage of resource to the requested resources of the pod
So, I'm not understanding something here. If request is the minimum resources required to run the app, how would scaling be based on this value? 60% of 0.01 is nothing, and the service would be constantly scaling.
Your misunderstanding might be that the value of request is not necessarily the minimum your app need to run.
It is what you (the developer, admin, DevOps) request from the Kubernetes cluster for a pod in your application to run and it helps the scheduler to pick the right node for your workload (say the in one that has sufficient resources available). So, don't pick this value too small or too high.
Apart from that, autoscaling works as you described it. In this case, the cluster calculates how much of your requested CPU is used and will scale out when more than 60% are in use. Keep in mind, that Kubernetes does not look at every single pod but on the average of all pods in that group.
For example, given two pods running, one pod could run on 100% of requests and the other one at (almost) 0%. The average would be around 50% so no autoscaling happens in the case of the Horizontal Pod Autoscaler.
In production, I personally try to do a guess on the right values and then look at the metrics and adjust the values to my real-world workload. Prometheus is your friend or at least the metrics server:
https://github.com/prometheus-operator/kube-prometheus
https://github.com/kubernetes-sigs/metrics-server

Kubernetes: cpu request and total resources doubts

For better understand my doubts, I will put an example
Example:
We have one worker node with 3 allocatable cpus and kubernetes has scheduled three pods on it:
pod_1 with 500m cpu request
pod_2 with 700m cpu request
pod_3 with 300m cpu request
In this worker node I can't schedule other pods.
But if I check the real usage:
pod_1 cpu usage is 300m
pod_2: cpu usage is 600m
My question is:
Can pod_3 have a real usage of 500m or the request of other pods will limit the cpu usage?
Thanks
Pietro
It doesn't matter what the real usage is - the "request" means how much resources are guaranteed to be available for the pod. Your workload might be using only a fraction of the requested resources - but what will really count is the "request" itself.
Example - Let's say you have a node with 1CPU core.
Pod A - 100m Request
Pod B - 200m Request
Pod C - 700m Request
Now, no pod can be allocated in the node - because the whole 1 CPU resource is already requested by 3 pods. It doesn't really matter which fraction of the allocated resources each pod is using at any given time.
Another point worth noting is the "Limit". A requested resource usage could be surpassed by a workload - but it cannot surpass the "Limit". This is a very important mechanism to be understood.
Kubernetes will schedule the pods based on the request that you configure for the container(s) of pod (via the specs for the respective Deployment or other kinds).
Here's an example:
For simplicity, let's assume only one container for the pod.
containers:
- name: "foobar"
resources:
requests:
cpu: "300m"
memory: "256Mi"
limits:
cpu: "500m"
memory: "512Mi"
If you ask for 300 millicpus as your request, Kubernetes will place the pod on a node that has at least 300 millicpus allocatable to that pod. If a node has less allocatable CPU available, the pod will not be placed on that node. Similarly, you can also set the value for memory request as well.
The limit works to limit the resource use by the container. In the example above, Kubernetes will evict the pod if the container ends up using more than 512MiB of memory; once evicted, the pod will be placed on a node that has at least 300 millicpus available (and if no such node exists, the pod will remain in Pending state with FailedScheduling as the reason, until a node with sufficient capacity is available).
Do note, that the resource request works only at the time of pod scheduling, and not at runtime (meaning, the actual consumption of the resources will not trigger a re-scheduling of the pod even if the container used more resources than what it requested as long as it remains below the limit, if specified).
https://kubernetes.io/docs/concepts/configuration/manage-resources-containers/#how-pods-with-resource-requests-are-scheduled
So, in summary,
The total of all your requests is used as the what can be allocated regardless of the actual runtime utilization of your pod (as long as the limit is not crossed)
You can request for 300 millicpus, but only use 100 millicpus, or 400 millicpus; Kubernetes will still show the "allocated" value as 300
If your container crosses the limit, it will get evicted by Kubernetes

AutoScaling work loads without running out of memory

I have a number of pods running and horizontal pod auto scaler assigned to target them, the cluster I am using can also add nodes and remove nodes automatically based on current load.
BUT we recently had the cluster go offline with OOM errors and this caused a disruption in service.
Is there a way to monitor the load on each node and if usage reaches say 80% of the memory on a node, Kubernetes should not schedule more pods on that node but wait for another node to come online.
The pending pods are what one should monitor and define Resource requests which affect scheduling.
The Scheduler uses Resource requests Information when scheduling the pod
to a node. Each node has a certain amount of CPU and memory it can allocate to
pods. When scheduling a pod, the Scheduler will only consider nodes with enough
unallocated resources to meet the pod’s resource requirements. If the amount of
unallocated CPU or memory is less than what the pod requests, Kubernetes will not
schedule the pod to that node, because the node can’t provide the minimum amount
required by the pod. The new Pods will remain in Pending state until new nodes come into the cluster.
Example:
apiVersion: v1
kind: Pod
metadata:
name: requests-pod
spec:
containers:
- image: busybox
command: ["dd", "if=/dev/zero", "of=/dev/null"]
name: main
resources:
requests:
cpu: 200m
memory: 10Mi
When you don’t specify a request for CPU, you’re saying you don’t care how much
CPU time the process running in your container is allotted. In the worst case, it may
not get any CPU time at all (this happens when a heavy demand by other processes
exists on the CPU). Although this may be fine for low-priority batch jobs, which aren’t
time-critical, it obviously isn’t appropriate for containers handling user requests.
Short answer: add resources requests but don't add limits. Otherwise, you will face the throttling issue.

"Limits" property ignored when deploying a container in a Kubernetes cluster

I am deploying a container in Google Kubernetes Engine with this YAML fragment:
spec:
containers:
- name: service
image: registry/service-go:latest
resources:
requests:
memory: "20Mi"
cpu: "20m"
limits:
memory: "100Mi"
cpu: "50m"
But it keeps taking 120m. Why is "limits" property being ignored? Everything else is working correctly. If I request 200m, 200m are being reserved, but limit keeps being ignored.
My Kubernetes version is 1.10.7-gke.1
I only have the default namespace and when executing
kubectl describe namespace default
Name: default
Labels: <none>
Annotations: <none>
Status: Active
No resource quota.
Resource Limits
Type Resource Min Max Default Request Default Limit Max Limit/Request Ratio
---- -------- --- --- --------------- ------------- -----------------------
Container cpu - - 100m - -
Considering Resources Request Only
The google cloud console works well, I think you have multiple containers in your pod, this is why. The value shown above is the sum of resources requests declared in your truncated YAML file. You can verify easily with kubectl.
First verify the number of containers in you pod.
kubectl describe pod service-85cc4df46d-t6wc9
Then, look the description of the node via kubectl, you should have the same informations as the console says.
kubectl describe node gke-default-pool-abcdefgh...
What is the difference between resources request and limit ?
You can imagine your cluster as a big square box. This is the total of your allocatable resources. When you drop a Pod in the big box, Kubernetes will check if there is an empty space for the requested resources of the pod (is the small box fits in the big box?). If there is enough space available, then it will schedule your workload on the selected node.
Resources limits are not taken into account by the scheduler. All is done at the kernel level with CGroups. The goal is to restrict workloads to take all the CPU or Memory on the node they are scheduled on.
If your resources requests == resources limits then, workloads cannot escape their "box" and are not able to use available CPU/Memory next to them. In other terms, your resource are guaranteed for the pod.
But, if the limits are greater than your requests, this is called overcommiting resources. You bet that all the workloads on the same node are not fully loaded at the same time (generally the case).
I recommend to not overcommiting the memory resource, do not let the pod escape the "box" in term of memory, it can leads to OOMKilling.
You can try logging into the node running your pod and run:
ps -Af | grep docker
You'll see the full command line that kubelet sends to docker. Representing the memory limit it should have something like --memory. Note that the request value for memory is only used by the Kubernetes scheduler to determine whether it has exceeded all pods/containers running on a node.
Representing the requests for CPUs you'll see the --cpu-shares flag. In this case the limit is not a hard limit but again it's a way for the Kubernetes scheduler to not allocate containers/pod passed that limit when running multiple containers/pods on a specific node. You can learn more about cpu-shares here and from the Kubernetes side here. So in essence, if you don't have enough workloads on the node, it will always go over its CPU share if it needs to and that's what you are probably seeing.
Docker has other ways of restricting the CPUs such as cpu-period/cpu-quota and cpuset-cpus but not used bu Kubernetes as of this writing. In this, I believe mesos does somehow better when dealing with CPU/memory reservations and quotas imo.
Hope it helps.

how can i found that the pod was recreate by out of limits memory

If a pod run out of limits memory, which define as follows:
resources:
limits:
memory: 80Gi
cpu: 10
The kubernetes will recreate the pod but how can I found that the pod was recreate by out of limits memory?
Any logs record this situation?
The simplest way is to use Heapster for monitoring cluster resource usage.
Using a Grafana setup with InfluxDB as storage backends for Heapster gives you the CPU and Memory usage of the entire cluster, individual pods and containers.
When a Pod gets restarted due to reaching memory limit, you should see a sawtooth wave on memory graph for this pod.
More useful information about monitoring tools and how to set it up can be found here.