AutoScaling work loads without running out of memory - kubernetes

I have a number of pods running and horizontal pod auto scaler assigned to target them, the cluster I am using can also add nodes and remove nodes automatically based on current load.
BUT we recently had the cluster go offline with OOM errors and this caused a disruption in service.
Is there a way to monitor the load on each node and if usage reaches say 80% of the memory on a node, Kubernetes should not schedule more pods on that node but wait for another node to come online.

The pending pods are what one should monitor and define Resource requests which affect scheduling.
The Scheduler uses Resource requests Information when scheduling the pod
to a node. Each node has a certain amount of CPU and memory it can allocate to
pods. When scheduling a pod, the Scheduler will only consider nodes with enough
unallocated resources to meet the pod’s resource requirements. If the amount of
unallocated CPU or memory is less than what the pod requests, Kubernetes will not
schedule the pod to that node, because the node can’t provide the minimum amount
required by the pod. The new Pods will remain in Pending state until new nodes come into the cluster.
Example:
apiVersion: v1
kind: Pod
metadata:
name: requests-pod
spec:
containers:
- image: busybox
command: ["dd", "if=/dev/zero", "of=/dev/null"]
name: main
resources:
requests:
cpu: 200m
memory: 10Mi
When you don’t specify a request for CPU, you’re saying you don’t care how much
CPU time the process running in your container is allotted. In the worst case, it may
not get any CPU time at all (this happens when a heavy demand by other processes
exists on the CPU). Although this may be fine for low-priority batch jobs, which aren’t
time-critical, it obviously isn’t appropriate for containers handling user requests.

Short answer: add resources requests but don't add limits. Otherwise, you will face the throttling issue.

Related

K8s memory request handling for 2 and more pods

I am trying understand memory requests in k8s. I have observed
that when I set memory request for pod, e.g. nginx, equals 1Gi, it actually consume only 1Mi (I have checked it with kubectl top pods). My question. I have 2Gi RAM on node and set
memory requests for pod1 and pod2 equal 1.5Gi, but they actually consume only 1Mi of memory. I start pod1 and it should be started, cause node has 2Gi memory and pod1 requests only 1.5Gi. But what happens If I try to start pod2 after that? Would it be started? I am not sure, cause pod1 consumes only 1Mi of memory but has request for 1.5Gi. Do memory request of pod1 influences on execution of pod2? How k8s will rule this situation?
Memory request is the amount of memory that kubernetes holds for pod. If pod requests some amount of memory, there is a strong guarantee that it will get it. This is why you can't create pod1 with 1.5Gi and pod2 with 1.5Gi request on 2Gi node because if kubernetes would allow it and these pods start using this memory kubernetes won't be able to satisfy the requirements and this is unacceptable.
This is why sum of all pod requests running an specific node cannot exceed this specific node's memory.
"But what happens If I try to start pod2 after that? [...] How k8s
will rule this situation?"
If you have only one node with 2Gi of memory then pod2 won't start. You would see that this pod is in Pending state, waiting for resources. If you have spare resources on different node then kubernetes would schedule pod2 to this node.
Let me know if something is not clear and needs more explanation.
Request is reserved resource for a container, Limit is maximum allowed for the container to use. If you try to start two pods with 1.5Gi on a machine with 2Gi the 2nd one will not start due to the lack of resources it needs to reserve. You need to set requests lower - to the average expected consumption of the pod and some reasonable Limit (max allowed memory). It's better to get familiar with these concepts
In Kubernetes you decide on Pod/Container memory using two parameters:
spec.containers[].resources.requests.memory: Kubernetes scheduler will not schedule your Pod if there is not enough memory, this memory is also reserved for you container
spec.containers[].resources.limits.memory: Container cannot exceed this memory
If you want to be precise about the memory for you container, then you'd better set the same value for both parameters.
This is a very good article explaining by example. And here's the official doc.

kubernetes node shows limit, but no limit set

i have not configured any rangelimit or pod limit
but my nodes show requests and limits, is that a limit? or the max-seen value?
having around 20 active nodes all of them are the same hardware size - but each node shows diffrent limit with kubctl describe node nodeXX
does that mean i cannot use more than the limit?
If you check the result of kubectl describe node nodeXX again more carefully you can see that each pod has the columns: CPU Requests, CPU Limits, Memory Requests and Memory Limits. The total Requests and Limits as shown in your screenshot should be the sum of your pods requests and limits.
If you haven't configured limits for your pods then they will have 0%. However I can see in your screenshot that you have a node-exporter pod on your node. You probably also have pods in the kube-system namespace that you haven't scheduled yourself but are essential for kubernetes to work.
About your question:
does that mean i cannot use more than the limit
This article is great at explaining about requests and limits:
Requests are what the container is guaranteed to get. If a container
requests a resource, Kubernetes will only schedule it on a node that
can give it that resource.
Limits, on the other hand, make sure a container never goes above a
certain value. The container is only allowed to go up to the limit,
and then it is restricted.
For example: if your pod requests 1000Mi of memory and your node only has 500Mi of requested memory left, the pod will never be scheduled. If your pod requests 300Mi and has a limit of 1000Mi it will be scheduled, and kubernetes will try to not allocate more than 1000Mi of memory to it.
It may be OK to surpass 100% limit, specially in development environments, where we trade performance for capacity. Example:

"Limits" property ignored when deploying a container in a Kubernetes cluster

I am deploying a container in Google Kubernetes Engine with this YAML fragment:
spec:
containers:
- name: service
image: registry/service-go:latest
resources:
requests:
memory: "20Mi"
cpu: "20m"
limits:
memory: "100Mi"
cpu: "50m"
But it keeps taking 120m. Why is "limits" property being ignored? Everything else is working correctly. If I request 200m, 200m are being reserved, but limit keeps being ignored.
My Kubernetes version is 1.10.7-gke.1
I only have the default namespace and when executing
kubectl describe namespace default
Name: default
Labels: <none>
Annotations: <none>
Status: Active
No resource quota.
Resource Limits
Type Resource Min Max Default Request Default Limit Max Limit/Request Ratio
---- -------- --- --- --------------- ------------- -----------------------
Container cpu - - 100m - -
Considering Resources Request Only
The google cloud console works well, I think you have multiple containers in your pod, this is why. The value shown above is the sum of resources requests declared in your truncated YAML file. You can verify easily with kubectl.
First verify the number of containers in you pod.
kubectl describe pod service-85cc4df46d-t6wc9
Then, look the description of the node via kubectl, you should have the same informations as the console says.
kubectl describe node gke-default-pool-abcdefgh...
What is the difference between resources request and limit ?
You can imagine your cluster as a big square box. This is the total of your allocatable resources. When you drop a Pod in the big box, Kubernetes will check if there is an empty space for the requested resources of the pod (is the small box fits in the big box?). If there is enough space available, then it will schedule your workload on the selected node.
Resources limits are not taken into account by the scheduler. All is done at the kernel level with CGroups. The goal is to restrict workloads to take all the CPU or Memory on the node they are scheduled on.
If your resources requests == resources limits then, workloads cannot escape their "box" and are not able to use available CPU/Memory next to them. In other terms, your resource are guaranteed for the pod.
But, if the limits are greater than your requests, this is called overcommiting resources. You bet that all the workloads on the same node are not fully loaded at the same time (generally the case).
I recommend to not overcommiting the memory resource, do not let the pod escape the "box" in term of memory, it can leads to OOMKilling.
You can try logging into the node running your pod and run:
ps -Af | grep docker
You'll see the full command line that kubelet sends to docker. Representing the memory limit it should have something like --memory. Note that the request value for memory is only used by the Kubernetes scheduler to determine whether it has exceeded all pods/containers running on a node.
Representing the requests for CPUs you'll see the --cpu-shares flag. In this case the limit is not a hard limit but again it's a way for the Kubernetes scheduler to not allocate containers/pod passed that limit when running multiple containers/pods on a specific node. You can learn more about cpu-shares here and from the Kubernetes side here. So in essence, if you don't have enough workloads on the node, it will always go over its CPU share if it needs to and that's what you are probably seeing.
Docker has other ways of restricting the CPUs such as cpu-period/cpu-quota and cpuset-cpus but not used bu Kubernetes as of this writing. In this, I believe mesos does somehow better when dealing with CPU/memory reservations and quotas imo.
Hope it helps.

Ensuring availability in Kubernetes with high-variance memory / CPU load?

Problem: the code we're running on Kubernetes Pods have a very high variance across it's runtime; specifically, it has occasional CPU & Memory spikes when certain conditions are triggered. These triggers involve user queries with hard realtime requirements (system has to respond within <5 seconds).
Under conditions where the node serving the spiking pod doesn't have enough CPU/RAM, Kubernetes responds to these excessive requests by killing the pod altogether; which results in no output across any time whatsoever.
In what way can we ensure, that these spikes are being taken into account when pods are allocated; and more critically, that no pod shutdown happens for these reasons?
Thanks!
High availability of pods with load can be achieved in two ways:
Configuring More CPU/Memory
As the applications requires more CPU/memory during the peak times configure in such a way that allocated resources for the POD will take care of extra load. Configure the POD something like this:
resources:
requests:
memory: "64Mi"
cpu: "250m"
limits:
memory: "128Mi"
cpu: "500m"
You can increase the limits based on the usage. But this way of doing can cause two issues
1) Underutilized resources
As the resources are allocated in large number, these may go wasted unless there is a spike in the traffic.
2) Deployment failure
POD deployment may fail because of not having enough resources in the kubernetes node to cater the request.
For more info : https://kubernetes.io/docs/concepts/configuration/manage-compute-resources-container/
> Autoscaling
Ideal way of doing it is to autoscale the POD based on the traffic.
kubectl autoscale deployment <DEPLOY-APP-NAME> --cpu-percent=50 --min=1 --max=10
Configure the cpu-percent based on the requirement, else 80% by default. Min and max are the number of PODS which can be configured accordingly.
So each time a POD hits the CPU percent with 50% a new pod will be launched and continues till it launches a max of 10 PODS and same applicable for vice-versa scenario.
For more info: https://kubernetes.io/docs/tasks/run-application/horizontal-pod-autoscale-walkthrough/
Limit is a limit, it's expected to do that, period.
What you can do is either run without limit - it will then behave like in any other situation when run on the node - OOM will happen when Node, not Pod reaches memory limit. But this sounds like asking for trouble. And mind that even if you'd set a high limit it's the request that actualy guarantees some resources to pod, so even with limit of 2Gi on Pod it can OOM on 512Mi if request was 128Mi
You should design your app in a way that will not generate such spikes or that will tolerate OOMs on pods. Hard to tell what your soft does exactly, but some things that come to mind that could help cracking this are request throttling, horizontal pod autoscaler or running asynchronously with some kind of message queue.

Node not ready, pods pending

I am running a cluster on GKE and sometimes I get into a hanging state. Right now I was working with just two nodes and allowed the cluster to autoscale. One of the nodes has a NotReady status and simply stays in it. Because of that, half of my pods are Pending, because of insufficient CPU.
How I got there
I deployed a pod which has quite high CPU usage from the moment it starts. When I scaled it to 2, I noticed CPU usage was at 1.0; the moment I scaled the Deployment to 3 replicas, I expected to have the third one in Pending state until the cluster adds another node, then schedule it there.
What happened instead is the node switched to a NotReady status and all pods that were on it are now Pending.
However, the node does not restart or anything - it is just not used by Kubernetes. The GKE then thinks that there are enough resources as the VM has 0 CPU usage and won't scale up to 3.
I cannot manually SSH into the instance from console - it is stuck in the loading loop.
I can manually delete the instance and then it starts working - but I don't think that's the idea of fully managed.
One thing I noticed - not sure if related: in GCE console, when I look at VM instances, the Ready node is being used by the instance group and the load balancer (which is the service around an nginx entry point), but the NotReady node is only in use by the instance group - not the load balancer.
Furthermore, in kubectl get events, there was a line:
Warning CreatingLoadBalancerFailed {service-controller } Error creating load balancer (will retry): Failed to create load balancer for service default/proxy-service: failed to ensure static IP 104.199.xx.xx: error creating gce static IP address: googleapi: Error 400: Invalid value for field 'resource.address': '104.199.xx.xx'. Specified IP address is already reserved., invalid
I specified loadBalancerIP: 104.199.xx.xx in the definition of the proxy-service to make sure that on each restart the service gets the same (reserved) static IP.
Any ideas on how to prevent this from happening? So that if a node gets stuck in NotReady state it at least restarts - but ideally doesn't get into such state to begin with?
Thanks.
The first thing I would do is to define Resources and Limits for those pods.
Resources tell the cluster how much memory and CPU you think that the pod is going to use. You do this to help the scheduler to find the best location to run those pods.
Limits are crucial here: they are set to prevent your pods damaging the stability of the nodes. It's better to have a pod killed by an OOM than a pod bringing a node down because of resource starvation.
For example, in this case you're saying that you want 200m CPU (20%) for your pod but if for any chance it goes above 300 (30%), you want the scheduler to kill it and start a new one.
spec:
containers:
- image: nginx
imagePullPolicy: Always
name: nginx
resources:
limits:
cpu: 300m
memory: 200Mi
requests:
cpu: 200m
memory: 100Mi
You can read more here: http://kubernetes.io/docs/admin/limitrange/
For AWS I can tell. You can create dynamic scaling policies based on CPU and memory utilization.
It goes in NotReady state because of out of memory or maybe insufficient CPU. You can create a custom memory metric to collect memory metric of all the worker nodes in the cluster collectively and push it to cloudwatch.
You can follow this documentation- https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/mon-scripts.html
CPU metric is already there so no need to create it. So a memory metric will be created for you cluster.
You can now create an alarm for it when it goes above certain threshold. Now you have to go to the Auto Scaling Group through AWS console. Now you have to add a scaling policy for your autoscaling group selecting the alarm that you created and add number of instance accordingly.