Setting a max lifetime condition on a pod in kuberenetes - kubernetes

We have some weird memory leaking issues with our containers where the longer they live, the more resources they take. We do not have the resources at the moment to look into these issues (as they don't become problems for over a month) but would like to avoid manual work to "clean up" the bloated containers.
What I'd like to do is configure our deployments in such a way that "time alive" is a parameter for the state of a pod, and if it exceed a value (say a couple days) the pod is killed off and a new one is created. I'd prefer to do this entirely within kubernetes, as while we will eventually be adding a "health check" endpoint to our services, that will not be able to be done for a while.
What is the best way to implement this sort of a "max age" parameter on the healthiness of a pod? Alternatively, I guess we could trigger based off of resource usage, but it's not an issue if the use is temporary, only if the resources aren't released after a short while.

The easiest way is to put a hard resource limit on memory that is above what you would see in a temporary spike: at a level that you'd expect to see over say a couple of weeks.
It's probably a good idea to do this anyhow, as k8s will schedule workloads based on requested resources, not their limit, so you could end up with memory pressure in a node as the memory usage increases.
One problem is would if you have significant memory spikes, the pod restart where k8s kills your pod would probably happen in the middle of some workload, so you'd need to be able to absorb that effect.
So, from the documentation it would look something like this (and clearly Deployment would be preferable to a raw Pod as shown below, this example can be carried over into a PodTemplateSpec):
apiVersion: v1
kind: Pod
metadata:
name: frontend
spec:
containers:
- name: ccccc
image: theimage
resources:
requests:
memory: "64Mi"
limits:
memory: "128Mi"

Related

Programmatic calculation of Kubernetes Limit Range

I am looking for a way to calculate appropriate Limit Range and Resource Quota settings for Kubernetes based on the sizing of our Load Test (LT) environment. The LT environment we want to keep flexible in order to play with things and I feel that's a great way to figure out how to set up the limits, etc.
I might also have a fundamental misunderstanding of how this works, so feel free to correct that.
Does anyone have a set of equations or anything that takes into account (I know it won't be an exact science, but I am looking mostly for a jumping-off point):
Container CPU
Container memory
Right now I am pulling the CPU limits requested using this (and the memory similarly, and using some nifty shell script things I found to do full calculations for me):
kubectl get pods -o=jsonpath='{range .items[*]}{.metadata.name}{"\t"}{.spec.containers[].resource.limits.cpu}{"\n"}{end}' -n my-namespace
We made sure all of our containers are explicitly making requests for CPU/memory, so that works nicely.
The machine type is based on our testing and target number of pods per node. We have nodeSelector declarations in use as we need to separate things out for some very specific needs by the services being deployed and to be able to leverage multiple machine types.
For the Limit Range I was thinking (adding 10% just for padding):
Maximum [CPU/memory] + 10% (ensuring that the machine type holds 2x that calculation) as:
apiVersion: v1
kind: LimitRange
metadata:
name: ns-my-namespace
namespace: my-namespace
spec:
limits:
- max:
cpu: [calculation_from_above]
memory: [calculation_from_above]
type: Container
For the Resource Quota I was thinking (50% to handle estimated overflow in "emergency" HPA):
Total of all CPU/memory in the Load Test environment + 50% as:
apiVersion: v1
kind: ResourceQuota
metadata:
name: ns-my-namespace
namespace: my-namespace
spec:
hard:
limits:
cpu: [calculation_from_above]
memory: [calculation_from_above]

Alicloud node memory Requested/Limited/Used

We have a test cluster of 3 nodes on Alicloud
We haven't set any memory limits, per namespace or per pod
However, when looking at the nodes in the alicloud console, we see a Requested/Limited/Used set for the memory. The pods are running out of memory when the Used is over the Limited threshold
Does anyone know where this limit comes from? It seems to be different for each one of our node, so it creates an arbitrary limit per pod?
To be honest I cant find any relevant information where are your current default limits come from.. but anyway you should solve your issue somehow.
I can suggest you manually set required limits to avoid OOMs in future. What you will need to do - is to calculate approximate resource usage and correctly apply limits
This can help:
Kubernetes: Assign Memory Resources and Limits to Containers, especially this part
LimitRange for Memory
Kubernetes administrators can define RAM limits for their nodes.
These limits are enforced at higher priority over how much RAM your
Pod declares and wants to use.
Let's define our first LimitRange : 25Mi RAM as min, 200Mi as max.
nano myRAM-LimitRange.yaml
apiVersion: v1
kind: LimitRange
metadata:
name: my-ram-limit
spec:
limits:
- max:
memory: 200Mi
min:
memory: 25Mi

Kubernetes dynamic configuration of CPU resource limit

Kubernetes CPU manager provides a way to configure CPU resource limit statically. However, in some cases it could lead to a waste of cluster resources, for example an application could require significant CPU during its startup, later on allocated resources are not required anymore and it makes sense to optimize CPU in such case and lower CPU limit. I think Kubernetes doesn't provide a support for such scenario as of today, I am wondering if there is any workaround to address this issue, the CPU manager relies on CFS, technically wouldn't be possible to modify system configuration (cpu.cfs_quota_us for instance), dynamically after pod creation by Kubernetes using initial CPU limits?
You can use VerticalPodAutoscaler to achieve this. You'll need to define a CustomResource for this that details which pods to target and the policy to use:
example:
apiVersion: autoscaling.k8s.io/v1
kind: VerticalPodAutoscaler
metadata:
name: my-app-vpa
spec:
targetRef:
apiVersion: "apps/v1"
kind: Deployment
name: my-app
updatePolicy:
updateMode: "Auto"
More details on installing and using VPA: https://github.com/kubernetes/autoscaler/tree/master/vertical-pod-autoscaler
This post seems to give the solution or resetting the cpu limits without restarting the POD.
There is also a MultiDimentionalAutoScaler which seems to be very versatile and can handle a lot of cases that you may need to utilize

Kubernetes Deployment makes great use of cpu and memory without stressing it

I have deployed an app on Kubernetes and would like to test hpa.
With kubectl top nodes command, i noticed that cpu and memory are increased without stressing it.
Does it make sense?
Also while stressing deployment with apache bench, cpu and memory dont be increased enough to pass the target and make a replica.
My Deployment yaml file is so big to provide it. This is one of my containers.
- name: web
image: php_apache:1.0
imagePullPolicy: Always
resources:
requests:
memory: 50Mi
cpu: 80m
limits:
memory: 100Mi
cpu: 120m
volumeMounts:
- name: shared-data
mountPath: /var/www/html
ports:
- containerPort: 80
It consists of 15 containers
I have a VM that contains a cluster with 2 nodes (master,worker).
I would like to stress deployment so that i can see it scale up.
But here I think there is a problem! Without stressing the app, the
CPU/Memory from Pod has passed the target and 2 replicas have been made (without stressing it).
I know that the more Requests i provide to containers the less is that percentage.
But does it make sense the usage of memory/cpu to be increased from the beggining, without stressing it?
I would like, the left part of target(the usage of memory in pods), be at the beggining 0% and as much as I stress it to be increased and create replicas.
But as i'm stressing with apache bench, the value is increased by a maximum of 10%
We can see here the usage of CPU:
kubectl top pods
NAME CPU(cores) MEMORY(bytes)
x-app-55b54b6fc8-7dqjf 76m 765Mi
!!59% is the usage of memory from the pod and is described by Sum of Memory Requests/Memory(usage of memory). In my case 59% = 765Mi/1310Mi
HPA yaml file:
apiVersion: autoscaling/v2beta2
kind: HorizontalPodAutoscaler
metadata:
name: hpa
spec:
maxReplicas: 10
minReplicas: 1
scaleTargetRef:
apiVersion: apps/v1
kind: Deployment
name: myapp
metrics:
- type: Resource
resource:
name: memory
target:
type: Utilization
averageUtilization: 35
With kubectl top nodes command, i noticed that cpu and memory are increased without stressing it. Does it make sense?
Yes, it makes sense. If you will check Google Cloud about Requests and Limits
Requests and limits are the mechanisms Kubernetes uses to control resources such as CPU and memory. Requests are what the container is guaranteed to get. If a container requests a resource, Kubernetes will only schedule it on a node that can give it that resource. Limits, on the other hand, make sure a container never goes above a certain value. The container is only allowed to go up to the limit, and then it is restricted.
But does it make sense the usage of memory/cpu to be increased from the beggining, without stressing it?
Yes as, for example your container www it can start with memory: 50Mi and cpu: 80m but its allowed to increase to memory: 100Mi and cpu: 120m. Also as you mentioned you have 15 containers in total, so depends on their request, limits it can reach more than 35% of your memory.
In HPA documentation - algorithm-details you can find information:
When a targetAverageValue or targetAverageUtilization is specified, the currentMetricValue is computed by taking the average of the given metric across all Pods in the HorizontalPodAutoscaler's scale target. Before checking the tolerance and deciding on the final values, we take pod readiness and missing metrics into consideration, however.
All Pods with a deletion timestamp set (i.e. Pods in the process of being shut down) and all failed Pods are discarded.
If a particular Pod is missing metrics, it is set aside for later; Pods with missing metrics will be used to adjust the final scaling amount.
Not sure about last question:
!!59% is the usage of memory from the pod and is described by Sum of Memory Requests/Memory(usage of memory). In my case 59% = 765Mi/1310Mi
In your HPA you set to create another pod when averageUtilization: will reach 35% of memory. It reached 59% and it created another pod. As HPA target is memory, HPA is not counting CPU at all. Also please keep in mind as this is average it needs about ~1 minute to change values.
For better understanding how HPA is working, please try this walkthrough.
If this was not helpful, please clarify what are you exact asking.

Openshift ResourceQuota usage rather than reservation?

Running Openshift 3.11 with project ResourceQuotas and LimitRanges enforced, I am trying to understand how I can utilise the entire of my project CPU quota based on the "actual current usage" rather than what I have "reserved".
As a simple example, if my question is not clear:
If I have a project with a ResourceQuota of 2 Core CPU
apiVersion: v1
kind: ResourceQuota
metadata:
name: compute-resources
spec:
hard:
limits.cpu: "2"
I have a number of long running containers which are often idle, waiting for requests. So are not actually using much CPU. When requests start appearing I want the effected container to be able to "burst", allowing CPU usage up to the remaining CPU quota available in the project based on what is actually being used (I have no issue with the 100ms CFS resolution).
I need to enforce the maximum the project can have in total, hence the limits.cpu ResourceQuota. But, I must therefore also provide the limits.cpu for each container I create (explicitly or via LimitRange defaults) e.g:
...
spec:
containers:
...
resources:
limits:
cpu: "2"
requests:
cpu: 200m
This however will only work with the first container I create - the second container with the same settings will exceed the project quotas limits.cpu. But the container is just idle doing almost nothing after it's initial startup sequence.
Is it not possible in my scenario above to have it deallocate 200m from the quota for each container based on the request.cpu and burst up to 1800m? ( 1600m of 2000m quota unused + initial 200m requested )
I have read through the following, the overcommit link seemed promising, but I am still stuck.
https://docs.openshift.com/container-platform/3.11/admin_guide/quota.html
https://docs.openshift.com/container-platform/3.11/admin_guide/limits.html
https://docs.openshift.com/container-platform/3.11/admin_guide/overcommit.html
Is what I am trying to do possible?
I am trying to understand how I can utilise the entire of my project CPU quota based on the "actual current usage" rather than what I have "reserved"
You can't. If your quota is on limit.cpu then the cluster admin doesn't want you to burst higher than that value.
If you can get your cluster admin to set your quota differently, to have a low request.cpu quota, and a higher limit.cpu quota, you might be able to size your containers as you'd like.
The other option is to use low limits, and a Horizontal Pod Autoscaler to scale up the number of pods for a specific service that is getting a burst in traffic.