Kubernetes dynamic configuration of CPU resource limit - kubernetes

Kubernetes CPU manager provides a way to configure CPU resource limit statically. However, in some cases it could lead to a waste of cluster resources, for example an application could require significant CPU during its startup, later on allocated resources are not required anymore and it makes sense to optimize CPU in such case and lower CPU limit. I think Kubernetes doesn't provide a support for such scenario as of today, I am wondering if there is any workaround to address this issue, the CPU manager relies on CFS, technically wouldn't be possible to modify system configuration (cpu.cfs_quota_us for instance), dynamically after pod creation by Kubernetes using initial CPU limits?

You can use VerticalPodAutoscaler to achieve this. You'll need to define a CustomResource for this that details which pods to target and the policy to use:
example:
apiVersion: autoscaling.k8s.io/v1
kind: VerticalPodAutoscaler
metadata:
name: my-app-vpa
spec:
targetRef:
apiVersion: "apps/v1"
kind: Deployment
name: my-app
updatePolicy:
updateMode: "Auto"
More details on installing and using VPA: https://github.com/kubernetes/autoscaler/tree/master/vertical-pod-autoscaler

This post seems to give the solution or resetting the cpu limits without restarting the POD.
There is also a MultiDimentionalAutoScaler which seems to be very versatile and can handle a lot of cases that you may need to utilize

Related

Programmatic calculation of Kubernetes Limit Range

I am looking for a way to calculate appropriate Limit Range and Resource Quota settings for Kubernetes based on the sizing of our Load Test (LT) environment. The LT environment we want to keep flexible in order to play with things and I feel that's a great way to figure out how to set up the limits, etc.
I might also have a fundamental misunderstanding of how this works, so feel free to correct that.
Does anyone have a set of equations or anything that takes into account (I know it won't be an exact science, but I am looking mostly for a jumping-off point):
Container CPU
Container memory
Right now I am pulling the CPU limits requested using this (and the memory similarly, and using some nifty shell script things I found to do full calculations for me):
kubectl get pods -o=jsonpath='{range .items[*]}{.metadata.name}{"\t"}{.spec.containers[].resource.limits.cpu}{"\n"}{end}' -n my-namespace
We made sure all of our containers are explicitly making requests for CPU/memory, so that works nicely.
The machine type is based on our testing and target number of pods per node. We have nodeSelector declarations in use as we need to separate things out for some very specific needs by the services being deployed and to be able to leverage multiple machine types.
For the Limit Range I was thinking (adding 10% just for padding):
Maximum [CPU/memory] + 10% (ensuring that the machine type holds 2x that calculation) as:
apiVersion: v1
kind: LimitRange
metadata:
name: ns-my-namespace
namespace: my-namespace
spec:
limits:
- max:
cpu: [calculation_from_above]
memory: [calculation_from_above]
type: Container
For the Resource Quota I was thinking (50% to handle estimated overflow in "emergency" HPA):
Total of all CPU/memory in the Load Test environment + 50% as:
apiVersion: v1
kind: ResourceQuota
metadata:
name: ns-my-namespace
namespace: my-namespace
spec:
hard:
limits:
cpu: [calculation_from_above]
memory: [calculation_from_above]

K8S Ingress: How to limit requests in flight per pod

I am porting an application to run within k8s. I have run into an issue with ingress. I am trying to find a way to limit the number of REST API requests in flight at any given time to each backend pod managed by a deployment.
See the image below the shows the architecture.
Ingress is being managed by nginx-ingress. For a given set of URL paths, the ingress forwards the request to a service that targets a deployment of REST API backend processes. The deployment is also managed by an HPA based upon CPU load.
What I want to do is find a way to queue up ingress requests such that there are never more than X requests in flight to any pod running our API backend process. (ex. only allow 50 requests in flight at once per pod)
Does anyone know how to put a request limit in place like this?
As a bonus question, the next thing I would need to do is have the HPA monitor the request queuing and automatically scale up/down the deployment to match the number of pods to the number of requests currently being processed / queued. For example if each pod can handle 100 requests in flight at once and we currently have load levels of 1000 requests to handle, then autoscale to 10 pods.
If it is useful, I am also planning to have linkerd in place for this cluster. Perhaps it has a capability that could help.
Autoscaling in Network request requires the custom metrics. Given that you are using the NGINX ingress controller, you can first install prometheus and prometheus adaptor to export the metrics from NGINX ingress controller. By default, NGINX ingress controller has already exposed the prometheus endpoint.
The relation graph will be like this.
NGINX ingress <- Prometheus <- Prometheus Adaptor <- custom metrics api service <- HPA controller
The arrow means the calling in API. So, in total, you will have three more extract components in your cluster.
Once you have set up the custom metric server, you can scale your app based on the metrics from NGINX ingress. The HPA will look like this.
apiVersion: autoscaling/v2beta1
kind: HorizontalPodAutoscaler
metadata:
name: srv-deployment-custom-hpa
spec:
scaleTargetRef:
apiVersion: extensions/v1beta1
kind: Deployment
name: srv-deployment
minReplicas: 1
maxReplicas: 100
metrics:
- type: Pods
pods:
metricName: nginx_srv_server_requests_per_second
targetAverageValue: 100
I won't go through the actual implementation here because it will include a lot of environment specific configuration.
Once you have set that up, you can see the HPA object will show up the metrics which is pulling from the adaptor.
For the rate limiting in the Service object level, you will need a powerful service mesh to do so. Linkerd2 is designed to be lightweight so it does not ship with the function in rate limiting. You can refer to this issue under linkerd2. The maintainer rejected to implement the rate limiting in the service level. They would suggest you to do this on Ingress level instead.
AFAIK, Istio and some advanced serivce mesh provides the rate limiting function. In case you haven't deployed the linkerd as your service mesh option, you may try Istio instead.
For Istio, you can refer this document to see how to do the rate limiting. But I need to let you know that Istio with NGINX ingress may cause you a trouble. Istio is shipped with its own ingress controller. You will need to have extra work for making it work.
To conclude, if you can use the HPA with custom metrics in the number of requests, it will be the quick solution to resolve your issue in traffic control. Unless you still have a really hard time with the traffic control, you will then need to consider the Service level rate limiting.
Nginx ingress allow to have rate limiting with annotations. You may want to have a look at the limit-rps one:
nginx.ingress.kubernetes.io/limit-rps: number of requests accepted from a given IP each second. The burst limit is set to this limit multiplied by the burst multiplier, the default multiplier is 5. When clients exceed this limit, limit-req-status-code default: 503 is returned.
On top of that NGINX will queue your requests with the leaky bucket algorithm so the incoming requests will buffered in the queue with FIFO (first-in-first-out) algorithm and then consumed at limited rate. The burst value in this case defines the size of the queue which allows the request to exceed the beyond limit. When this queue become full the next requests will be rejected.
For more detailed reading about limit traffic and shaping:
Nginx rate limiting in nutshell
Rate limiting nginx
Perhaps you should consider implementing Kubernetes Service APIs
Based on the latest kubernetes docs . We can do hpa based on custom metrics.
Doc reference : https://kubernetes.io/docs/tasks/run-application/horizontal-pod-autoscale-walkthrough/
Adding code below :
apiVersion: autoscaling/v2beta2
kind: HorizontalPodAutoscaler
metadata:
name: php-apache
spec:
scaleTargetRef:
apiVersion: apps/v1
kind: Deployment
name: php-apache
minReplicas: 1
maxReplicas: 10
metrics:
type: Object
object:
metric:
name: requests-per-second
describedObject:
apiVersion: networking.k8s.io/v1beta1
kind: Ingress
name: main-route
target:
type: Value
value: 10k
What i would suggest is having an ingress resource specifically for this service (load balancing in round robing) and then if you can do autoscaling based on the number of request(you expect) * no of min replica nodes . This should do a optimal hpa .
PS I will test it out myself and comment.

Removing default CPU request and limits on GCP Kubernetes

Kubernetes on Google Cloud Platform configures a default CPU request and limit.
I make use of deamonsets and deamonset pods should use as much CPU as possible.
Manually increasing the upper limit is possible but the upper bound must be reconfigured in case of new nodes and the upper bound must be set much lower than what is available on the node in order to have rolling updates allowing pods scheduling.
This requires a lot of manual actions and some resources are just not used most of the time. Is there a way to completely remove the default CPU limit so that pods can use all available CPUs?
GKE, by default, creates a LimitRange object named limits in the default namespace looking like this:
apiVersion: v1
kind: LimitRange
metadata:
name: limits
spec:
limits:
- defaultRequest:
cpu: 100m
type: Container
So, if you want to change this, you can either edit it:
kubectl edit limitrange limits
Or you can delete it altogether:
kubectl delete limitrange limits
Note: the policies in the LimitRange objects are enforced by the LimitRanger admission controller which is enabled by default in GKE.
Limit Range is a policy to constrain resource by Pod or Container in a namespace.
A limit range, defined by a LimitRange object, provides constraints
that can:
Enforce minimum and maximum compute resources usage per Pod or Container in a namespace.
Enforce minimum and maximum storage
request per PersistentVolumeClaim in a namespace.
Enforce a ratio between request and limit for a resource in a namespace.
Set default request/limit for compute resources in a namespace and automatically inject them to Containers at runtime.
You need to find the LimitRange resource of your namespace and remove the spec.limits.default.cpu and spec.limits.defaultRequest.cpu that are defined (or simply delete the LimitRange to remove all constraints).
The resource limitation can be configured in 2 ways.
At object level:
kubectl edit limitrange limits
This object is created by default and the value is 100m (1/10 of CPU) and when a pod reach that limit, it's simply killed.
At manifest level:
Using statefulSet, DaemonSet, etc, through a yaml file and configured on
spec.containers.resources
it's look like this:
spec:
containers:
resources:
limits:
memory: 200Mi
requests:
cpu: 100m
memory: 200 Mi
As mentioned you can modify the configuration or simply delete them to remove the limitations.
However, they have some reasons why these limitations has been implemented.
I found a video from a Googler talking about it, take a look! [1]
On top of the Limit Range mentioned by Eduardo Baitello, you should also look out for admission controllers, which can intercept requests to the Kubernetes API and modify them (e.g. add limits, and other defaults).

Setting a max lifetime condition on a pod in kuberenetes

We have some weird memory leaking issues with our containers where the longer they live, the more resources they take. We do not have the resources at the moment to look into these issues (as they don't become problems for over a month) but would like to avoid manual work to "clean up" the bloated containers.
What I'd like to do is configure our deployments in such a way that "time alive" is a parameter for the state of a pod, and if it exceed a value (say a couple days) the pod is killed off and a new one is created. I'd prefer to do this entirely within kubernetes, as while we will eventually be adding a "health check" endpoint to our services, that will not be able to be done for a while.
What is the best way to implement this sort of a "max age" parameter on the healthiness of a pod? Alternatively, I guess we could trigger based off of resource usage, but it's not an issue if the use is temporary, only if the resources aren't released after a short while.
The easiest way is to put a hard resource limit on memory that is above what you would see in a temporary spike: at a level that you'd expect to see over say a couple of weeks.
It's probably a good idea to do this anyhow, as k8s will schedule workloads based on requested resources, not their limit, so you could end up with memory pressure in a node as the memory usage increases.
One problem is would if you have significant memory spikes, the pod restart where k8s kills your pod would probably happen in the middle of some workload, so you'd need to be able to absorb that effect.
So, from the documentation it would look something like this (and clearly Deployment would be preferable to a raw Pod as shown below, this example can be carried over into a PodTemplateSpec):
apiVersion: v1
kind: Pod
metadata:
name: frontend
spec:
containers:
- name: ccccc
image: theimage
resources:
requests:
memory: "64Mi"
limits:
memory: "128Mi"

Difference between API versions v2beta1 and v2beta2 in Horizontal Pod Autoscaler?

The Kubernetes Horizontal Pod Autoscaler walkthrough in https://kubernetes.io/docs/tasks/run-application/horizontal-pod-autoscale-walkthrough/ explains that we can perform autoscaling on custom metrics. What I didn't understand is when to use the two API versions: v2beta1 and v2beta2. If anybody can explain, I would really appreciate it.
Thanks in advance.
The first metrics autoscaling/V2beta1 doesn't allow you to scale your pods based on custom metrics. That only allows you to scale your application based on CPU and memory utilization of your application
The second metrics autoscaling/V2beta2 allows users to autoscale based on custom metrics. It allow autoscaling based on metrics coming from outside of Kubernetes. A new External metric source is added in this api.
metrics:
- type: Resource
resource:
name: cpu
target:
type: Utilization
averageUtilization: 50
It will identify a specific metric to autoscale on based on metric name and a label selector. Those metrics can come from anywhere like a stackdriver or prometheus monitoring application and based on some query from prometheus you want to scale your application.
It would always better to use V2beta2 api because it can do scaling on CPU and memory as well as on custom metrics, while V2beta1 API can scale only on internal metrics.
The snippet I mentioned in answer denotes how you can specify the target CPU utilisation in V2beta2 API
UPDATE: v2beta1 is deprecated in 1.19 and you should use v2beta2 going forward.
Also, v2beta2 added the new api field spec.behavior in 1.18 which allows you to define how fast or slow pods are scaled up and down.
Originally, both versions were functionally identical but had different APIs.
autoscaling/v2beta2 was released in Kubernetes version 1.12 and the release notes state:
We released autoscaling/v2beta2, which cleans up and unifies the API
The "cleans up and unifies the API" is referring to that fact that v2beta2 consistently uses the MetricIdentifier and MetricTarget objects:
spec:
metrics:
external:
metric: MetricIdentifier
target: MetricTarget
object:
describedObject: CrossVersionObjectReference
metric: MetricIdentifier
target: MetricTarget
pods:
metric: MetricIdentifier
target: MetricTarget
resource:
name: string
target: MetricTarget
type: string
In v2beta1, those fields have pretty different specs, making it (in my opinion) more difficult to figure out how to use.
How to check differences between HPA versions in general?
I would provide additional answer which I think would be also suitable for other version differences in the future.
Run kubectl api-versions and check which version your cluster is supporting.
Go to the K8S API site and comapre autoscaling versions:
MetricSpec v2beta2 autoscaling Vs MetricSpec v2beta1 autoscaling .
(*) Just notice that you're in the correct K8S version in the url:
https:// kubernetes.io/docs/reference/generated/kubernetes-api/v1.23/#metricspec-v2beta1-autoscaling
In case you need to drive the horizontal pod autoscaler with a custom external metric, and only v2beta1 is available to you (I think this is true of GKE still), we do this routinely in GKE. You need:
A stackdriver monitoring metric, possibly one you create yourself,
If the metric isn't derived from sampling Stackdriver logs, a way to publish data to the stackdriver monitoring metric, such as a cronjob that runs no more than once per minute (we use a little python script and Google's python library for monitoring_v3), and
A custom metrics adapter to expose Stackdriver monitoring to the HPA (e.g., in Google, gcr.io/google-containers/custom-metrics-stackdriver-adapter:v0.10.0). There's a tutorial on how to deploy this adapter here. You'll need to ensure that you grant the required RBAC stuff to the service account running the adapter, as shown here. You may or may not want to grant the principal that deploys the configuration cluster-admin role as described in the tutorial; we use Helm 2 w/ Tiller and are careful to grant least privilege to Tiller to deploy.
Configure your HPA this way:
kind: HorizontalPodAutoscaler
apiVersion: autoscaling/v2beta1
metadata:
...
spec:
scaleTargetRef:
kind: e.g., StatefulSet
name: name-of-pod-to-scale
apiVersion: e.g., apps/v1
minReplicas: 1
maxReplicas: ...
metrics:
type: External
external:
metricName: "custom.googleapis.com|your_metric_name"
metricSelector:
matchLabels:
resource.type: "generic_task"
resource.labels.job: ...
resource.labels.namespace: ...
resource.labels.project_id: ...
resourcel.labels.task_id: ...
targetValue: e.g., 0.7 (i.e., if you publish a metric that measures the ratio between demand and current capacity)
If you ask kubectl for your HPA object, you won't see autoscaling/v2beta1 settings, but this works well:
kubectl get --raw /apis/autoscaling/v2beta1/namespaces/your-namespace/horizontalpodautoscalers/your-autoscaler | jq
So far, we've only exercised this on GKE. It's clearly Stackdriver-specific. To the extent that Stackdriver can be deployed on other public managed k8s platforms, it might actually be portable. Or you might end up with a different way to publish a custom metric for each platform, using a different metrics publishing library in your cronjob, and a different custom metrics adapter. We know that one exists for Azure, for example.