Is there a way to limit cpu usage according to my clusters current resources? - kubernetes

I want to limit a specific namespace to never leave less then a full cpu for my other namespaces (using a resources quota) and i have to problems in the way.:
a. Today i find the total amount of cpus using kubectl top nodes and divide the nominal cpu usage in the usage percentages, this is not accurate enough, is there a way to get my cluster total available cpu?
b. I would like to dynamically adjust to changes in the cluster (specifically added or removed nodes), a cronjob works fine but I'm looking for a way to hook major nodes changes, is there a known way to do that?

Not directly. You can set namespace level quotas easily enough but to make it that kind of dynamic thing you would need to make an operator or similar. A simple version would be a script running as a cron job which updates the Quota object.

Number of nodes CPUs is under .status.capacity.cpu in node object. You can print them e.g. with kubectl get nodes -o custom-columns=NAME:.metadata.name,CPU:.status.capacity.cpu.
Dynamicaly adding/removing nodes can be achieved using autoscaler.

Related

Kubernetes scale up pod with dynamic environment variable

I am wondering if there is a way to set up dynamically environment variables on a scale depending on the high load.
Let's imagine that we have
Kubernetes with service called Generic Consumer which at the beginning have 4 pods. First of all
I would like to set that 75% of pods should have env variable Gold and 25% Platinium. Is that possible? (% can be changed to static number for example 3 nodes Gold, 1 Platinium)
Second question:
If Platinium pod is having a high load is there a way to configure Kubernetes/charts to scale only the Platinium and then decrease it after higher load subsided
So far I came up with creating 2 separate YAML files with diff env variables and replicas numbers.
Obviously, the whole purpose of this is to prioritize some topics
I have used this as a reference https://www.confluent.io/blog/prioritize-messages-in-kafka.
So in the above example, Generic Consumer would be the Kafka consumer which would use env variable to get bucket config
configs.put(ConsumerConfig.PARTITION_ASSIGNMENT_STRATEGY_CONFIG,
BucketPriorityAssignor.class.getName());
configs.put(BucketPriorityConfig.TOPIC_CONFIG, "orders-per-bucket");
configs.put(BucketPriorityConfig.BUCKETS_CONFIG, "Platinum, Gold");
configs.put(BucketPriorityConfig.ALLOCATION_CONFIG, "70%, 30%");
configs.put(BucketPriorityConfig.BUCKET_CONFIG, "Platinum");
consumer = new KafkaConsumer<>(configs);
If you have any alternatives, please let me know!
As was mention in comment section, the most versitale option (and probably the best for your scenario with prioritization) is to keep two separate deployments with gold and platinium labels.
Regarding first question, like #David Maze pointed, pods from Deployment are identical and you cannot have few pods with one label and few others with different. Even if you would create manually (3 pods with gold and 1 with platiunuim) you won't be able to use HPA.
This option allows you to adjustments depends on the situation. For example you would be able to scale one deployment using HPA and another with VPA or both with HPA. Would help you maintain budget, i.e for gold users you might limit to have maximum 5 pods to run simultaneously and for platinium you could set this maximum to 10.
You could consider Istio Traffic Management to routing request, but in my opinion, method with two separated deployments is more suitable.

Kuberenetes Scheduling: How would one achieve depth first scheduling behavior?

// I'm almost certain this must be a dup or at least a solved problem, but I could not find what I was after through searching the many k8 communities.
We have jobs that run for between a minute and many hours. Given that we assign them resource values that afford them QOS Guaranteed status, how could we minimize resource waste across the nodes?
The problem is that downscaling rarely happens, because each node eventually gets assigned one of the long running jobs. They are not common, but the keep all of the nodes running, even when we do not have need for them.
The dumb strategy that seems to avoid this would be a depth first scheduling algorithm, wherein among nodes that have capacity, the one most filled already will be assigned. In other words, if we have two total nodes running at 90% cpu/memory usage and 10% cpu/memory assigned, the 90% would always be assigned first provided it has sufficient capacity.
Open to any input here and/or ideas. Thanks kindly.
As of now there seems to be this kube-sheduler profile plugin:
NodeResourcesMostAllocated: Favors nodes that have a high allocation of resources.
But it is in alpha stage since k8s v1.18+, so probably not safe to use it for produciton.
There is also this parameter you can set for kube-scheduler that I have found here:
MostRequestedPriority: Favors nodes with most requested resources. This policy will fit the scheduled Pods onto the smallest number of Nodes needed to run your overall set of workloads.
and here is an example on how to configure it.
One last thing that comes to my mind is using node affinity.
Using nodeAffinity on long running pods, (more specificaly with preferredDuringSchedulingIgnoredDuringExecution), will prefer to schedule these pods on the nodes that run all the time, and prefer to not schedule them on nodes that are being autoscaled. This approach requires excluding some nodes from autoscaling and labeling approprietly so that scheduler can make use of node-affinity.

How to assign resource limits dynamically?

I would like to define a policy to dynamically assigns resource limits to pods and containers. For example, if there are 4 number of pods scheduled in a specific node, and the memory capacity is 100mi, each pod to be assigned with 25mi memory limit. In other words, the fair share of the node capacity.
So, is it necessary to change the codes in scheduler.go or I need to change other objects as well?
I do agree with Arslanbekov answer, it's contrary to the ideology of scalability used by kubernetes.
The principle is that you define what resources is needed by your application and the cluster do all it can to give this resource to the pod, scalling the resources (pod, nodes) depending on the global consumption of all apps.
What you are asking is the reverse, give resources to the pod depending on the node resources, this way could prove very difficult to allow automatic scallability of the nodes as it would be the resource aim to attain (I may be confusing in my explanation but that shows how difficult it could be).
One way to do what you want would be to size all your pod to the same size to use 80% of the nodes but this would prove wrong if an app need more resources.
I think this is contrary to the ideology of the kubernetes. In this approach, the new application will not be able to get to the node.
At each point in time for the scheduler will be the utilization of 100% each node.

Kubernetes "nice" a pod's CPU usage

I have a cluster w/ 3 nodes. Hurray! The cluster doesn't autoscale nodes.
These nodes run an amazing web app, yet most of the time do almost nothing.
I also have a background process that could use an infinite amount of CPU (the usefulness drops rapidly but remains useful).
I want these background pods to run on each Node and slowed down to leave a 20% CPU headroom on the Node. Or similar.
That's the shape of a DaemonSet.
Can I tell Kubernetes to deprioritize the DaemonSet Pods w/ a 20% headroom?
Can the DaemonSet Pods detect the Nodes CPU usage and deprioritize themselves (risky if buggy)?
QoS looks like it's for scheduling and evicting pods to make room for other pods, but they don't get 'niced'.
Priority also looks like it's for eviction.
You may achieve what you're looking for in many ways.
I imagine that you've already read this and that, based on the theory of this other.
Also RedHat has nice documentation about setting hardware limits via softwarre.
Here you can find how to restrict cpu usage, which may be set inside a container to achieve what you're looking for.
So, to recap: with K8S you can set requests and limits, and inside the container you can set even further restrictive limits.
Hope this gives you the solution or at least the path to follow in order to achieve what you want.

Is there any tool for GKE nodes autoscaling base on total pods requested in kubernetes?

When I resize a replication controller using kubectl, if the cluster does not have enough resource, there will have one or more pods always in pending.
Is there has any tool will auto resize GKE cluster when the resource is running out?
I had a similar requirement (for the Go build system): wanted to know when scheduled vs. available CPU or memory was > 1, and scale out nodes when that was true (or, more accurately, when it was ~.8). There's not a built-in metric, but as you suggest you can do it with a custom metric.
This was all done in Go, but it will give you the basic idea:
Create the metrics (memory and CPU, in my case
Put values to the metrics
The key takeaway IMO is that you have to iterate over each pod in the cluster to determine how much capacity is consumed, then iterate over each node in the cluster to determine how much capacity is available. It's then just a matter of pointing your autoscaler to the custom metric(s).
Big big big thing worth noting: I ultimately determined that scaling on the built-in CPU utilization metric was just as good as (if not better than, but more on that in a bit) than the custom metric. Each pod we scheduled pegged the CPU, so when pods were maxed out so was CPU. The build-in CPU utilization metric is probably better because you don't have the latency that comes with periodically putting custom metrics.
You can turn on autoscaling for the Instance Group that your GKE nodes belong to.