Kubernetes scale up pod with dynamic environment variable - kubernetes

I am wondering if there is a way to set up dynamically environment variables on a scale depending on the high load.
Let's imagine that we have
Kubernetes with service called Generic Consumer which at the beginning have 4 pods. First of all
I would like to set that 75% of pods should have env variable Gold and 25% Platinium. Is that possible? (% can be changed to static number for example 3 nodes Gold, 1 Platinium)
Second question:
If Platinium pod is having a high load is there a way to configure Kubernetes/charts to scale only the Platinium and then decrease it after higher load subsided
So far I came up with creating 2 separate YAML files with diff env variables and replicas numbers.
Obviously, the whole purpose of this is to prioritize some topics
I have used this as a reference https://www.confluent.io/blog/prioritize-messages-in-kafka.
So in the above example, Generic Consumer would be the Kafka consumer which would use env variable to get bucket config
configs.put(ConsumerConfig.PARTITION_ASSIGNMENT_STRATEGY_CONFIG,
BucketPriorityAssignor.class.getName());
configs.put(BucketPriorityConfig.TOPIC_CONFIG, "orders-per-bucket");
configs.put(BucketPriorityConfig.BUCKETS_CONFIG, "Platinum, Gold");
configs.put(BucketPriorityConfig.ALLOCATION_CONFIG, "70%, 30%");
configs.put(BucketPriorityConfig.BUCKET_CONFIG, "Platinum");
consumer = new KafkaConsumer<>(configs);
If you have any alternatives, please let me know!

As was mention in comment section, the most versitale option (and probably the best for your scenario with prioritization) is to keep two separate deployments with gold and platinium labels.
Regarding first question, like #David Maze pointed, pods from Deployment are identical and you cannot have few pods with one label and few others with different. Even if you would create manually (3 pods with gold and 1 with platiunuim) you won't be able to use HPA.
This option allows you to adjustments depends on the situation. For example you would be able to scale one deployment using HPA and another with VPA or both with HPA. Would help you maintain budget, i.e for gold users you might limit to have maximum 5 pods to run simultaneously and for platinium you could set this maximum to 10.
You could consider Istio Traffic Management to routing request, but in my opinion, method with two separated deployments is more suitable.

Related

High total CPU request but low total usage (kubernetes resources)

I have a bunch of pods in a cluster that is almost requesting all (7.35/8) available CPU resources on a node:
even though their actual total usage is almost nothing (0.34/8).
The pod that is currently requesting the most only requests 210m which I guess is not an outrageous amount - also I would like to enforce some sensible minimum request size for all pods in the cluster. Of course that will accumulate when there are lots of pods.
It seems I could easily scale down the request by a factor of 10 and leave the limits where they are to begin with.
But is there something else that I should look into instead before doing that - reducing replica count etc.?
Also it looks a bit strange that the pods are not more evenly distributed between the nodes.
Your request values seems overestimated.
You need time and metrics to find the right request/limit for your workload.
Keep in mind that if you change those values, your pods will restart.
Also, It's normal that you can find some unbalance nodes on your cluster. Kubernetes will never remove a pod if you don't ask.
For example, if your create a cluster with 3 nodes, fill those 3 nodes with pods and then add another 3 nodes. The new nodes will stay empty.
You can setup some HorizontalPodAutoScaler on your cluster to adapt your number of pod to your workload.
Doing that, your workload will spread among nodes and with a correct balance. (if you use the default Scheduling Policy
I suggest following:
Resource Allocation: Based on history value set your request to meaningful value with buffer. Also to have guaranteed pod resource allocation it may be a good idea to set request and limit as same value. But that means you pod cannot burst for new resource. One more thing to note is scheduling only happens based on requested value, so if node has no more resource left, then pod will be killed and rescheduled if you request is trying to burst to limit.
Resource quotas: Check Kubernetes Resource Quotas to have sensible namespace level quotas to control overly provisioned resources by developers
Affinity/AntiAffinity: Check concept of Anti-affinity to have your replicas or different pods scheduled across your cluster. You can ensure for eg., that one host or Avalability zone etc can have only one replica of your pod (helps in HA), spread different pods to different nodes (layer scheduling etc) - Check this video
There are good answers already but I would like to add some more info.
It is very important to have a good strategy when calculating how much resources you would need for each container.
Optimally, your pods should be using exactly the amount of resources you requested but that's almost impossible to achieve. If the usage is lower than your request, you are wasting resources. If it's higher, you are risking performance issues. Consider a 25% margin up and down the request value as a good starting point. Regarding limits, achieving a good setting would depend on trying and adjusting. There is no optimal value that would fit everyone as it depends on many factors related to the application itself, the demand model, the tolerance to errors etc.
Kubernetes best practices: Resource requests and limits is a very good guide explaining the idea behind these mechanisms with a detailed explanation and examples.
Also, Managing Resources for Containers will provide you with the official docs regarding:
Requests and limits
Resource types
Resource requests and limits of Pod and Container
Resource units in Kubernetes
How Pods with resource requests are scheduled
How Pods with resource limits are run, etc
Just in case you'll need a reference.

Kuberenetes Scheduling: How would one achieve depth first scheduling behavior?

// I'm almost certain this must be a dup or at least a solved problem, but I could not find what I was after through searching the many k8 communities.
We have jobs that run for between a minute and many hours. Given that we assign them resource values that afford them QOS Guaranteed status, how could we minimize resource waste across the nodes?
The problem is that downscaling rarely happens, because each node eventually gets assigned one of the long running jobs. They are not common, but the keep all of the nodes running, even when we do not have need for them.
The dumb strategy that seems to avoid this would be a depth first scheduling algorithm, wherein among nodes that have capacity, the one most filled already will be assigned. In other words, if we have two total nodes running at 90% cpu/memory usage and 10% cpu/memory assigned, the 90% would always be assigned first provided it has sufficient capacity.
Open to any input here and/or ideas. Thanks kindly.
As of now there seems to be this kube-sheduler profile plugin:
NodeResourcesMostAllocated: Favors nodes that have a high allocation of resources.
But it is in alpha stage since k8s v1.18+, so probably not safe to use it for produciton.
There is also this parameter you can set for kube-scheduler that I have found here:
MostRequestedPriority: Favors nodes with most requested resources. This policy will fit the scheduled Pods onto the smallest number of Nodes needed to run your overall set of workloads.
and here is an example on how to configure it.
One last thing that comes to my mind is using node affinity.
Using nodeAffinity on long running pods, (more specificaly with preferredDuringSchedulingIgnoredDuringExecution), will prefer to schedule these pods on the nodes that run all the time, and prefer to not schedule them on nodes that are being autoscaled. This approach requires excluding some nodes from autoscaling and labeling approprietly so that scheduler can make use of node-affinity.

Is there a way to limit cpu usage according to my clusters current resources?

I want to limit a specific namespace to never leave less then a full cpu for my other namespaces (using a resources quota) and i have to problems in the way.:
a. Today i find the total amount of cpus using kubectl top nodes and divide the nominal cpu usage in the usage percentages, this is not accurate enough, is there a way to get my cluster total available cpu?
b. I would like to dynamically adjust to changes in the cluster (specifically added or removed nodes), a cronjob works fine but I'm looking for a way to hook major nodes changes, is there a known way to do that?
Not directly. You can set namespace level quotas easily enough but to make it that kind of dynamic thing you would need to make an operator or similar. A simple version would be a script running as a cron job which updates the Quota object.
Number of nodes CPUs is under .status.capacity.cpu in node object. You can print them e.g. with kubectl get nodes -o custom-columns=NAME:.metadata.name,CPU:.status.capacity.cpu.
Dynamicaly adding/removing nodes can be achieved using autoscaler.

How to assign resource limits dynamically?

I would like to define a policy to dynamically assigns resource limits to pods and containers. For example, if there are 4 number of pods scheduled in a specific node, and the memory capacity is 100mi, each pod to be assigned with 25mi memory limit. In other words, the fair share of the node capacity.
So, is it necessary to change the codes in scheduler.go or I need to change other objects as well?
I do agree with Arslanbekov answer, it's contrary to the ideology of scalability used by kubernetes.
The principle is that you define what resources is needed by your application and the cluster do all it can to give this resource to the pod, scalling the resources (pod, nodes) depending on the global consumption of all apps.
What you are asking is the reverse, give resources to the pod depending on the node resources, this way could prove very difficult to allow automatic scallability of the nodes as it would be the resource aim to attain (I may be confusing in my explanation but that shows how difficult it could be).
One way to do what you want would be to size all your pod to the same size to use 80% of the nodes but this would prove wrong if an app need more resources.
I think this is contrary to the ideology of the kubernetes. In this approach, the new application will not be able to get to the node.
At each point in time for the scheduler will be the utilization of 100% each node.

What to set .spec.replicas to for autoscaled deployments in kubernetes?

When creating a kubernetes deployment I set .spec.replicas to my minimum desired number of replicas. Then I create a horizontal pod autoscaler with minimum and maximum replicas.
The easiest way to do the next deployment is to use this same lower bound. When combining it with autoscaling should I set replicas to the minimum desired as used before or should I get the current number of replicas and start from there? This would involve an extra roundtrip to the api, so if it's not needed would be preferable.
There are two interpretations of your question:
1. You have an existing Deployment object and you want to update it - "deploy new version of your app".
In this case you don't need to change either replicas in Deployment object (it's managed by horizontal pod autoscaler) or horizontal pod autoscaler configuration. It will work out of the box. It's enough to change the important bits of deployment spec.
See rolling update documentation for more details.
2. You have an existing Deployment object and you want to create second one with the same application
If you create a separate application it may have a different load characteristics, so it's probably that the desired number of replicas will be different. In any case HPA will adjust it relatively quickly, so IMO setting initial number of replicas to the same number is not needed.