Does TORQUE work on heteregenous clusters?
I would like to use as cluser a set of old servers I have at home, but they do not have the same characteristics (number of cpus, memory, etc..)
Torque schedules jobs to a cluster where each node has an individual set of attributes collected by the PBS agents on the nodes. Those attributes include
ncpus - number of CPU cores
physmem - amount of physical memory
totmem - amount of available memory including swap
The following two attributes are used for scheduling decisions
np - number of processing units (by default it is set to ncpus)
gpus - number of graphics accelerators
Additional boolean properties can be set by the cluster administrator to help users and administrators influence node selection. Thus if you have a cluster with nodes with different CPU count Torque can ensure that nodes will not be overloaded by matching job requirements with available cpus on the nodes.
Related
I have two machine to do the load test. One machine has worse CPU performance. And the machine will reach high CPU usage when number of users keep increasing, while the other machine still has low CPU usage. Locust complains:
[2022-07-28 11:22:15,529] PF1YW96X-MUO/WARNING/root: CPU usage above 90%! This may constrain your throughput and may even give inconsistent response time measurements! See https://docs.locust.io/en/stable/running-locust-distributed.html for how to distribute the load over multiple CPU cores or machines
[2022-07-28 11:25:06,766] PF1YW96X-MUO/WARNING/locust.runners: CPU usage was too high at some point during the test! See https://docs.locust.io/en/stable/running-distributed.html for how to distribute the load over multiple CPU cores or machines
I want to set lower weight for the machine who has worse CPU perfomance. Is there a way to do that?
You can run fewer worker processes on the weak machine. If necessary you could run more than one process per core on the strong machine, just to make it take more Users.
Resource limiting of containers in pods is typically achieved using something like below -
resources
limits
cpu "600m"
requests
cpu "400m"
As you see, absolute values are used above.
Now,
If the server/host has, say, 1 core then the total CPU computing power of the server is 1,000m. And the container is limited to 600m of computing power, which makes sense.
But, if the server/host has say 16 cores then the total CPU computing power of server is 16,000m. But the container is still restricted to 600m of computing power, which might not make complete sense in every case.
What I instead want is to define limits/requests as a percentage of host resources. Something like below.
resources
limits
cpu "60%"
requests
cpu "40%"
Is this possible in k8s either out-of-box or using any CRD's?
I have a databricks cluster setup with auto scale upto 12 nodes.
I have often observed databricks scaling cluster from 6 to 8, then 8 to 11 and then 11 to 14 nodes.
So my queries -
1. Why is it picking up 2-3 nodes to be added at one go
2. Why auto scale is triggered as I see not many jobs are active or heavy processing on cluster. CPU usage is pretty low.
3. While auto scaling why is it leaving notebook in waiting state
4. Why is it taking up to 8-10 min to auto scale
Thanks
I am trying to investigate why data bricks is auto scaling cluster when its not needed
When you create a cluster, you can either provide a fixed number of workers for the cluster or provide a minimum and maximum number of workers for the cluster.
When you provide a fixed size cluster, Databricks ensures that your cluster has the specified number of workers. When you provide a range for the number of workers, Databricks chooses the appropriate number of workers required to run your job. This is referred to as autoscaling.
With autoscaling, Databricks dynamically reallocates workers to account for the characteristics of your job. Certain parts of your pipeline may be more computationally demanding than others, and Databricks automatically adds additional workers during these phases of your job (and removes them when they’re no longer needed).
Autoscaling makes it easier to achieve high cluster utilization, because you don’t need to provision the cluster to match a workload. This applies especially to workloads whose requirements change over time (like exploring a dataset during the course of a day), but it can also apply to a one-time shorter workload whose provisioning requirements are unknown. Autoscaling thus offers two advantages:
Workloads can run faster compared to a constant-sized
under-provisioned cluster.
Autoscaling clusters can reduce overall costs compared to a
statically-sized cluster.
Databricks offers two types of cluster node autoscaling: standard and optimized.
How autoscaling behaves
Autoscaling behaves differently depending on whether it is optimized or standard and whether applied to an interactive or a job cluster.
Optimized
Scales up from min to max in 2 steps.
Can scale down even if the cluster is not idle by looking at shuffle
file state.
Scales down based on a percentage of current nodes.
On job clusters, scales down if the cluster is underutilized over
the last 40 seconds.
On interactive clusters, scales down if the cluster is underutilized
over the last 150 seconds.
Standard
Starts with adding 4 nodes. Thereafter, scales up exponentially, but
can take many steps to reach the max.
Scales down only when the cluster is completely idle and it has been
underutilized for the last 10 minutes.
Scales down exponentially, starting with 1 node.
In Kubernetes, the role of the scheduler is to seek a suitable node for the pods. So, after assigning a pod into a node, there are different pods on that node so that those pods are competing to gain resources. Therefore, for this competitive situation, how Kubernetes allocates resource? Is there any source code in Kubernetes for computing resource allocation?
I suppose you can take a look at the below articles to see if that answers your query
https://github.com/kubernetes/community/blob/master/contributors/devel/sig-scheduling/scheduler_algorithm.md#ranking-the-nodes
https://jvns.ca/blog/2017/07/27/how-does-the-kubernetes-scheduler-work/
The filtered nodes are considered suitable to host the Pod, and it is often that there are more than one nodes remaining. Kubernetes prioritizes the remaining nodes to find the "best" one for the Pod. The prioritization is performed by a set of priority functions. For each remaining node, a priority function gives a score which scales from 0-10 with 10 representing for "most preferred" and 0 for "least preferred". Each priority function is weighted by a positive number and the final score of each node is calculated by adding up all the weighted scores. For example, suppose there are two priority functions, priorityFunc1 and priorityFunc2 with weighting factors weight1 and weight2 respectively, the final score of some NodeA is:
finalScoreNodeA = (weight1 * priorityFunc1) + (weight2 * priorityFunc2)
After the scores of all nodes are calculated, the node with highest score is chosen as the host of the Pod. If there are more than one nodes with equal highest scores, a random one among them is chosen.
Currently, Kubernetes scheduler provides some practical priority functions, including:
LeastRequestedPriority: The node is prioritized based on the fraction of the node that would be free if the new Pod were scheduled onto the node. (In other words, (capacity - sum of requests of all Pods already on the node - request of Pod that is being scheduled) / capacity). CPU and memory are equally weighted. The node with the highest free fraction is the most preferred. Note that this priority function has the effect of spreading Pods across the nodes with respect to resource consumption.
CalculateNodeLabelPriority: Prefer nodes that have the specified label.
BalancedResourceAllocation: This priority function tries to put the Pod on a node such that the CPU and Memory utilization rate is balanced after the Pod is deployed.
CalculateSpreadPriority: Spread Pods by minimizing the number of Pods belonging to the same service on the same node. If zone information is present on the nodes, the priority will be adjusted so that pods are spread across zones and nodes.
CalculateAntiAffinityPriority: Spread Pods by minimizing the number of Pods belonging to the same service on nodes with the same value for a particular label.
Horizontal scaling means that we scale by adding more machines into the pool of resources. Still, there is a choice of how much power (CPU, RAM) each node in the cluster will have.
When cluster managed with Kubernetes it is extremely easy to set any CPU and memory limit for Pods. How to choose the optimal CPU and memory size for cluster nodes (or Pods in Kubernetes)?
For example, there are 3 nodes in a cluster with 1 vCPU and 1GB RAM each. To handle more load there are 2 options:
Add the 4th node with 1 vCPU and 1GB RAM
Add to each of the 3 nodes more power (e.g. 2 vCPU and 2GB RAM)
A straightforward solution is to calculate the throughput and cost of each option and choose the cheaper one. Are there any more advanced approaches for choosing the compute resources of the nodes in a cluster with horizontal scalability?
For this particular example I would go for 2x vCPU instead of another 1vCPU node, but that is mainly cause I believe running OS for anything serious on a single vCPU is just wrong. System to behave decently needs 2+ cores available, otherwise it's too easy to overwhelm that one vCPU and send the node into dust. There is no ideal algorithm for this though. It will depend on your budget, on characteristics of your workloads etc.
As a rule of thumb, don't stick to too small instances as you have a bunch of stuff that has to run on them always, regardless of their size and the more node, the more overhead. 3x 4vCpu+16/32GB RAM sounds like nice plan for starters, but again... it depends on what you want, need and can afford.
The answer is related to such performance metrics as latency and throughput:
Latency is a time interval between sending request and receiving response.
Throughput is a request processing rate (requests per second).
Latency has influence on throughput: bigger latency = less throughput.
If a business transaction consists of multiple sequential calls of the services that can't be parallelized, then compute resources (CPU and memory) has to be chosen based on the desired latency value. Adding more instances of the services (horizontal scaling) will not have any positive influence on the latency in this case.
Adding more instances of the service increases throughput allowing to process more requests in parallel (if there are no bottlenecks).
In other words, allocate CPU and memory resources so that service has desired response time and add more service instances (scale horizontally) to handle more requests in parallel.