Pods and nodes Cpu usages in kubernates - kubernetes

Is Total cpu usages of a node sum up of all pods running in specific nodes.what is the relation between millicpu in cpu cores and % of cpu usages of node.Is request and limit control the cpu usages of pods if so then if a pods cpu usages reach its limit then it will be killed and move other node or continues execution in similar node with maximum limit.

Millicores is an absolute number (one core divided by 1000). A given node typically has multiple cores, so the relation between the number of millicores and the total percentage varies. For example 1000 millicores (one core) would be 25% on a four core node, but 50% on a node with two cores.
Request determines how much cpu a pod is guaranteed. It will not be scheduled on a node unless the node can deliver that much.
Limit determines how much a pod can get. It will not be killed or moved if it exceeds the limit; it is simply not allowed to exceed the limit.

Related

Kubernetes - Sum of Pods CPU Utilization Limit

We have pods running with same label in multiple namespaces. Cpu utilization of all pods should not cross licensed vCPU,say X-vCPUR
We do not want to limit through SUM(POD cpu Limit) < X-vCPU as we want to give flexibility to apps to burst, as license constraints is on sum of all pod utilization. Limiting through CPU Limit ,reduces number of pods that can be deployed and all pods will not burst at same time.
How can we achieve this? any insights help full, thanks in advance.
Note: There are other applications running with different labels, which does not consider for core usage.
Limiting through CPU limit
Nodes start up ,these apps burst at same time and takes more vCpu,we would like to limit combined pods utilization burst to below a specified limit.

How to setup Kubernetes HPA to scale based on maximum available memory in a given pod?

I’d like to autoscale the pods not based on the average memory, but rather based on largest amount of available memory in a given pod.
Example:
Let’s say the target maximum available memory is 50%.
If we have 7 pods already and 6 of them have 90% occupied memory, but a single pod with 40% occupied memory, that’d satisfy my criteria and we won’t need to upscale. But the moment that last pod goes below 50% available memory we’ll upscale.
I know it’s not a wise criteria for scaling in case majority of case, but in my particular circumstance, it fits.

How is CPU usage calculated in Grafana?

Here's an image taken from Grafana which shows the CPU usage, requests and limits, as well as throttling, for a pod running in a K8s cluster.
As you can see, the CPU usage of the pod is very low compared to the requests and limits. However, there is around 25% of CPU throttling.
Here are my questions:
How is the CPU usage (yellow) calculated?
How is the ~25% CPU throttling (green) calculated?
How is it possible that the CPU throttles when the allocated resources for the pod are so much higher than the usage?
Extra info:
The yellow is showing container_cpu_usage_seconds_total.
The green is container_cpu_cfs_throttled_periods_total / container_cpu_cfs_periods_total

Query on kubernetes metrics-server metrics values

I am using metrics-server(https://github.com/kubernetes-incubator/metrics-server/) to collect the core metrics from containers in a kubernetes cluster.
I could fetch 2 resource usage metrics per container.
cpu usage
memory usage
However its not clear to me whether
these metrics are accumulated over time or they are already sampled for a particular time window(1 minute/ 30 seconds..)
What are the units for the above metric values. For CPU usage, is it the number of cores or milliseconds? And for memory usage i assume its the bytes usage.
While computing CPU usage metric value, does metrics-server already take care of dividing the container usage by the host system usage?
Also, if i have to compare these metrics with the docker-api metrics, how to compute CPU usage % for a given container?
Thanks!
Metrics are scraped periodically from kubelets. The default resolution duration is 60s, which can be overriden with the --metric-resolution=<duration> flag.
The value and unit (cpu - cores in decimal SI, memory - bytes in binary SI) are arrived at by using the Quantity serializer in the k8s apimachinery package. You can read about it from the comments in the source code
No, the CPU metric is not relative to the host system usage as you can see that it's not a percentage value. It represents the rate of change of the total amount of CPU seconds consumed by the container by core. If this value increases by 1 within one second, the pod consumes 1 CPU core (or 1000 milli-cores) in that second.
To arrive at a relative value, depending on your use case, you can divide the CPU metric for a pod by that for the node, since metrics-server exposes both /pods and /nodes endpoints.

Choosing the compute resources of the nodes in the cluster with horizontal scaling

Horizontal scaling means that we scale by adding more machines into the pool of resources. Still, there is a choice of how much power (CPU, RAM) each node in the cluster will have.
When cluster managed with Kubernetes it is extremely easy to set any CPU and memory limit for Pods. How to choose the optimal CPU and memory size for cluster nodes (or Pods in Kubernetes)?
For example, there are 3 nodes in a cluster with 1 vCPU and 1GB RAM each. To handle more load there are 2 options:
Add the 4th node with 1 vCPU and 1GB RAM
Add to each of the 3 nodes more power (e.g. 2 vCPU and 2GB RAM)
A straightforward solution is to calculate the throughput and cost of each option and choose the cheaper one. Are there any more advanced approaches for choosing the compute resources of the nodes in a cluster with horizontal scalability?
For this particular example I would go for 2x vCPU instead of another 1vCPU node, but that is mainly cause I believe running OS for anything serious on a single vCPU is just wrong. System to behave decently needs 2+ cores available, otherwise it's too easy to overwhelm that one vCPU and send the node into dust. There is no ideal algorithm for this though. It will depend on your budget, on characteristics of your workloads etc.
As a rule of thumb, don't stick to too small instances as you have a bunch of stuff that has to run on them always, regardless of their size and the more node, the more overhead. 3x 4vCpu+16/32GB RAM sounds like nice plan for starters, but again... it depends on what you want, need and can afford.
The answer is related to such performance metrics as latency and throughput:
Latency is a time interval between sending request and receiving response.
Throughput is a request processing rate (requests per second).
Latency has influence on throughput: bigger latency = less throughput.
If a business transaction consists of multiple sequential calls of the services that can't be parallelized, then compute resources (CPU and memory) has to be chosen based on the desired latency value. Adding more instances of the services (horizontal scaling) will not have any positive influence on the latency in this case.
Adding more instances of the service increases throughput allowing to process more requests in parallel (if there are no bottlenecks).
In other words, allocate CPU and memory resources so that service has desired response time and add more service instances (scale horizontally) to handle more requests in parallel.