ECS show metrics as the cluster level and at the service level. At the services level the Memory utilization is showing above 100% How is that possible.
The CPU and Memory settings of the tasks in the servics.
Related
My AKS cluster is always low in utilisation. I can see its using 20-30% of CPU/Memory and using extra nodes and never scaling down on the nodes. Its running an airflow instance which can spike utilization to 60% when running DAGs for short bursts.
How do I make sure it doesn't use extra nodes so we can manage costs effectively?
Following link would be helpful -> https://learn.microsoft.com/en-us/azure/aks/cluster-autoscaler
We are running Spring Boot application on Rancher Kubernetes. The Kubernetes POD ran out of threads, but never breached the limits of CPU or memory (less than 80%). Since the CPU and memory limits are never breached, HPA never kicked in since they are set to CPU resource limits. The rancher POD became unresponsive and never recovered. Are there any resource settings on threads pools to avoid this failure in the future.
If your application suffers from thread starvation before HPA kicks in, increase the number of web server threads by setting server.tomcat.threads.max (default is 200)
Alternatively, decrease resource allocation to your application in K8s manifest.
I'm trying to figure out why a GKE "Workload" CPU usage is not equivalent to the sum of cpu usage of its pods.
Following image shows a Workload CPU usage.
Service Workload CPU Usage
Following images show pods CPU usage for the above Workload.
Pod #1 CPU Usage
Pod #2 CPU Usage
For example, at 9:45, the Workload cpu usage was around 3.7 cores, but at the same time Pod#1 CPU usage was around 0.9 cores and Pod#2 CPU usage was around 0.9 cores too. It means, the service Workload CPU Usage should have been around 1.8 cores, but it wasn't.
Does anyone have an idea of this behavior?
Thanks.
On your VM, the node managed by Kubernetes, you have the deployed pods (that you manage) but also several services that run on it for the supervision, the management, the logs ingestion,... A basic description here
You can see all these basic services by performing this command kubctl get all --namespace kube-system.
If you have installed additional components, like Istio or Knative, you have additional services and namespaces. All of these get a part of the resources of the node.
Danny,
The CPU chart on the Workloads page is an aggregate of CPU usage for managed pods. The values are taken from the Stackdriver Monitoring metric container/cpu/usage_time, check this link. That metric represents "Cumulative CPU usage on all cores in seconds. This number divided by the elapsed time represents usage as a number of cores, regardless of any core limit that might be set."
Please let me know if you have further questions in regard to this.
I suspect this is a bug in the UI. There is no actual metric for deployment CPU usage. Stackdriver Monitoring only collects data on container, pod, and node level metrics thus the only really reliable metrics in this case are the ones for pod CPU usage.
The graph for the total deployment CPU usage is likely meant to be a sum of all the pods metrics calculated and then presented to you. It is not as reliable as the pod or container metrics since it is not a direct metric.
If you are seeing this discrepancy consistently, I recommend opening a UI bug report through the Google Public Issue Tracker to report this to the GCP Engineers.
I've set up Prometheus and Grafana for tracking and monitoring of my Kubernetes cluster.
I've set up 3 Nodes for my cluster.
I have 26 pods running (mostly monitoring namespace).
I have one major Node app (deployment) running and right now there isn't any load.
I'm trying to understand these graph metrics. However I can't understand why there's such high CPU cores usage despite there being no load on the app.
Here's a grafana screenshot
24% memory usage I can understand as there are Kubernetes processes running as well such as kube-system etc.
And it's also telling me my cluster can support 330 pods (currently 26). I'm only worried about high cpu cores. Can anybody explain it.
82% is not the CPU usage of the processes but the ratio of requested to allocatable resources (2.31 / 2.82 = 0.819 --> ~82%).
This means that out of your 2.82 available (allocatable CPUs), you requested (allocated) about 82% for pods in the monitoring namespace but that does not mean they actually use that much CPU.
To see actual CPU usage, look at metrics like container_cpu_usage_seconds_total (per container CPU usage) or maybe even process_cpu_seconds_total (per process CPU usage).
I have to set resource limits for my kubernetes apps, and they use the "milicore" unity "m".
When analyzing my apps in Datadog, I see a unity called M% for CPU usage.
How do I convert 1.5M% to m?
Kubernetes resources: http://kubernetes.io/docs/user-guide/compute-resources/
This is not correct graph to detect correct resource limit. You graph shows CPU usage of your app in the cluster, but resource limit is per pod (container). We (and you as well) don't know from the graph how many containers were up and running. You can determinate right CPU limit from the container CPU usage graph(s). You will need Datadog-Docker integration:
Please be aware that Kubernetes relies on Heapster to report metrics,
rather than the cgroup file directly. The collection interval for
Heapster is unknown which can lead to innacurate time-related data,
such as CPU usage. If you require more precise metrics, we recommend
using the Datadog-Docker Integration.
Then it will depends how Datadog measure CPU utilization per container. If container CPU utilization has max 100%, then 100% CPU container utilization ~ 1000m ~ 1.
I recommend you to read how and when cgroup limits CPU - https://access.redhat.com/documentation/en-US/Red_Hat_Enterprise_Linux/6/html/Resource_Management_Guide/sec-cpu.html
You will need a deep knowledge to set proper CPU limits. If you don't need to prioritize any container, then IMHO the best practice is to set 1 (resources.requests.cpu) for all your containers - they will have always equal CPU times.