I'm looking for a way to find:
The current usage of CPU and RAM of each pod running.
The configured CPU and RAM of each pod.
One side is to identify the resource usage, and the other is to identify if it was patched manually or via the deploy YAML.
You can deploy a metrics-server on your cluster to get resources usage:
Metrics Server is a scalable, efficient source of container resource
metrics for Kubernetes built-in autoscaling pipelines [...] Metrics API can also be accessed by kubectl top [...]
Then you can use kubectl top to view current resources usage. e.g.:
$ kubectl top pods --all-namespaces
NAMESPACE NAME CPU(cores) MEMORY(bytes)
kube-system coredns-74ff55c5b-vgfzw 5m 13Mi
kube-system etcd-minikube 32m 46Mi
kube-system ingress-nginx-controller-65cf89dc4f-crrr9 6m 204Mi
kube-system kube-apiserver-minikube 99m 295Mi
kube-system kube-controller-manager-minikube 32m 53Mi
kube-system kube-proxy-9mfb9 0m 23Mi
kube-system kube-scheduler-minikube 4m 17Mi
kube-system metrics-server-56c4f8c9d6-48rdd 1m 12Mi
kube-system storage-provisioner 2m 9Mi
You can kubectl describe nodes to get an overview of requests/limits configurations for pods running on each node. e.g.:
Non-terminated Pods: (13 in total)
Namespace Name CPU Requests CPU Limits Memory Requests Memory Limits AGE
--------- ---- ------------ ---------- --------------- ------------- ---
default my-nginx-5b56ccd65f-txkfg 0 (0%) 0 (0%) 0 (0%) 0 (0%) 4m48s
default my-nginx-5b56ccd65f-wkhms 0 (0%) 0 (0%) 0 (0%) 0 (0%) 4m48s
kube-system coredns-74ff55c5b-vgfzw 100m (0%) 0 (0%) 70Mi (0%) 170Mi (1%) 4d
kube-system etcd-minikube 100m (0%) 0 (0%) 100Mi (0%) 0 (0%) 17h
kube-system ingress-nginx-controller-65cf89dc4f-crrr9 100m (0%) 0 (0%) 90Mi (0%) 0 (0%) 3d23h
kube-system kube-apiserver-minikube 250m (2%) 0 (0%) 0 (0%) 0 (0%) 17h
kube-system kube-controller-manager-minikube 200m (1%) 0 (0%) 0 (0%) 0 (0%) 4d
kube-system kube-proxy-9mfb9 0 (0%) 0 (0%) 0 (0%) 0 (0%) 4d
kube-system kube-scheduler-minikube 100m (0%) 0 (0%) 0 (0%) 0 (0%) 4d
kube-system metrics-server-56c4f8c9d6-48rdd 0 (0%) 0 (0%) 0 (0%) 0 (0%) 4m18s
kube-system my-nginx-5b56ccd65f-96n7v 0 (0%) 0 (0%) 0 (0%) 0 (0%) 3d23h
kube-system my-nginx-5b56ccd65f-sm7w5 0 (0%) 0 (0%) 0 (0%) 0 (0%) 3d23h
kube-system storage-provisioner 0 (0%) 0 (0%) 0 (0%) 0 (0%) 4d
0 means no request/limits defined.
The first part of your question is answered with the kubectl top command.
The second part is here
You specify the initial cpu and memory and the max cpu and memory in the pod spec.
spec:
containers:
- name: cpu-demo-ctr
image: vish/stress
resources:
limits:
cpu: "1"
memory: "400Mi"
requests:
cpu: "0.5"
memory: "200Mi"
There is a guide in the Kubernetes documentation here :
enter link description here
Related
I'm using an AKS cluster running with K8s v1.16.15.
I'm following this simple example to assign some cpu to a pod and it does not work.
https://kubernetes.io/docs/tasks/configure-pod-container/assign-cpu-resource/
After applying this yaml file for the request,
apiVersion: v1
kind: Pod
metadata:
name: cpu-demo
namespace: cpu-example
spec:
containers:
- name: cpu-demo-ctr
image: vish/stress
resources:
limits:
cpu: "1"
requests:
cpu: "0.5"
args:
- -cpus
- "2"
If I try Kubectl describe pod... I get the following:
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Warning FailedScheduling <unknown> default-scheduler 0/1 nodes are available: 1 Insufficient cpu.
But CPUs seems available, if I run kubectl top nodes, I get:
CPU(cores) CPU% MEMORY(bytes) MEMORY%
702m 36% 4587Mi 100%
Maybe it is related to some AKS configuration but I can figure it out.
Do you have an idea of what is happening?
Thanks a lot in advance!!
Kubernetes will decide where the pod can schedule on using node allocatable resources, not real resource usages. You can see your node allocatable resource using kubectl describe node <your node name>. Refer Capacity and Allocatable for more details. As I see the events logs, 0/1 nodes are available: 1 Insufficient cpu., you have just one worker node and the node has not cpu resource enough to run your pod with requests.cpu: "0.5". Pod scheduling is based on requests resource size, not limits one.
The previous answer well explains the reasons why this could happen. What can be added is that while scheduling pods that has request you have to be aware of the resources that your other cluster objects consumes. System objects also use your resources. Even with small cluster you may have enabled some addon that will consume node resources.
So your node has a certain amount of CPU and memory it can allocate to pods. While scheduling the Scheduler will only take into consideration nodes with enough unallocated resources to meet your desired requests.
If the amount of unallocated CPU or memory is less than what the pod requests, Kubernetes will not schedule the pod to that node, because the node can’t provide the minimum amount required by the pod.
If you describe your node you will see the pods that are already running and consuming your resources and all allocated resources:
Namespace Name CPU Requests CPU Limits Memory Requests Memory Limits AGE
--------- ---- ------------ ---------- --------------- ------------- ---
default elasticsearch-master-0 1 (25%) 1 (25%) 2Gi (13%) 4Gi (27%) 8d
default test-5487d9b57b-4pz8v 0 (0%) 0 (0%) 0 (0%) 0 (0%) 27d
kube-system coredns-66bff467f8-rhbnj 100m (2%) 0 (0%) 70Mi (0%) 170Mi (1%) 35d
kube-system etcd-minikube 0 (0%) 0 (0%) 0 (0%) 0 (0%) 16d
kube-system httpecho 0 (0%) 0 (0%) 0 (0%) 0 (0%) 34d
kube-system ingress-nginx-controller-69ccf5d9d8-rbdf8 100m (2%) 0 (0%) 90Mi (0%) 0 (0%) 34d
kube-system kube-apiserver-minikube 250m (6%) 0 (0%) 0 (0%) 0 (0%) 16d
kube-system kube-controller-manager-minikube 200m (5%) 0 (0%) 0 (0%) 0 (0%) 35d
kube-system kube-scheduler-minikube 100m (2%) 0 (0%) 0 (0%) 0 (0%) 35d
kube-system traefik-ingress-controller-78b4959fdf-8kp5k 0 (0%) 0 (0%) 0 (0%) 0 (0%) 34d
Allocated resources:
(Total limits may be over 100 percent, i.e., overcommitted.)
Resource Requests Limits
-------- -------- ------
cpu 1750m (43%) 1 (25%)
memory 2208Mi (14%) 4266Mi (28%)
ephemeral-storage 0 (0%) 0 (0%)
hugepages-1Gi 0 (0%) 0 (0%)
hugepages-2Mi 0 (0%) 0 (0%)
Now the most important part is what you can do about that:
You can enable autoscaling so that system automatically provision node and extra needed resources. This of course assumes that you ran out of resources and you need more
You can provision appropriate node by yourself (depending on how did you bootstrap your cluster)
Turn off any addon services that might taking desired resources that you don`t need
Is there any way to list all PODs that are using the most CPU on the node using kubectl command. I could not see this in the official documentation.
You can get by using
kubectl top pods # This will give you which pod is using how much CPU and Memory
kubectl top nodes # This will give you which node is using how much CPU and Memory
Make sure metric server has deployed on the cluster.
To know which pod scheduled on a specific node has most CPU requests you can describe that node and check the Non-terminated Pods section.
kubectl describe node masternode
Non-terminated Pods: (8 in total)
Namespace Name CPU Requests CPU Limits Memory Requests Memory Limits AGE
--------- ---- ------------ ---------- --------------- ------------- ---
kube-system calico-kube-controllers-76d4774d89-vmsnf 0 (0%) 0 (0%) 0 (0%) 0 (0%) 30d
kube-system calico-node-t4qzr 250m (12%) 0 (0%) 0 (0%) 0 (0%) 30d
kube-system coredns-66bff467f8-v9mn5 100m (5%) 0 (0%) 70Mi (1%) 170Mi (4%) 30d
kube-system etcd-ip-10-0-0-38 0 (0%) 0 (0%) 0 (0%) 0 (0%) 30d
kube-system kube-apiserver-ip-10-0-0-38 250m (12%) 0 (0%) 0 (0%) 0 (0%) 30d
kube-system kube-controller-manager-ip-10-0-0-38 200m (10%) 0 (0%) 0 (0%) 0 (0%) 30d
kube-system kube-proxy-nf7jp 0 (0%) 0 (0%) 0 (0%) 0 (0%) 30d
kube-system kube-scheduler-ip-10-0-0-38 100m (5%) 0 (0%) 0 (0%) 0 (0%) 30d
If the cluster have metrics server deployed then below commands are useful to know pod and node CPU utilization
kubectl top podname
kubectl top nodename
For nodes that have many pods across multiple namespaces I use an alias in .bash_profile. Outputs the cpu and memory for all pods on given node.
kntp () {
for p in `kubectl get pods --all-namespaces --field-selector spec.nodeName=$1` | grep -v "Completed" | tail -n +2 | awk '{print $2}'`; do
kubectl top pod --all-namespaces --field-selector metadata.name=$p | tail -n +2
done
}
Run it like
source ~/.bash_profile
kntp my-node-name-here
You can use:
kubectl top pods --all-namespaces --sort-by=cpu
To find the CPU and memory usage of all the pods among all available namespaces.
The CPU (cores) is the CPU usage:
338m means 338 millicpu. 1000m is equal to 1 CPU, hence 338m means 33.8% of 1 CPU.
I’m trying to scheduling GPU in Kubernetes v1.13.1 and I followed the guide in https://kubernetes.io/docs/tasks/manage-gpus/scheduling-gpus/#deploying-nvidia-gpu-device-plugin
But the gpu resources doesn't show up when I run
kubectl get nodes -o yaml, according to this post, I checked the Nvidia gpu device plugin.
I run:
kubectl create -f https://raw.githubusercontent.com/NVIDIA/k8s-device-plugin/v1.11/nvidia-device-plugin.yml
several times and the result is
Error from server (AlreadyExists): error when creating "https://raw.githubusercontent.com/NVIDIA/k8s-device-plugin/v1.11/nvidia-device-plugin.yml": daemonsets.extensions "nvidia-device-plugin-daemonset" already exists
It seems that I have installed the NVIDIA Device Plugin? But the result of kubectl get pods --all-namespaces is
NAMESPACE NAME READY STATUS RESTARTS AGE
kube-system calico-node-qdhvd 2/2 Running 0 65m
kube-system coredns-78d4cf999f-fk4wl 1/1 Running 0 68m
kube-system coredns-78d4cf999f-zgfvl 1/1 Running 0 68m
kube-system etcd-liuqin01 1/1 Running 0 67m
kube-system kube-apiserver-liuqin01 1/1 Running 0 67m
kube-system kube-controller-manager-liuqin01 1/1 Running 0 67m
kube-system kube-proxy-l8p9p 1/1 Running 0 68m
kube-system kube-scheduler-liuqin01 1/1 Running 0 67m
When I run kubectl describe node, gpu is not in the the allocatable resource
Non-terminated Pods: (9 in total)
Namespace Name CPU Requests CPU Limits Memory Requests Memory Limits AGE
--------- ---- ----------- - ---------- --------------- ------------- ---
kube-system calico-node-qdhvd 250m (2%) 0 (0%) 0 (0%) 0 (0%) 18h
kube-system coredns-78d4cf999f-fk4wl 100m (0%) 0 (0%) 70Mi (0%) 170Mi (1%) 19h
kube-system coredns-78d4cf999f-zgfvl 100m (0%) 0 (0%) 70Mi (0%) 170Mi (1%) 19h
kube-system etcd-liuqin01 0 (0%) 0 (0%) 0 (0%) 0 (0%) 19h
kube-system kube-apiserver-liuqin01 250m (2%) 0 (0%) 0 (0%) 0 (0%) 19h
kube-system kube-controller-manager-liuqin01 200m (1%) 0 (0%) 0 (0%) 0 (0%) 19h
kube-system kube-proxy-l8p9p 0 (0%) 0 (0%) 0 (0%) 0 (0%) 19h
kube-system kube-scheduler-liuqin01 100m (0%) 0 (0%) 0 (0%) 0 (0%) 19h
kube-system nvidia-device-plugin-daemonset-p78wz 0 (0%) 0 (0%) 0 (0%) 0 (0%) 26m
Allocated resources:
(Total limits may be over 100 percent, i.e., overcommitted.)
Resource Requests Limits
-------- -------- ------
cpu 1 (8%) 0 (0%)
memory 140Mi (0%) 340Mi (2%)
ephemeral-storage 0 (0%) 0 (0%)
As lianyouCat mentioned in the comments:
After installing nvidia-docker2, the default runtime of docker should be modified to nvidia docker as github.com/NVIDIA/k8s-device-plugin#preparing-your-gpu-nodes.
After modifying the /etc/docker/daemon.json, you need to restart docker so that the configuration works.
I have multiple Node.js apps / Services running on Google Kubernetes Engine (GKE), Actually 8 pods are running. I didnot set up resources limit when I created the pods so now I'm getting CPU Unscheduled error.
I understand I have to set up resource limits. From what I know, 1 CPU / Node = 1000Mi ? My question is,
1) what's the ideal resource limit I should set up? Like the minimum? for a Pod that's rarely used, can I set up 20Mi? or 50Mi?
2) How many Pods are ideal to run on a single Kubernetes Node? Right now I have 2 Nodes set up which I want to reduce to 1.
3) what do people use in Production? and for development Cluster?
Here are my Nodes
Node 1:
Namespace Name CPU Requests CPU Limits Memory Requests Memory Limits
--------- ---- ------------ ---------- --------------- -------------
default express-gateway-58dff8647-f2kft 100m (10%) 0 (0%) 0 (0%) 0 (0%)
default openidconnect-57c48dc448-9jmbn 100m (10%) 0 (0%) 0 (0%) 0 (0%)
default web-78d87bdb6b-4ldsv 100m (10%) 0 (0%) 0 (0%) 0 (0%)
kube-system event-exporter-v0.1.9-5c8fb98cdb-tcd68 0 (0%) 0 (0%) 0 (0%) 0 (0%)
kube-system fluentd-gcp-v2.0.17-mhpgb 100m (10%) 0 (0%) 200Mi (7%) 300Mi (11%)
kube-system kube-dns-5df78f75cd-6hdfv 260m (27%) 0 (0%) 110Mi (4%) 170Mi (6%)
kube-system kube-dns-autoscaler-69c5cbdcdd-2v2dj 20m (2%) 0 (0%) 10Mi (0%) 0 (0%)
kube-system kube-proxy-gke-qp-cluster-default-pool-7b00cb40-6z79 100m (10%) 0 (0%) 0 (0%) 0 (0%)
kube-system kubernetes-dashboard-7b89cff8-9xnsm 50m (5%) 100m (10%) 100Mi (3%) 300Mi (11%)
kube-system l7-default-backend-57856c5f55-k9wgh 10m (1%) 10m (1%) 20Mi (0%) 20Mi (0%)
kube-system metrics-server-v0.2.1-7f8dd98c8f-5z5zd 53m (5%) 148m (15%) 154Mi (5%) 404Mi (15%)
Allocated resources:
(Total limits may be over 100 percent, i.e., overcommitted.)
CPU Requests CPU Limits Memory Requests Memory Limits
------------ ---------- --------------- -------------
893m (95%) 258m (27%) 594Mi (22%) 1194Mi (45%)
Node 2:
Namespace Name CPU Requests CPU Limits Memory Requests Memory Limits
--------- ---- ------------ ---------- --------------- -------------
default kube-healthcheck-55bf58578d-p2tn6 100m (10%) 0 (0%) 0 (0%) 0 (0%)
default pubsub-function-675585cfbf-2qgmh 100m (10%) 0 (0%) 0 (0%) 0 (0%)
default servicing-84787cfc75-kdbzf 100m (10%) 0 (0%) 0 (0%) 0 (0%)
kube-system fluentd-gcp-v2.0.17-ptnlg 100m (10%) 0 (0%) 200Mi (7%) 300Mi (11%)
kube-system heapster-v1.5.2-7dbb64c4f9-bpc48 138m (14%) 138m (14%) 301656Ki (11%) 301656Ki (11%)
kube-system kube-dns-5df78f75cd-89c5b 260m (27%) 0 (0%) 110Mi (4%) 170Mi (6%)
kube-system kube-proxy-gke-qp-cluster-default-pool-7b00cb40-9n92 100m (10%) 0 (0%) 0 (0%) 0 (0%)
Allocated resources:
(Total limits may be over 100 percent, i.e., overcommitted.)
CPU Requests CPU Limits Memory Requests Memory Limits
------------ ---------- --------------- -------------
898m (95%) 138m (14%) 619096Ki (22%) 782936Ki (28%)
My plan is to move all this into 1 Node.
According to kubernetes official documentation
1) You can go low in terms of memory and CPU, but you need to give enough CPU and memory to pods to function properly. I have gone as low as to CPU 100 and Memory 200 (It is highly dependent on the application you're running also the number of replicas)
2) There should not be 100 pods per node (This is the extreme case)
3) Production cluster are not of single node in any case. This is a very good read around kubernetes in production
But keep in mind, if you increase the number of pod on single node, you might need to increase the size (in terms of resources) of node.
Memory and CPU usage tends to grow proportionally with size/load on cluster
Here is the official documentation stating the requirements
https://kubernetes.io/docs/setup/cluster-large/
How do I interpret the memory usage returned by "kubectl top node". E.g. if it returns:
NAME CPU(cores) CPU% MEMORY(bytes) MEMORY%
ip-XXX.ec2.internal 222m 11% 3237Mi 41%
ip-YYY.ec2.internal 91m 9% 2217Mi 60%
By comparison, if I look in the Kubernetes dashboard for the same node, I get:
Memory Requests: 410M / 7.799 Gi
kubernetes dashboard
How do I reconcile the difference?
kubectl top node is reflecting the actual usage to the VM(nodes), and k8s dashboard is showing the percentage of limit/request you configured.
E.g. Your EC2 instance has 8G memory and you actually use 3237MB so it's 41%. In k8s, you only request 410MB(5.13%), and have a limit of 470MB memory. This doesn't mean you only consume 5.13% memory, but the amount configured.
Namespace Name CPU Requests CPU Limits Memory Requests Memory Limits
--------- ---- ------------ ---------- --------------- -------------
default kube-lego 20m (2%) 0 (0%) 0 (0%) 0 (0%)
default mongo-0 100m (10%) 0 (0%) 0 (0%) 0 (0%)
default web 100m (10%) 0 (0%) 0 (0%) 0 (0%)
kube-system event-exporter- 0 (0%) 0 (0%) 0 (0%) 0 (0%)
kube-system fluentd-gcp-v2.0-z6xh9 100m (10%) 0 (0%) 200Mi (11%) 300Mi (17%)
kube-system heapster-v1.4.0-3405140848-k6cm9 138m (13%) 138m (13%) 301456Ki (17%) 301456Ki (17%)
kube-system kube-dns-3809445927-hn5xk 260m (26%) 0 (0%) 110Mi (6%) 170Mi (9%)
kube-system kube-dns-autoscaler-38801 20m (2%) 0 (0%) 10Mi (0%) 0 (0%)
kube-system kube-proxy-gke-staging-default- 100m (10%) 0 (0%) 0 (0%) 0 (0%)
kube-system kubernetes-dashboard-1962351 100m (10%) 100m (10%) 100Mi (5%) 300Mi (17%)
kube-system l7-default-backend-295440977 10m (1%) 10m (1%) 20Mi (1%) 20Mi (1%)
Here you see many pods with 0 request/limit means unlimited, which didn't count in k8s dashboard but definitely consume memory.
Sum up the memory request/limit you'll find they match k8s dashboard.