I'm currently trying to benchmark different services deployment in a k8s cluster.
It's all here, but I'll save you the dig in.
The service itself is ultra simple, as well as the deployment (basically a http get), and I added an hpa, which work well.
The bench is running inside the same cluster, on a specific node.
The bench runs ok, everything seems to be working as expected.
If I take one of the results and give an extract here:
NAME CPU(cores) MEMORY(bytes)
bencher-deployment-cf89ddc67-bwwgb 197m 4Mi
go-induzo-deployment-d6cbc56c6-97j62 1m 7Mi
go-induzo-deployment-d6cbc56c6-c2w24 0m 7Mi
go-induzo-deployment-d6cbc56c6-jh768 0m 7Mi
go-induzo-deployment-d6cbc56c6-mfdhb 0m 6Mi
go-induzo-deployment-d6cbc56c6-mh6mt 820m 11Mi
go-induzo-deployment-d6cbc56c6-pktn4 939m 11Mi
go-induzo-deployment-d6cbc56c6-vdjjj 1m 5Mi
go-induzo-deployment-d6cbc56c6-x64jw 893m 11Mi
go-induzo-deployment-d6cbc56c6-zhsp7 0m 5Mi
go-induzo-deployment-d6cbc56c6-zvf9m 0m 5Mi
As you can notice, the hpa is triggered, and scaled to 10 pods.
But as you can notice as well, the load is only balanced between 3 pods, it's not using the other ones. It seems it can only use one pod per node, and not the other ones.
Anything I would have forgotten? Is it expected? Do I need to add a load balancer service to actually leverage all pods?
I forgot to make sure there is no Keep Alive in the benchmark tool.
in the case of wrk, which I'm using: -H "Connection: Close"
Related
Version
k8s version: v1.19.0
metrics server: v0.3.6
I set up k8s cluster and metrics server, it can check nodes and pod on master node,
work node can not see, it return unknown.
NAME CPU(cores) CPU% MEMORY(bytes) MEMORY%
u-29 1160m 14% 37307Mi 58%
u-31 2755m 22% 51647Mi 80%
u-32 4661m 38% 32208Mi 50%
u-34 1514m 12% 41083Mi 63%
u-36 1570m 13% 40400Mi 62%
when the pod running on the client node, it return unable to fetch pod metrics for pod default/nginx-7764dc5cf4-c2sbq: no metrics known for pod
when the pod running one the master node, it can return cpu or memory
NAME CPU(cores) MEMORY(bytes)
nginx-7cdd6c99b8-6pfg2 0m 2Mi
This is a community wiki answer based on OP's comment posted for better visibility. Feel free to expand it.
The issue was caused by using different versions of docker on different nodes. After upgrading docker to v19.3 on both nodes and executing kubeadm reset the issue was resolved.
Generally the metrics server receives the metrics via the kubelet.
Maybe there is a problem in retrieving the information from that.
You will need to look at the follow configurations mentioned in the readme.
Configuration
Depending on your cluster setup, you may also need to change flags passed to the Metrics Server container. Most useful flags:
--kubelet-preferred-address-types - The priority of node address types used when determining an address for connecting to a particular node (default [Hostname,InternalDNS,InternalIP,ExternalDNS,ExternalIP])
--kubelet-insecure-tls - Do not verify the CA of serving certificates presented by Kubelets. For testing purposes only.
--requestheader-client-ca-file - Specify a root certificate bundle for verifying client certificates on incoming requests.
Maybe you can check below configuration changes.
--kubelet-preferred-address-types=InternalIP
--kubelet-insecure-tls
You might be able to refer this ticket to get more information.
I apologize for my poor English.
I created 1 master-node and 1 worker-node in cluster, and deployed container (replicas:4).
then kubectl get all shows like as below. (omitted)
NAME NODE
pod/container1 k8s-worker-1.local
pod/container2 k8s-worker-1.local
pod/container3 k8s-worker-1.local
pod/container4 k8s-worker-1.local
next, I added 1 worker-node to this cluster. but all containers keep to be deployed to worker1.
ideally, I want 2 containers to stop, and start up on worker2 like as below.
NAME NODE
pod/container1 k8s-worker-1.local
pod/container2 k8s-worker-1.local
pod/container3 k8s-worker-2.local
pod/container4 k8s-worker-2.local
Do I need some commands after adding additional node?
Scheduling only happens when a pod is started. After that, it won't be moved. There are tools out there for deleting (evicting) pods when nodes get too imbalanced, but if you're just starting out I wouldn't go that far for now. If you delete your 4 pods and recreate them (or let the Deployment system recreate them as is more common in a real situation) they should end up more balanced (though possibly not 2 and 2 since the system isn't exact and spreading out is only one of the factors used in scheduling).
I want to calculate the CPU and Memory Percentage of Resource utilization of an individual pod in Kubernetes. For that, I am using metrics server API
From the metrics server, I get the utilization from this command
kubectl top pods --all-namespaces
kube-system coredns-5644d7b6d9-9whxx 2m 6Mi
kube-system coredns-5644d7b6d9-hzgjc 2m 7Mi
kube-system etcd-manhattan-master 10m 53Mi
kube-system kube-apiserver-manhattan-master 23m 257Mi
But I want the percentage utilization of individual pod Both CPU % and MEM%
From this output by top command it is not clear that from how much amount of cpu and memory it consumes the resultant amount?
I don't want to use Prometheus operator I saw one formula for it
sum (rate (container_cpu_usage_seconds_total{image!=""}[1m])) by (pod_name)
Can I calculate it with MetricsServer API?
I thought to calculate like this
CPU% = ((2+2+10+23)/ Total CPU MILLICORES)*100
MEM% = ((6+7+53+257)/AllocatableMemory)* 100
Please tell me if I right or wrong. Because I didn't see any standard formula for calculating pod utilization in Kubernetes documentation
Unfortunately kubectl top pods provides only a quantity values and not a percentages.
Here is a good explanation of how to interpret those values.
It is currently not possible to list pod resource usage in percentages with a kubectl top command.
You could still chose Grafana with Prometheus but it was already stated that you don't want to use it (however maybe another member of the community with similar problem would do so I am mentioning it here).
EDIT:
Your formulas are correct. They will calculate how much CPU/Mem is being consumed by all Pods relative to total CPU/Mem you got.
I hope it helps.
I try to get CPU/Memory usage of the k8s Cluster Nodes via metrics-server API, but I found the returned values of metrics-server is lower than actual used CPU/Memory.
The output of kubectl top command : kubectl top nodes
The following is the output of the free command, from which you could see the memory usage is great than 90%.
Why the difference is so high?
kubectl top nodes is reflecting the actual usage of your Kubernetes Nodes.
For example:
Your node has 60GB memory and you actually use 30GB so it will be 50% of usage.
But you can request for example:
100 MB and have a limit 200MB memory.
This doesn't mean you only consume 0.16% (100 / 60000) memory, but the amount of your configuration.
I know this is an old topic, but I think the problem is still remaining.
To answer simply, the kubectl top command shows ONLY the actual resource usage, and it is not related to the request/limits configurations in your manifests.
for example:
you could obtain a 400m:1Gi (cpu/memory) usage for a specifique node while total requests/limits are 1.5:4Gi (cpu/memory).
You will observe enougth available resources to schedule but actually it will not work.
requests/limits are impacting directly node resources (resources reservation) but it does not means they are completly used (what kubectl top nodes is showing).
I am trying to add swap space on kubernetes node to prevent it from out of memory issue. Is it possible to add swap space on node (previously known as minion)? If possible what procedure should I follow and how it effects pods acceptance test?
Kubernetes doesn't support container memory swap. Even if you add swap space, kubelet will create the container with --memory-swappiness=0 (when using Docker). There have been discussions about adding support, but the proposal was not approved. https://github.com/kubernetes/kubernetes/issues/7294
Technically you can do it.
There is a broad discussion weather to give K8S users the privilege to decide enabling swap or not.
I'll first refer directly to your question and then continue with the discussion.
If you run K8S on Kubeadm and you've added swap to your nodes - follow the steps below:
1 ) Reset the current cluster setup and then add the fail-swap-on=false flag to the kubelet configuration:
kubeadm reset
echo 'Environment="KUBELET_EXTRA_ARGS=--fail-swap-on=false"' >> /etc/systemd/system/kubelet.service.d/10-kubeadm.conf
(*) If you're running on Ubuntu replace the path for the Kubelet config from etc/systemd/syste,/kubelet to /etc/default/kubelet.
2 ) Reload the service:
systemctl daemon-reload
systemctl restart kubelet
3 ) Initialize the cluster settings again and ignore the swap error:
kubeadm init --ignore-preflight-errors Swap
OR:
If you prefer working with kubeadm-config.yaml:
1 ) Add the failSwapOn flag:
---
apiVersion: kubelet.config.k8s.io/v1beta1
kind: KubeletConfiguration
failSwapOn: false # <---- Here
2 ) And run:
kubeadm init --config /etc/kubernetes/kubeadm-config.yaml --ignore-preflight-errors=Swap
Returning back to discussion weather to allow swapping or not.
From the one hand, K8S is very clear about this - Kubelet is not designed to support swap - you can see it mentioned in the Kubeadm link I shared above:
Swap disabled. You MUST disable swap in order for the kubelet to work
properly
From the other hand, you can see users reporting that there are cases where there deployments require swap enabled.
I would suggest that you first try without enabling swap.
(Not because swap is a function that the kernel can't manage, but merely because it is not recommended by Kube - probably related to the design of Kubelet).
Make sure that you are familiar with the features that K8S provides to prioritize memory of pods:
1 ) The 3 qos classes - Make sure that your high priority workloads are running with the Guaranteed (or at least Burstable) class.
2 ) Pod Priority and Preemption.
I would recommend also reading Evicting end-user Pods:
If the kubelet is unable to reclaim sufficient resource on the node,
kubelet begins evicting Pods.
The kubelet ranks Pods for eviction first by whether or not their
usage of the starved resource exceeds requests, then by Priority, and
then by the consumption of the starved compute resource relative to
the Pods' scheduling requests.
As a result, kubelet ranks and evicts Pods in the following order:
BestEffort or Burstable Pods whose usage of a starved resource exceeds its request. Such pods are ranked by Priority, and then usage
above request.
Guaranteed pods and Burstable pods whose usage is beneath requests are evicted last. Guaranteed Pods are guaranteed only when requests
and limits are specified for all the containers and they are equal.
Such pods are guaranteed to never be evicted because of another Pod's
resource consumption. If a system daemon (such as kubelet, docker, and
journald) is consuming more resources than were reserved via
system-reserved or kube-reserved allocations, and the node only has
Guaranteed or Burstable Pods using less than requests remaining, then
the node must choose to evict such a Pod in order to preserve node
stability and to limit the impact of the unexpected consumption to
other Pods. In this case, it will choose to evict pods of Lowest
Priority first.
Good luck (:
A few relevant discussions:
Kubelet/Kubernetes should work with Swap Enabled
[ERROR Swap]: running with swap on is not supported. Please disable swap
Kubelet needs to allow configuration of container memory-swap
Kubernetes 1.22 introduced swap as an alpha feature.
More at:
https://kubernetes.io/blog/2021/08/09/run-nodes-with-swap-alpha/
https://kubernetes.io/docs/concepts/architecture/nodes/#swap-memory