About k8s metrices server only some resources can be monitored - kubernetes

Version
k8s version: v1.19.0
metrics server: v0.3.6
I set up k8s cluster and metrics server, it can check nodes and pod on master node,
work node can not see, it return unknown.
NAME CPU(cores) CPU% MEMORY(bytes) MEMORY%
u-29 1160m 14% 37307Mi 58%
u-31 2755m 22% 51647Mi 80%
u-32 4661m 38% 32208Mi 50%
u-34 1514m 12% 41083Mi 63%
u-36 1570m 13% 40400Mi 62%
when the pod running on the client node, it return unable to fetch pod metrics for pod default/nginx-7764dc5cf4-c2sbq: no metrics known for pod
when the pod running one the master node, it can return cpu or memory
NAME CPU(cores) MEMORY(bytes)
nginx-7cdd6c99b8-6pfg2 0m 2Mi

This is a community wiki answer based on OP's comment posted for better visibility. Feel free to expand it.
The issue was caused by using different versions of docker on different nodes. After upgrading docker to v19.3 on both nodes and executing kubeadm reset the issue was resolved.

Generally the metrics server receives the metrics via the kubelet.
Maybe there is a problem in retrieving the information from that.
You will need to look at the follow configurations mentioned in the readme.
Configuration
Depending on your cluster setup, you may also need to change flags passed to the Metrics Server container. Most useful flags:
--kubelet-preferred-address-types - The priority of node address types used when determining an address for connecting to a particular node (default [Hostname,InternalDNS,InternalIP,ExternalDNS,ExternalIP])
--kubelet-insecure-tls - Do not verify the CA of serving certificates presented by Kubelets. For testing purposes only.
--requestheader-client-ca-file - Specify a root certificate bundle for verifying client certificates on incoming requests.
Maybe you can check below configuration changes.
--kubelet-preferred-address-types=InternalIP
--kubelet-insecure-tls
You might be able to refer this ticket to get more information.

Related

k8s service internal bench

I'm currently trying to benchmark different services deployment in a k8s cluster.
It's all here, but I'll save you the dig in.
The service itself is ultra simple, as well as the deployment (basically a http get), and I added an hpa, which work well.
The bench is running inside the same cluster, on a specific node.
The bench runs ok, everything seems to be working as expected.
If I take one of the results and give an extract here:
NAME CPU(cores) MEMORY(bytes)
bencher-deployment-cf89ddc67-bwwgb 197m 4Mi
go-induzo-deployment-d6cbc56c6-97j62 1m 7Mi
go-induzo-deployment-d6cbc56c6-c2w24 0m 7Mi
go-induzo-deployment-d6cbc56c6-jh768 0m 7Mi
go-induzo-deployment-d6cbc56c6-mfdhb 0m 6Mi
go-induzo-deployment-d6cbc56c6-mh6mt 820m 11Mi
go-induzo-deployment-d6cbc56c6-pktn4 939m 11Mi
go-induzo-deployment-d6cbc56c6-vdjjj 1m 5Mi
go-induzo-deployment-d6cbc56c6-x64jw 893m 11Mi
go-induzo-deployment-d6cbc56c6-zhsp7 0m 5Mi
go-induzo-deployment-d6cbc56c6-zvf9m 0m 5Mi
As you can notice, the hpa is triggered, and scaled to 10 pods.
But as you can notice as well, the load is only balanced between 3 pods, it's not using the other ones. It seems it can only use one pod per node, and not the other ones.
Anything I would have forgotten? Is it expected? Do I need to add a load balancer service to actually leverage all pods?
I forgot to make sure there is no Keep Alive in the benchmark tool.
in the case of wrk, which I'm using: -H "Connection: Close"

Kubectl status nodes provides different responses for equivalent clusters

I have recently started using kubectl krew (v0.3.4), which later was used to install "status" plugin (v0.4.1).
I am managing right now different clusters, and I'm checking the nodes' status. Most of the clusters answer something exactly like:
Node/[NodeName], created 25d ago linux Oracle Linux Server 7.8
(amd64), kernel 4.1.12-124.36.4.el7uek.x86_64, kubelet v1.18.2, kube-proxy v1.18.2
cpu: 0.153/7 (2%)
mem: 4.4GB/7.1GB (63%)
ephemeral-storage: 2.2GB
There is one cluster that answers, for some reason:
Node/[nodeName], created 11d ago
linux Oracle Linux Server 7.8 (amd64), kernel 4.1.12-124.26.5.el7uek.x86_64, kubelet v1.18.2, kube-proxy v1.18.2
cpu: 5, mem: 7.1GB, ephemeral-storage: 2.2GB
(Let me clarify that I'm trying to automate some resources checking and the way resources are differently displayed is quite annoying, plus the used vs total resources is exactly what I need!)
I am absolutely unable to locate the status plugin repo, and I have no idea where to go with this issue. kubectl version says that both clusters have the same server version, I'm executing the kubectl status command from my local in both cases and... I am completely out of ideas.
Does anyone know why this might be happening, or when can I go to look for answers?
To display used and total resources you can use kubectl top
Display Resource (CPU/Memory/Storage) usage.
The top command allows you to see the resource consumption for nodes or pods.
This command requires Metrics Server to be correctly configured and working on the server.
Available Commands:
node Display Resource (CPU/Memory/Storage) usage of nodes
pod Display Resource (CPU/Memory/Storage) usage of pods
Usage:
kubectl top [flags] [options]
You can also have a look at Tools for Monitoring Resources inside Kubernetes docs.
As for doing the same using Kubernetes Python Client you can use:
from kubernetes.config import load_kube_config
from kubernetes.client import CustomObjectsApi
load_kube_config()
cust = CustomObjectsApi()
cust.list_cluster_custom_object('metrics.k8s.io', 'v1beta1', 'nodes') # All node metrics
cust.list_cluster_custom_object('metrics.k8s.io', 'v1beta1', 'pods') # All Pod Metrics

Jenkins X builds fail with "The node was low on resource: [DiskPressure]."

My Jenkins X installation, mid-project, is now becoming very unstable. (Mainly) Jenkins pods are failing to start due to disk pressure.
Commonly, many pods are failing with
The node was low on resource: [DiskPressure].
or
0/4 nodes are available: 1 Insufficient cpu, 1 node(s) had disk pressure, 2 node(s) had no available volume zone.
Unable to mount volumes for pod "jenkins-x-chartmuseum-blah": timeout expired waiting for volumes to attach or mount for pod "jx"/"jenkins-x-chartmuseum-blah". list of unmounted volumes=[storage-volume]. list of unattached volumes=[storage-volume default-token-blah]
Multi-Attach error for volume "pvc-blah" Volume is already exclusively attached to one node and can't be attached to another
This may have become more pronounced with more preview builds for projects with npm and the massive node-modules directories it generates. I'm also not sure if Jenkins is cleaning up after itself.
Rebooting the nodes helps, but not for very long.
Let's approach this from the Kubernetes side.
There are few things you could do to fix this:
As mentioned by #Vasily check what is causing disk pressure on nodes. You may also need to check logs from:
kubeclt logs: kube-scheduler events logs
journalctl -u kubelet: kubelet logs
/var/log/kube-scheduler.log
More about why those logs below.
Check your Eviction Thresholds. Adjust Kubelet and Kube-Scheduler configuration if needed. See what is happening with both of them (logs mentioned earlier might be useful now). More info can be found here
Check if you got a correctly running Horizontal Pod Autoscaler: kubectl get hpa
You can use standard kubectl commands to setup and manage your HPA.
Finally, the volume related errors that you receive indicates that we might have problem with PVC and/or PV. Make sure you have your volume in the same zone as node. If you want to mount the volume to a specific container make sure it is not exclusively attached to another one. More info can be found here and here
I did not test it myself because more info is needed in order to reproduce the whole scenario but I hope that above suggestion will be useful.
Please let me know if that helped.

Kubernetes Node NotReady: ContainerGCFailed / ImageGCFailed context deadline exceeded

Worker node is getting into "NotReady" state with an error in the output of kubectl describe node:
ContainerGCFailed rpc error: code = DeadlineExceeded desc = context deadline exceeded
Environment:
Ubuntu, 16.04 LTS
Kubernetes version: v1.13.3
Docker version: 18.06.1-ce
There is a closed issue on that on Kubernetes GitHub k8 git, which is closed on the merit of being related to Docker issue.
Steps done to troubleshoot the issue:
kubectl describe node - error in question was found(root cause isn't clear).
journalctl -u kubelet - shows this related message:
skipping pod synchronization - [container runtime status check may not have completed yet PLEG is not healthy: pleg has yet to be successful]
it is related to this open k8 issue Ready/NotReady with PLEG issues
Check node health on AWS with cloudwatch - everything seems to be fine.
journalctl -fu docker.service : check docker for errors/issues -
the output doesn't show any erros related to that.
systemctl restart docker - after restarting docker, the node gets into "Ready" state but in 3-5 minutes becomes "NotReady" again.
It all seems to start when I deployed more pods to the node( close to its resource capacity but don't think that it is direct dependency) or was stopping/starting instances( after restart it is ok, but after some time node is NotReady).
Questions:
What is the root cause of the error?
How to monitor that kind of issue and make sure it doesn't happen?
Are there any workarounds to this problem?
What is the root cause of the error?
From what I was able to find it seems like the error happens when there is an issue contacting Docker, either because it is overloaded or because it is unresponsive. This is based on my experience and what has been mentioned in the GitHub issue you provided.
How to monitor that kind of issue and make sure it doesn't happen?
There seem to be no clarified mitigation or monitoring to this. But it seems like the best way would be to make sure your node will not be overloaded with pods. I have seen that it is not always shown on disk or memory pressure of the Node - but this is probably a problem of not enough resources allocated to Docker and it fails to respond in time. Proposed solution is to set limits for your pods to prevent overloading the Node.
In case of managed Kubernetes in GKE (not sure but other vendors probably have similar feature) there is a feature called node auto-repair. Which will not prevent node pressure or Docker related issue but when it detects an unhealthy node it can drain and redeploy the node/s.
If you already have resources and limits it seems like the best way to make sure this does not happen is to increase memory resource requests for pods. This will mean fewer pods per node and the actual used memory on each node should be lower.
Another way of monitoring/recognizing this could be done by SSH into the node check the memory, the processes with PS, monitoring the syslog and command $docker stats --all
I have got the same issue. I have cordoned and evicted the pods.
Rebooted the server. automatically node came into ready state.

Is it possible to add swap space on kubernetes nodes?

I am trying to add swap space on kubernetes node to prevent it from out of memory issue. Is it possible to add swap space on node (previously known as minion)? If possible what procedure should I follow and how it effects pods acceptance test?
Kubernetes doesn't support container memory swap. Even if you add swap space, kubelet will create the container with --memory-swappiness=0 (when using Docker). There have been discussions about adding support, but the proposal was not approved. https://github.com/kubernetes/kubernetes/issues/7294
Technically you can do it.
There is a broad discussion weather to give K8S users the privilege to decide enabling swap or not.
I'll first refer directly to your question and then continue with the discussion.
If you run K8S on Kubeadm and you've added swap to your nodes - follow the steps below:
1 ) Reset the current cluster setup and then add the fail-swap-on=false flag to the kubelet configuration:
kubeadm reset
echo 'Environment="KUBELET_EXTRA_ARGS=--fail-swap-on=false"' >> /etc/systemd/system/kubelet.service.d/10-kubeadm.conf
(*) If you're running on Ubuntu replace the path for the Kubelet config from etc/systemd/syste,/kubelet to /etc/default/kubelet.
2 ) Reload the service:
systemctl daemon-reload
systemctl restart kubelet
3 ) Initialize the cluster settings again and ignore the swap error:
kubeadm init --ignore-preflight-errors Swap
OR:
If you prefer working with kubeadm-config.yaml:
1 ) Add the failSwapOn flag:
---
apiVersion: kubelet.config.k8s.io/v1beta1
kind: KubeletConfiguration
failSwapOn: false # <---- Here
2 ) And run:
kubeadm init --config /etc/kubernetes/kubeadm-config.yaml --ignore-preflight-errors=Swap
Returning back to discussion weather to allow swapping or not.
From the one hand, K8S is very clear about this - Kubelet is not designed to support swap - you can see it mentioned in the Kubeadm link I shared above:
Swap disabled. You MUST disable swap in order for the kubelet to work
properly
From the other hand, you can see users reporting that there are cases where there deployments require swap enabled.
I would suggest that you first try without enabling swap.
(Not because swap is a function that the kernel can't manage, but merely because it is not recommended by Kube - probably related to the design of Kubelet).
Make sure that you are familiar with the features that K8S provides to prioritize memory of pods:
1 ) The 3 qos classes - Make sure that your high priority workloads are running with the Guaranteed (or at least Burstable) class.
2 ) Pod Priority and Preemption.
I would recommend also reading Evicting end-user Pods:
If the kubelet is unable to reclaim sufficient resource on the node,
kubelet begins evicting Pods.
The kubelet ranks Pods for eviction first by whether or not their
usage of the starved resource exceeds requests, then by Priority, and
then by the consumption of the starved compute resource relative to
the Pods' scheduling requests.
As a result, kubelet ranks and evicts Pods in the following order:
BestEffort or Burstable Pods whose usage of a starved resource exceeds its request. Such pods are ranked by Priority, and then usage
above request.
Guaranteed pods and Burstable pods whose usage is beneath requests are evicted last. Guaranteed Pods are guaranteed only when requests
and limits are specified for all the containers and they are equal.
Such pods are guaranteed to never be evicted because of another Pod's
resource consumption. If a system daemon (such as kubelet, docker, and
journald) is consuming more resources than were reserved via
system-reserved or kube-reserved allocations, and the node only has
Guaranteed or Burstable Pods using less than requests remaining, then
the node must choose to evict such a Pod in order to preserve node
stability and to limit the impact of the unexpected consumption to
other Pods. In this case, it will choose to evict pods of Lowest
Priority first.
Good luck (:
A few relevant discussions:
Kubelet/Kubernetes should work with Swap Enabled
[ERROR Swap]: running with swap on is not supported. Please disable swap
Kubelet needs to allow configuration of container memory-swap
Kubernetes 1.22 introduced swap as an alpha feature.
More at:
https://kubernetes.io/blog/2021/08/09/run-nodes-with-swap-alpha/
https://kubernetes.io/docs/concepts/architecture/nodes/#swap-memory