k8s dashboard: Metric client health check failed - kubernetes

I install the k8s dashboard use the following command:
kubectl apply -f https://raw.githubusercontent.com/kubernetes/dashboard/v2.0.4/aio/deploy/recommended.yaml
then I watch the log of dashboard pod:
$ kubectl -n kubernetes-dashboard logs -f kubernetes-dashboard-665f4c5ff-wcrj9
2020/09/12 04:19:10 Metric client health check failed: an error on the server ("unknown") has prevented the request from succeeding (get services dashboard-metrics-scraper). Retrying in 30 seconds.
2020/09/12 04:19:43 Metric client health check failed: an error on the server ("unknown") has prevented the request from succeeding (get services dashboard-metrics-scraper). Retrying in 30 seconds.
2020/09/12 04:20:17 Metric client health check failed: an error on the server ("unknown") has prevented the request from succeeding (get services dashboard-metrics-scraper). Retrying in 30 seconds.
2020/09/12 04:20:50 Metric client health check failed: an error on the server ("unknown") has prevented the request from succeeding (get services dashboard-metrics-scraper). Retrying in 30 seconds.
2020/09/12 04:21:23 Metric client health check failed: an error on the server ("unknown") has prevented the request from succeeding (get services dashboard-metrics-scraper). Retrying in 30 seconds.
2020/09/12 04:21:56 Metric client health check failed: an error on the server ("unknown") has prevented the request from succeeding (get services dashboard-metrics-scraper). Retrying in 30 seconds.
2020/09/12 04:22:29 Metric client health check failed: an error on the server ("unknown") has prevented the request from succeeding (get services dashboard-metrics-scraper). Retrying in 30 seconds.
kubeadm version: 1.19
kubectl version: 1.19
Can anyone help me?

To give a bit of background information: once you install the Kubernetes Dashboard you install a Pod that provides the Dashboard as well as a Pod that is in charge of scraping Metrics from the Kubernetes Metrics API, the Dashboard Metrics Scraper. The dashboard delegates to the scraper, expecting to address it via its K8s Service: "dashboard-metrics-scraper".
In your case, this service can't be found. Do a "kubectl get service -n kubernetes-dashboard" to see whether the scraper service was deleted or renamed. If it was deleted, reapply the Dashboard installation yamls to recreate it.

I was unable to replicate your issue but here are some steps you can try to debug the problem:
Metric client health check failed: ... Retrying in 30 seconds error appears only one time in the dashboard's source code, when Health check fails.
HealthCheck itself is a proxy request to api-server.
Use following command to test if proxy is working correctly.
$ kubectl get --raw "/api/v1/namespaces/kubernetes-dashboard/services/dashboard-metrics-scraper/proxy/healthz"
it should return: URL: /healthz. If didn't, there is most probably sth wrong with the dashboard-metrics-scraper service or the pod. Make sure that service exists and the pod is running and ready.
If it's working for you (from cli), but it is still not working for kubernetes-dashboard, this mean that you should check kubernetes-dashboard's RBAC permissions. Make sure that kubernetes-dashboard has permissions to proxy.
The second error you are seeing:
{"level":"error","msg":"Error scraping node metrics: the server could not find the requested resource (get nodes.metrics.k8s.io)","time":"2020-09-13T02:52:38Z"}
indicates that you don't have a metrics server deployed in your cluster. Check metrics-server github repo for more information.

I'm on kubernetes 1.20.1-00 ubuntu 20.04. I got the
{"level":"error","msg":"Error scraping node metrics: the server could not find the requested resource (get nodes.metrics.k8s.io)","time":"2020-09-13T02:52:38Z"}
error because I deployed kubernetes dashboard with metric scraper prior to deploying metric server. After a day of running in that configuration I was still getting the "Error scraping node..." in my metric scraper pod logs.
I resolved it by scaling the the metric scraper deployment to 0 (zero) and then scaling it back to the desired no of pods (in my case 3).
The error message in the logs went away immediately once the metric scraper pods had spun up.
I'm not implying that this is the correct fix just an observation from seeing an identical error. It could caused by simply deploying metric server and Kubernetes dashboard in the wrong order as I did.

Related

k3s - Metrics server doesn't work for worker nodes

I deployed a k3s cluster into 2 raspberry pi 4. One as a master and the second as a worker using the script k3s offered with the following options:
For the master node:
curl -sfL https://get.k3s.io | INSTALL_K3S_EXEC='server --bind-address 192.168.1.113 (which is the master node ip)' sh -
To the agent node:
curl -sfL https://get.k3s.io | \
K3S_URL=https://192.168.1.113:6443 \
K3S_TOKEN=<master-token> \
INSTALL_K3S_EXEC='agent' sh-
Everything seems to work, but kubectl top nodes returns the following:
NAME CPU(cores) CPU% MEMORY(bytes) MEMORY%
k3s-master 137m 3% 1285Mi 33%
k3s-node-01 <unknown> <unknown> <unknown> <unknown>
I also tried to deploy the k8s dashboard, according to what is written in the docs but it fails to work because it can't reach the metrics server and gets a timeout error:
"error trying to reach service: dial tcp 10.42.1.11:8443: i/o timeout"
and I see a lot of errors in the pod logs:
2021/09/17 09:24:06 Metric client health check failed: the server is currently unable to handle the request (get services dashboard-metrics-scraper). Retrying in 30 seconds.
2021/09/17 09:25:06 Metric client health check failed: the server is currently unable to handle the request (get services dashboard-metrics-scraper). Retrying in 30 seconds.
2021/09/17 09:26:06 Metric client health check failed: the server is currently unable to handle the request (get services dashboard-metrics-scraper). Retrying in 30 seconds.
2021/09/17 09:27:06 Metric client health check failed: the server is currently unable to handle the request (get services dashboard-metrics-scraper). Retrying in 30 seconds.
logs from the metrics-server pod:
elet_summary:k3s-node-01: unable to fetch metrics from Kubelet k3s-node-01 (k3s-node-01): Get https://k3s-node-01:10250/stats/summary?only_cpu_and_memory=true: dial tcp 192.168.1.106:10250: connect: no route to host
E0917 14:03:24.767949 1 manager.go:111] unable to fully collect metrics: unable to fully scrape metrics from source kubelet_summary:k3s-node-01: unable to fetch metrics from Kubelet k3s-node-01 (k3s-node-01): Get https://k3s-node-01:10250/stats/summary?only_cpu_and_memory=true: dial tcp 192.168.1.106:10250: connect: no route to host
E0917 14:04:24.767960 1 manager.go:111] unable to fully collect metrics: unable to fully scrape metrics from source kubelet_summary:k3s-node-01: unable to fetch metrics from Kubelet k3s-node-01 (k3s-node-01): Get https://k3s-node-01:10250/stats/summary?only_cpu_and_memory=true: dial tcp 192.168.1.106:10250: connect: no route to host
Moving this out of comments for better visibility.
After creation of small cluster, I wasn't able to reproduce this behaviour and metrics-server worked fine for both nodes, kubectl top nodes showed information and metrics about both available nodes (thought it took some time to start collecting the metrics).
Which leads to troubleshooting steps why it doesn't work. Checking metrics-server logs is the most efficient way to figure this out:
$ kubectl logs metrics-server-58b44df574-2n9dn -n kube-system
Based on logs it will be different steps to continue, for instance in comments above:
first it was no route to host which is related to network and lack of possibility to resolve hostname
then i/o timeout which means route exists, but service did not respond back. This may happen due to firewall which blocks certain ports/sources, kubelet is not running (listens to port 10250) or as it appeared for OP, there was an issue with ntp which affected certificates and connections.
errors may be different in other cases, it's important to find the error and based on it troubleshoot further.

CRD probe failing

I am installing service catalog which uses CRD and have created the same. Now I am running my controller deployment file and the image running in it runs a CRD list command to verify CRD are in place. This use to work fine previously but now CRD Probe is failing with error:
1226 07:45:01.539118 1 round_trippers.go:438] GET https://169.72.128.1:443/apis/apiextensions.k8s.io/v1beta1/customresourcedefinitions?labelSelector=svcat%3Dtrue in 30000 milliseconds
I1226 07:45:01.539158 1 round_trippers.go:444] Response Headers:
Error: while waiting for ready Service Catalog CRDs: failed to list CustomResourceDefinition: Get https://169.72.128.1:443/apis/apiextensions.k8s.io/v1beta1/customresourcedefinitions?labelSelector=svcat%3Dtrue: dial tcp 169.72.128.1:443: i/o timeout
I have followed same steps as previously but could not debug now.
Inside the controller code it is trying to make following call:
list, err := r.client.ApiextensionsV1beta1().CustomResourceDefinitions().List(v1.ListOptions{LabelSelector: labels.SelectorFromSet(labels.Set{"svcat": "true"}).String()})
Which is failing.
Update 1 : Installation works fine in default namespace but fails in specific namespace.
Environment Info: On Prem k8s cluster, latest k8s, 2 node cluster.
It's not a port issue.Service accounts use 443 port to connect to Kubernetes API Server. Check the if there is any network policy blocking the communication between your namespace and Kube-System namespace.

'kubectl top pods' Error from server (ServiceUnavailable): the server is currently unable to handle the request (get nodes.metrics.k8s.io)

When I am trying to run kubectl top nodes I`m getting the output:
Error from server (ServiceUnavailable): the server is currently unable to handle the request (get nodes.metrics.k8s.io)
Metric server is able to scrape the metrics, in the logs getting the metrics
ScrapeMetrics: time: 49.754499ms, nodes: 4, pods: 82
...Storing metrics...
...Cycle complete...
But the end points for the metrics service are missing, how can i resolve this issue?
kubectl get apiservices |egrep metrics
v1beta1.metrics.k8s.io kube-system/metrics-server False (MissingEndpoints)
any help will be appriciated!

Disabling heapster health checks in Kubernetes 1.10

While doing kubectl cluster-info dump , I see alot of:
2018/10/18 14:47:47 Metric client health check failed: the server could not find the requested resource (get services heapster). Retrying in 30 seconds.
2018/10/18 14:48:17 Metric client health check failed: the server could not find the requested resource (get services heapster). Retrying in 30 seconds.
2018/10/18 14:48:47 Metric client health check failed: the server could not find the requested resource (get services heapster). Retrying in 30 seconds.
2018/10/18 14:49:17 Metric client health check failed: the server could not find the requested resource (get services heapster). Retrying in 30 seconds.
2018/10/18 14:49:47 Metric client health check failed: the server could not find the requested resource (get services heapster). Retrying in 30 seconds.
2018/10/18 14:50:17 Metric client health check failed: the server could not find the requested resource (get services heapster). Retrying in 30 seconds.
2018/10/18 14:50:47 Metric client health check failed: the server could not find the requested resource (get services heapster). Retrying in 30 seconds.
Maybe this is a bug that will be fixed in new version ( heapster is deprecated anyway in new versions) , but is there anyway to disable these checks to avoid these noisy messges.
You can find Heapster deprecation timeline here.
I found that in Kubernetes cluster 1.10 version kubernetes-dashboard Pod produces such kind of error messages:
kubectl --namespace=kube-system log <kubernetes-dashboard-Pod>
2018/10/22 13:04:36 Metric client health check failed: the server
could not find the requested resource (get services heapster).
Retrying in 30 seconds.
It seems that kubernetes-dashboard still requires Heapster service for metrics and graph purposes.

Configure kubernetes-dashboard to use metrics-server service instated of heapster.

I have installed kube v1.11, since heapster is depreciated I am using matrics-server. Kubectl top node command works.
Kubernetes dashboard looking for heapster service. What is the steps to configure dashboard to use materics server services
2018/08/09 21:13:43 Metric client health check failed: the server could not find the requested resource (get services heapster). Retrying in 30 seconds.
Thanks
SR
This must be the week for asking that question; it seems that whatever is deploying heapster is omitting the Service, which one can fix as described here -- or the tl;dr is just: create the Service named heapster and point it at your heapster pods.
As of today Kubernetes dashboard doesn't support matrics-server and it is expected to be released very soon with new release of kubernetes dashboard.
You can follow https://github.com/kubernetes/dashboard/issues/2986