Kubernetes metrics-server unable to fully scrape metrics - kubernetes

When I try to use kubectl top nodes I get this error:
Error from server (NotFound): the server could not find the requested resource (get services http:heapster:)
But heapster is deprecated and I'm using kubernetes 1.11. I installed metrics-server and I still get the same error, when I try to check metrics-server's logs I see this error:
E1019 12:33:55.621691 1 manager.go:102] unable to fully collect metrics: [unable to fully scrape metrics from source kubelet_summary:elegant-ardinghelli-ei3: unable to fetch metrics from Kubelet elegant-ardinghelli-ei3 (elegant-ardinghelli-ei3):
Get https://elegant-ardinghelli-ei3:10250/stats/summary/: dial tcp: lookup elegant-ardinghelli-ei3 on 10.245.0.10:53: no such host, unable to fully scrape metrics from source kubelet_summary:elegant-ardinghelli-aab: unable to fetch metrics from Kubelet elegant-ardinghelli-aab (elegant-ardinghelli-aab):
Get https://elegant-ardinghelli-aab:10250/stats/summary/: dial tcp: lookup elegant-ardinghelli-aab on 10.245.0.10:53: no such host, unable to fully scrape metrics from source kubelet_summary:elegant-ardinghelli-e4z: unable to fetch metrics from Kubelet elegant-ardinghelli-e4z (elegant-ardinghelli-e4z):
Get https://elegant-ardinghelli-e4z:10250/stats/summary/: dial tcp: lookup elegant-ardinghelli-e4z on 10.245.0.10:53: no such host, unable to fully scrape metrics from source kubelet_summary:elegant-ardinghelli-e41: unable to fetch metrics from Kubelet elegant-ardinghelli-e41 (elegant-ardinghelli-e41):
Get https://elegant-ardinghelli-e41:10250/stats/summary/: dial tcp: lookup elegant-ardinghelli-e41 on 10.245.0.10:53: no such host, unable to fully scrape metrics from source kubelet_summary:elegant-ardinghelli-ein: unable to fetch metrics from Kubelet elegant-ardinghelli-ein (elegant-ardinghelli-ein):
Get https://elegant-ardinghelli-ein:10250/stats/summary/: dial tcp: lookup elegant-ardinghelli-ein on 10.245.0.10:53: no such host, unable to fully scrape metrics from source kubelet_summary:elegant-ardinghelli-aar: unable to fetch metrics from Kubelet elegant-ardinghelli-aar (elegant-ardinghelli-aar):
Get https://elegant-ardinghelli-aar:10250/stats/summary/: dial tcp: lookup elegant-ardinghelli-aar on 10.245.0.10:53: no such host, unable to fully scrape metrics from source kubelet_summary:elegant-ardinghelli-aaj: unable to fetch metrics from Kubelet elegant-ardinghelli-aaj (elegant-ardinghelli-aaj):
Get https://elegant-ardinghelli-aaj:10250/stats/summary/: dial tcp: lookup elegant-ardinghelli-aaj on 10.245.0.10:53: no such host, unable to fully scrape metrics from source kubelet_summary:elegant-ardinghelli-e49: unable to fetch metrics from Kubelet elegant-ardinghelli-e49 (elegant-ardinghelli-e49):
Get https://elegant-ardinghelli-e49:10250/stats/summary/: dial tcp: lookup elegant-ardinghelli-e49 on 10.245.0.10:53: no such host]

It is reported here.
Github Issues:
This PR implements support for the kubectl top commands to use the
metrics-server as an aggregated API, instead of requesting the metrics
from heapster directly. If the metrics.k8s.io API is not served by the
apiserver, then this still falls back to the previous behavior.
Merged in https://github.com/kubernetes/kubernetes/pull/56206
Maybe fixed in 1.12 or scheduled for next version.

Related

k3s - Metrics server doesn't work for worker nodes

I deployed a k3s cluster into 2 raspberry pi 4. One as a master and the second as a worker using the script k3s offered with the following options:
For the master node:
curl -sfL https://get.k3s.io | INSTALL_K3S_EXEC='server --bind-address 192.168.1.113 (which is the master node ip)' sh -
To the agent node:
curl -sfL https://get.k3s.io | \
K3S_URL=https://192.168.1.113:6443 \
K3S_TOKEN=<master-token> \
INSTALL_K3S_EXEC='agent' sh-
Everything seems to work, but kubectl top nodes returns the following:
NAME CPU(cores) CPU% MEMORY(bytes) MEMORY%
k3s-master 137m 3% 1285Mi 33%
k3s-node-01 <unknown> <unknown> <unknown> <unknown>
I also tried to deploy the k8s dashboard, according to what is written in the docs but it fails to work because it can't reach the metrics server and gets a timeout error:
"error trying to reach service: dial tcp 10.42.1.11:8443: i/o timeout"
and I see a lot of errors in the pod logs:
2021/09/17 09:24:06 Metric client health check failed: the server is currently unable to handle the request (get services dashboard-metrics-scraper). Retrying in 30 seconds.
2021/09/17 09:25:06 Metric client health check failed: the server is currently unable to handle the request (get services dashboard-metrics-scraper). Retrying in 30 seconds.
2021/09/17 09:26:06 Metric client health check failed: the server is currently unable to handle the request (get services dashboard-metrics-scraper). Retrying in 30 seconds.
2021/09/17 09:27:06 Metric client health check failed: the server is currently unable to handle the request (get services dashboard-metrics-scraper). Retrying in 30 seconds.
logs from the metrics-server pod:
elet_summary:k3s-node-01: unable to fetch metrics from Kubelet k3s-node-01 (k3s-node-01): Get https://k3s-node-01:10250/stats/summary?only_cpu_and_memory=true: dial tcp 192.168.1.106:10250: connect: no route to host
E0917 14:03:24.767949 1 manager.go:111] unable to fully collect metrics: unable to fully scrape metrics from source kubelet_summary:k3s-node-01: unable to fetch metrics from Kubelet k3s-node-01 (k3s-node-01): Get https://k3s-node-01:10250/stats/summary?only_cpu_and_memory=true: dial tcp 192.168.1.106:10250: connect: no route to host
E0917 14:04:24.767960 1 manager.go:111] unable to fully collect metrics: unable to fully scrape metrics from source kubelet_summary:k3s-node-01: unable to fetch metrics from Kubelet k3s-node-01 (k3s-node-01): Get https://k3s-node-01:10250/stats/summary?only_cpu_and_memory=true: dial tcp 192.168.1.106:10250: connect: no route to host
Moving this out of comments for better visibility.
After creation of small cluster, I wasn't able to reproduce this behaviour and metrics-server worked fine for both nodes, kubectl top nodes showed information and metrics about both available nodes (thought it took some time to start collecting the metrics).
Which leads to troubleshooting steps why it doesn't work. Checking metrics-server logs is the most efficient way to figure this out:
$ kubectl logs metrics-server-58b44df574-2n9dn -n kube-system
Based on logs it will be different steps to continue, for instance in comments above:
first it was no route to host which is related to network and lack of possibility to resolve hostname
then i/o timeout which means route exists, but service did not respond back. This may happen due to firewall which blocks certain ports/sources, kubelet is not running (listens to port 10250) or as it appeared for OP, there was an issue with ntp which affected certificates and connections.
errors may be different in other cases, it's important to find the error and based on it troubleshoot further.

metrics not available yet metrics-server kubernetes on aws

I am trying to enable Metrics Server on AWS, and I followed these steps
Clone or download the Metrics Server project.
Open the deploy/1.8+/metrics-server-deployment.yaml file in an editor.
Add the following command values into the containers property (it should be at the same level as the image property).
command:
- /metrics-server
- --kubelet-insecure-tls
Run kubectl create -f deploy/1.8+ as shown on the Metrics Server repo to create the deployment, services, etc.
Till this point everything is working fine, my metrics-server-pod is running fine, but when I do kubectl top nodes I am getting the following error
error: metrics not available yet
when i did kubectl logs [metrics-server-pod-name] -n kube-system i am getting this
E0611 14:36:57.527048 1 reststorage.go:129] unable to fetch node metrics for node "ip-172-20-xx-xxx.ec2.internal": no metrics known for node
E0611 14:36:57.527069 1 reststorage.go:129] unable to fetch node metrics for node "ip-172-20-xx-xx.ec2.internal": no metrics known for node
E0611 14:36:57.527075 1 reststorage.go:129] unable to fetch node metrics for node "ip-172-xx-114-xxx.ec2.internal": no metrics known for node
E0611 14:36:57.527079 1 reststorage.go:129] unable to fetch node metrics for node "ip-172-xx-xx-xxx.ec2.internal": no metrics known for node
E0611 14:36:57.527084 1 reststorage.go:129] unable to fetch node metrics for node "ip-172-xx-91-xx.ec2.internal": no metrics known for node
E0611 14:36:57.527088 1 reststorage.go:129] unable to fetch node metrics for node "ip-172-20-xx-xxx.ec2.internal": no metrics known for node
E0611 14:37:26.006830 1 manager.go:111] unable to fully collect metrics: [unable to fully scrape metrics from source kubelet_summary:ip-172-xx-36-103.ec2.internal: unable to fetch metrics from Kubelet ip-172-xx-36-103.ec2.internal (172.20.36.xxx): request failed - "401 Unauthorized", response: "Unauthorized", unable to fully scrape metrics from source kubelet_summary:ip-172-xx-65-xx.ec2.internal: unable to fetch metrics from Kubelet ip-172-20-65-xxx.ec2.internal (172.xx.65.xx): request failed - "401 Unauthorized", response: "Unauthorized", unable to fully scrape metrics from source kubelet_summary:ip-172-xx-114-xxx.ec2.internal: unable to fetch metrics from Kubelet ip-172-20-114-223.ec2.internal (172.xx.114.xxx): request failed - "401 Unauthorized", response: "Unauthorized", unable to fully scrape metrics from source kubelet_summary:ip-172-xx-63-xxx.ec2.internal: unable to fetch metrics from Kubelet ip-172-xx-63-xxx.ec2.internal (172.xx.63.xxx): request failed - "401 Unauthorized", response: "Unauthorized", unable to fully scrape metrics from source kubelet_summary:ip-172-xxx-91-xx.ec2.internal: unable to fetch metrics from Kubelet ip-172-xx-91-xxx.ec2.internal (172.xx.91.xxx): request failed - "401 Unauthorized", response: "Unauthorized", unable to fully scrape metrics from source kubelet_summary:ip-172-xxx-96-xxx.ec2.internal: unable to fetch metrics from Kubelet ip-172-xx-96-xxx.ec2.internal (172.xx.96.xxx): request failed - "401 Unauthorized", response: "Unauthorized"]
Option I.
Add - --kubelet-preferred-address-types=InternalDNS,InternalIP,ExternalDNS,ExternalIP,Hostname
to your metrics-server-deployment.yaml, e.g.
command:
- /metrics-server
- --kubelet-insecure-tls
- --kubelet-preferred-address-types=InternalDNS,InternalIP,ExternalDNS,ExternalIP,Hostname
If still doesn't work, try with
Option II.
Because of metrics-server resolving the hostname from coredns, add node IPs to coredns configmap
kubectl edit configmap coredns -n kube-system
and add
apiVersion: v1
data:
Corefile: |
.:53 {
errors
health
hosts {
192.168.199.100 master.qls.com
192.168.199.220 worker.qls.com
fallthrough
}
kubernetes cluster.local in-addr.arpa ip6.arpa {
pods insecure
upstream
fallthrough in-addr.arpa ip6.arpa
}
prometheus :9153
forward . /etc/resolv.conf
cache 30
loop
reload
loadbalance
}
kind: ConfigMap
metadata:
creationTimestamp: "2019-05-17T12:32:08Z"
name: coredns
namespace: kube-system
resourceVersion: "180"
selfLink: /api/v1/namespaces/kube-system/configmaps/coredns
uid: c93e5274-789f-11e9-a0ea-42010a9c0003

Kubernetes Metrics unable to fetch pod/node metrics

I've installed metrics-server on kubernetes v1.11.2.
I'm running a bare-metal cluster using 3 nodes and 1 master
In the metrics-server log I have the following errors:
E0907 14:29:51.774592 1 manager.go:102] unable to fully collect metrics: [unable to fully scrape metrics from source kubelet_summary:vps01: unable to
fetch metrics from Kubelet vps01 (vps01): Get https://vps01:10250/stats/summary/: dial tcp: lookup vps01 on 10.96.0.10:53: no such host, unable to fully scr
ape metrics from source kubelet_summary:vps04: unable to fetch metrics from Kubelet vps04 (vps04): Get https://vps04:10250/stats/summary/: dial tcp: lookup
vps04 on 10.96.0.10:53: no such host, unable to fully scrape metrics from source kubelet_summary:vps03: unable to fetch metrics from Kubelet vps03 (vps03):
Get https://vps03:10250/stats/summary/: dial tcp: lookup vps03 on 10.96.0.10:53: no such host, unable to fully scrape metrics from source kubelet_summary:vp
s02: unable to fetch metrics from Kubelet vps02 (vps02): Get https://vps02:10250/stats/summary/: dial tcp: lookup vps02 on 10.96.0.10:53: no such host]
E0907 14:30:01.694794 1 reststorage.go:98] unable to fetch pod metrics for pod boxweb/boxweb-deployment-7756c49688-fz625: no metrics known for pod "bo
xweb/boxweb-deployment-7756c49688-fz625"
E0907 14:30:10.517886 1 reststorage.go:112] unable to fetch node metrics for node "vps01": no metrics known for node "vps01"
I also can't get any metrics using
kubectl top node vps01
Same for autoscale it is not working
unable to get metrics for resource cpu: unable to fetch metrics from
resource metrics API: the server could not find the requested resource (get pods.metrics.k8s.io)
I found the following solution:
Change the metrics-server-deployment.yaml file and add:
command:
- /metrics-server
- --kubelet-preferred-address-types=InternalIP
- --kubelet-insecure-tls
It looks like you have DNS issue from your metrics-server pod. You can connect to the pod:
kubectl exec -it metrics-server-xxxxxxxxxx-xxxxx -n kube-system sh
/ # ping vps01
If you can't ping you can't resolve your node.
core-dns or kube-dns use the /etc/resolv.conf on each on your nodes too, so I would check if you can resolve the nodes between each other. Say, can you ping vps01 from vps02 or vps03, etc.
I got the same issue and I resolved by adding hostname in /etc/hosts on every node.
For collecting metric data (CPU/memory usage) metric-server try to access the nodes. However, the metric-server cannot resolve the hostname(vps01, vps02, vps03, and vps04) because the ones are not registered in DNS. As you mentioned, you cannot register the hostnames in DNS.
So, you must add the hostnames to /etc/hosts on the node where the metrics-server POD is running.
The autoscaler does not work since the metric-server is not working and there is no metric data.

Can't connect to a kubernetes service, error: time out

I am trying to connect to kubernetes service through a pod.
I can list down the service using kubectl get svc and I can see the CLusterIP and port are there, but when the pod tries to connect to it, I get the error
dial tcp 10.0.0.153:xxxx: i/o timeout.
Any idea how to debug that? Or what could be the reason?

In prometheus all target servers show as down with this error: dial tcp 10.7.17.11:9100: getsockopt: connection timed out

I have configured the Node Exporter in Kubernetes and start monitoring using Prometheus, But in Prometheus all servers are showing as down with the error below:
Get http://10.7.17.11:9100/metrics: dial tcp 10.7.17.11:9100:
getsockopt: connection timed out
Can anyone help why it is showing down ?
Make sure firewall is not blocking port 9100.
Try to curl this URL from other nodes and from the prometheus pod