I've installed metrics-server on kubernetes v1.11.2.
I'm running a bare-metal cluster using 3 nodes and 1 master
In the metrics-server log I have the following errors:
E0907 14:29:51.774592 1 manager.go:102] unable to fully collect metrics: [unable to fully scrape metrics from source kubelet_summary:vps01: unable to
fetch metrics from Kubelet vps01 (vps01): Get https://vps01:10250/stats/summary/: dial tcp: lookup vps01 on 10.96.0.10:53: no such host, unable to fully scr
ape metrics from source kubelet_summary:vps04: unable to fetch metrics from Kubelet vps04 (vps04): Get https://vps04:10250/stats/summary/: dial tcp: lookup
vps04 on 10.96.0.10:53: no such host, unable to fully scrape metrics from source kubelet_summary:vps03: unable to fetch metrics from Kubelet vps03 (vps03):
Get https://vps03:10250/stats/summary/: dial tcp: lookup vps03 on 10.96.0.10:53: no such host, unable to fully scrape metrics from source kubelet_summary:vp
s02: unable to fetch metrics from Kubelet vps02 (vps02): Get https://vps02:10250/stats/summary/: dial tcp: lookup vps02 on 10.96.0.10:53: no such host]
E0907 14:30:01.694794 1 reststorage.go:98] unable to fetch pod metrics for pod boxweb/boxweb-deployment-7756c49688-fz625: no metrics known for pod "bo
xweb/boxweb-deployment-7756c49688-fz625"
E0907 14:30:10.517886 1 reststorage.go:112] unable to fetch node metrics for node "vps01": no metrics known for node "vps01"
I also can't get any metrics using
kubectl top node vps01
Same for autoscale it is not working
unable to get metrics for resource cpu: unable to fetch metrics from
resource metrics API: the server could not find the requested resource (get pods.metrics.k8s.io)
I found the following solution:
Change the metrics-server-deployment.yaml file and add:
command:
- /metrics-server
- --kubelet-preferred-address-types=InternalIP
- --kubelet-insecure-tls
It looks like you have DNS issue from your metrics-server pod. You can connect to the pod:
kubectl exec -it metrics-server-xxxxxxxxxx-xxxxx -n kube-system sh
/ # ping vps01
If you can't ping you can't resolve your node.
core-dns or kube-dns use the /etc/resolv.conf on each on your nodes too, so I would check if you can resolve the nodes between each other. Say, can you ping vps01 from vps02 or vps03, etc.
I got the same issue and I resolved by adding hostname in /etc/hosts on every node.
For collecting metric data (CPU/memory usage) metric-server try to access the nodes. However, the metric-server cannot resolve the hostname(vps01, vps02, vps03, and vps04) because the ones are not registered in DNS. As you mentioned, you cannot register the hostnames in DNS.
So, you must add the hostnames to /etc/hosts on the node where the metrics-server POD is running.
The autoscaler does not work since the metric-server is not working and there is no metric data.
Related
Im Anddiy and im working with a kubernetes cluster deployed by Rancher.
Its important to say that all my machines don't have direct access to the internet, im using a proxy to use the internet for downloads or something, so i've setted RKE2 with this proxy during the installation steps.
Here i have a machine with an RKE2 that build up my Rancher, and from the Rancer U.I i've created my Kubernetes Cluster, here it tis:
[15:19] root#vmrmmstnodehom01 [~]:# kubectl get nodes
NAME STATUS ROLES AGE VERSION
vmrmmstnodehom01 Ready controlplane 5d16h v1.24.8
vmrmwrknodehom01 Ready controlplane,etcd,worker 5d20h v1.24.8
vmrmwrknodehom02 Ready worker 5d19h v1.24.8
vmrmwrknodehom03 Ready worker 5d19h v1.24.8
vmrmwrknodehom04 Ready worker 5d19h v1.24.8
My cluster is a clean cluster, no applications installed on it at moment.
I've tried to install the longhorn application by using this command ( got this on longhorn documentation):
kubectl apply -f https://raw.githubusercontent.com/longhorn/longhorn/v1.4.0/deploy/longhorn.yaml
But when i tried that, this error message are displayed to me:
Unable to connect to the server: dial tcp: lookup raw.githubusercontent.com on 10.129.251.125:53: server misbehaving
I've tried to check if it is my proxy that don't connect to this url or something, but my machine connected succesfully to this url, i've tried that using the CURL -V and the longhorn url to test that.
I don't know if the kubernetes api has imported the proxy configs of my rke2/rancher, so i don't know if i need to set the proxy manually internal or something, really don't know what is happening here.
I deployed a k3s cluster into 2 raspberry pi 4. One as a master and the second as a worker using the script k3s offered with the following options:
For the master node:
curl -sfL https://get.k3s.io | INSTALL_K3S_EXEC='server --bind-address 192.168.1.113 (which is the master node ip)' sh -
To the agent node:
curl -sfL https://get.k3s.io | \
K3S_URL=https://192.168.1.113:6443 \
K3S_TOKEN=<master-token> \
INSTALL_K3S_EXEC='agent' sh-
Everything seems to work, but kubectl top nodes returns the following:
NAME CPU(cores) CPU% MEMORY(bytes) MEMORY%
k3s-master 137m 3% 1285Mi 33%
k3s-node-01 <unknown> <unknown> <unknown> <unknown>
I also tried to deploy the k8s dashboard, according to what is written in the docs but it fails to work because it can't reach the metrics server and gets a timeout error:
"error trying to reach service: dial tcp 10.42.1.11:8443: i/o timeout"
and I see a lot of errors in the pod logs:
2021/09/17 09:24:06 Metric client health check failed: the server is currently unable to handle the request (get services dashboard-metrics-scraper). Retrying in 30 seconds.
2021/09/17 09:25:06 Metric client health check failed: the server is currently unable to handle the request (get services dashboard-metrics-scraper). Retrying in 30 seconds.
2021/09/17 09:26:06 Metric client health check failed: the server is currently unable to handle the request (get services dashboard-metrics-scraper). Retrying in 30 seconds.
2021/09/17 09:27:06 Metric client health check failed: the server is currently unable to handle the request (get services dashboard-metrics-scraper). Retrying in 30 seconds.
logs from the metrics-server pod:
elet_summary:k3s-node-01: unable to fetch metrics from Kubelet k3s-node-01 (k3s-node-01): Get https://k3s-node-01:10250/stats/summary?only_cpu_and_memory=true: dial tcp 192.168.1.106:10250: connect: no route to host
E0917 14:03:24.767949 1 manager.go:111] unable to fully collect metrics: unable to fully scrape metrics from source kubelet_summary:k3s-node-01: unable to fetch metrics from Kubelet k3s-node-01 (k3s-node-01): Get https://k3s-node-01:10250/stats/summary?only_cpu_and_memory=true: dial tcp 192.168.1.106:10250: connect: no route to host
E0917 14:04:24.767960 1 manager.go:111] unable to fully collect metrics: unable to fully scrape metrics from source kubelet_summary:k3s-node-01: unable to fetch metrics from Kubelet k3s-node-01 (k3s-node-01): Get https://k3s-node-01:10250/stats/summary?only_cpu_and_memory=true: dial tcp 192.168.1.106:10250: connect: no route to host
Moving this out of comments for better visibility.
After creation of small cluster, I wasn't able to reproduce this behaviour and metrics-server worked fine for both nodes, kubectl top nodes showed information and metrics about both available nodes (thought it took some time to start collecting the metrics).
Which leads to troubleshooting steps why it doesn't work. Checking metrics-server logs is the most efficient way to figure this out:
$ kubectl logs metrics-server-58b44df574-2n9dn -n kube-system
Based on logs it will be different steps to continue, for instance in comments above:
first it was no route to host which is related to network and lack of possibility to resolve hostname
then i/o timeout which means route exists, but service did not respond back. This may happen due to firewall which blocks certain ports/sources, kubelet is not running (listens to port 10250) or as it appeared for OP, there was an issue with ntp which affected certificates and connections.
errors may be different in other cases, it's important to find the error and based on it troubleshoot further.
I am running command kubectl top nodes and getting error :
node#kubemaster:~/Desktop/metric$ kubectl top nodes
Error from server (ServiceUnavailable): the server is currently unable to handle the request (get nodes.metrics.k8s.io)
Metric Server pod is running with following params :
command:
- /metrics-server
- --metric-resolution=30s
- --requestheader-allowed-names=aggregator
- --kubelet-insecure-tls
- --kubelet-preferred-address-types=InternalIP,Hostname,InternalDNS,ExternalDNS,ExternalIP
Most of the answer I am getting is the above params,
Still getting error
E0601 18:33:22.012798 1 manager.go:111] unable to fully collect metrics: [unable to fully scrape metrics from source kubelet_summary:kubemaster: unable to fetch metrics from Kubelet kubemaster (192.168.56.30): Get https://192.168.56.30:10250/stats/summary?only_cpu_and_memory=true: context deadline exceeded, unable to fully scrape metrics from source kubelet_summary:kubenode1: unable to fetch metrics from Kubelet kubenode1 (192.168.56.31): Get https://192.168.56.31:10250/stats/summary?only_cpu_and_memory=true: dial tcp 192.168.56.31:10250: i/o timeout]
I have deployed metric server using :
kubectl apply -f https://github.com/kubernetes-sigs/metrics-server/releases/download/v0.3.6/components.yaml
What am I missing?
Using Calico for Pod Networking
On github page of metric server under FAQ:
[Calico] Check whether the value of CALICO_IPV4POOL_CIDR in the calico.yaml conflicts with the local physical network segment. The default: 192.168.0.0/16.
Could this be the reason. Can someone explains this to me.
I have setup Calico using :
kubectl apply -f https://docs.projectcalico.org/v3.14/manifests/calico.yaml
My Node Ips are : 192.168.56.30 / 192.168.56.31 / 192.168.56.32
I have initiated the cluster with --pod-network-cidr=20.96.0.0/12. So my pods Ip are 20.96.205.192 and so on.
Also getting this in apiserver logs
E0601 19:29:59.362627 1 available_controller.go:420] v1beta1.metrics.k8s.io failed with: failing or missing response from https://10.100.152.145:443/apis/metrics.k8s.io/v1beta1: Get https://10.100.152.145:443/apis/metrics.k8s.io/v1beta1: net/http: request canceled while waiting for connection (Client.Timeout exceeded while awaiting headers)
where 10.100.152.145 is IP of service/metrics-server(ClusterIP)
Surprisingly it works on another cluster with Node Ip in 172.16.0.0 range.
Rest everything is same. Setup using kudeadm, Calico, same pod cidr
It started working after I edited the metrics-server deployment yaml config to include a DNS policy.
hostNetwork: true
Refer to the link below:
https://www.linuxsysadmins.com/service-unavailable-kubernetes-metrics/
Default value of Calico net is 192.168.0.0/16
There is a comment in yaml file:
The default IPv4 pool to create on startup if none exists. Pod IPs
will be chosen from this range. Changing this value after installation
will have no effect. This should fall within --cluster-cidr.
name: CALICO_IPV4POOL_CIDR value: "192.168.0.0/16"
So, its better use different one if your home network is contained in 192.168.0.0/16.
Also, if you used kubeadm you can check your cidr in k8s:
kubeadm config view | grep Subnet
Or you can use kubectl:
kubectl --namespace kube-system get configmap kubeadm-config -o yaml
Default one in kubernetes "selfhosted" is 10.96.0.0/12
I had the same problem trying to run metrics on docker desktop and I followed #suren's answer and it worked.
The default configuration is:
- --kubelet-preferred-address-types=InternalIP,Hostname,InternalDNS,ExternalDNS,ExternalIP
And I changed to:
- --kubelet-preferred-address-types=InternalIP
I had same issue in my on-prem k8s v1.26 (cni=calico).
I thinks that this issue because of Metric-Server version (v0.6).
I solved my issue by apply Metric-Server v5.0.2
1- Download this Yaml file from official source
2- add ( - --kubelet-insecure-tls=true ) bellow the -args section
3- apply yaml
enjoy ;)
I am trying to enable Metrics Server on AWS, and I followed these steps
Clone or download the Metrics Server project.
Open the deploy/1.8+/metrics-server-deployment.yaml file in an editor.
Add the following command values into the containers property (it should be at the same level as the image property).
command:
- /metrics-server
- --kubelet-insecure-tls
Run kubectl create -f deploy/1.8+ as shown on the Metrics Server repo to create the deployment, services, etc.
Till this point everything is working fine, my metrics-server-pod is running fine, but when I do kubectl top nodes I am getting the following error
error: metrics not available yet
when i did kubectl logs [metrics-server-pod-name] -n kube-system i am getting this
E0611 14:36:57.527048 1 reststorage.go:129] unable to fetch node metrics for node "ip-172-20-xx-xxx.ec2.internal": no metrics known for node
E0611 14:36:57.527069 1 reststorage.go:129] unable to fetch node metrics for node "ip-172-20-xx-xx.ec2.internal": no metrics known for node
E0611 14:36:57.527075 1 reststorage.go:129] unable to fetch node metrics for node "ip-172-xx-114-xxx.ec2.internal": no metrics known for node
E0611 14:36:57.527079 1 reststorage.go:129] unable to fetch node metrics for node "ip-172-xx-xx-xxx.ec2.internal": no metrics known for node
E0611 14:36:57.527084 1 reststorage.go:129] unable to fetch node metrics for node "ip-172-xx-91-xx.ec2.internal": no metrics known for node
E0611 14:36:57.527088 1 reststorage.go:129] unable to fetch node metrics for node "ip-172-20-xx-xxx.ec2.internal": no metrics known for node
E0611 14:37:26.006830 1 manager.go:111] unable to fully collect metrics: [unable to fully scrape metrics from source kubelet_summary:ip-172-xx-36-103.ec2.internal: unable to fetch metrics from Kubelet ip-172-xx-36-103.ec2.internal (172.20.36.xxx): request failed - "401 Unauthorized", response: "Unauthorized", unable to fully scrape metrics from source kubelet_summary:ip-172-xx-65-xx.ec2.internal: unable to fetch metrics from Kubelet ip-172-20-65-xxx.ec2.internal (172.xx.65.xx): request failed - "401 Unauthorized", response: "Unauthorized", unable to fully scrape metrics from source kubelet_summary:ip-172-xx-114-xxx.ec2.internal: unable to fetch metrics from Kubelet ip-172-20-114-223.ec2.internal (172.xx.114.xxx): request failed - "401 Unauthorized", response: "Unauthorized", unable to fully scrape metrics from source kubelet_summary:ip-172-xx-63-xxx.ec2.internal: unable to fetch metrics from Kubelet ip-172-xx-63-xxx.ec2.internal (172.xx.63.xxx): request failed - "401 Unauthorized", response: "Unauthorized", unable to fully scrape metrics from source kubelet_summary:ip-172-xxx-91-xx.ec2.internal: unable to fetch metrics from Kubelet ip-172-xx-91-xxx.ec2.internal (172.xx.91.xxx): request failed - "401 Unauthorized", response: "Unauthorized", unable to fully scrape metrics from source kubelet_summary:ip-172-xxx-96-xxx.ec2.internal: unable to fetch metrics from Kubelet ip-172-xx-96-xxx.ec2.internal (172.xx.96.xxx): request failed - "401 Unauthorized", response: "Unauthorized"]
Option I.
Add - --kubelet-preferred-address-types=InternalDNS,InternalIP,ExternalDNS,ExternalIP,Hostname
to your metrics-server-deployment.yaml, e.g.
command:
- /metrics-server
- --kubelet-insecure-tls
- --kubelet-preferred-address-types=InternalDNS,InternalIP,ExternalDNS,ExternalIP,Hostname
If still doesn't work, try with
Option II.
Because of metrics-server resolving the hostname from coredns, add node IPs to coredns configmap
kubectl edit configmap coredns -n kube-system
and add
apiVersion: v1
data:
Corefile: |
.:53 {
errors
health
hosts {
192.168.199.100 master.qls.com
192.168.199.220 worker.qls.com
fallthrough
}
kubernetes cluster.local in-addr.arpa ip6.arpa {
pods insecure
upstream
fallthrough in-addr.arpa ip6.arpa
}
prometheus :9153
forward . /etc/resolv.conf
cache 30
loop
reload
loadbalance
}
kind: ConfigMap
metadata:
creationTimestamp: "2019-05-17T12:32:08Z"
name: coredns
namespace: kube-system
resourceVersion: "180"
selfLink: /api/v1/namespaces/kube-system/configmaps/coredns
uid: c93e5274-789f-11e9-a0ea-42010a9c0003
When I try to use kubectl top nodes I get this error:
Error from server (NotFound): the server could not find the requested resource (get services http:heapster:)
But heapster is deprecated and I'm using kubernetes 1.11. I installed metrics-server and I still get the same error, when I try to check metrics-server's logs I see this error:
E1019 12:33:55.621691 1 manager.go:102] unable to fully collect metrics: [unable to fully scrape metrics from source kubelet_summary:elegant-ardinghelli-ei3: unable to fetch metrics from Kubelet elegant-ardinghelli-ei3 (elegant-ardinghelli-ei3):
Get https://elegant-ardinghelli-ei3:10250/stats/summary/: dial tcp: lookup elegant-ardinghelli-ei3 on 10.245.0.10:53: no such host, unable to fully scrape metrics from source kubelet_summary:elegant-ardinghelli-aab: unable to fetch metrics from Kubelet elegant-ardinghelli-aab (elegant-ardinghelli-aab):
Get https://elegant-ardinghelli-aab:10250/stats/summary/: dial tcp: lookup elegant-ardinghelli-aab on 10.245.0.10:53: no such host, unable to fully scrape metrics from source kubelet_summary:elegant-ardinghelli-e4z: unable to fetch metrics from Kubelet elegant-ardinghelli-e4z (elegant-ardinghelli-e4z):
Get https://elegant-ardinghelli-e4z:10250/stats/summary/: dial tcp: lookup elegant-ardinghelli-e4z on 10.245.0.10:53: no such host, unable to fully scrape metrics from source kubelet_summary:elegant-ardinghelli-e41: unable to fetch metrics from Kubelet elegant-ardinghelli-e41 (elegant-ardinghelli-e41):
Get https://elegant-ardinghelli-e41:10250/stats/summary/: dial tcp: lookup elegant-ardinghelli-e41 on 10.245.0.10:53: no such host, unable to fully scrape metrics from source kubelet_summary:elegant-ardinghelli-ein: unable to fetch metrics from Kubelet elegant-ardinghelli-ein (elegant-ardinghelli-ein):
Get https://elegant-ardinghelli-ein:10250/stats/summary/: dial tcp: lookup elegant-ardinghelli-ein on 10.245.0.10:53: no such host, unable to fully scrape metrics from source kubelet_summary:elegant-ardinghelli-aar: unable to fetch metrics from Kubelet elegant-ardinghelli-aar (elegant-ardinghelli-aar):
Get https://elegant-ardinghelli-aar:10250/stats/summary/: dial tcp: lookup elegant-ardinghelli-aar on 10.245.0.10:53: no such host, unable to fully scrape metrics from source kubelet_summary:elegant-ardinghelli-aaj: unable to fetch metrics from Kubelet elegant-ardinghelli-aaj (elegant-ardinghelli-aaj):
Get https://elegant-ardinghelli-aaj:10250/stats/summary/: dial tcp: lookup elegant-ardinghelli-aaj on 10.245.0.10:53: no such host, unable to fully scrape metrics from source kubelet_summary:elegant-ardinghelli-e49: unable to fetch metrics from Kubelet elegant-ardinghelli-e49 (elegant-ardinghelli-e49):
Get https://elegant-ardinghelli-e49:10250/stats/summary/: dial tcp: lookup elegant-ardinghelli-e49 on 10.245.0.10:53: no such host]
It is reported here.
Github Issues:
This PR implements support for the kubectl top commands to use the
metrics-server as an aggregated API, instead of requesting the metrics
from heapster directly. If the metrics.k8s.io API is not served by the
apiserver, then this still falls back to the previous behavior.
Merged in https://github.com/kubernetes/kubernetes/pull/56206
Maybe fixed in 1.12 or scheduled for next version.