metrics not available yet metrics-server kubernetes on aws - kubernetes

I am trying to enable Metrics Server on AWS, and I followed these steps
Clone or download the Metrics Server project.
Open the deploy/1.8+/metrics-server-deployment.yaml file in an editor.
Add the following command values into the containers property (it should be at the same level as the image property).
command:
- /metrics-server
- --kubelet-insecure-tls
Run kubectl create -f deploy/1.8+ as shown on the Metrics Server repo to create the deployment, services, etc.
Till this point everything is working fine, my metrics-server-pod is running fine, but when I do kubectl top nodes I am getting the following error
error: metrics not available yet
when i did kubectl logs [metrics-server-pod-name] -n kube-system i am getting this
E0611 14:36:57.527048 1 reststorage.go:129] unable to fetch node metrics for node "ip-172-20-xx-xxx.ec2.internal": no metrics known for node
E0611 14:36:57.527069 1 reststorage.go:129] unable to fetch node metrics for node "ip-172-20-xx-xx.ec2.internal": no metrics known for node
E0611 14:36:57.527075 1 reststorage.go:129] unable to fetch node metrics for node "ip-172-xx-114-xxx.ec2.internal": no metrics known for node
E0611 14:36:57.527079 1 reststorage.go:129] unable to fetch node metrics for node "ip-172-xx-xx-xxx.ec2.internal": no metrics known for node
E0611 14:36:57.527084 1 reststorage.go:129] unable to fetch node metrics for node "ip-172-xx-91-xx.ec2.internal": no metrics known for node
E0611 14:36:57.527088 1 reststorage.go:129] unable to fetch node metrics for node "ip-172-20-xx-xxx.ec2.internal": no metrics known for node
E0611 14:37:26.006830 1 manager.go:111] unable to fully collect metrics: [unable to fully scrape metrics from source kubelet_summary:ip-172-xx-36-103.ec2.internal: unable to fetch metrics from Kubelet ip-172-xx-36-103.ec2.internal (172.20.36.xxx): request failed - "401 Unauthorized", response: "Unauthorized", unable to fully scrape metrics from source kubelet_summary:ip-172-xx-65-xx.ec2.internal: unable to fetch metrics from Kubelet ip-172-20-65-xxx.ec2.internal (172.xx.65.xx): request failed - "401 Unauthorized", response: "Unauthorized", unable to fully scrape metrics from source kubelet_summary:ip-172-xx-114-xxx.ec2.internal: unable to fetch metrics from Kubelet ip-172-20-114-223.ec2.internal (172.xx.114.xxx): request failed - "401 Unauthorized", response: "Unauthorized", unable to fully scrape metrics from source kubelet_summary:ip-172-xx-63-xxx.ec2.internal: unable to fetch metrics from Kubelet ip-172-xx-63-xxx.ec2.internal (172.xx.63.xxx): request failed - "401 Unauthorized", response: "Unauthorized", unable to fully scrape metrics from source kubelet_summary:ip-172-xxx-91-xx.ec2.internal: unable to fetch metrics from Kubelet ip-172-xx-91-xxx.ec2.internal (172.xx.91.xxx): request failed - "401 Unauthorized", response: "Unauthorized", unable to fully scrape metrics from source kubelet_summary:ip-172-xxx-96-xxx.ec2.internal: unable to fetch metrics from Kubelet ip-172-xx-96-xxx.ec2.internal (172.xx.96.xxx): request failed - "401 Unauthorized", response: "Unauthorized"]

Option I.
Add - --kubelet-preferred-address-types=InternalDNS,InternalIP,ExternalDNS,ExternalIP,Hostname
to your metrics-server-deployment.yaml, e.g.
command:
- /metrics-server
- --kubelet-insecure-tls
- --kubelet-preferred-address-types=InternalDNS,InternalIP,ExternalDNS,ExternalIP,Hostname
If still doesn't work, try with
Option II.
Because of metrics-server resolving the hostname from coredns, add node IPs to coredns configmap
kubectl edit configmap coredns -n kube-system
and add
apiVersion: v1
data:
Corefile: |
.:53 {
errors
health
hosts {
192.168.199.100 master.qls.com
192.168.199.220 worker.qls.com
fallthrough
}
kubernetes cluster.local in-addr.arpa ip6.arpa {
pods insecure
upstream
fallthrough in-addr.arpa ip6.arpa
}
prometheus :9153
forward . /etc/resolv.conf
cache 30
loop
reload
loadbalance
}
kind: ConfigMap
metadata:
creationTimestamp: "2019-05-17T12:32:08Z"
name: coredns
namespace: kube-system
resourceVersion: "180"
selfLink: /api/v1/namespaces/kube-system/configmaps/coredns
uid: c93e5274-789f-11e9-a0ea-42010a9c0003

Related

k3s - Metrics server doesn't work for worker nodes

I deployed a k3s cluster into 2 raspberry pi 4. One as a master and the second as a worker using the script k3s offered with the following options:
For the master node:
curl -sfL https://get.k3s.io | INSTALL_K3S_EXEC='server --bind-address 192.168.1.113 (which is the master node ip)' sh -
To the agent node:
curl -sfL https://get.k3s.io | \
K3S_URL=https://192.168.1.113:6443 \
K3S_TOKEN=<master-token> \
INSTALL_K3S_EXEC='agent' sh-
Everything seems to work, but kubectl top nodes returns the following:
NAME CPU(cores) CPU% MEMORY(bytes) MEMORY%
k3s-master 137m 3% 1285Mi 33%
k3s-node-01 <unknown> <unknown> <unknown> <unknown>
I also tried to deploy the k8s dashboard, according to what is written in the docs but it fails to work because it can't reach the metrics server and gets a timeout error:
"error trying to reach service: dial tcp 10.42.1.11:8443: i/o timeout"
and I see a lot of errors in the pod logs:
2021/09/17 09:24:06 Metric client health check failed: the server is currently unable to handle the request (get services dashboard-metrics-scraper). Retrying in 30 seconds.
2021/09/17 09:25:06 Metric client health check failed: the server is currently unable to handle the request (get services dashboard-metrics-scraper). Retrying in 30 seconds.
2021/09/17 09:26:06 Metric client health check failed: the server is currently unable to handle the request (get services dashboard-metrics-scraper). Retrying in 30 seconds.
2021/09/17 09:27:06 Metric client health check failed: the server is currently unable to handle the request (get services dashboard-metrics-scraper). Retrying in 30 seconds.
logs from the metrics-server pod:
elet_summary:k3s-node-01: unable to fetch metrics from Kubelet k3s-node-01 (k3s-node-01): Get https://k3s-node-01:10250/stats/summary?only_cpu_and_memory=true: dial tcp 192.168.1.106:10250: connect: no route to host
E0917 14:03:24.767949 1 manager.go:111] unable to fully collect metrics: unable to fully scrape metrics from source kubelet_summary:k3s-node-01: unable to fetch metrics from Kubelet k3s-node-01 (k3s-node-01): Get https://k3s-node-01:10250/stats/summary?only_cpu_and_memory=true: dial tcp 192.168.1.106:10250: connect: no route to host
E0917 14:04:24.767960 1 manager.go:111] unable to fully collect metrics: unable to fully scrape metrics from source kubelet_summary:k3s-node-01: unable to fetch metrics from Kubelet k3s-node-01 (k3s-node-01): Get https://k3s-node-01:10250/stats/summary?only_cpu_and_memory=true: dial tcp 192.168.1.106:10250: connect: no route to host
Moving this out of comments for better visibility.
After creation of small cluster, I wasn't able to reproduce this behaviour and metrics-server worked fine for both nodes, kubectl top nodes showed information and metrics about both available nodes (thought it took some time to start collecting the metrics).
Which leads to troubleshooting steps why it doesn't work. Checking metrics-server logs is the most efficient way to figure this out:
$ kubectl logs metrics-server-58b44df574-2n9dn -n kube-system
Based on logs it will be different steps to continue, for instance in comments above:
first it was no route to host which is related to network and lack of possibility to resolve hostname
then i/o timeout which means route exists, but service did not respond back. This may happen due to firewall which blocks certain ports/sources, kubelet is not running (listens to port 10250) or as it appeared for OP, there was an issue with ntp which affected certificates and connections.
errors may be different in other cases, it's important to find the error and based on it troubleshoot further.

Metric server not working : unable to handle the request (get nodes.metrics.k8s.io)

I am running command kubectl top nodes and getting error :
node#kubemaster:~/Desktop/metric$ kubectl top nodes
Error from server (ServiceUnavailable): the server is currently unable to handle the request (get nodes.metrics.k8s.io)
Metric Server pod is running with following params :
command:
- /metrics-server
- --metric-resolution=30s
- --requestheader-allowed-names=aggregator
- --kubelet-insecure-tls
- --kubelet-preferred-address-types=InternalIP,Hostname,InternalDNS,ExternalDNS,ExternalIP
Most of the answer I am getting is the above params,
Still getting error
E0601 18:33:22.012798 1 manager.go:111] unable to fully collect metrics: [unable to fully scrape metrics from source kubelet_summary:kubemaster: unable to fetch metrics from Kubelet kubemaster (192.168.56.30): Get https://192.168.56.30:10250/stats/summary?only_cpu_and_memory=true: context deadline exceeded, unable to fully scrape metrics from source kubelet_summary:kubenode1: unable to fetch metrics from Kubelet kubenode1 (192.168.56.31): Get https://192.168.56.31:10250/stats/summary?only_cpu_and_memory=true: dial tcp 192.168.56.31:10250: i/o timeout]
I have deployed metric server using :
kubectl apply -f https://github.com/kubernetes-sigs/metrics-server/releases/download/v0.3.6/components.yaml
What am I missing?
Using Calico for Pod Networking
On github page of metric server under FAQ:
[Calico] Check whether the value of CALICO_IPV4POOL_CIDR in the calico.yaml conflicts with the local physical network segment. The default: 192.168.0.0/16.
Could this be the reason. Can someone explains this to me.
I have setup Calico using :
kubectl apply -f https://docs.projectcalico.org/v3.14/manifests/calico.yaml
My Node Ips are : 192.168.56.30 / 192.168.56.31 / 192.168.56.32
I have initiated the cluster with --pod-network-cidr=20.96.0.0/12. So my pods Ip are 20.96.205.192 and so on.
Also getting this in apiserver logs
E0601 19:29:59.362627 1 available_controller.go:420] v1beta1.metrics.k8s.io failed with: failing or missing response from https://10.100.152.145:443/apis/metrics.k8s.io/v1beta1: Get https://10.100.152.145:443/apis/metrics.k8s.io/v1beta1: net/http: request canceled while waiting for connection (Client.Timeout exceeded while awaiting headers)
where 10.100.152.145 is IP of service/metrics-server(ClusterIP)
Surprisingly it works on another cluster with Node Ip in 172.16.0.0 range.
Rest everything is same. Setup using kudeadm, Calico, same pod cidr
It started working after I edited the metrics-server deployment yaml config to include a DNS policy.
hostNetwork: true
Refer to the link below:
https://www.linuxsysadmins.com/service-unavailable-kubernetes-metrics/
Default value of Calico net is 192.168.0.0/16
There is a comment in yaml file:
The default IPv4 pool to create on startup if none exists. Pod IPs
will be chosen from this range. Changing this value after installation
will have no effect. This should fall within --cluster-cidr.
name: CALICO_IPV4POOL_CIDR value: "192.168.0.0/16"
So, its better use different one if your home network is contained in 192.168.0.0/16.
Also, if you used kubeadm you can check your cidr in k8s:
kubeadm config view | grep Subnet
Or you can use kubectl:
kubectl --namespace kube-system get configmap kubeadm-config -o yaml
Default one in kubernetes "selfhosted" is 10.96.0.0/12
I had the same problem trying to run metrics on docker desktop and I followed #suren's answer and it worked.
The default configuration is:
- --kubelet-preferred-address-types=InternalIP,Hostname,InternalDNS,ExternalDNS,ExternalIP
And I changed to:
- --kubelet-preferred-address-types=InternalIP
I had same issue in my on-prem k8s v1.26 (cni=calico).
I thinks that this issue because of Metric-Server version (v0.6).
I solved my issue by apply Metric-Server v5.0.2
1- Download this Yaml file from official source
2- add ( - --kubelet-insecure-tls=true ) bellow the -args section
3- apply yaml
enjoy ;)

'kubectl top pods' Error from server (ServiceUnavailable): the server is currently unable to handle the request (get nodes.metrics.k8s.io)

When I am trying to run kubectl top nodes I`m getting the output:
Error from server (ServiceUnavailable): the server is currently unable to handle the request (get nodes.metrics.k8s.io)
Metric server is able to scrape the metrics, in the logs getting the metrics
ScrapeMetrics: time: 49.754499ms, nodes: 4, pods: 82
...Storing metrics...
...Cycle complete...
But the end points for the metrics service are missing, how can i resolve this issue?
kubectl get apiservices |egrep metrics
v1beta1.metrics.k8s.io kube-system/metrics-server False (MissingEndpoints)
any help will be appriciated!

kubernetes autoscale having issues with heapster

I have heapster installed on kubernetes, i am trying to autoscale my pods.. but i keep seeing the following:
unable to get metrics for resource cpu: unable to fetch metrics from resource metrics API: the server could not find the requested resource (get pods.metrics.k8s.io)
The heapster service itself
I1009 14:22:21.014890 1 heapster.go:73] Heapster version v1.4.2
I1009 14:22:21.015226 1 configs.go:61] Using Kubernetes client with master "https://kubernetes.default" and version v1
I1009 14:22:21.015244 1 configs.go:62] Using kubelet port 10250
I1009 14:22:21.030070 1 heapster.go:196] Starting with Metric Sink
I1009 14:22:21.042806 1 heapster.go:106] Starting heapster on port 8082
E1009 14:30:05.000311 1 kubelet.go:280] Node ip-xxxxxx.eu-west-1.compute.internal is not ready
E1009 14:30:05.000342 1 kubelet.go:280] Node ip-xxxxxx.eu-west-1.compute.internal is not ready
E1009 14:30:05.000351 1 kubelet.go:280] Node ip-xxxxxx.eu-west-1.compute.internal is not ready
E1009 14:30:05.000357 1 kubelet.go:280] Node ip-xxxxxx.eu-west-1.compute.internal is not ready
E1009 14:30:05.000363 1 kubelet.go:280] Node ip-xxxxxx.eu-west-1.compute.internal is not ready
E1009 14:30:05.000370 1 kubelet.go:280] Node ip-xxxxxx.eu-west-1.compute.internal is not ready
I want to have autoscaling working with kubernetes, anyone have any ideas?
I know heapster is deprecated, but as of now i cannot chnage or upgrade to metrics server.. so could do with some help
I have once seen similar error. The problem was RBAC issue but the error log was misleading. Make sure you have provided get permission for pods.metrics.k8s.io resource.
Check this similar question: Kubernetes Custom CRD: “Failed to list …: the server could not find the requested resource”
Note: Fetching metrics from Heapster is deprecated as of Kubernetes 1.11. Ref: Horizontal Pod Autoscaler

Kubernetes Metrics unable to fetch pod/node metrics

I've installed metrics-server on kubernetes v1.11.2.
I'm running a bare-metal cluster using 3 nodes and 1 master
In the metrics-server log I have the following errors:
E0907 14:29:51.774592 1 manager.go:102] unable to fully collect metrics: [unable to fully scrape metrics from source kubelet_summary:vps01: unable to
fetch metrics from Kubelet vps01 (vps01): Get https://vps01:10250/stats/summary/: dial tcp: lookup vps01 on 10.96.0.10:53: no such host, unable to fully scr
ape metrics from source kubelet_summary:vps04: unable to fetch metrics from Kubelet vps04 (vps04): Get https://vps04:10250/stats/summary/: dial tcp: lookup
vps04 on 10.96.0.10:53: no such host, unable to fully scrape metrics from source kubelet_summary:vps03: unable to fetch metrics from Kubelet vps03 (vps03):
Get https://vps03:10250/stats/summary/: dial tcp: lookup vps03 on 10.96.0.10:53: no such host, unable to fully scrape metrics from source kubelet_summary:vp
s02: unable to fetch metrics from Kubelet vps02 (vps02): Get https://vps02:10250/stats/summary/: dial tcp: lookup vps02 on 10.96.0.10:53: no such host]
E0907 14:30:01.694794 1 reststorage.go:98] unable to fetch pod metrics for pod boxweb/boxweb-deployment-7756c49688-fz625: no metrics known for pod "bo
xweb/boxweb-deployment-7756c49688-fz625"
E0907 14:30:10.517886 1 reststorage.go:112] unable to fetch node metrics for node "vps01": no metrics known for node "vps01"
I also can't get any metrics using
kubectl top node vps01
Same for autoscale it is not working
unable to get metrics for resource cpu: unable to fetch metrics from
resource metrics API: the server could not find the requested resource (get pods.metrics.k8s.io)
I found the following solution:
Change the metrics-server-deployment.yaml file and add:
command:
- /metrics-server
- --kubelet-preferred-address-types=InternalIP
- --kubelet-insecure-tls
It looks like you have DNS issue from your metrics-server pod. You can connect to the pod:
kubectl exec -it metrics-server-xxxxxxxxxx-xxxxx -n kube-system sh
/ # ping vps01
If you can't ping you can't resolve your node.
core-dns or kube-dns use the /etc/resolv.conf on each on your nodes too, so I would check if you can resolve the nodes between each other. Say, can you ping vps01 from vps02 or vps03, etc.
I got the same issue and I resolved by adding hostname in /etc/hosts on every node.
For collecting metric data (CPU/memory usage) metric-server try to access the nodes. However, the metric-server cannot resolve the hostname(vps01, vps02, vps03, and vps04) because the ones are not registered in DNS. As you mentioned, you cannot register the hostnames in DNS.
So, you must add the hostnames to /etc/hosts on the node where the metrics-server POD is running.
The autoscaler does not work since the metric-server is not working and there is no metric data.