CRD probe failing - kubernetes

I am installing service catalog which uses CRD and have created the same. Now I am running my controller deployment file and the image running in it runs a CRD list command to verify CRD are in place. This use to work fine previously but now CRD Probe is failing with error:
1226 07:45:01.539118 1 round_trippers.go:438] GET https://169.72.128.1:443/apis/apiextensions.k8s.io/v1beta1/customresourcedefinitions?labelSelector=svcat%3Dtrue in 30000 milliseconds
I1226 07:45:01.539158 1 round_trippers.go:444] Response Headers:
Error: while waiting for ready Service Catalog CRDs: failed to list CustomResourceDefinition: Get https://169.72.128.1:443/apis/apiextensions.k8s.io/v1beta1/customresourcedefinitions?labelSelector=svcat%3Dtrue: dial tcp 169.72.128.1:443: i/o timeout
I have followed same steps as previously but could not debug now.
Inside the controller code it is trying to make following call:
list, err := r.client.ApiextensionsV1beta1().CustomResourceDefinitions().List(v1.ListOptions{LabelSelector: labels.SelectorFromSet(labels.Set{"svcat": "true"}).String()})
Which is failing.
Update 1 : Installation works fine in default namespace but fails in specific namespace.
Environment Info: On Prem k8s cluster, latest k8s, 2 node cluster.

It's not a port issue.Service accounts use 443 port to connect to Kubernetes API Server. Check the if there is any network policy blocking the communication between your namespace and Kube-System namespace.

Related

Kubernetes - Failed to Apply a yaml from a raw url, Unable to connect to the server: dial tcp: lookup raw.githubusercontent.com on: server misbehaving

Im Anddiy and im working with a kubernetes cluster deployed by Rancher.
Its important to say that all my machines don't have direct access to the internet, im using a proxy to use the internet for downloads or something, so i've setted RKE2 with this proxy during the installation steps.
Here i have a machine with an RKE2 that build up my Rancher, and from the Rancer U.I i've created my Kubernetes Cluster, here it tis:
[15:19] root#vmrmmstnodehom01 [~]:# kubectl get nodes
NAME STATUS ROLES AGE VERSION
vmrmmstnodehom01 Ready controlplane 5d16h v1.24.8
vmrmwrknodehom01 Ready controlplane,etcd,worker 5d20h v1.24.8
vmrmwrknodehom02 Ready worker 5d19h v1.24.8
vmrmwrknodehom03 Ready worker 5d19h v1.24.8
vmrmwrknodehom04 Ready worker 5d19h v1.24.8
My cluster is a clean cluster, no applications installed on it at moment.
I've tried to install the longhorn application by using this command ( got this on longhorn documentation):
kubectl apply -f https://raw.githubusercontent.com/longhorn/longhorn/v1.4.0/deploy/longhorn.yaml
But when i tried that, this error message are displayed to me:
Unable to connect to the server: dial tcp: lookup raw.githubusercontent.com on 10.129.251.125:53: server misbehaving
I've tried to check if it is my proxy that don't connect to this url or something, but my machine connected succesfully to this url, i've tried that using the CURL -V and the longhorn url to test that.
I don't know if the kubernetes api has imported the proxy configs of my rke2/rancher, so i don't know if i need to set the proxy manually internal or something, really don't know what is happening here.

gmp managed prometheus example not working on a brand new vanilla stable gke autopilot cluster

Google managed prometheus seems like a great service however at the moment it does not work even in the example... https://cloud.google.com/stackdriver/docs/managed-prometheus/setup-managed
Setup:
create a new autopilot cluster 1.21.12-gke.2200
enable manage prometheus via gcloud cli command
gcloud beta container clusters update <mycluster> --enable-managed-prometheus --region us-central1
add port 8443 firewall webhook command
install ingress-nginx
try and use the PodMonitoring manifest to get metrics from ingress-nginx
Error from server (InternalError): error when creating "ingress-nginx/metrics.yaml": Internal error occurred: failed calling webhook "default.podmonitorings.gmp-operator.gke-gmp-system.monitoring.googleapis.com": Post "https://gmp-operator.gke-gmp-system.svc:443/default/monitoring.googleapis.com/v1/podmonitorings?timeout=10s": x509: certificate is valid for gmp-operator, gmp-operator.gmp-system, gmp-operator.gmp-system.svc, not gmp-operator.gke-gmp-system.svc
There is a thread suggesting this will all be fixed this week (8/11/2022), https://github.com/GoogleCloudPlatform/prometheus-engine/issues/300, but it seems like this should work regardless.
if I try to port forward ...
kubectl -n gke-gmp-system port-forward svc/gmp-operator 8443
error: Pod 'gmp-operator-67d5fff8b9-p4n7t' does not have a named port 'webhook'

k3s - Metrics server doesn't work for worker nodes

I deployed a k3s cluster into 2 raspberry pi 4. One as a master and the second as a worker using the script k3s offered with the following options:
For the master node:
curl -sfL https://get.k3s.io | INSTALL_K3S_EXEC='server --bind-address 192.168.1.113 (which is the master node ip)' sh -
To the agent node:
curl -sfL https://get.k3s.io | \
K3S_URL=https://192.168.1.113:6443 \
K3S_TOKEN=<master-token> \
INSTALL_K3S_EXEC='agent' sh-
Everything seems to work, but kubectl top nodes returns the following:
NAME CPU(cores) CPU% MEMORY(bytes) MEMORY%
k3s-master 137m 3% 1285Mi 33%
k3s-node-01 <unknown> <unknown> <unknown> <unknown>
I also tried to deploy the k8s dashboard, according to what is written in the docs but it fails to work because it can't reach the metrics server and gets a timeout error:
"error trying to reach service: dial tcp 10.42.1.11:8443: i/o timeout"
and I see a lot of errors in the pod logs:
2021/09/17 09:24:06 Metric client health check failed: the server is currently unable to handle the request (get services dashboard-metrics-scraper). Retrying in 30 seconds.
2021/09/17 09:25:06 Metric client health check failed: the server is currently unable to handle the request (get services dashboard-metrics-scraper). Retrying in 30 seconds.
2021/09/17 09:26:06 Metric client health check failed: the server is currently unable to handle the request (get services dashboard-metrics-scraper). Retrying in 30 seconds.
2021/09/17 09:27:06 Metric client health check failed: the server is currently unable to handle the request (get services dashboard-metrics-scraper). Retrying in 30 seconds.
logs from the metrics-server pod:
elet_summary:k3s-node-01: unable to fetch metrics from Kubelet k3s-node-01 (k3s-node-01): Get https://k3s-node-01:10250/stats/summary?only_cpu_and_memory=true: dial tcp 192.168.1.106:10250: connect: no route to host
E0917 14:03:24.767949 1 manager.go:111] unable to fully collect metrics: unable to fully scrape metrics from source kubelet_summary:k3s-node-01: unable to fetch metrics from Kubelet k3s-node-01 (k3s-node-01): Get https://k3s-node-01:10250/stats/summary?only_cpu_and_memory=true: dial tcp 192.168.1.106:10250: connect: no route to host
E0917 14:04:24.767960 1 manager.go:111] unable to fully collect metrics: unable to fully scrape metrics from source kubelet_summary:k3s-node-01: unable to fetch metrics from Kubelet k3s-node-01 (k3s-node-01): Get https://k3s-node-01:10250/stats/summary?only_cpu_and_memory=true: dial tcp 192.168.1.106:10250: connect: no route to host
Moving this out of comments for better visibility.
After creation of small cluster, I wasn't able to reproduce this behaviour and metrics-server worked fine for both nodes, kubectl top nodes showed information and metrics about both available nodes (thought it took some time to start collecting the metrics).
Which leads to troubleshooting steps why it doesn't work. Checking metrics-server logs is the most efficient way to figure this out:
$ kubectl logs metrics-server-58b44df574-2n9dn -n kube-system
Based on logs it will be different steps to continue, for instance in comments above:
first it was no route to host which is related to network and lack of possibility to resolve hostname
then i/o timeout which means route exists, but service did not respond back. This may happen due to firewall which blocks certain ports/sources, kubelet is not running (listens to port 10250) or as it appeared for OP, there was an issue with ntp which affected certificates and connections.
errors may be different in other cases, it's important to find the error and based on it troubleshoot further.

Metric server not working : unable to handle the request (get nodes.metrics.k8s.io)

I am running command kubectl top nodes and getting error :
node#kubemaster:~/Desktop/metric$ kubectl top nodes
Error from server (ServiceUnavailable): the server is currently unable to handle the request (get nodes.metrics.k8s.io)
Metric Server pod is running with following params :
command:
- /metrics-server
- --metric-resolution=30s
- --requestheader-allowed-names=aggregator
- --kubelet-insecure-tls
- --kubelet-preferred-address-types=InternalIP,Hostname,InternalDNS,ExternalDNS,ExternalIP
Most of the answer I am getting is the above params,
Still getting error
E0601 18:33:22.012798 1 manager.go:111] unable to fully collect metrics: [unable to fully scrape metrics from source kubelet_summary:kubemaster: unable to fetch metrics from Kubelet kubemaster (192.168.56.30): Get https://192.168.56.30:10250/stats/summary?only_cpu_and_memory=true: context deadline exceeded, unable to fully scrape metrics from source kubelet_summary:kubenode1: unable to fetch metrics from Kubelet kubenode1 (192.168.56.31): Get https://192.168.56.31:10250/stats/summary?only_cpu_and_memory=true: dial tcp 192.168.56.31:10250: i/o timeout]
I have deployed metric server using :
kubectl apply -f https://github.com/kubernetes-sigs/metrics-server/releases/download/v0.3.6/components.yaml
What am I missing?
Using Calico for Pod Networking
On github page of metric server under FAQ:
[Calico] Check whether the value of CALICO_IPV4POOL_CIDR in the calico.yaml conflicts with the local physical network segment. The default: 192.168.0.0/16.
Could this be the reason. Can someone explains this to me.
I have setup Calico using :
kubectl apply -f https://docs.projectcalico.org/v3.14/manifests/calico.yaml
My Node Ips are : 192.168.56.30 / 192.168.56.31 / 192.168.56.32
I have initiated the cluster with --pod-network-cidr=20.96.0.0/12. So my pods Ip are 20.96.205.192 and so on.
Also getting this in apiserver logs
E0601 19:29:59.362627 1 available_controller.go:420] v1beta1.metrics.k8s.io failed with: failing or missing response from https://10.100.152.145:443/apis/metrics.k8s.io/v1beta1: Get https://10.100.152.145:443/apis/metrics.k8s.io/v1beta1: net/http: request canceled while waiting for connection (Client.Timeout exceeded while awaiting headers)
where 10.100.152.145 is IP of service/metrics-server(ClusterIP)
Surprisingly it works on another cluster with Node Ip in 172.16.0.0 range.
Rest everything is same. Setup using kudeadm, Calico, same pod cidr
It started working after I edited the metrics-server deployment yaml config to include a DNS policy.
hostNetwork: true
Refer to the link below:
https://www.linuxsysadmins.com/service-unavailable-kubernetes-metrics/
Default value of Calico net is 192.168.0.0/16
There is a comment in yaml file:
The default IPv4 pool to create on startup if none exists. Pod IPs
will be chosen from this range. Changing this value after installation
will have no effect. This should fall within --cluster-cidr.
name: CALICO_IPV4POOL_CIDR value: "192.168.0.0/16"
So, its better use different one if your home network is contained in 192.168.0.0/16.
Also, if you used kubeadm you can check your cidr in k8s:
kubeadm config view | grep Subnet
Or you can use kubectl:
kubectl --namespace kube-system get configmap kubeadm-config -o yaml
Default one in kubernetes "selfhosted" is 10.96.0.0/12
I had the same problem trying to run metrics on docker desktop and I followed #suren's answer and it worked.
The default configuration is:
- --kubelet-preferred-address-types=InternalIP,Hostname,InternalDNS,ExternalDNS,ExternalIP
And I changed to:
- --kubelet-preferred-address-types=InternalIP
I had same issue in my on-prem k8s v1.26 (cni=calico).
I thinks that this issue because of Metric-Server version (v0.6).
I solved my issue by apply Metric-Server v5.0.2
1- Download this Yaml file from official source
2- add ( - --kubelet-insecure-tls=true ) bellow the -args section
3- apply yaml
enjoy ;)

Kubernetes-dashboard pod is crashing again and again

I have installed and configured Kubernetes on my ubuntu machine, followed this Document
After deploying the Kubernetes-dashboard, container keep crashing
kubectl create -f https://raw.githubusercontent.com/kubernetes/dashboard/master/src/deploy/recommended/kubernetes-dashboard.yaml
Started the Proxy using:
kubectl proxy --address='0.0.0.0' --accept-hosts='.*' --port=8001
Pod status:
kubectl get pods -o wide --all-namespaces
....
....
kube-system kubernetes-dashboard-64576d84bd-z6pff 0/1 CrashLoopBackOff 26 2h 192.168.162.87 kb-node <none>
Kubernetes system log:
root#KB-master:~# kubectl -n kube-system logs kubernetes-dashboard-64576d84bd-z6pff --follow
2018/09/11 09:27:03 Starting overwatch
2018/09/11 09:27:03 Using apiserver-host location: http://192.168.33.30:8001
2018/09/11 09:27:03 Skipping in-cluster config
2018/09/11 09:27:03 Using random key for csrf signing
2018/09/11 09:27:03 No request provided. Skipping authorization
2018/09/11 09:27:33 Error while initializing connection to Kubernetes apiserver. This most likely means that the cluster is misconfigured (e.g., it has invalid apiserver certificates or service account's configuration) or the --apiserver-host param points to a server that does not exist. Reason: Get http://192.168.33.30:8001/version: dial tcp 192.168.33.30:8001: i/o timeout
Refer to our FAQ and wiki pages for more information: https://github.com/kubernetes/dashboard/wiki/FAQ
Getting the msg when I'm trying to hit below link on the browser
URL:http://192.168.33.30:8001/api/v1/namespaces/kube-system/services/https:kubernetes-dashboard:/proxy/#!/login
Error: 'dial tcp 192.168.162.87:8443: connect: connection refused'
Trying to reach: 'https://192.168.162.87:8443/'
Anyone can help me with this.
http://192.168.33.30:8001 is not a legitimate API server URL. All communications with the API server use TLS internally (https:// URL scheme). These communications are verified using the API server CA certificate and are authenticated by mean of tokens signed by the same CA.
What you see is the result of a misconfiguration. At first sight it seems like you mixed pod, service and host networks.
Make sure you understand the difference between Host network, Pod network and Service network. These 3 networks can not overlap. For example --pod-network-cidr=192.168.0.0/16 must not include the IP address of your host, change it to 10.0.0.0/16 or something smaller if necessary.
After you have a clear overview of the network topology, run the setup again and everything will be configured correctly, including the Kubernetes CA.