Timeouts in metrics-server right after installing ingress in AKS - kubernetes

Prerequisites:
New kubernetes cluster (Azure, v. 1.14.8) is set up
Metrics-server is set up automatically by AKS (v. 0.3.5)
Steps:
Install ingress into cluster via helm install ingress stable/nginx-ingress --namespace ingress --create-namespace --set controller.replicaCount=1
Wait few minutes
After some minutes (3-8) there are errors in metrics-server and it fall into loop with FailedDiscoveryCheck error: Failed to make webhook authorized request: Post https://...azmk8s.io:443/apis/authorization.k8s.io/v1beta1/subjectaccessreviews: read tcp %IP%: read: connection timed out.
Error in NGINX Ingress controller pod:
E0625 12:18:49.622522 6 leaderelection.go:320] error retrieving resource lock ingress/ingress-controller-leader-nginx: Get "https://10.0.0.1:443/api/v1/namespaces/ingress/configmaps/ingress-controller-leader-nginx": context deadline exceeded
I0625 12:18:49.622561 6 leaderelection.go:277] failed to renew lease ingress/ingress-controller-leader-nginx: timed out waiting for the condition
I0625 12:18:49.626143 6 leaderelection.go:242] attempting to acquire leader lease ingress/ingress-controller-leader-nginx...
E0625 12:34:13.890642 6 leaderelection.go:320] error retrieving resource lock ingress/ingress-controller-leader-nginx: Get "https://10.0.0.1:443/api/v1/namespaces/ingress/configmaps/ingress-controller-leader-nginx": read tcp 10.244.0.53:55144->10.0.0.1:443: read: connection timed out
The metrics-server does not work until its restart. After the restart no issues are observed. The adding of liveness/readiness probes to metrics-server deployment fixes the issue with late restart of metrics-server, but does not fix the root cause.
Why the metrics-server stop working only after few minutes of installing ingress? How the installing of ingress affects cluster? It is reproduced stably. You can delete ingress, then install it again and the issue will be repeated.
Sometimes, metrics-server fails with error:
Message: endpoints for service/metrics-server in "kube-system" have no addresses
Reason: MissingEndpoints
The same behavior is also observed for another pod: If you install kubernetes-dashboard, then it stops working after installation of ingress. There is error 500 context deadline exceeded.
It is wanted to understand and fix the root cause.

Related

gmp managed prometheus example not working on a brand new vanilla stable gke autopilot cluster

Google managed prometheus seems like a great service however at the moment it does not work even in the example... https://cloud.google.com/stackdriver/docs/managed-prometheus/setup-managed
Setup:
create a new autopilot cluster 1.21.12-gke.2200
enable manage prometheus via gcloud cli command
gcloud beta container clusters update <mycluster> --enable-managed-prometheus --region us-central1
add port 8443 firewall webhook command
install ingress-nginx
try and use the PodMonitoring manifest to get metrics from ingress-nginx
Error from server (InternalError): error when creating "ingress-nginx/metrics.yaml": Internal error occurred: failed calling webhook "default.podmonitorings.gmp-operator.gke-gmp-system.monitoring.googleapis.com": Post "https://gmp-operator.gke-gmp-system.svc:443/default/monitoring.googleapis.com/v1/podmonitorings?timeout=10s": x509: certificate is valid for gmp-operator, gmp-operator.gmp-system, gmp-operator.gmp-system.svc, not gmp-operator.gke-gmp-system.svc
There is a thread suggesting this will all be fixed this week (8/11/2022), https://github.com/GoogleCloudPlatform/prometheus-engine/issues/300, but it seems like this should work regardless.
if I try to port forward ...
kubectl -n gke-gmp-system port-forward svc/gmp-operator 8443
error: Pod 'gmp-operator-67d5fff8b9-p4n7t' does not have a named port 'webhook'

Failed to install metrics-server on minikube

I am trying to install metrics-server on my Kubernetes cluster. But it is not going to READY mode.
I am was installed metrics-server in this method
kubectl apply -f https://github.com/kubernetes-sigs/metrics-server/releases/latest/download/components.yaml
After installing i was tried some of those commands, kubectl top pods, kubectl top nodes. But i got an error
Error from server (ServiceUnavailable): the server is currently unable to handle the request (get pods.metrics.k8s.io)
Metrics server is failed to start
Enable metrics-server addon in minikube cluster.
Try the following commend.
minikube addons enable metrics-server

Cert-manager and Ingress pods in Crash loop back off (AKS)

I was trying to upgrade the kubernetes version of our cluster from 1.19.7 to 1.22 and some of the worker nodes failed in updating so I restarted the cluster. After restarting the upgrade was successful but Cert-manager-webhook and Cert-manager-cainjector pods went down along with the ingress pods. i.e. they are either in crashloopbackoff state or error state
after checking the logs,
The cert-manager-webhook is throwing this error - "msg"="Failed to generate initial serving certificate, retrying..." "error"="failed verifying CA keypair: tls: failed to find any PEM data in certificate input" "interval"=1000000000
"msg"="Generating new ECDSA private key"
The cert-manager-cainjector is throwing this error- cert-manager/controller-runtime/manager "msg"="Failed to get API Group-Resources" "error"="an error on the server (\"\") has prevented the request from succeeding"
The nginx-ingress pod is throwing this error - SSL certificate chain completion is disabled (--enable-ssl-chain-completion=false)
Can anyone please help?

Istio Installation successful but not able to deploy POD

I have successfully installed Istio in k8 cluster.
Istio version is 1.9.1
Kubernetes CNI plugin used: Calico version 3.18 (Calico POD is up and running)
kubectl get pod -A
istio-system istio-egressgateway-bd477794-8rnr6 1/1 Running 0 124m
istio-system istio-ingressgateway-79df7c789f-fjwf8 1/1 Running 0 124m
istio-system istiod-6dc55bbdd-89mlv 1/1 Running 0 124
When I'm trying to deploy sample nginx app I am getting the error below:
failed calling webhook sidecar-injector.istio.io context deadline exceeded
Post "https://istiod.istio-system.svc:443/inject?timeout=30s":
context deadline exceeded
When I Disable automatic proxy sidecar injection the pod is getting deployed without any errors.
kubectl label namespace default istio-injection-
I am not sure how to fix this issue could you please some one help me on this?
In this case, adding hostNetwork:true under spec.template.spec to the istiod Deployment may help.
This seems to be a workaround when using Calico CNI for pod networking (see: failed calling webhook "sidecar-injector.istio.io)
As we can find in the Kubernetes Host namespaces documentation:
HostNetwork - Controls whether the pod may use the node network namespace. Doing so gives the pod access to the loopback device, services listening on localhost, and could be used to snoop on network activity of other pods on the same node.

Metric server not working : unable to handle the request (get nodes.metrics.k8s.io)

I am running command kubectl top nodes and getting error :
node#kubemaster:~/Desktop/metric$ kubectl top nodes
Error from server (ServiceUnavailable): the server is currently unable to handle the request (get nodes.metrics.k8s.io)
Metric Server pod is running with following params :
command:
- /metrics-server
- --metric-resolution=30s
- --requestheader-allowed-names=aggregator
- --kubelet-insecure-tls
- --kubelet-preferred-address-types=InternalIP,Hostname,InternalDNS,ExternalDNS,ExternalIP
Most of the answer I am getting is the above params,
Still getting error
E0601 18:33:22.012798 1 manager.go:111] unable to fully collect metrics: [unable to fully scrape metrics from source kubelet_summary:kubemaster: unable to fetch metrics from Kubelet kubemaster (192.168.56.30): Get https://192.168.56.30:10250/stats/summary?only_cpu_and_memory=true: context deadline exceeded, unable to fully scrape metrics from source kubelet_summary:kubenode1: unable to fetch metrics from Kubelet kubenode1 (192.168.56.31): Get https://192.168.56.31:10250/stats/summary?only_cpu_and_memory=true: dial tcp 192.168.56.31:10250: i/o timeout]
I have deployed metric server using :
kubectl apply -f https://github.com/kubernetes-sigs/metrics-server/releases/download/v0.3.6/components.yaml
What am I missing?
Using Calico for Pod Networking
On github page of metric server under FAQ:
[Calico] Check whether the value of CALICO_IPV4POOL_CIDR in the calico.yaml conflicts with the local physical network segment. The default: 192.168.0.0/16.
Could this be the reason. Can someone explains this to me.
I have setup Calico using :
kubectl apply -f https://docs.projectcalico.org/v3.14/manifests/calico.yaml
My Node Ips are : 192.168.56.30 / 192.168.56.31 / 192.168.56.32
I have initiated the cluster with --pod-network-cidr=20.96.0.0/12. So my pods Ip are 20.96.205.192 and so on.
Also getting this in apiserver logs
E0601 19:29:59.362627 1 available_controller.go:420] v1beta1.metrics.k8s.io failed with: failing or missing response from https://10.100.152.145:443/apis/metrics.k8s.io/v1beta1: Get https://10.100.152.145:443/apis/metrics.k8s.io/v1beta1: net/http: request canceled while waiting for connection (Client.Timeout exceeded while awaiting headers)
where 10.100.152.145 is IP of service/metrics-server(ClusterIP)
Surprisingly it works on another cluster with Node Ip in 172.16.0.0 range.
Rest everything is same. Setup using kudeadm, Calico, same pod cidr
It started working after I edited the metrics-server deployment yaml config to include a DNS policy.
hostNetwork: true
Refer to the link below:
https://www.linuxsysadmins.com/service-unavailable-kubernetes-metrics/
Default value of Calico net is 192.168.0.0/16
There is a comment in yaml file:
The default IPv4 pool to create on startup if none exists. Pod IPs
will be chosen from this range. Changing this value after installation
will have no effect. This should fall within --cluster-cidr.
name: CALICO_IPV4POOL_CIDR value: "192.168.0.0/16"
So, its better use different one if your home network is contained in 192.168.0.0/16.
Also, if you used kubeadm you can check your cidr in k8s:
kubeadm config view | grep Subnet
Or you can use kubectl:
kubectl --namespace kube-system get configmap kubeadm-config -o yaml
Default one in kubernetes "selfhosted" is 10.96.0.0/12
I had the same problem trying to run metrics on docker desktop and I followed #suren's answer and it worked.
The default configuration is:
- --kubelet-preferred-address-types=InternalIP,Hostname,InternalDNS,ExternalDNS,ExternalIP
And I changed to:
- --kubelet-preferred-address-types=InternalIP
I had same issue in my on-prem k8s v1.26 (cni=calico).
I thinks that this issue because of Metric-Server version (v0.6).
I solved my issue by apply Metric-Server v5.0.2
1- Download this Yaml file from official source
2- add ( - --kubelet-insecure-tls=true ) bellow the -args section
3- apply yaml
enjoy ;)