Kubeflow / Istio - problem with admission-webhook-bootstrap and istio-side-injector - kubernetes

I have an installation of KF 1.0.2 on GCP, that used to work fine. Recently two pods went into a state of CrashLoopBackOff
admission-webhook-bootstrap-stateful-set, with the error message:
Error from server (NotFound):
mutatingwebhookconfigurations.admissionregistration.k8s.io
"admission-webhook-mutating-webhook-configuration" not found
istio-sidecar-injector, the error message:
failed to start patch cert loop
mutatingwebhookconfigurations.admissionregistration.k8s.io
"istio-sidecar-injector" not found
I restarted webhooks as shown below, but no success:
kubectl delete mutatingwebhookconfigurations.admissionregistration.k8s.io --all
kubectl delete validatingwebhookconfigurations.admissionregistration.k8s.io --all
Any ideas on how to fix it?

Related

kubectl get --raw /metrics return Error from server (NotFound)

I am trying to use Prometheus for monitoring my EKS Fargate (k8s ver.: 1.23). So far I followed the procedure from https://aws.amazon.com/blogs/containers/monitoring-amazon-eks-on-aws-fargate-using-prometheus-and-grafana/ and https://devopscube.com/setup-prometheus-monitoring-on-kubernetes/. Right now I only received the metrics from "kube-state-metrics", but failed to get the resource usage and performance data.
Besides, I got "Error from server (NotFound): the server could not find the requested resource" when I run kubectl get --raw /metrics, kubectl get --raw /api/v1/nodes/${NODE_NAME}/proxy/metrics/cadvisor
Anyone knows the reason why I got Error from server (NotFound)? How can I fix it?
Thank you!

Multiple K8s Issues relating to Webhooks, Logs & I/O Timeouts

I have a really weird issue with one of my Linode K8s clusters running 1.23, there are multiple issues occuring and I can't quite pinpoint the root cause.
Linode have let me know it is not a issue with the master and nothing on there end, let me highlight all the identified problems to start.
Logs not Working
When trying to pull logs from any pods I get this error (which makes it very hard to troubleshoot)
root#aidan:~# kubectl logs <pod-name> -n revwhois-subdomain-enum
Error from server: Get "https://192.168.150.102:10250/containerLogs/revwhois-subdomain-enum/tldbrr-revwhois-worker12-twppv/tldbrr-revwhois-worker12": dial tcp 192.168.150.102:10250: i/o timeout
Metrics not Working
root#m0chan:~# kubectl top nodes
Error from server (ServiceUnavailable): the server is currently unable to handle the request (get nodes.metrics.k8s.io)
Pod Deletion not Working
When deleting a pod with kubectl delete pod <pod-name> - <namespace> - it will delete the pod however it is stuck in a terminating state, the old pod is not deleted and anew pod is not launched.
Errors Editing Ingress
Error from server (InternalError): error when creating "yaml/xxx/xxx-ingress.yaml": Internal error occurred: failed calling webhook "validate.nginx.ingress.kubernetes.io": Post https://ingress-nginx-controller-admission.ingress-nginx.svc:443/extensions/v1beta1/ingresses?timeout=30s: Temporary Redirect
I also have errors on Metrics logs and Cert-Manager logs relating to failed calling webhook
This is all for now and I would really appreciate some help resolving this.
Aidan

kubectl exec fails intermittently with error: unable to upgrade connection: Authorization error

I am just trying to do kubectl exec to one of my pods. When I see the pod status, it is all fine. My worker nodes are also in a good state. But when I try 'kubectl exec', it fails intermittently with the below error. Not able to understand why is this happening. We guessed it might be because of worker nodes, so we deployed all new worker nodes freshly. But still, see the issue.
error: unable to upgrade connection: Authorization error (user=cluster_admin, verb=create, resource=nodes, subresource=proxy)
Any help is much appreciated. Thanks
Looks like it's not an issue of the Kubectl exec
it's failing due to Authorization issue.
Verify first access
kubectl auth can-i create pods/exec
yes
kubectl auth can-i get pods/exec
yes
If you have admin access with kubectl and getting output of commands, which mean you kubectl to API server connection is good.
It could be failing at kubelet level, as kubelet might be configured to Auth the all requests and API server is not providing the details.
You can read more about it at : https://kubernetes.io/docs/reference/access-authn-authz/kubelet-authn-authz/#kubelet-authentication
API server uses the --kubelet-client-certificate and --kubelet-client-key flags for auth

unable to create Istio bookinfo-gateway.yaml Gateway

While running this command k
kubectl apply -f samples/bookinfo/networking/bookinfo-gateway.yaml
I am getting this error
Error from server (NotFound): error when deleting
"samples/bookinfo/networking/bookinfo-gateway.yaml": the server could
not find the requested resource (delete gatewaies.networking.istio.io
bookinfo-gateway)
Can someone please tell me how can i accept gatewaies plural ? or how to fix this error
Upgrading to latest kubectl solved the issue

Cert-manager fails on kubernetes with webhooks

I'm following the Kubernetes install instructions for Helm: https://docs.cert-manager.io/en/latest/getting-started/install/kubernetes.html
With Cert-manager v0.81 on K8 v1.15, Ubuntu 18.04 on-premise.
When I get to testing the installation, I get these errors:
error when creating "test-resources.yaml": Internal error occurred: failed calling webhook "issuers.admission.certmanager.k8s.io": the server is currently unable to handle the request
Error from server (InternalError): error when creating "test-resources.yaml": Internal error occurred: failed calling webhook "certificates.admission.certmanager.k8s.io": the server is currently unable to handle the request
If I apply the test-resources.yaml before installing with Helm, I'm not getting the errors but it is still not working.
These errors are new to me, as Cert-manager used to work for me on my previous install about a month ago, following the same installation instructions.
I've tried with Cert-Manager 0.72(CRD 0.7) as well as I think that was the last version I managed to get installed but its not working either.
What does these errors mean?
Update: It turned out to be an internal CoreDNS issue on my cluster. Somehow not being configured correctly. Possible related to wrong POD_CIDR configuration.
If you experience this problem, check the logs of CoreDNS(Or KubeDNS) and you may see lots of errors related to contacting services. Unfortunately, I no longer have the errors.
But this is how I figured out that my network setup was invalid.
I'm using Calico(Will apply for other networks as well) and its network was not set to the same as the POD_CIDR network that I initialized my Kubernetes with.
Example
1. Set up K8:
kubeadm init --pod-network-cidr=10.244.0.0/16
Configure Calico.yaml:
- name: CALICO_IPV4POOL_CIDR
value: "10.244.0.0/16"
I also tried cert-manager v0.8.0 a very similar setup on Ubuntu 18.04 and k8s v1.14.1 and I began to get the same error when i tore down cert-manager using kubectl delete and reinstalled it, after experiencing some network issues on the cluster.
I stumbled on a solution that worked. On the master node, simply restart the apiserver container:
$ sudo docker ps -a | grep apiserver
af99f816c7ec gcr.io/google_containers/kube-apiserver#sha256:53b987e5a2932bdaff88497081b488e3b56af5b6a14891895b08703129477d85 "/bin/sh -c '/usr/loc" 15 months ago Up 19 hours k8s_kube-apiserver_kube-apiserver-ip-xxxxxc_0
40f3a18050c3 gcr.io/google_containers/pause-amd64:3.0 "/pause" 15 months ago Up 15 months k8s_POD_kube-apiserver-ip-xxxc_0
$ sudo docker restart af99f816c7ec
af99f816c7ec
$
Then try applying the test-resources.yaml again:
$ kubectl apply -f test-resources.yaml
namespace/cert-manager-test unchanged
issuer.certmanager.k8s.io/test-selfsigned created
certificate.certmanager.k8s.io/selfsigned-cert created
If that does not work, this github issue mentions that the master node might need firewall rules to be able to reach the cert-manager-webhook pod. The exact steps to do so will depend on which cloud platform you are on.