Single-Node Kubernete Cluster Has Cluster-Wide 401 Unauthorized Error in Microservices After CA Cert Ratation - kubernetes

What was done?
kubeadm init phase certs all
kubeadm init phase kubeconfig all
Daemon reloaded
Kubelet restarted
Calico CNI restarted
Now:
All Worker Nodes show Ready State
All Deployments and pods show Running state
Application has errors in logs:
akka.management.cluster.bootstrap.internal.BootstrapCoordinator -
Resolve attempt failed! Cause:
akka.discovery.kubernetes.KubernetesApiServiceDiscovery$KubernetesApiException:
Non-200 from Kubernetes API server: 401 Unauthorized
Kube Apiserver has logs:
Unable to authenticate the request due to an error: [invalid bearer token, square/go-jose: error in cryptographic primitive]
Could it be the old certs and tokensbeing cached by the services somewhere ?

Related

Cert-manager and Ingress pods in Crash loop back off (AKS)

I was trying to upgrade the kubernetes version of our cluster from 1.19.7 to 1.22 and some of the worker nodes failed in updating so I restarted the cluster. After restarting the upgrade was successful but Cert-manager-webhook and Cert-manager-cainjector pods went down along with the ingress pods. i.e. they are either in crashloopbackoff state or error state
after checking the logs,
The cert-manager-webhook is throwing this error - "msg"="Failed to generate initial serving certificate, retrying..." "error"="failed verifying CA keypair: tls: failed to find any PEM data in certificate input" "interval"=1000000000
"msg"="Generating new ECDSA private key"
The cert-manager-cainjector is throwing this error- cert-manager/controller-runtime/manager "msg"="Failed to get API Group-Resources" "error"="an error on the server (\"\") has prevented the request from succeeding"
The nginx-ingress pod is throwing this error - SSL certificate chain completion is disabled (--enable-ssl-chain-completion=false)
Can anyone please help?

restore destroyed kubeadm master

I created a 1-master 2-workers kubernetes cluster using kubeadm 1.20 and backed up the etcd. I destroyed the master on purpose to see test how to get cluster back to running state.
Kubernetes version: 1.20
Installation method: kubeadm
Host OS: windows 10 pro
Guest OS: ubuntu 18 on virtual box 6
CNI and version: weave-net
CRI and version: docker 19
I'm partially successful in that the secret that I created before destroying master is visible after etcd restore, so that part seems to work.
HOWEVER the coredns pods are unauthorized to make requests to api server, based on the logs of coredns pods:
[INFO] plugin/ready: Still waiting on: "kubernetes"
E1229 21:42:25.892580 1 reflector.go:178] pkg/mod/k8s.io/client-go#v0.18.3/tools/cache/reflector.go:125: Failed to list *v1.Namespace: Unauthorized
E1229 21:42:29.680620 1 reflector.go:178] pkg/mod/k8s.io/client-go#v0.18.3/tools/cache/reflector.go:125: Failed to list *v1.Endpoints: Unauthorized
[INFO] plugin/ready: Still waiting on: "kubernetes"
E1229 21:42:39.492521 1 reflector.go:178] pkg/mod/k8s.io/client-go#v0.18.3/tools/cache/reflector.go:125: Failed to list *v1.Service: Unauthorized
I'm guessing it has something to do with service account tokens so there's a step I'm missing to authorize pods to authenticate to api-server after etcd db replacement.
What am I missing?
If you only backed up the contents of Etcd then kubeadm would have generated new certificates used for signing the ServiceAccount JWTs. Old tokens would no longer verify. As this is not generally done during routine maintenance, I don't think the SA controller knows to reissues the tokens. If you delete all the underlying secrets it should do the reissue though.

Worker node unable to join master node in kubernetes

I have two network interface in my master node -
192.168.56.118
10.0.3.15
While doing kubeadm init on master node, I got following command to add workers
kubeadm join --token qr1czu.5lh1nt34ldiauc1u 192.168.56.118:6443 --discovery-token-ca-cert-hash sha256:e5d90dfa0fff67589551559c443762dac3f1e5c7a5d2b4a630e4c0156ad0e16c
As you can see, it shows 192.168.56.118 IP to connect from worker.
But while executing the same on worker node, I'm getting following error.
[root#k8s-worker ~]# kubeadm join --token qr1czu.5lh1nt34ldiauc1u 192.168.56.118:6443 --discovery-token-ca-cert-hash sha256:e5d90dfa0fff67589551559c443762dac3f1e5c7a5d2b4a630e4c0156ad0e16c
[preflight] Running pre-flight checks
[discovery] Trying to connect to API Server "192.168.56.118:6443"
[discovery] Created cluster-info discovery client, requesting info from "https://192.168.56.118:6443"
[discovery] Requesting info from "https://192.168.56.118:6443" again to validate TLS against the pinned public key
[discovery] Failed to request cluster info, will try again: [Get https://192.168.56.118:6443/api/v1/namespaces/kube-public/configmaps/cluster-info: x509: certificate is valid for 10.96.0.1, 10.0.3.15, not 192.168.56.118]
I tried with other IP - 10.0.3.15. But it returns connection refused error, despite the fact that the firewall is disabled in master.
[root#k8s-worker ~]# kubeadm join --token qr1czu.5lh1nt34ldiauc1u 10.0.3.15:6443 --discovery-token-ca-cert-hash sha256:e5d90dfa0fff67589551559c443762dac3f1e5c7a5d2b4a630e4c0156ad0e16c
[preflight] Running pre-flight checks
[WARNING Service-Kubelet]: kubelet service is not enabled, please run 'systemctl enable kubelet.service'
[discovery] Trying to connect to API Server "10.0.3.15:6443"
[discovery] Created cluster-info discovery client, requesting info from "https://10.0.3.15:6443"
[discovery] Failed to request cluster info, will try again: [Get https://10.0.3.15:6443/api/v1/namespaces/kube-public/configmaps/cluster-info: dial tcp 10.0.3.15:6443: connect: connection refused]
How can I force the certificate to make 192.168.56.118 as valid? or any idea how can I resolve this issue?
You need to provide extra apiserver certificate SAN (--apiserver-cert-extra-sans <ip_address>) and api server advertise address(--apiserver-advertise-address) while initialising the cluster using kubeadm init. Your kubeadm init command will look like:
kubeadm init --apiserver-cert-extra-sans 192.168.56.118 --apiserver-advertise-address 192.168.56.118
Once, you initialise cluster with above command you will not face the issue of certificates while joining the cluster

Unable to see join nodes in Kubernetes master

This is my worker node:
root#ivu:~# kubeadm join 10.16.70.174:6443 --token hl36mu.0uptj0rp3x1lfw6n --discovery-token-ca-cert-hash sha256:daac28160d160f938b82b8c720cfc91dd9e6988d743306f3aecb42e4fb114f19 --ignore-preflight-errors=swap
[preflight] Running pre-flight checks.
[WARNING Swap]: running with swap on is not supported. Please disable swap
[WARNING FileExisting-crictl]: crictl not found in system path
Suggestion: go get github.com/kubernetes-incubator/cri-tools/cmd/crictl
[discovery] Trying to connect to API Server "10.16.70.174:6443"
[discovery] Created cluster-info discovery client, requesting info from "https://10.16.70.174:6443"
[discovery] Requesting info from "https://10.16.70.174:6443" again to validate TLS against the pinned public key
[discovery] Cluster info signature and contents are valid and TLS certificate validates against pinned roots, will use API Server "10.16.70.174:6443"
[discovery] Successfully established connection with API Server "10.16.70.174:6443"
This node has joined the cluster:
* Certificate signing request was sent to master and a response
was received.
* The Kubelet was informed of the new secure connection details.
Run 'kubectl get nodes' on the master to see this node join the cluster.
While checking in master nodes using command kubectl get nodes, I can only able to see master:
ivum01#ivum01-HP-Pro-3330-SFF:~$ kubectl get nodes
NAME STATUS ROLES AGE VERSION
ivum01-hp-pro-3330-sff Ready master 36m v1.10.0
For question answer:
docker kubelet kubeadm kubectl installed fine
kubectl get node can not see the current added node; of cause kubectl get pods --all-namespaces has no result for this node;
docker which in current node has no report for kubeadm command(means no k8s images pull, no running container for that)
must import is kubelet not running in work node
run kubelet output:
Failed to get system container stats for "/user.slice/user-1000.slice/session-1.scope": failed to get cgroup stats for "/user.slice/user-1000.slice/session-1.scope": failed to get container info for "/user.slice/user-1000.slice/session-1.scope": unknown container "/user.slice/user-1000.slice/session-1.scope"
same as this issue said
tear down and reset cluster(kubeadm reset) and redo that has no problem in my case.
I had this problem and it was solved by ensuring that cgroup driver on the worker nodes also were set properly.
check with:
docker info | grep -i cgroup
cat /etc/systemd/system/kubelet.service.d/10-kubeadm.conf
set it with:
sed -i "s/cgroup-driver=systemd/cgroup-driver=cgroupfs/g" /etc/systemd/system/kubelet.service.d/10-kubeadm.conf
then restart kubelet service and rejoin the cluster:
systemctl daemon-reload
systemctl restart kubelet
kubeadm reset
kubeadm join ...
Info from docs: https://kubernetes.io/docs/tasks/tools/install-kubeadm/#configure-cgroup-driver-used-by-kubelet-on-master-node

kubernetes HA joing work node error - cluster CA found in cluster-info configmap is invalid: public key

I trying to setup the 3 node Multi Master kubernetes v1.10.0 cluster setup using Centos 7.
following kubeadm document steps. I was able to setup the 3 master all of them in Ready status.
https://kubernetes.io/docs/setup/independent/high-availability/#kubeadm-init-master0
while joining work node. I am getting certificate issue, kubeadm join command giving this error message.
kubeadm join <masterip>:6443 --token v5ylg8.jgqn122kewvobaoo --discovery-token-ca-cert-hash sha256:b01832713190461cc96ca02e5c2b1e578473c6712b0965d2383aa9de9b41d4b6 --ignore-preflight-errors=cri
[preflight] Running pre-flight checks.
[WARNING SystemVerification]: docker version is greater than the most recently validated version. Docker version: 17.09.1-ce. Max validated version: 17.03
[WARNING CRI]: unable to check if the container runtime at "/var/run/dockershim.sock" is running: exit status 1
[discovery] Trying to connect to API Server "<masterip>:6443"
[discovery] Created cluster-info discovery client, requesting info from "https://<masterip>:6443"
[discovery] Failed to connect to API Server "<masterip>:6443": cluster CA found in cluster-info configmap is invalid: public key sha256:87d28cf32666a75cb7ed6502ab5c726de29438754c48507687648e84ad9b6693 not pinned
[discovery] Trying to connect to API Server "<masterip>:6443"
[discovery] Created cluster-info discovery client, requesting info from "https://<masterip>:6443"
[discovery] Failed to connect to API Server "<masterip>:6443": cluster CA found in cluster-info configmap is invalid: public key sha256:87d28cf32666a75cb7ed6502ab5c726de29438754c48507687648e84ad9b6693 not pinned
How to find out root cause for this certificate error message?
Thanks
SR