restore destroyed kubeadm master - kubernetes

I created a 1-master 2-workers kubernetes cluster using kubeadm 1.20 and backed up the etcd. I destroyed the master on purpose to see test how to get cluster back to running state.
Kubernetes version: 1.20
Installation method: kubeadm
Host OS: windows 10 pro
Guest OS: ubuntu 18 on virtual box 6
CNI and version: weave-net
CRI and version: docker 19
I'm partially successful in that the secret that I created before destroying master is visible after etcd restore, so that part seems to work.
HOWEVER the coredns pods are unauthorized to make requests to api server, based on the logs of coredns pods:
[INFO] plugin/ready: Still waiting on: "kubernetes"
E1229 21:42:25.892580 1 reflector.go:178] pkg/mod/k8s.io/client-go#v0.18.3/tools/cache/reflector.go:125: Failed to list *v1.Namespace: Unauthorized
E1229 21:42:29.680620 1 reflector.go:178] pkg/mod/k8s.io/client-go#v0.18.3/tools/cache/reflector.go:125: Failed to list *v1.Endpoints: Unauthorized
[INFO] plugin/ready: Still waiting on: "kubernetes"
E1229 21:42:39.492521 1 reflector.go:178] pkg/mod/k8s.io/client-go#v0.18.3/tools/cache/reflector.go:125: Failed to list *v1.Service: Unauthorized
I'm guessing it has something to do with service account tokens so there's a step I'm missing to authorize pods to authenticate to api-server after etcd db replacement.
What am I missing?

If you only backed up the contents of Etcd then kubeadm would have generated new certificates used for signing the ServiceAccount JWTs. Old tokens would no longer verify. As this is not generally done during routine maintenance, I don't think the SA controller knows to reissues the tokens. If you delete all the underlying secrets it should do the reissue though.

Related

Single-Node Kubernete Cluster Has Cluster-Wide 401 Unauthorized Error in Microservices After CA Cert Ratation

What was done?
kubeadm init phase certs all
kubeadm init phase kubeconfig all
Daemon reloaded
Kubelet restarted
Calico CNI restarted
Now:
All Worker Nodes show Ready State
All Deployments and pods show Running state
Application has errors in logs:
akka.management.cluster.bootstrap.internal.BootstrapCoordinator -
Resolve attempt failed! Cause:
akka.discovery.kubernetes.KubernetesApiServiceDiscovery$KubernetesApiException:
Non-200 from Kubernetes API server: 401 Unauthorized
Kube Apiserver has logs:
Unable to authenticate the request due to an error: [invalid bearer token, square/go-jose: error in cryptographic primitive]
Could it be the old certs and tokensbeing cached by the services somewhere ?

CRD probe failing

I am installing service catalog which uses CRD and have created the same. Now I am running my controller deployment file and the image running in it runs a CRD list command to verify CRD are in place. This use to work fine previously but now CRD Probe is failing with error:
1226 07:45:01.539118 1 round_trippers.go:438] GET https://169.72.128.1:443/apis/apiextensions.k8s.io/v1beta1/customresourcedefinitions?labelSelector=svcat%3Dtrue in 30000 milliseconds
I1226 07:45:01.539158 1 round_trippers.go:444] Response Headers:
Error: while waiting for ready Service Catalog CRDs: failed to list CustomResourceDefinition: Get https://169.72.128.1:443/apis/apiextensions.k8s.io/v1beta1/customresourcedefinitions?labelSelector=svcat%3Dtrue: dial tcp 169.72.128.1:443: i/o timeout
I have followed same steps as previously but could not debug now.
Inside the controller code it is trying to make following call:
list, err := r.client.ApiextensionsV1beta1().CustomResourceDefinitions().List(v1.ListOptions{LabelSelector: labels.SelectorFromSet(labels.Set{"svcat": "true"}).String()})
Which is failing.
Update 1 : Installation works fine in default namespace but fails in specific namespace.
Environment Info: On Prem k8s cluster, latest k8s, 2 node cluster.
It's not a port issue.Service accounts use 443 port to connect to Kubernetes API Server. Check the if there is any network policy blocking the communication between your namespace and Kube-System namespace.

Unable to see join nodes in Kubernetes master

This is my worker node:
root#ivu:~# kubeadm join 10.16.70.174:6443 --token hl36mu.0uptj0rp3x1lfw6n --discovery-token-ca-cert-hash sha256:daac28160d160f938b82b8c720cfc91dd9e6988d743306f3aecb42e4fb114f19 --ignore-preflight-errors=swap
[preflight] Running pre-flight checks.
[WARNING Swap]: running with swap on is not supported. Please disable swap
[WARNING FileExisting-crictl]: crictl not found in system path
Suggestion: go get github.com/kubernetes-incubator/cri-tools/cmd/crictl
[discovery] Trying to connect to API Server "10.16.70.174:6443"
[discovery] Created cluster-info discovery client, requesting info from "https://10.16.70.174:6443"
[discovery] Requesting info from "https://10.16.70.174:6443" again to validate TLS against the pinned public key
[discovery] Cluster info signature and contents are valid and TLS certificate validates against pinned roots, will use API Server "10.16.70.174:6443"
[discovery] Successfully established connection with API Server "10.16.70.174:6443"
This node has joined the cluster:
* Certificate signing request was sent to master and a response
was received.
* The Kubelet was informed of the new secure connection details.
Run 'kubectl get nodes' on the master to see this node join the cluster.
While checking in master nodes using command kubectl get nodes, I can only able to see master:
ivum01#ivum01-HP-Pro-3330-SFF:~$ kubectl get nodes
NAME STATUS ROLES AGE VERSION
ivum01-hp-pro-3330-sff Ready master 36m v1.10.0
For question answer:
docker kubelet kubeadm kubectl installed fine
kubectl get node can not see the current added node; of cause kubectl get pods --all-namespaces has no result for this node;
docker which in current node has no report for kubeadm command(means no k8s images pull, no running container for that)
must import is kubelet not running in work node
run kubelet output:
Failed to get system container stats for "/user.slice/user-1000.slice/session-1.scope": failed to get cgroup stats for "/user.slice/user-1000.slice/session-1.scope": failed to get container info for "/user.slice/user-1000.slice/session-1.scope": unknown container "/user.slice/user-1000.slice/session-1.scope"
same as this issue said
tear down and reset cluster(kubeadm reset) and redo that has no problem in my case.
I had this problem and it was solved by ensuring that cgroup driver on the worker nodes also were set properly.
check with:
docker info | grep -i cgroup
cat /etc/systemd/system/kubelet.service.d/10-kubeadm.conf
set it with:
sed -i "s/cgroup-driver=systemd/cgroup-driver=cgroupfs/g" /etc/systemd/system/kubelet.service.d/10-kubeadm.conf
then restart kubelet service and rejoin the cluster:
systemctl daemon-reload
systemctl restart kubelet
kubeadm reset
kubeadm join ...
Info from docs: https://kubernetes.io/docs/tasks/tools/install-kubeadm/#configure-cgroup-driver-used-by-kubelet-on-master-node

kubernetes HA joing work node error - cluster CA found in cluster-info configmap is invalid: public key

I trying to setup the 3 node Multi Master kubernetes v1.10.0 cluster setup using Centos 7.
following kubeadm document steps. I was able to setup the 3 master all of them in Ready status.
https://kubernetes.io/docs/setup/independent/high-availability/#kubeadm-init-master0
while joining work node. I am getting certificate issue, kubeadm join command giving this error message.
kubeadm join <masterip>:6443 --token v5ylg8.jgqn122kewvobaoo --discovery-token-ca-cert-hash sha256:b01832713190461cc96ca02e5c2b1e578473c6712b0965d2383aa9de9b41d4b6 --ignore-preflight-errors=cri
[preflight] Running pre-flight checks.
[WARNING SystemVerification]: docker version is greater than the most recently validated version. Docker version: 17.09.1-ce. Max validated version: 17.03
[WARNING CRI]: unable to check if the container runtime at "/var/run/dockershim.sock" is running: exit status 1
[discovery] Trying to connect to API Server "<masterip>:6443"
[discovery] Created cluster-info discovery client, requesting info from "https://<masterip>:6443"
[discovery] Failed to connect to API Server "<masterip>:6443": cluster CA found in cluster-info configmap is invalid: public key sha256:87d28cf32666a75cb7ed6502ab5c726de29438754c48507687648e84ad9b6693 not pinned
[discovery] Trying to connect to API Server "<masterip>:6443"
[discovery] Created cluster-info discovery client, requesting info from "https://<masterip>:6443"
[discovery] Failed to connect to API Server "<masterip>:6443": cluster CA found in cluster-info configmap is invalid: public key sha256:87d28cf32666a75cb7ed6502ab5c726de29438754c48507687648e84ad9b6693 not pinned
How to find out root cause for this certificate error message?
Thanks
SR

"kubectl get nodes" shows NotReady always even after giving the appropriate IP

i am trying to setup a kubernetes cluster for testing purpose with a master and one minion. When i run the kubectl get nodes it always says NotReady. Following the configuration on minion in /etc/kubernetes/kubelet
KUBELET_ADDRESS="--address=0.0.0.0"
KUBELET_PORT="--port=10250"
KUBELET_HOSTNAME="--hostname-override=centos-minion"
KUBELET_API_SERVER="--api-servers=http://centos-master:8080"
KUBELET_ARGS=""
When kubelete service is started following logs could be seen
Mar 16 13:29:49 centos-minion kubelet: E0316 13:29:49.126595 53912 event.go:202] Unable to write event: 'Post http://centos-master:8080/api/v1/namespaces/default/events: dial tcp 10.143.219.12:8080: i/o timeout' (may retry after sleeping)
Mar 16 13:16:01 centos-minion kube-proxy: E0316 13:16:01.195731 53595 event.go:202] Unable to write event: 'Post http://localhost:8080/api/v1/namespaces/default/events: dial tcp [::1]:8080: getsockopt: connection refused' (may retry after sleeping)
Following is the config on master /etc/kubernetes/apiserver
KUBE_API_ADDRESS="--bind-address=0.0.0.0"
KUBE_API_PORT="--port=8080"
KUBELET_PORT="--kubelet-port=10250"
KUBE_SERVICE_ADDRESSES="--service-cluster-ip-range=10.254.0.0/16"
/etc/kubernetes/config
KUBE_ETCD_SERVERS="--etcd-servers=http://centos-master:2379"
KUBE_LOGTOSTDERR="--logtostderr=true"
KUBE_LOG_LEVEL="--v=0"
KUBE_ALLOW_PRIV="--allow-privileged=false"
KUBE_MASTER="--master=http://centos-master:8080"
On master following processes are properly running
kube 5657 1 0 Mar15 ? 00:12:05 /usr/bin/kube-apiserver --logtostderr=true --v=0 --etcd-servers=http://centos-master:2379 --address=0.0.0.0 --port=8080 --kubelet-port=10250 --allow-privileged=false --service-cluster-ip-range=10.254.0.0/16
kube 5690 1 1 Mar15 ? 00:16:01 /usr/bin/kube-controller-manager --logtostderr=true --v=0 --master=http://centos-master:8080
kube 5723 1 0 Mar15 ? 00:02:23 /usr/bin/kube-scheduler --logtostderr=true --v=0 --master=http://centos-master:8080
So i still do not know what is missing.
I was having the same issue when I setting up the kubernetes with fedora following the steps on kubernetes.io.
In the tutorial, it's commenting out KUBELET_ARGS="--cgroup-driver=systemd" in node's /etc/kubernetes/kubelet, if you uncomment it, you will see the node status become ready.
Hope this help
rejoin the worker nodes to the master.
My install is on three physical machines. One master and two workers. All needed reboots.
you will need your join token, which you probably don't have:
sudo kubeadm token list
copy the TOKEN field data, the output looks like this (no, that's not my real one):
TOKEN
ow3v08ddddgmgzfkdkdkd7 18h 2018-07-30T12:39:53-05:00 authentication,signing The default bootstrap token generated by 'kubeadm init'. system:bootstrappers:kubeadm:default-node-token
THEN join the cluster here. Master node IP is the real IP address of your machine:
sudo kubeadm join --token <YOUR TOKEN HASH> <MASTER_NODE_IP>:6443 --discovery-token-unsafe-skip-ca-verification
Have to restart kubelet service in node (systemctl enable kubelet & systemctl restart kubelet). Then you can see your node is in "Ready" status.