Replacing dead master in Kubernetes 1.15 cluster with stacked control plane - kubernetes

I have a Kubernetes cluster with 3-master stacked control plane - so each master also has its own etcd instance running locally. The problem I am trying solve is this:
"If one master dies such that it cannot be restarted, how do I replace it?"
Currently, when I try to add the replacement master into the cluster, I get the following error while running kubeadm join:
[check-etcd] Checking that the etcd cluster is healthy
I0302 22:43:41.968068 9158 local.go:66] [etcd] Checking etcd cluster health
I0302 22:43:41.968089 9158 local.go:69] creating etcd client that connects to etcd pods
I0302 22:43:41.986715 9158 etcd.go:106] etcd endpoints read from pods: https://10.0.2.49:2379,https://10.0.225.90:2379,https://10.0.247.138:2379
error execution phase check-etcd: error syncing endpoints with etc: dial tcp 10.0.2.49:2379: connect: no route to host
The 10.0.2.49 node is the one that died. These nodes are all running in an AWS AutoScaling group, so I don't have control over the addresses.
I have drained and deleted the dead master node using kubectl drain and kubectl delete; and I have used etcdctl to make sure the dead node was not in the member list.
Why is it still trying to connect to that node's etcd?

It is still trying to connect to the member because etcd maintains a list of members in its store -- that's how it knows to vote on quorum decisions. I don't believe etcd is unique in that way -- most distributed key-value stores know their member list
The fine manual shows how to remove a dead member, but it also warns to add a new member before removing unhealthy ones.
There is also a project etcdadm that is designed to smooth over some of the rough edges about etcd cluster management, but I haven't used it to say what it is good at versus not

The problem turned out to be that the failed node was still listed in the ConfigMap. Further investigation led me to the following thread, which discusses the same problem:
https://github.com/kubernetes/kubeadm/issues/1300
The solution that worked for me was to edit the ConfigMap manually.
kubectl -n kube-system get cm kubeadm-config -o yaml > tmp-kubeadm-config.yaml
manually edit tmp-kubeadm-config.yaml to remove the old server
kubectl -n kube-system apply -f tmp-kubeadm-config.yaml
I believe updating the etcd member list is still necessary to ensure cluster stability, but it wasn't the full solution.

Related

Recovering Kubernetes cluster without certs

I have the following scenario in the lab and would like to see if its possible to recover. The cluster is broken but very expected since I was testing how far I could go with breaking the cluster and still be able to recover.
Env:
Kubernetes 1.16.3
Kubespray
I was experimenting a bit and don't have any data on this cluster but I am still very curious if it's possible to recover. I have a healthy 3 node etcd cluster with the original configuration (all namespaces, workloads, configmaps etc). I don't have the original SSL certs for the control plane.
I removed all nodes from the cluster (kubeadm reset). I have original manifests and kubelet config and try to re-init master nodes. It is quite more successful than I thought it would be but not where I want it to be.
After successful kubeadm init, the kubelet and control plane containers start successfully but the corresponding pods are not created. I am able to use the kube API with kubectl and see the nodes, namespaces, deployments, etc.
In the kube-system namespace all daemonsets still exist but the pods won't start with the following message:
49m Warning FailedCreate daemonset/kube-proxy Error creating: Timeout: request did not complete within requested timeout
The kubelet logs the following re control plane pods
Jul 21 22:30:02 k8s-master-4 kubelet[13791]: E0721 22:30:02.088787 13791 kubelet.go:1664] Failed creating a mirror pod for "kube-scheduler-k8s-master-4_kube-system(3e128801ef687b022f6c8ae175c9c56d)": Timeout: request did not complete within requested timeout
Jul 21 22:30:53 k8s-master-4 kubelet[13791]: E0721 22:30:53.089517 13791 kubelet.go:1664] Failed creating a mirror pod for "kube-controller-manager-k8s-master-4_kube-system(da5cfae13814fa171a320ce0605de98f)": Timeout: request did not complete within requested timeout
During kubeadm reset/init process I already have some steps so I can get to where I am now (delete serviceaccounts to reset the tokens, delete some configmaps (kuebadm etc))
My question is - is it possible to recover the control plane without the certs. And if its complicated but still possible process I would still like to know.
All help appreciated
Henro
is it possible to recover the control plane without the certs.
Yes, should be able to. The certs 🔏 are required but they don't have to be the very same ones that you created the cluster initially with. All the certificates including the CA can be rotated across the board. The kubelet even supports certificate auto-rotation. The configurations need to match everywhere though. Meaning the CA needs to be the same that created the CSRs and cert keys/certs need to be created from the same CSRs. 🔑
Also, all the components need to use the same CA and be able to authenticate with the API server (kube-controller-manager, kube-scheduler, etc) 🔐. I'm not entirely sure about the logs that you are seeing but it looks like the kube-controller-manager and kube-scheduler are not able to authenticate and join the cluster. So I would take a look at their cert configurations:
/etc/kubernetes/kube-controller-manager.conf
/etc/kubernetes/kube-scheduler.conf
Also, you would find every PKI component that you need to verify under /etc/kubernetes/pki
✌️

Kubernetes V1.16.8 doesn't support 'node-role' label using "--node-labels=node-role.kubernetes.io/master="

Upgrade Kube-aws v1.15.5 cluster to the next version 1.16.8.
Use Case:
I want to keep the Same node label for Master and Worker nodes as I'm using in v1.15 .
When I tried to upgrade the cluster to V1.16 the --node-labels is restricted to use 'node-role'
If I keep the node role as "node-role.kubernetes.io/master" the kubelet fails to start after upgrade. if I remove the label, kubectl get node output shows none for the upgraded node.
How do I reproduce?
Before the upgrade I took a backup of 'cp /etc/sysconfig/kubelet /etc/sysconfig/kubelet-bkup' have removed "-role" from it and once the upgrade is completed, I have moved the kubelet sysconfig by replacing the edited file 'mv /etc/sysconfig/kubelet-bkup /etc/sysconfig/kubelet'. Now I could able to see the Noderole as Master/Worker even after kubelet service restart.
The Problem I'm facing now?
Though I perform the upgrade on the existing cluster successfully. The cluster is running in AWS as Kube-aws model. So, the ASG would spin up a new node whenever Cluster-Autoscaler triggers it.
But, the new node fails to join to the cluster since the node label "node-role.kubernetes.io/master" exists in the code base.
How can I add the node-role dynamically in the ASG scale-in process?. Any solution would be appreciated.
Note:
(Kubeadm, kubelet, kubectl )- v1.16.8
I have sorted out the issue. I have created a Python code that watches the node events. So whenever ASG spins up a new node, after it joins to the cluster, the node wil be having a role "" , later the python code will add a appropriate label to the node dynamically.
Also, I have created a docker image with the base of python script I created for node-label and it will run as a pod. The pod will be deployed into the cluster and it does the job of labelling the new nodes.
Ref my solution given in GitHub
https://github.com/kubernetes/kubernetes/issues/91664
I have created as a docker image and it is publicly available
https://hub.docker.com/r/shaikjaffer/node-watcher
Thanks,
Jaffer

Failed to remove etcd after reset kubeadm

When I try to kubeadm reset -f, it report the etcd server can not be removed, you must remove it manually.
failed to remove etcd member: error syncing endpoints with etc: etcdclient: no available endpoints. Please manually remove this etcd member using etcdctl
Is this a control-plane (master) node?
If not: simply running kubectl delete node <node_id> should suffice (see reference below). This will update etcd and take care of the rest of cleanup. You'll still have to diagnose what caused the node to fail to reset in the first place if you're hoping to re-add it... but that's a separate problem. See discussion e.g., here on a related issue:
If the node is hard failed and you cannot call kubeadm reset on it, it requires manual steps. you'd have to:
Remove the control-plane IP from the kubeadm-config CM ClusterStatus
Remove the etcd member using etcdctl
Delete the Node object using kubectl (if you don't want the Node around anymore)
1 and 2 apply only to control-plane nodes.
Hope this helps — if you are dealing with a master node, I'd be happy to include examples of what commands to run.

Problem getting pods stats from kubelet and cri-o

We are running Kubernetes with the following configuration:
On-premise Kubernetes 1.11.3, cri-o 1.11.6 and CentOS7 with UEK-4.14.35
I can't make crictl stats to return pods information, it returns empty list only. Has anyone run into the same problem?
Another issue we have, is that when I query the kubelet for stats/summary it returns an empty pods list.
I think that these two issues are related, although I am not sure which one of them is the problem.
I would recommend checking kubelet service to verify health status and debug any suspicious events within the cluster. I assume that CRI-O runtime engine can select kubelet as the main Pods information provider because of its managing Pod lifecycle role.
systemctl status kubelet -l
journalctl -u kubelet
In case you found some errors or dubious events, share it in a comment below this answer.
However, you can use metrics-server, which will collect Pod metrics in the cluster and enable kube-apiserver flags for Aggregation Layer. Here is a good article about Horizontal Pod Autoscaling and monitoring resources via Prometheus.

Kubernetes "Unable to register node" with cloud-provider=aws

I'm trying to run kubelet with --cloud-provider=aws flag but it fails with the following error:
kubelet_node_status.go:107] Unable to register node "ip-172-28-68-69.eu-west-1.compute.internal" with API server: nodes "ip-172-28-68-69.eu-west-1.compute.internal" is forbidden: node "k8s-master.my.fqdn" cannot modify node "ip-172-28-68-69.eu-west-1.compute.internal"
I already tried to set --host-override flag to "k8s-master.my.fqdn" with no success.
(kubectl get nodes:
NAME STATUS ROLES AGE VERSION
k8s.my.fqdn Ready <none> 29m v1.8.1)
How should I start kubelet in order to successful register on/to AWS?
I solved my issue in this way:
Don't change default amazon hostname to your own because --host-override flag isn't working.
Init node like: kubeadm init --pod-network-cidr=10.233.0.0/16 --node-name=$(curl http://169.254.169.254/latest/meta-data/local-hostname) or simply use kubespray as a cluster management solution.
BTW if you want to integrate with amazon it's better to leave amazon hostname as is. Same I found in kubespray doc:
The next step is to make sure the hostnames in your inventory file are identical to your internal hostnames in AWS. This may look something like ip-111-222-333-444.us-west-2.compute.internal