Kubeadm join a new master node fails because of "Initial timeout of 40s passed" - kubernetes

I have a master node and it works fine. when I get nodes it gives me a master node
now I want to add a new master node with following command:
kubeadm join 45.82.137.112:8443 --token 61vi23.i1qy9k2hvqc9k8ib --discovery-token-ca-cert-hash sha256:40617af1ebd8893c1df42f2d26c5f18e05be91b4e2c9b69adbeab1edff7a51ab --control-plane --certificate-key 4aafd2369fa85eb2feeacd69a7d1cfe683771181e3ee781ce806905b74705fe8
which 45.82.137.112 is my HAProxy IP and I copy this command after creating first master node.
after this command I get following error:
[etcd] Announced new etcd member joining to the existing etcd cluster
[etcd] Creating static Pod manifest for "etcd"
[etcd] Waiting for the new etcd member to join the cluster. This can take up to 40s
[kubelet-check] Initial timeout of 40s passed.
and my first master node also disappear and fails. Everything in master1 is ok until I use join command for another master node.

bro,I've solved the problem,my kubeadm version is 1.20.1
this is my join command:
kubeadm join 192.168.43.122:6444 \ --token 689yfz.w60ihod0js5zcina \ --discovery-token-ca-cert-hash sha256:532de9882f2b417515203dff99203d7d7f3dd00a88eb2e8f6cbf5ec998827537 \ --control-plane \ --certificate-key 8792f355dc22227029a091895adf9f84be6eea9e8e65f0da4ad510843e54fbcf \ --apiserver-advertise-address 192.168.43.123
I just add the flag
--apiserver-advertise-address 192.168.43.123

Related

Is it possible to promote a Kubernetes worker node to master?

Is it possible to promote a Kubernetes worker node to master to quickly recover from the loss of a master (1 of 3) and restore safety to the cluster? Preferably without disrupting all the pods already running on it. Bare metal deployment. Tx.
It doesn't look like a worker node can be promoted to master in general. However it is easy to sort out for a specific case:
Control plane node disappears from the network
Node is manually deleted: k delete node2.example.com --ignore-daemonsets --delete-local-data
Some time later it reboots and rejoins the cluster
Check that it has rejoined the etcd cluster:
# k exec -it etcd-node1.example.com -n kube-system -- /bin/sh
# etcdctl --endpoints 127.0.0.1:2379 --cacert /etc/kubernetes/pki/etcd/ca.crt \
--cert /etc/kubernetes/pki/etcd/server.crt --key /etc/kubernetes/pki/etcd/server.key \
member list
506801cdae97607b, started, node1.example.com, https://65.21.128.36:2380, https://xxx:2379, false
8051adea81dc4c6a, started, node2.example.com, https://95.217.56.177:2380, https://xxx:2379, false
ccd32aaf544c8ef9, started, node3.example.com, https://65.21.121.254:2380, https://xxx:2379, false
If it is part of the cluster then re-label it:
k label node node2.example.com node-role.kubernetes.io/control-plane=
k label node node2.example.com node-role.kubernetes.io/master=

Kubeadm join: Fails while creating HA cluster with multiple master nodes

I have 5 Vm in my GCP, out of which three are supposed to be master1, master2, master3 and other two are worker nodes (worker1 & worker 2). I have created a TCP Loadbalancer(LB) to enable load balancing for the master nodes. I have two sections in the LB:
i)frontend ii)backend
In the backend, i have defined all master ips there. And the frontend, i generated a static public ip and given port 6443 as LB port.
In master1, i sucessfully ran the kubeadm init command as follows:
kubeadm init --control-plane-endpoint="<LB_IP>:6443" --apiserver-advertise-address=10.128.0.2 --pod-network-cidr=10.244.0.0/16
where 10.128.0.2 is the master1 internal ip & 10.244.0.0/16 is the network cidr for the kube-flannel.
The kubeadm init runs sucessfully and gives two kubeadm join commands, one to join a new control plane and other to join a new worker node.
You can now join any number of control-plane nodes by copying certificate authorities
and service account keys on each node and then running the following as root:
kubeadm join LB_IP:6443 --token znnlha.6Gfn1vlkunwpz36b \
--discovery-token-ca-cert-hash sha256:dc8834a2a5b4ada38a1ab9831e4cae67e9d64cb585458a194018f3ba5a82ac4U \
--control-plane
Then you can join any number of worker nodes by running the following on each as root:
kubeadm join LB_IP:6443 --token znnlha.6sfn1vlkunwpz36A \
--discovery-token-ca-cert-hash sha256:dc8834a2a5b4ada38a1ab9831e4cae68e9d64cb585458a194018f3ba5a82ac4e
I am not using --upload-certs for transfering the certificates from one control plane to another. I am doing it manually.
But when I run the above kubeadm join command to add a new control plane, on the one of my other master nodes,say master2, I am getting an error like following :
[preflight] Running pre-flight checks
[preflight] Reading configuration from the cluster...
[preflight] FYI: You can look at this config file with 'kubectl -n kube-system get cm kubeadm-config -o yaml'
error execution phase preflight: unable to fetch the kubeadm-config ConfigMap: failed to get config map: Get "https://LB_IP:6443/api/v1/namespaces/kube-system/configmaps/kubeadm-config?timeout=10s": dial tcp LB_IP:6443: connect: connection refused
I just came across the same issue - in my case the problem was that when hitting GCP's loadbalancer from an instance which is also a target of this loadbalancer, the request will always land on the same instance from where you sent the request.
Based on this:
you run kubeadm init on Node A using LB_IP. LB_IP gets resolved to Node A. Everything works as expected, as you are creating a new cluster.
you run kubeadm join on Node B using LB_IP. LB_IP gets resolved to Node B, while the master you just initialized is on Node A. Node B doesn't have anything running on port 6443, thus you get connection refused error.
A simple solution is to remove the instance you are running kubeadm join on from the loadbalancer targets. You can re-add it right after successful join.

node joined successfully to master node, but got error when kubectl get nodes "The connection to the server localhost:8080 was refused"

I'm using two virtual machine with operating system Centos 8
master-node:
kubeadm init
node-1:
kubeadm join
node-1 joined successfully, and got the result run "kubectl get nodes".
but running kubectl get nodes got response "The connection to the server localhost:8080 was refused - did you specify the right host or port?"
I've checked my config using command kubectl config view and I got a result:
apiVersion: v1
clusters: null
contexts: null
current-context: ""
kind: Config
preferences: {}
users: null
I've ls /etc/kubernetes/ and it show kubelet.conf only
From what I see you are trying to use kubectl on worker node after successfull kubeadm join.
kubeadm init is genereting kubeadmin credentials/config files that are used to connect to the cluster and you were expecting that kubeadm join will also create simmilar credentials so you can run kubectl commands from worker node. kubeadm join command is not placing any admin credentials on worker nodes (where applications are running; for security reasons).
If you want it there you need to copy it from master to worker manually (or create a new ones).
Based on the writing, once kubeadm init is completed the master node is initialized and components are set.
Running kubeadm join on worker node would join this node to previous master.
After this step if you're running kubectl get nodes on master and encountering the above mentioned issue then its because cluster config is missing for kubectl.
The default config will be /etc/kubernetes/admin.conf which can set to environmental variables with key as KUBECONFIG.
Or simplest way would be to copy this file into .kube folder.
cp -f /etc/kubernetes/admin.conf ~/.kube/config

Unable to run pods on new node

Had to change node (server) with the new one leaving the same node name. What I did was:
master> kubectl delete no srv1 (removing old node)
srv1> kubeadm join... (joining new node)
after new node joined cluster no pods can be created.
Warning FailedCreatePodSandBox 16s kubelet, srv1 Failed create pod sandbox: rpc error: code = Unknown desc = failed to set up sandbox container "b85728b51a18533e9d57f6a1b1808dbb5ad72bff4d516217de04e7dad4ce358d" network for pod "dpl-6f56777485-6jzm6": NetworkPlugin cni failed to set up pod "dpl-6f56777485-6jzm6_default" network: failed to set bridge addr: "cni0" already has an IP address different from 10.244.16.1/24
Ideally when performing such a task like "replacing a node" below steps should be considered:
Drain node kubectl drain NODE_NAME
Reset that node kubeadm reset in the old node (optional step if the old node is accessible)
Finally kubeadm delete node NODE_NAME
Things to consider when replacing a old node with new node:
The new node should have the same name as the old node which is echo $HOSTNAME should remain same.
The new node should have the same ip as the old one.
Because these are a node identity.
Finally in a scenario where you have already performed kubectl delete node ... and replaced it with a new one.
curl -LO https://raw.githubusercontent.com/coreos/flannel/62e44c867a2846fefb68bd5f178daf4da3095ccb/Documentation/kube-flannel.yml
kubectl delete -f kube-flannel.yml
[perform below in the nodes which are having problems]
sudo ip link del cni0
sudo ip link del flannel.1
sudo systemctl restart network
[re-apply network plugin]
kubectl apply -f kube-flannel.yml

pods are restarted automatically in the node which is added to the existing kubeadm cluster

Recently added a kubenode to the existing kubeadm cluster using
kubeadm join --token (TOKEN) (MASTER IP):6443
and with
--discovery-token-ca-cert-hash.
The node attached successfully and it is listed in the "kubectl get nodes".
Now the pods are assigned to the node, but those pods are restarted automatically and it seems this pods cannot communicate the other node pods also.