Kubeadm join: Fails while creating HA cluster with multiple master nodes - kubernetes

I have 5 Vm in my GCP, out of which three are supposed to be master1, master2, master3 and other two are worker nodes (worker1 & worker 2). I have created a TCP Loadbalancer(LB) to enable load balancing for the master nodes. I have two sections in the LB:
i)frontend ii)backend
In the backend, i have defined all master ips there. And the frontend, i generated a static public ip and given port 6443 as LB port.
In master1, i sucessfully ran the kubeadm init command as follows:
kubeadm init --control-plane-endpoint="<LB_IP>:6443" --apiserver-advertise-address=10.128.0.2 --pod-network-cidr=10.244.0.0/16
where 10.128.0.2 is the master1 internal ip & 10.244.0.0/16 is the network cidr for the kube-flannel.
The kubeadm init runs sucessfully and gives two kubeadm join commands, one to join a new control plane and other to join a new worker node.
You can now join any number of control-plane nodes by copying certificate authorities
and service account keys on each node and then running the following as root:
kubeadm join LB_IP:6443 --token znnlha.6Gfn1vlkunwpz36b \
--discovery-token-ca-cert-hash sha256:dc8834a2a5b4ada38a1ab9831e4cae67e9d64cb585458a194018f3ba5a82ac4U \
--control-plane
Then you can join any number of worker nodes by running the following on each as root:
kubeadm join LB_IP:6443 --token znnlha.6sfn1vlkunwpz36A \
--discovery-token-ca-cert-hash sha256:dc8834a2a5b4ada38a1ab9831e4cae68e9d64cb585458a194018f3ba5a82ac4e
I am not using --upload-certs for transfering the certificates from one control plane to another. I am doing it manually.
But when I run the above kubeadm join command to add a new control plane, on the one of my other master nodes,say master2, I am getting an error like following :
[preflight] Running pre-flight checks
[preflight] Reading configuration from the cluster...
[preflight] FYI: You can look at this config file with 'kubectl -n kube-system get cm kubeadm-config -o yaml'
error execution phase preflight: unable to fetch the kubeadm-config ConfigMap: failed to get config map: Get "https://LB_IP:6443/api/v1/namespaces/kube-system/configmaps/kubeadm-config?timeout=10s": dial tcp LB_IP:6443: connect: connection refused

I just came across the same issue - in my case the problem was that when hitting GCP's loadbalancer from an instance which is also a target of this loadbalancer, the request will always land on the same instance from where you sent the request.
Based on this:
you run kubeadm init on Node A using LB_IP. LB_IP gets resolved to Node A. Everything works as expected, as you are creating a new cluster.
you run kubeadm join on Node B using LB_IP. LB_IP gets resolved to Node B, while the master you just initialized is on Node A. Node B doesn't have anything running on port 6443, thus you get connection refused error.
A simple solution is to remove the instance you are running kubeadm join on from the loadbalancer targets. You can re-add it right after successful join.

Related

Kubeadm join a new master node fails because of "Initial timeout of 40s passed"

I have a master node and it works fine. when I get nodes it gives me a master node
now I want to add a new master node with following command:
kubeadm join 45.82.137.112:8443 --token 61vi23.i1qy9k2hvqc9k8ib --discovery-token-ca-cert-hash sha256:40617af1ebd8893c1df42f2d26c5f18e05be91b4e2c9b69adbeab1edff7a51ab --control-plane --certificate-key 4aafd2369fa85eb2feeacd69a7d1cfe683771181e3ee781ce806905b74705fe8
which 45.82.137.112 is my HAProxy IP and I copy this command after creating first master node.
after this command I get following error:
[etcd] Announced new etcd member joining to the existing etcd cluster
[etcd] Creating static Pod manifest for "etcd"
[etcd] Waiting for the new etcd member to join the cluster. This can take up to 40s
[kubelet-check] Initial timeout of 40s passed.
and my first master node also disappear and fails. Everything in master1 is ok until I use join command for another master node.
bro,I've solved the problem,my kubeadm version is 1.20.1
this is my join command:
kubeadm join 192.168.43.122:6444 \ --token 689yfz.w60ihod0js5zcina \ --discovery-token-ca-cert-hash sha256:532de9882f2b417515203dff99203d7d7f3dd00a88eb2e8f6cbf5ec998827537 \ --control-plane \ --certificate-key 8792f355dc22227029a091895adf9f84be6eea9e8e65f0da4ad510843e54fbcf \ --apiserver-advertise-address 192.168.43.123
I just add the flag
--apiserver-advertise-address 192.168.43.123

node joined successfully to master node, but got error when kubectl get nodes "The connection to the server localhost:8080 was refused"

I'm using two virtual machine with operating system Centos 8
master-node:
kubeadm init
node-1:
kubeadm join
node-1 joined successfully, and got the result run "kubectl get nodes".
but running kubectl get nodes got response "The connection to the server localhost:8080 was refused - did you specify the right host or port?"
I've checked my config using command kubectl config view and I got a result:
apiVersion: v1
clusters: null
contexts: null
current-context: ""
kind: Config
preferences: {}
users: null
I've ls /etc/kubernetes/ and it show kubelet.conf only
From what I see you are trying to use kubectl on worker node after successfull kubeadm join.
kubeadm init is genereting kubeadmin credentials/config files that are used to connect to the cluster and you were expecting that kubeadm join will also create simmilar credentials so you can run kubectl commands from worker node. kubeadm join command is not placing any admin credentials on worker nodes (where applications are running; for security reasons).
If you want it there you need to copy it from master to worker manually (or create a new ones).
Based on the writing, once kubeadm init is completed the master node is initialized and components are set.
Running kubeadm join on worker node would join this node to previous master.
After this step if you're running kubectl get nodes on master and encountering the above mentioned issue then its because cluster config is missing for kubectl.
The default config will be /etc/kubernetes/admin.conf which can set to environmental variables with key as KUBECONFIG.
Or simplest way would be to copy this file into .kube folder.
cp -f /etc/kubernetes/admin.conf ~/.kube/config

kubernetes - Couldn't able to join master node - error execution phase preflight: couldn't validate the identity of the API Server

I am novice to k8s, so this might be very simple issue for someone with expertise in the k8s.
I am working with two nodes
master - 2cpu, 2 GB memory
worker - 1 cpu, 1 GB memory
OS - ubuntu - hashicorp/bionic64
I did setup the master node successfully and i can see it is up and running
vagrant#master:~$ kubectl get nodes
NAME STATUS ROLES AGE VERSION
master Ready master 29m v1.18.2
Here is token which i have generated
vagrant#master:~$ kubeadm token create --print-join-command
W0419 13:45:52.513532 16403 configset.go:202] WARNING: kubeadm cannot validate component configs for API groups [kubelet.config.k8s.io kubeproxy.config.k8s.io]
kubeadm join 10.0.2.15:6443 --token xuz63z.todnwgijqb3z1vhz --discovery-token-ca-cert-hash sha256:d4dadda6fa90c94eca1c8dcd3a441af24bb0727ffc45c0c27161ee8f7e883521
Issue - But when i try to join it from the worker node i get
vagrant#worker:~$ sudo kubeadm join 10.0.2.15:6443 --token xuz63z.todnwgijqb3z1vhz --discovery-token-ca-cert-hash sha256:d4dadda6fa90c94eca1c8dcd3a441af24bb0727ffc45c0c27161ee8f7e883521
W0419 13:46:17.651819 15987 join.go:346] [preflight] WARNING: JoinControlPane.controlPlane settings will be ignored when control-plane flag is not set.
[preflight] Running pre-flight checks
[WARNING IsDockerSystemdCheck]: detected "cgroupfs" as the Docker cgroup driver. The recommended driver is "systemd". Please follow the guide at https://kubernetes.io/docs/setup/cri/
error execution phase preflight: couldn't validate the identity of the API Server: Get https://10.0.2.15:6443/api/v1/namespaces/kube-public/configmaps/cluster-info?timeout=10s: dial tcp 10.0.2.15:6443: connect: connection refused
To see the stack trace of this error execute with --v=5 or higher
Here are the ports which are occupied
10.0.2.15:2379
10.0.2.15:2380
10.0.2.15:68
Note i am using CNI from -
kubectl apply -f https://docs.projectcalico.org/manifests/calico.yaml
Here are mistakes which realized i did during my kubernetes installation -
(For detailed installation step follow - Steps for Installation )
But here are the key mistakes which i did -
Mistake 1 - Since i was working on the VMs so i had multiple ethernet adapter on my both the VMs (master as well as worker ). By default the the CNI always takes the eth0 but i our case it should be eth1
1: lo: <LOOPBACK,UP,LOWER_UP>
2: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc fq_codel state UP group default qlen 1000
link/ether 08:00:27:bb:14:75 brd ff:ff:ff:ff:ff:ff
inet 10.0.2.15
3: eth1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc fq_codel state UP group default qlen 1000
link/ether 08:00:27:fb:48:77 brd ff:ff:ff:ff:ff:ff
inet 100.0.0.1
4: docker0: <NO-CARRIER,BROADCAST,MULTICAST,UP>
Mistake 2- The way i was initializing the my kubeadm without --apiserver-advertise-address and --pod-network-cidr
So here is kubeadm command which i used -
[vagrant#master ~]$ sudo kubeadm init --apiserver-advertise-address=100.0.0.1 --pod-network-cidr=10.244.0.0/16
Mistake 3 - - Since we have mulitple ethernet adapter in our VMs so i coudln't find the a way to set up extra args to switch from eth0 to eth1 in calico.yml configuration.
So i used flannel CNI*
[vagrant#master ~]$ wget https://raw.githubusercontent.com/coreos/flannel/master/Documentation/kube-flannel.yml
and in the args section added - --iface=eth1
- --iface=eth1
args:
- --ip-masq
- --kube-subnet-mgr
- --iface=eth1
And it worked after that
It worked for me using this --apiserver-advertise-address:
sudo kubeadm init --apiserver-advertise-address=172.16.28.10 --pod-network-cidr=192.168.0.0/16
kubectl apply -f https://docs.projectcalico.org/v3.8/manifests/calico.yaml
I have used calico and worked for me.
In Member node for join:
kubeadm join 172.16.28.10:6443 --token 2jm9hd.o2gulx4x1b8l1t5d \ --discovery-token-ca-cert-hash sha256:b8b679e86c4d228bfa486086f18dcac4760e5871e8dd023ec166acfd93781595
I ran into the same problem while setting up the Kubernetes Master and worker node setup. I got the same error while adding the worker nodes to the master node. I just stopped the Firewalld and then tried adding the nodes adn then IT WORKED. Hope this helps someone.
Thanks
When joining master node with slave nodes, received below error:
====================================================================
TASK [Joining worker nodes with kubernetes master] ***********************************************
fatal: [ip-0-0-0-0.ec2.internal]: FAILED! => {"changed": true, "cmd": "grep -i -A2 'kubeadm join' join_token|bash", "delta": "0:05:07.728008", "end": "2022-01-15 11:41:28.615600", "msg": "non-zero return code", "rc": 1, "start": "2022-01-15 11:36:20.887592", "stderr": "error execution phase preflight: couldn't validate the identity of the API Server: Get \"https://139.41.82.79:6443/api/v1/namespaces/kube-public/configmaps/cluster-info?timeout=10s\": net/http: request canceled while waiting for connection (Client.Timeout exceeded while awaiting headers)\nTo see the stack trace of this error execute with --v=5 or higher", "stderr_lines": ["error execution phase preflight: couldn't validate the identity of the API Server: Get \"https://139.41.82.79:6443/api/v1/namespaces/kube-public/configmaps/cluster-info?timeout=10s\": net/http: request canceled while waiting for connection (Client.Timeout exceeded while awaiting headers)", "To see the stack trace of this error execute with --v=5 or higher"], "stdout": "[preflight] Running pre-flight checks", "stdout_lines": ["[preflight] Running pre-flight checks"]}
===============================================================================
If you observe the error carefully, you will see the ip address and port number as below:
https://139.41.82.79:6443
Solution:
Please follow the below steps to resolve:
Go to your security groups
Add the port number 6443 entry in your security group for custom IP, for 0.0.0.0:0, and save the details.
Rerun the yaml script to join the master nodes with slave.
Run the command 'kubectl config view' or 'kubectl cluster-info' to check the IP address of Kubernetes control plane. In my case it is 10.0.0.2.
$ kubectl config view
apiVersion: v1
clusters:
cluster:
certificate-authority-data: DATA+OMITTED
server: https://10.0.0.2:6443
Or
$ kubectl cluster-info
Kubernetes control plane is running at https://10.0.0.2:6443
Tried to telnet the Kubernetes control plane.
telnet 10.0.0.2 6443
Trying 10.0.0.2...
Press Control + C in your keyboard to terminate the telnet command.
Go to your Firewall Rules and add port 6443 and make sure to allow all instances in the network.
Then try to telnet the Kubernetes control plane once again and you should be able to connect now:
$ telnet 10.0.0.2 6443
Trying 10.0.0.2...
Connected to 10.0.0.2.
Escape character is '^]'.
Try to join the worker nodes now. You can run the command 'kubeadm token create --print-join-command' to create new token just in case you forgot to save the old one.
Run 'kubectl get nodes' on the control-plane to see this node join the cluster
$ kubectl get nodes
NAME STATUS ROLES AGE VERSION
k8s Ready control-plane 57m v1.25.0
wk8s-node-0 Ready 36m v1.25.0
wk8s-node-1 Ready 35m v1.25.0
I ran into similar issue, problem was my node VM's timezone was different. Corrected the time on node and it worked!
Hope it may help someone.
Simple fix: Expose port:6443 in Security Group of my AWS EC2 instance.
If the master is on Ubuntu 21.04 64 bit version ( VM) then opening up the firewall for the port used for joining the cluster using the command sudo ufw allow 6443 will help
With addition to the good examples gave for this issue by #Rahul Wagh - I'll just add the fact that I got this error because I ran export KUBECONFIG on the wrong kubeconfig file which was configured with an api-server endpoint which wasn't reachable.
First Check if you could connect from Worker Node to Master with telnet
telnet Master_IP 6443
If you could then please In Master Run kubeadm with --apiserver-advertise-address=Master_IP and --pod-network-cidr=10.244.0.0/16
N.B: 10.244.0.0/16 is Network CIDR for vagrant most of the time and save the join command
Then Create the pod network using CNI {flannel/Calico/WeaveNet}
Now run the join command which you saved from Worker Node {Here 10.0.2.15 will be set max time}
kubeadm join 10.0.2.15:6443 --token 7gm48u.6ffny379c1mw3hpu
--discovery-token-ca-cert-hash sha256:936daab57e3302ed7b70f665af3d041736e265d19a16abc710fa0efbf318b5bf

adding master to Kubernetes cluster: cluster doesn't have a stable controlPlaneEndpoint address

How can I add a second master to the control plane of an existing Kubernetes 1.14 cluster?
The available documentation apparently assumes that both masters (in stacked control plane and etcd nodes) are created at the same time. I have created my first master already a while ago with kubeadm init --pod-network-cidr=10.244.0.0/16, so I don't have a kubeadm-config.yaml as referred to by this documentation.
I have tried the following instead:
kubeadm join ... --token ... --discovery-token-ca-cert-hash ... \
--experimental-control-plane --certificate-key ...
The part kubeadm join ... --token ... --discovery-token-ca-cert-hash ... is what is suggested when running kubeadm token create --print-join-command on the first master; it normally serves for adding another worker. --experimental-control-plane is for adding another master instead. The key in --certificate-key ... is as suggested by running kubeadm init phase upload-certs --experimental-upload-certs on the first master.
I receive the following errors:
[preflight] Running pre-flight checks
[WARNING IsDockerSystemdCheck]: detected "cgroupfs" as the Docker cgroup driver.
The recommended driver is "systemd". Please follow the guide at
https://kubernetes.io/docs/setup/cri/
[preflight] Reading configuration from the cluster...
[preflight] FYI: You can look at this config file with 'kubectl -n kube-system get cm kubeadm-config -oyaml'
error execution phase preflight:
One or more conditions for hosting a new control plane instance is not satisfied.
unable to add a new control plane instance a cluster that doesn't have a stable
controlPlaneEndpoint address
Please ensure that:
* The cluster has a stable controlPlaneEndpoint address.
* The certificates that must be shared among control plane instances are provided.
What does it mean for my cluster not to have a stable controlPlaneEndpoint address? Could this be related to controlPlaneEndpoint in the output from kubectl -n kube-system get configmap kubeadm-config -o yaml currently being an empty string? How can I overcome this situation?
As per HA - Create load balancer for kube-apiserver:
In a cloud environment you should place your control plane nodes behind a TCP forwarding load balancer. This load balancer distributes
traffic to all healthy control plane nodes in its target list. The
health check for an apiserver is a TCP check on the port the
kube-apiserver listens on (default value :6443).
The load balancer must be able to communicate with all control plane nodes on the apiserver port. It must also allow incoming traffic
on its listening port.
Make sure the address of the load balancer
always matches the address of kubeadm’s ControlPlaneEndpoint.
To set ControlPlaneEndpoint config, you should use kubeadm with the --config flag. Take a look here for a config file example:
apiVersion: kubeadm.k8s.io/v1beta1
kind: ClusterConfiguration
kubernetesVersion: stable
controlPlaneEndpoint: "LOAD_BALANCER_DNS:LOAD_BALANCER_PORT"
Kubeadm config files examples are scattered across many documentation sections. I recommend that you read the /apis/kubeadm/v1beta1 GoDoc, which have fully populated examples of YAML files used by multiple kubeadm configuration types.
If you are configuring a self-hosted control-plane, consider using the kubeadm alpha selfhosting feature:
[..] key components such as the API server, controller manager, and
scheduler run as DaemonSet pods configured via the Kubernetes API
instead of static pods configured in the kubelet via static files.
This PR (#59371) may clarify the differences of using a self-hosted config.
You need to copy the certificates ( etcd/api server/ca etc. ) from the existing master and place on the second master.
then run kubeadm init script. since the certs are already present the cert creation step is skipped and rest of the cluster initialization steps are resumed.

Kubernetes worker nodes not automatically being assigned podCidr on kubeadm join

I have a multi-master Kubernetes cluster set up, with one worker node. I set up the cluster with kubeadm. On kubeadm init, I passed the -pod-network-cidr=10.244.0.0/16 (using Flannel as the network overlay).
When using kubeadm join on the first worker node, everything worked properly. For some reason when trying to add more workers, none of the nodes are automatically assigned a podCidr.
I used this document to manually patch each worker node, using the
kubectl patch node <NODE_NAME> -p '{"spec":{"podCIDR":"<SUBNET>"}}' command and things work fine.
But this is not ideal, I am wondering how I can fix my setup so that just adding the kubeadm join command will automatically assign the podCidr.
Any help would be greatly appreciated. Thanks!
Edit:
I1003 23:08:55.920623 1 main.go:475] Determining IP address of default interface
I1003 23:08:55.920896 1 main.go:488] Using interface with name eth0 and address
I1003 23:08:55.920915 1 main.go:505] Defaulting external address to interface address ()
I1003 23:08:55.941287 1 kube.go:131] Waiting 10m0s for node controller to sync
I1003 23:08:55.942785 1 kube.go:294] Starting kube subnet manager
I1003 23:08:56.943187 1 kube.go:138] Node controller sync successful
I1003 23:08:56.943212 1 main.go:235] Created subnet manager:
Kubernetes Subnet Manager - kubernetes-worker-06
I1003 23:08:56.943219 1 main.go:238] Installing signal handlers
I1003 23:08:56.943273 1 main.go:353] Found network config - Backend type: vxlan
I1003 23:08:56.943319 1 vxlan.go:120] VXLAN config: VNI=1 Port=0 GBP=false DirectRouting=false
E1003 23:08:56.943497 1 main.go:280] Error registering network: failed to acquire lease: node "kube-worker-02" pod cidr not assigned
I1003 23:08:56.943513 1 main.go:333] Stopping shutdownHandler...
I was able to solve my issue. In my multi-master setup, on one of my master nodes, the kube-controller-manager.yaml (in /etc/kubernetes/manifest) file was missing the two following fields:
--allocate-node-cidrs=true
--cluster-cidr=10.244.0.0/16
Once adding these fields to the yaml, I reset the kubelet service and everything was working great when trying to add a new worker node.
This was a mistake on my part, because when initializing one of my master nodes with kubeadm init, I must of forgot to pass the --pod-network-cidr. Oops.
Hope this helps someone out there!
If you only have a couple of worker nodes like I did rather than kubeadm revert and then passing in the kubeadm init --pod-network-cidr=10.244.0.0/16 you can do the following on each node and issue should disappear:
kubectl patch node node2-worker -p '{"spec":{"podCIDR":"10.244.0.0/16"}}'
where node2-worker is your node name and not the one shown.
I'm using kubernetes v1.16 with docker-ce v17.05. The thing is, I only have one master node, which is inited with --pod-network-cidr option.
The flannel pod on another worker node failed to syncing, according to kubelet log under /var/log/message. Checking this pod (with docker logs <container-id>), it turned out that node <NODE_NAME> pod cidr not assigned.
I fixed it by manually set the podCidr to the worker node, according to this doc
Although I've not yet figured out why this manually set-up is required, because as the the doc pointed out:
If kubeadm is being used then pass --pod-network-cidr=10.244.0.0/16 to kubeadm init which will ensure that all nodes are automatically assigned a podCIDR.