Kubernetes worker nodes not automatically being assigned podCidr on kubeadm join - kubernetes

I have a multi-master Kubernetes cluster set up, with one worker node. I set up the cluster with kubeadm. On kubeadm init, I passed the -pod-network-cidr=10.244.0.0/16 (using Flannel as the network overlay).
When using kubeadm join on the first worker node, everything worked properly. For some reason when trying to add more workers, none of the nodes are automatically assigned a podCidr.
I used this document to manually patch each worker node, using the
kubectl patch node <NODE_NAME> -p '{"spec":{"podCIDR":"<SUBNET>"}}' command and things work fine.
But this is not ideal, I am wondering how I can fix my setup so that just adding the kubeadm join command will automatically assign the podCidr.
Any help would be greatly appreciated. Thanks!
Edit:
I1003 23:08:55.920623 1 main.go:475] Determining IP address of default interface
I1003 23:08:55.920896 1 main.go:488] Using interface with name eth0 and address
I1003 23:08:55.920915 1 main.go:505] Defaulting external address to interface address ()
I1003 23:08:55.941287 1 kube.go:131] Waiting 10m0s for node controller to sync
I1003 23:08:55.942785 1 kube.go:294] Starting kube subnet manager
I1003 23:08:56.943187 1 kube.go:138] Node controller sync successful
I1003 23:08:56.943212 1 main.go:235] Created subnet manager:
Kubernetes Subnet Manager - kubernetes-worker-06
I1003 23:08:56.943219 1 main.go:238] Installing signal handlers
I1003 23:08:56.943273 1 main.go:353] Found network config - Backend type: vxlan
I1003 23:08:56.943319 1 vxlan.go:120] VXLAN config: VNI=1 Port=0 GBP=false DirectRouting=false
E1003 23:08:56.943497 1 main.go:280] Error registering network: failed to acquire lease: node "kube-worker-02" pod cidr not assigned
I1003 23:08:56.943513 1 main.go:333] Stopping shutdownHandler...

I was able to solve my issue. In my multi-master setup, on one of my master nodes, the kube-controller-manager.yaml (in /etc/kubernetes/manifest) file was missing the two following fields:
--allocate-node-cidrs=true
--cluster-cidr=10.244.0.0/16
Once adding these fields to the yaml, I reset the kubelet service and everything was working great when trying to add a new worker node.
This was a mistake on my part, because when initializing one of my master nodes with kubeadm init, I must of forgot to pass the --pod-network-cidr. Oops.
Hope this helps someone out there!

If you only have a couple of worker nodes like I did rather than kubeadm revert and then passing in the kubeadm init --pod-network-cidr=10.244.0.0/16 you can do the following on each node and issue should disappear:
kubectl patch node node2-worker -p '{"spec":{"podCIDR":"10.244.0.0/16"}}'
where node2-worker is your node name and not the one shown.

I'm using kubernetes v1.16 with docker-ce v17.05. The thing is, I only have one master node, which is inited with --pod-network-cidr option.
The flannel pod on another worker node failed to syncing, according to kubelet log under /var/log/message. Checking this pod (with docker logs <container-id>), it turned out that node <NODE_NAME> pod cidr not assigned.
I fixed it by manually set the podCidr to the worker node, according to this doc
Although I've not yet figured out why this manually set-up is required, because as the the doc pointed out:
If kubeadm is being used then pass --pod-network-cidr=10.244.0.0/16 to kubeadm init which will ensure that all nodes are automatically assigned a podCIDR.

Related

Kubeadm join: Fails while creating HA cluster with multiple master nodes

I have 5 Vm in my GCP, out of which three are supposed to be master1, master2, master3 and other two are worker nodes (worker1 & worker 2). I have created a TCP Loadbalancer(LB) to enable load balancing for the master nodes. I have two sections in the LB:
i)frontend ii)backend
In the backend, i have defined all master ips there. And the frontend, i generated a static public ip and given port 6443 as LB port.
In master1, i sucessfully ran the kubeadm init command as follows:
kubeadm init --control-plane-endpoint="<LB_IP>:6443" --apiserver-advertise-address=10.128.0.2 --pod-network-cidr=10.244.0.0/16
where 10.128.0.2 is the master1 internal ip & 10.244.0.0/16 is the network cidr for the kube-flannel.
The kubeadm init runs sucessfully and gives two kubeadm join commands, one to join a new control plane and other to join a new worker node.
You can now join any number of control-plane nodes by copying certificate authorities
and service account keys on each node and then running the following as root:
kubeadm join LB_IP:6443 --token znnlha.6Gfn1vlkunwpz36b \
--discovery-token-ca-cert-hash sha256:dc8834a2a5b4ada38a1ab9831e4cae67e9d64cb585458a194018f3ba5a82ac4U \
--control-plane
Then you can join any number of worker nodes by running the following on each as root:
kubeadm join LB_IP:6443 --token znnlha.6sfn1vlkunwpz36A \
--discovery-token-ca-cert-hash sha256:dc8834a2a5b4ada38a1ab9831e4cae68e9d64cb585458a194018f3ba5a82ac4e
I am not using --upload-certs for transfering the certificates from one control plane to another. I am doing it manually.
But when I run the above kubeadm join command to add a new control plane, on the one of my other master nodes,say master2, I am getting an error like following :
[preflight] Running pre-flight checks
[preflight] Reading configuration from the cluster...
[preflight] FYI: You can look at this config file with 'kubectl -n kube-system get cm kubeadm-config -o yaml'
error execution phase preflight: unable to fetch the kubeadm-config ConfigMap: failed to get config map: Get "https://LB_IP:6443/api/v1/namespaces/kube-system/configmaps/kubeadm-config?timeout=10s": dial tcp LB_IP:6443: connect: connection refused
I just came across the same issue - in my case the problem was that when hitting GCP's loadbalancer from an instance which is also a target of this loadbalancer, the request will always land on the same instance from where you sent the request.
Based on this:
you run kubeadm init on Node A using LB_IP. LB_IP gets resolved to Node A. Everything works as expected, as you are creating a new cluster.
you run kubeadm join on Node B using LB_IP. LB_IP gets resolved to Node B, while the master you just initialized is on Node A. Node B doesn't have anything running on port 6443, thus you get connection refused error.
A simple solution is to remove the instance you are running kubeadm join on from the loadbalancer targets. You can re-add it right after successful join.

How to convert a Kubernetes non-HA control plane into an HA control plane?

What is the best way to convert a kubernetes non-HA control plane into an HA control plane?
I have started with my cluster as a non-HA control plane - one master node and several worker nodes. The cluster is already running with a lots of services.
Now I would like to add additional master nodes to convert my cluster into a HA control plane. I have setup and configured a load balancer.
But I did not figure out how I can change the -control-plane-endpoint to my load balancer IP address for my existing master node.
Calling kubeadm results in the following error:
sudo kubeadm init --control-plane-endpoint "my-load-balancer:6443" --upload-certs
[init] Using Kubernetes version: v1.20.1
[preflight] Running pre-flight checks
[WARNING SystemVerification]: missing optional cgroups: hugetlb
error execution phase preflight: [preflight] Some fatal errors occurred:
[ERROR Port-6443]: Port 6443 is in use
[ERROR Port-10259]: Port 10259 is in use
[ERROR Port-10257]: Port 10257 is in use
[ERROR FileAvailable--etc-kubernetes-manifests-kube-apiserver.yaml]: /etc/kubernetes/manifests/kube-apiserver.yaml already exists
[ERROR FileAvailable--etc-kubernetes-manifests-kube-controller-manager.yaml]: /etc/kubernetes/manifests/kube-controller-manager.yaml already exists
[ERROR FileAvailable--etc-kubernetes-manifests-kube-scheduler.yaml]: /etc/kubernetes/manifests/kube-scheduler.yaml already exists
[ERROR FileAvailable--etc-kubernetes-manifests-etcd.yaml]: /etc/kubernetes/manifests/etcd.yaml already exists
[ERROR Port-10250]: Port 10250 is in use
[ERROR Port-2379]: Port 2379 is in use
[ERROR Port-2380]: Port 2380 is in use
[ERROR DirAvailable--var-lib-etcd]: /var/lib/etcd is not empty
[preflight] If you know what you are doing, you can make a check non-fatal with `--ignore-preflight-errors=...`
To see the stack trace of this error execute with --v=5 or higher
The error message seems to be clear as my master is already running.
Is there a way how I can easily tell my existing master node to use the new load balancer to run as a HA control plane?
Best solution in my opinion
The best approach to convert a non-HA control plane to an HA control plane is to create a completely new HA control plane and after that to migrate all your applications there.
Possible solution
Below I will try to help you to achieve your goal but I do not recommend using this procedure on any cluster that will ever be considered as production cluster. It work for my scenario, it also might help you.
Update the kube-apiserver certificate
First of all, kube-apiserver uses a certificate to encrypt control plane traffic and this certificate have something known as SAN (Subject Alternative Name).
SAN is a list of IP addresses that you will use to access the API, so you need to add there IP address of your LoadBalancer and probably the hostname of your LB as well.
To do that, you have to get kubeadm configuration e.g. using command:
$ kubeadm config view > kubeadm-config.yaml
and then add certSANs to kubeadm-config.yaml config file under apiServer section, it should looks like below example: (you may also need to add controlPlaneEndpoint to point to your LB).
apiServer:
certSANs:
- "192.168.0.2" # your LB address
- "loadbalancer" # your LB hostname
extraArgs:
authorization-mode: Node,RBAC
...
controlPlaneEndpoint: "loadbalancer" # your LB DNS name or DNS CNAME
...
Now you can update kube-apiserver cert using:
BUT please remember you must first delete/move your old kube-apiserver cert and key from /etc/kubernetes/pki/ !
$ kubeadm init phase certs apiserver --config kubeadm-config.yaml.
Finally restart your kube-apiserver.
Update the kubelet, the scheduler and the controller manager kubeconfig files
Next step is to update the kubelet, scheduler and controller manager to communicate with the kube-apiserver using LoadBalancer.
All three of these components use standard kubeconfig files:
/etc/kubernetes/kubelet.conf, /etc/kubernetes/scheduler.conf, /etc/kubernetes/controller-manager.conf to communicate with kube-apiserver.
The only thing to do is to edit the server: line to point to LB instead of kube-apiserver directly and then restart these components.
The kubelet is systemd service so to restart it use:
systemctl restart kubelet
the controller manager and schedulers are deployed as pods.
Update the kube-proxy kubeconfig files
Next it is time to update kubeconfig file for kube-proxy and same as before - the only thing to do is to edit the server: line to point to LoadBalancer instead of kube-apiserver directly.
This kubeconfig is in fact a configmap, so you can edit it directly using:
$ kubectl edit cm kube-proxy -n kube-system
or first save it as manifest file:
$ kubectl get cm kube-proxy -n kube-system -o yaml > kube-proxy.yml
and then apply changes.
Don't forget to restart kube-proxy after these changes.
Update the kubeadm-config configmap
At the end upload new kubeadm-config configmap (with certSANs and controlPlaneEndpoint entries) to the cluster, it's especially important when you want to add new node to the cluster.
$ kubeadm config upload from-file --config kubeadm-config.yaml
if command above doesn't work, try this:
$ kubeadm upgrade apply --config kubeadm-config.yaml

Nginx Kubernetes POD stays in ContainerCreating

I was able to setup the Kubernetes Cluster on Centos7 with one master and two worker nodes, however when I try to deploy a pod with nginx, the state of the pod stays in ContainerRunning forever and doesn't seem to get out of it.
For pod network I am using the calico.
Can you please help me resolve this issue? for some reason I don't feel satisfied moving forward without resolving this issue, I tried to check forums etc, since the last two days and this is the last resort that I am reaching out to you.
[root#kube-master ~]# kubectl get pods --all-namespaces
[get pods result][1]
However when I run describe pods I see the below error for the nginx container under events section.
Warning FailedCreatePodSandBox 41s (x8 over 11m) kubelet,
kube-worker1 (combined from similar events): Failed to create pod
sandbox: rpc error: code = Unknown desc = failed to set up sandbox
container
"ac77a42270009cba0c508e2fd82a84d6caef287bdb117d288d5193960b52abcb"
network for pod "nginx-6db489d4b7-2r4d2": networkPlugin cni failed to
set up pod "nginx-6db489d4b7-2r4d2_default" network: unable to connect
to Cilium daemon: failed to create cilium agent client after 30.000000
seconds timeout: Get http:///var/run/cilium/cilium.sock/v1/config:
dial unix /var/run/cilium/cilium.sock: connect: no such file or
directory
Hope you can help here.
Edit 1:
The ip address of the master VM is 192.168.40.133
Used the below command to initialize the kubeadm:
kubeadm init --pod-network-cidr=192.168.0.0/16 --apiserver-advertise-address 192.168.40.133
Used the below command to install the pod network:
kubectl apply -f https://docs.projectcalico.org/v3.11/manifests/calico.yaml
The kubeadm init above gave me the join command that I used to join the workers into the cluster.
All the VMs are connected to host and bridged network adapters.
your pod subnet (specified by --pod-network-cidr) clashes with the network your VMs are located in: these 2 have to be distinct. Use something else for the pod subnet, for example 10.244.0.0/16 and then edit calico.yaml before applying it as described in the official docs:
POD_CIDR="10.244.0.0/16"
kubeadm init --pod-network-cidr=${POD_CIDR} --apiserver-advertise-address 192.168.40.133
curl https://docs.projectcalico.org/manifests/calico.yaml -O
sed -i -e "s?192.168.0.0/16?${POD_CIDR}?g" calico.yaml
kubectl apply -f calico.yaml
hope this helps :)
note: you don't really need to specify --apiserver-advertise-address flag: kubeadm will detect correctly the main IP of the machine most of the time.

adding master to Kubernetes cluster: cluster doesn't have a stable controlPlaneEndpoint address

How can I add a second master to the control plane of an existing Kubernetes 1.14 cluster?
The available documentation apparently assumes that both masters (in stacked control plane and etcd nodes) are created at the same time. I have created my first master already a while ago with kubeadm init --pod-network-cidr=10.244.0.0/16, so I don't have a kubeadm-config.yaml as referred to by this documentation.
I have tried the following instead:
kubeadm join ... --token ... --discovery-token-ca-cert-hash ... \
--experimental-control-plane --certificate-key ...
The part kubeadm join ... --token ... --discovery-token-ca-cert-hash ... is what is suggested when running kubeadm token create --print-join-command on the first master; it normally serves for adding another worker. --experimental-control-plane is for adding another master instead. The key in --certificate-key ... is as suggested by running kubeadm init phase upload-certs --experimental-upload-certs on the first master.
I receive the following errors:
[preflight] Running pre-flight checks
[WARNING IsDockerSystemdCheck]: detected "cgroupfs" as the Docker cgroup driver.
The recommended driver is "systemd". Please follow the guide at
https://kubernetes.io/docs/setup/cri/
[preflight] Reading configuration from the cluster...
[preflight] FYI: You can look at this config file with 'kubectl -n kube-system get cm kubeadm-config -oyaml'
error execution phase preflight:
One or more conditions for hosting a new control plane instance is not satisfied.
unable to add a new control plane instance a cluster that doesn't have a stable
controlPlaneEndpoint address
Please ensure that:
* The cluster has a stable controlPlaneEndpoint address.
* The certificates that must be shared among control plane instances are provided.
What does it mean for my cluster not to have a stable controlPlaneEndpoint address? Could this be related to controlPlaneEndpoint in the output from kubectl -n kube-system get configmap kubeadm-config -o yaml currently being an empty string? How can I overcome this situation?
As per HA - Create load balancer for kube-apiserver:
In a cloud environment you should place your control plane nodes behind a TCP forwarding load balancer. This load balancer distributes
traffic to all healthy control plane nodes in its target list. The
health check for an apiserver is a TCP check on the port the
kube-apiserver listens on (default value :6443).
The load balancer must be able to communicate with all control plane nodes on the apiserver port. It must also allow incoming traffic
on its listening port.
Make sure the address of the load balancer
always matches the address of kubeadm’s ControlPlaneEndpoint.
To set ControlPlaneEndpoint config, you should use kubeadm with the --config flag. Take a look here for a config file example:
apiVersion: kubeadm.k8s.io/v1beta1
kind: ClusterConfiguration
kubernetesVersion: stable
controlPlaneEndpoint: "LOAD_BALANCER_DNS:LOAD_BALANCER_PORT"
Kubeadm config files examples are scattered across many documentation sections. I recommend that you read the /apis/kubeadm/v1beta1 GoDoc, which have fully populated examples of YAML files used by multiple kubeadm configuration types.
If you are configuring a self-hosted control-plane, consider using the kubeadm alpha selfhosting feature:
[..] key components such as the API server, controller manager, and
scheduler run as DaemonSet pods configured via the Kubernetes API
instead of static pods configured in the kubelet via static files.
This PR (#59371) may clarify the differences of using a self-hosted config.
You need to copy the certificates ( etcd/api server/ca etc. ) from the existing master and place on the second master.
then run kubeadm init script. since the certs are already present the cert creation step is skipped and rest of the cluster initialization steps are resumed.

Can I label a worker node before joining Kubernetes master?

I'd like to prepare multiple yaml files customizing arguments of flannel (DaemonSet) and run the flannel pod of the node with yaml matching the condition expressed by the label. Can I label a worker node before joining Kubernetes master ?
You can specify --node-labels when you're kubelet is starting, which will apply the labels to the nodes; but ONLY during registration.
This will not work if your kubelet is starting up and the node is already a member of the cluster.
Kubelet Docs