adding master to Kubernetes cluster: cluster doesn't have a stable controlPlaneEndpoint address - kubernetes

How can I add a second master to the control plane of an existing Kubernetes 1.14 cluster?
The available documentation apparently assumes that both masters (in stacked control plane and etcd nodes) are created at the same time. I have created my first master already a while ago with kubeadm init --pod-network-cidr=10.244.0.0/16, so I don't have a kubeadm-config.yaml as referred to by this documentation.
I have tried the following instead:
kubeadm join ... --token ... --discovery-token-ca-cert-hash ... \
--experimental-control-plane --certificate-key ...
The part kubeadm join ... --token ... --discovery-token-ca-cert-hash ... is what is suggested when running kubeadm token create --print-join-command on the first master; it normally serves for adding another worker. --experimental-control-plane is for adding another master instead. The key in --certificate-key ... is as suggested by running kubeadm init phase upload-certs --experimental-upload-certs on the first master.
I receive the following errors:
[preflight] Running pre-flight checks
[WARNING IsDockerSystemdCheck]: detected "cgroupfs" as the Docker cgroup driver.
The recommended driver is "systemd". Please follow the guide at
https://kubernetes.io/docs/setup/cri/
[preflight] Reading configuration from the cluster...
[preflight] FYI: You can look at this config file with 'kubectl -n kube-system get cm kubeadm-config -oyaml'
error execution phase preflight:
One or more conditions for hosting a new control plane instance is not satisfied.
unable to add a new control plane instance a cluster that doesn't have a stable
controlPlaneEndpoint address
Please ensure that:
* The cluster has a stable controlPlaneEndpoint address.
* The certificates that must be shared among control plane instances are provided.
What does it mean for my cluster not to have a stable controlPlaneEndpoint address? Could this be related to controlPlaneEndpoint in the output from kubectl -n kube-system get configmap kubeadm-config -o yaml currently being an empty string? How can I overcome this situation?

As per HA - Create load balancer for kube-apiserver:
In a cloud environment you should place your control plane nodes behind a TCP forwarding load balancer. This load balancer distributes
traffic to all healthy control plane nodes in its target list. The
health check for an apiserver is a TCP check on the port the
kube-apiserver listens on (default value :6443).
The load balancer must be able to communicate with all control plane nodes on the apiserver port. It must also allow incoming traffic
on its listening port.
Make sure the address of the load balancer
always matches the address of kubeadm’s ControlPlaneEndpoint.
To set ControlPlaneEndpoint config, you should use kubeadm with the --config flag. Take a look here for a config file example:
apiVersion: kubeadm.k8s.io/v1beta1
kind: ClusterConfiguration
kubernetesVersion: stable
controlPlaneEndpoint: "LOAD_BALANCER_DNS:LOAD_BALANCER_PORT"
Kubeadm config files examples are scattered across many documentation sections. I recommend that you read the /apis/kubeadm/v1beta1 GoDoc, which have fully populated examples of YAML files used by multiple kubeadm configuration types.
If you are configuring a self-hosted control-plane, consider using the kubeadm alpha selfhosting feature:
[..] key components such as the API server, controller manager, and
scheduler run as DaemonSet pods configured via the Kubernetes API
instead of static pods configured in the kubelet via static files.
This PR (#59371) may clarify the differences of using a self-hosted config.

You need to copy the certificates ( etcd/api server/ca etc. ) from the existing master and place on the second master.
then run kubeadm init script. since the certs are already present the cert creation step is skipped and rest of the cluster initialization steps are resumed.

Related

Kubeadm join: Fails while creating HA cluster with multiple master nodes

I have 5 Vm in my GCP, out of which three are supposed to be master1, master2, master3 and other two are worker nodes (worker1 & worker 2). I have created a TCP Loadbalancer(LB) to enable load balancing for the master nodes. I have two sections in the LB:
i)frontend ii)backend
In the backend, i have defined all master ips there. And the frontend, i generated a static public ip and given port 6443 as LB port.
In master1, i sucessfully ran the kubeadm init command as follows:
kubeadm init --control-plane-endpoint="<LB_IP>:6443" --apiserver-advertise-address=10.128.0.2 --pod-network-cidr=10.244.0.0/16
where 10.128.0.2 is the master1 internal ip & 10.244.0.0/16 is the network cidr for the kube-flannel.
The kubeadm init runs sucessfully and gives two kubeadm join commands, one to join a new control plane and other to join a new worker node.
You can now join any number of control-plane nodes by copying certificate authorities
and service account keys on each node and then running the following as root:
kubeadm join LB_IP:6443 --token znnlha.6Gfn1vlkunwpz36b \
--discovery-token-ca-cert-hash sha256:dc8834a2a5b4ada38a1ab9831e4cae67e9d64cb585458a194018f3ba5a82ac4U \
--control-plane
Then you can join any number of worker nodes by running the following on each as root:
kubeadm join LB_IP:6443 --token znnlha.6sfn1vlkunwpz36A \
--discovery-token-ca-cert-hash sha256:dc8834a2a5b4ada38a1ab9831e4cae68e9d64cb585458a194018f3ba5a82ac4e
I am not using --upload-certs for transfering the certificates from one control plane to another. I am doing it manually.
But when I run the above kubeadm join command to add a new control plane, on the one of my other master nodes,say master2, I am getting an error like following :
[preflight] Running pre-flight checks
[preflight] Reading configuration from the cluster...
[preflight] FYI: You can look at this config file with 'kubectl -n kube-system get cm kubeadm-config -o yaml'
error execution phase preflight: unable to fetch the kubeadm-config ConfigMap: failed to get config map: Get "https://LB_IP:6443/api/v1/namespaces/kube-system/configmaps/kubeadm-config?timeout=10s": dial tcp LB_IP:6443: connect: connection refused
I just came across the same issue - in my case the problem was that when hitting GCP's loadbalancer from an instance which is also a target of this loadbalancer, the request will always land on the same instance from where you sent the request.
Based on this:
you run kubeadm init on Node A using LB_IP. LB_IP gets resolved to Node A. Everything works as expected, as you are creating a new cluster.
you run kubeadm join on Node B using LB_IP. LB_IP gets resolved to Node B, while the master you just initialized is on Node A. Node B doesn't have anything running on port 6443, thus you get connection refused error.
A simple solution is to remove the instance you are running kubeadm join on from the loadbalancer targets. You can re-add it right after successful join.

How to convert a Kubernetes non-HA control plane into an HA control plane?

What is the best way to convert a kubernetes non-HA control plane into an HA control plane?
I have started with my cluster as a non-HA control plane - one master node and several worker nodes. The cluster is already running with a lots of services.
Now I would like to add additional master nodes to convert my cluster into a HA control plane. I have setup and configured a load balancer.
But I did not figure out how I can change the -control-plane-endpoint to my load balancer IP address for my existing master node.
Calling kubeadm results in the following error:
sudo kubeadm init --control-plane-endpoint "my-load-balancer:6443" --upload-certs
[init] Using Kubernetes version: v1.20.1
[preflight] Running pre-flight checks
[WARNING SystemVerification]: missing optional cgroups: hugetlb
error execution phase preflight: [preflight] Some fatal errors occurred:
[ERROR Port-6443]: Port 6443 is in use
[ERROR Port-10259]: Port 10259 is in use
[ERROR Port-10257]: Port 10257 is in use
[ERROR FileAvailable--etc-kubernetes-manifests-kube-apiserver.yaml]: /etc/kubernetes/manifests/kube-apiserver.yaml already exists
[ERROR FileAvailable--etc-kubernetes-manifests-kube-controller-manager.yaml]: /etc/kubernetes/manifests/kube-controller-manager.yaml already exists
[ERROR FileAvailable--etc-kubernetes-manifests-kube-scheduler.yaml]: /etc/kubernetes/manifests/kube-scheduler.yaml already exists
[ERROR FileAvailable--etc-kubernetes-manifests-etcd.yaml]: /etc/kubernetes/manifests/etcd.yaml already exists
[ERROR Port-10250]: Port 10250 is in use
[ERROR Port-2379]: Port 2379 is in use
[ERROR Port-2380]: Port 2380 is in use
[ERROR DirAvailable--var-lib-etcd]: /var/lib/etcd is not empty
[preflight] If you know what you are doing, you can make a check non-fatal with `--ignore-preflight-errors=...`
To see the stack trace of this error execute with --v=5 or higher
The error message seems to be clear as my master is already running.
Is there a way how I can easily tell my existing master node to use the new load balancer to run as a HA control plane?
Best solution in my opinion
The best approach to convert a non-HA control plane to an HA control plane is to create a completely new HA control plane and after that to migrate all your applications there.
Possible solution
Below I will try to help you to achieve your goal but I do not recommend using this procedure on any cluster that will ever be considered as production cluster. It work for my scenario, it also might help you.
Update the kube-apiserver certificate
First of all, kube-apiserver uses a certificate to encrypt control plane traffic and this certificate have something known as SAN (Subject Alternative Name).
SAN is a list of IP addresses that you will use to access the API, so you need to add there IP address of your LoadBalancer and probably the hostname of your LB as well.
To do that, you have to get kubeadm configuration e.g. using command:
$ kubeadm config view > kubeadm-config.yaml
and then add certSANs to kubeadm-config.yaml config file under apiServer section, it should looks like below example: (you may also need to add controlPlaneEndpoint to point to your LB).
apiServer:
certSANs:
- "192.168.0.2" # your LB address
- "loadbalancer" # your LB hostname
extraArgs:
authorization-mode: Node,RBAC
...
controlPlaneEndpoint: "loadbalancer" # your LB DNS name or DNS CNAME
...
Now you can update kube-apiserver cert using:
BUT please remember you must first delete/move your old kube-apiserver cert and key from /etc/kubernetes/pki/ !
$ kubeadm init phase certs apiserver --config kubeadm-config.yaml.
Finally restart your kube-apiserver.
Update the kubelet, the scheduler and the controller manager kubeconfig files
Next step is to update the kubelet, scheduler and controller manager to communicate with the kube-apiserver using LoadBalancer.
All three of these components use standard kubeconfig files:
/etc/kubernetes/kubelet.conf, /etc/kubernetes/scheduler.conf, /etc/kubernetes/controller-manager.conf to communicate with kube-apiserver.
The only thing to do is to edit the server: line to point to LB instead of kube-apiserver directly and then restart these components.
The kubelet is systemd service so to restart it use:
systemctl restart kubelet
the controller manager and schedulers are deployed as pods.
Update the kube-proxy kubeconfig files
Next it is time to update kubeconfig file for kube-proxy and same as before - the only thing to do is to edit the server: line to point to LoadBalancer instead of kube-apiserver directly.
This kubeconfig is in fact a configmap, so you can edit it directly using:
$ kubectl edit cm kube-proxy -n kube-system
or first save it as manifest file:
$ kubectl get cm kube-proxy -n kube-system -o yaml > kube-proxy.yml
and then apply changes.
Don't forget to restart kube-proxy after these changes.
Update the kubeadm-config configmap
At the end upload new kubeadm-config configmap (with certSANs and controlPlaneEndpoint entries) to the cluster, it's especially important when you want to add new node to the cluster.
$ kubeadm config upload from-file --config kubeadm-config.yaml
if command above doesn't work, try this:
$ kubeadm upgrade apply --config kubeadm-config.yaml

Something seems to be catching TCP traffic to pods

I'm trying to deploy Kubernetes with Calico (IPIP) with Kubeadm. After deployment is done I'm deploying Calico using these manifests
kubectl apply -f https://docs.projectcalico.org/v3.3/getting-started/kubernetes/installation/hosted/rbac-kdd.yaml
kubectl apply -f https://docs.projectcalico.org/v3.3/getting-started/kubernetes/installation/hosted/kubernetes-datastore/calico-networking/1.7/calico.yaml
Before applying it, I'm editing CALICO_IPV4POOL_CIDR and setting it to 10.250.0.0/17 as well as using command kubeadm init --pod-cidr 10.250.0.0/17.
After few seconds CoreDNS pods (for example getting addr 10.250.2.2) starts restarting with error 10.250.2.2:8080 connection refused.
Now a bit of digging:
from any node in cluster ping 10.250.2.2 works and it reaches pod (tcpdump in pod net namespace shows it).
from different pod (on different node) curl 10.250.2.2:8080 works well
from any node to curl 10.250.2.2:8080 fails with connection refused
Because it's coredns pod it listens on 53 both udp and tcp, so I've tried netcat from nodes
nc 10.250.2.2 53 - connection refused
nc -u 10.250.2.2 55 - works
Now I've tcpdump each interface on source node for port 8080 and curl to CoreDNS pod doesn't even seem to leave node... sooo iptables?
I've also tried weave, canal and flannel, all seem to have same issue.
I've ran out of ideas by now...any pointers please?
Seems to be a problem with Calico implementation, CoreDNS Pods are sensitive on the CNI network Pods successful functioning.
For proper CNI network plugin implementation you have to include --pod-network-cidr flag to kubeadm init command and afterwards apply the same value to CALICO_IPV4POOL_CIDR parameter inside calico.yml.
Moreover, for a successful Pod network installation you have to apply some RBAC rules in order to make sufficient permissions in compliance with general cluster security restrictions, as described in official Kubernetes documentation:
For Calico to work correctly, you need to pass
--pod-network-cidr=192.168.0.0/16 to kubeadm init or update the calico.yml file to match your Pod network. Note that Calico works on
amd64 only.
kubectl apply -f https://docs.projectcalico.org/v3.3/getting-started/kubernetes/installation/hosted/rbac-kdd.yaml
kubectl apply -f https://docs.projectcalico.org/v3.3/getting-started/kubernetes/installation/hosted/kubernetes-datastore/calico-networking/1.7/calico.yaml
In your case I would switched to the latest Calico versions at least from v3.3 as given in the example.
If you've noticed that you run Pod network plugin installation properly, please take a chance and update the question with your current environment setup and Kubernetes components versions with a health statuses.

Kubernetes: Trying to add second master node to K8S master using stacked control plane instructions

I was following instructions at https://kubernetes.io/docs/setup/independent/high-availability/#stacked-control-plane-and-etcd-nodes and I can't get the secondary master node to join the primary master.
$> kubeadm join LB_IP:6443 --token TOKEN --discovery-token-ca-cert-hash sha256:HASH --experimental-control-plane
[preflight] running pre-flight checks
[discovery] Trying to connect to API Server "LB_IP:6443"
[discovery] Created cluster-info discovery client, requesting info from "https://LB_IP:6443"
[discovery] Requesting info from "https://LB_IP:6443" again to validate TLS against the pinned public key
[discovery] Cluster info signature and contents are valid and TLS certificate validates against pinned roots, will use API Server "LB_IP:6443"
[discovery] Successfully established connection with API Server "LB_IP:6443"
[join] Reading configuration from the cluster...
[join] FYI: You can look at this config file with 'kubectl -n kube-system get cm kubeadm-config -oyaml'
One or more conditions for hosting a new control plane instance is not satisfied.
unable to add a new control plane instance on a cluster that doesn't use an external etcd
Please ensure that:
* The cluster has a stable controlPlaneEndpoint address.
* The cluster uses an external etcd.
* The certificates that must be shared among control plane instances are provided.
Here is my admin init config:
apiVersion: kubeadm.k8s.io/v1alpha3
kind: ClusterConfiguration
kubernetesVersion: "1.12.3"
apiServer:
certSANs:
- "LB_IP"
controlPlaneEndpoint: "LB_IP:6443"
networking:
podSubnet: "192.168.128.0/17"
serviceSubnet: "192.168.0.0/17"
And I initialized the primary master node like:
kubeadm init --config=./kube-adm-config.yaml
I have also copied all the certs to the secondary node and kubectl works on the secondary:
[root#secondary ~]# kubectl get nodes
NAME STATUS ROLES AGE VERSION
primary.fqdn Ready master 8h v1.12.3
I was really hoping to not set up external etcd nodes. The instructions seem pretty straightforward and I don't understand what I am missing.
Any advice to help get this stacked control plane multi-master setup with local etcd to work would be appreciated. Or any debugging ideas. Or at least "stacked control plane doesn't work, you must use external etcd".
Upgrading to k8s version 1.13.0 resolved my issue. I think the instructions were specifically for this newer version.

Kubernetes worker nodes not automatically being assigned podCidr on kubeadm join

I have a multi-master Kubernetes cluster set up, with one worker node. I set up the cluster with kubeadm. On kubeadm init, I passed the -pod-network-cidr=10.244.0.0/16 (using Flannel as the network overlay).
When using kubeadm join on the first worker node, everything worked properly. For some reason when trying to add more workers, none of the nodes are automatically assigned a podCidr.
I used this document to manually patch each worker node, using the
kubectl patch node <NODE_NAME> -p '{"spec":{"podCIDR":"<SUBNET>"}}' command and things work fine.
But this is not ideal, I am wondering how I can fix my setup so that just adding the kubeadm join command will automatically assign the podCidr.
Any help would be greatly appreciated. Thanks!
Edit:
I1003 23:08:55.920623 1 main.go:475] Determining IP address of default interface
I1003 23:08:55.920896 1 main.go:488] Using interface with name eth0 and address
I1003 23:08:55.920915 1 main.go:505] Defaulting external address to interface address ()
I1003 23:08:55.941287 1 kube.go:131] Waiting 10m0s for node controller to sync
I1003 23:08:55.942785 1 kube.go:294] Starting kube subnet manager
I1003 23:08:56.943187 1 kube.go:138] Node controller sync successful
I1003 23:08:56.943212 1 main.go:235] Created subnet manager:
Kubernetes Subnet Manager - kubernetes-worker-06
I1003 23:08:56.943219 1 main.go:238] Installing signal handlers
I1003 23:08:56.943273 1 main.go:353] Found network config - Backend type: vxlan
I1003 23:08:56.943319 1 vxlan.go:120] VXLAN config: VNI=1 Port=0 GBP=false DirectRouting=false
E1003 23:08:56.943497 1 main.go:280] Error registering network: failed to acquire lease: node "kube-worker-02" pod cidr not assigned
I1003 23:08:56.943513 1 main.go:333] Stopping shutdownHandler...
I was able to solve my issue. In my multi-master setup, on one of my master nodes, the kube-controller-manager.yaml (in /etc/kubernetes/manifest) file was missing the two following fields:
--allocate-node-cidrs=true
--cluster-cidr=10.244.0.0/16
Once adding these fields to the yaml, I reset the kubelet service and everything was working great when trying to add a new worker node.
This was a mistake on my part, because when initializing one of my master nodes with kubeadm init, I must of forgot to pass the --pod-network-cidr. Oops.
Hope this helps someone out there!
If you only have a couple of worker nodes like I did rather than kubeadm revert and then passing in the kubeadm init --pod-network-cidr=10.244.0.0/16 you can do the following on each node and issue should disappear:
kubectl patch node node2-worker -p '{"spec":{"podCIDR":"10.244.0.0/16"}}'
where node2-worker is your node name and not the one shown.
I'm using kubernetes v1.16 with docker-ce v17.05. The thing is, I only have one master node, which is inited with --pod-network-cidr option.
The flannel pod on another worker node failed to syncing, according to kubelet log under /var/log/message. Checking this pod (with docker logs <container-id>), it turned out that node <NODE_NAME> pod cidr not assigned.
I fixed it by manually set the podCidr to the worker node, according to this doc
Although I've not yet figured out why this manually set-up is required, because as the the doc pointed out:
If kubeadm is being used then pass --pod-network-cidr=10.244.0.0/16 to kubeadm init which will ensure that all nodes are automatically assigned a podCIDR.