How to change name of a kubernetes node - kubernetes

I have a running node in a kubernetes cluster. Is there a way I can change its name?
I have tried to
delete the node using kubectl delete
change the name in the node's manifest
add the node back.
But the node won't start.
Anyone know how it should be done?
Thanks

Usualy it's kubelet that is responsible for registering the node under particular name, so you should make changes to your nodes kubelet configuration and then it should pop up as new node.

Changing the node's name is not possible at the moment, it requires you to remove and rejoin the node.
You need to make sure the hostname is changed to the new name, remove the node, reset it and rejoin it.
(you will notice that with the command : kubectl edit node , you will get an error if you try and save the name:
A copy of your changes has been stored to "/tmp/kubectl-edit-qlh54.yaml"
error: At least one of apiVersion, kind and name was changed
)
Ideally you have removed the running pods on it.
You can try to run kubectl drain <node_name_to_rename> . Proceed at your own risk if that doesn't complete . --ignore-daemon-sets can be used to ignore possible issues for pods that cannot be evicted.
In short, for a node that has been renamed and is out of the cluster on CentOS 7:
kubectl delete node <original-nodename>
Then on the node that you want to rejoin, as root:
kubeadm reset
check the output and see if it applies to your setup (for potential further cleanup).
Now generate the join command on the master node:
export KUBECONFIG=/etc/kubernetes/admin.conf #(or wherever you have it)
kubeadm token create --print-join-command
Run the output on the worker node you have just reset:
kubeadm join <masternode_ip_address>:6443 --token somegeneratedtoken --discovery-token-ca-cert-hash sha256:somesha256hashthatyougotfromtheabovecommand
If you run kubectl get nodes it should show up now with the new name
output in my case:
W0220 10:43:23.286109 11473 join.go:346] [preflight] WARNING: JoinControlPane.controlPlane settings will be ignored when control-plane flag is not set.
[preflight] Running pre-flight checks
[preflight] Reading configuration from the cluster...
[preflight] FYI: You can look at this config file with 'kubectl -n kube-system get cm kubeadm-config -oyaml'
[kubelet-start] Downloading configuration for the kubelet from the "kubelet-config-1.17" ConfigMap in the kube-system namespace
[kubelet-start] Writing kubelet configuration to file "/var/lib/kubelet/config.yaml"
[kubelet-start] Writing kubelet environment file with flags to file "/var/lib/kubelet/kubeadm-flags.env"
[kubelet-start] Starting the kubelet
[kubelet-start] Waiting for the kubelet to perform the TLS Bootstrap...
This node has joined the cluster:
* Certificate signing request was sent to apiserver and a response was received.
* The Kubelet was informed of the new secure connection details.
Run 'kubectl get nodes' on the control-plane to see this node join the cluster.
Enjoy your renamed node!
Based on source: https://www.youtube.com/watch?v=TqoA9HwFLVU

Related

How to convert a Kubernetes non-HA control plane into an HA control plane?

What is the best way to convert a kubernetes non-HA control plane into an HA control plane?
I have started with my cluster as a non-HA control plane - one master node and several worker nodes. The cluster is already running with a lots of services.
Now I would like to add additional master nodes to convert my cluster into a HA control plane. I have setup and configured a load balancer.
But I did not figure out how I can change the -control-plane-endpoint to my load balancer IP address for my existing master node.
Calling kubeadm results in the following error:
sudo kubeadm init --control-plane-endpoint "my-load-balancer:6443" --upload-certs
[init] Using Kubernetes version: v1.20.1
[preflight] Running pre-flight checks
[WARNING SystemVerification]: missing optional cgroups: hugetlb
error execution phase preflight: [preflight] Some fatal errors occurred:
[ERROR Port-6443]: Port 6443 is in use
[ERROR Port-10259]: Port 10259 is in use
[ERROR Port-10257]: Port 10257 is in use
[ERROR FileAvailable--etc-kubernetes-manifests-kube-apiserver.yaml]: /etc/kubernetes/manifests/kube-apiserver.yaml already exists
[ERROR FileAvailable--etc-kubernetes-manifests-kube-controller-manager.yaml]: /etc/kubernetes/manifests/kube-controller-manager.yaml already exists
[ERROR FileAvailable--etc-kubernetes-manifests-kube-scheduler.yaml]: /etc/kubernetes/manifests/kube-scheduler.yaml already exists
[ERROR FileAvailable--etc-kubernetes-manifests-etcd.yaml]: /etc/kubernetes/manifests/etcd.yaml already exists
[ERROR Port-10250]: Port 10250 is in use
[ERROR Port-2379]: Port 2379 is in use
[ERROR Port-2380]: Port 2380 is in use
[ERROR DirAvailable--var-lib-etcd]: /var/lib/etcd is not empty
[preflight] If you know what you are doing, you can make a check non-fatal with `--ignore-preflight-errors=...`
To see the stack trace of this error execute with --v=5 or higher
The error message seems to be clear as my master is already running.
Is there a way how I can easily tell my existing master node to use the new load balancer to run as a HA control plane?
Best solution in my opinion
The best approach to convert a non-HA control plane to an HA control plane is to create a completely new HA control plane and after that to migrate all your applications there.
Possible solution
Below I will try to help you to achieve your goal but I do not recommend using this procedure on any cluster that will ever be considered as production cluster. It work for my scenario, it also might help you.
Update the kube-apiserver certificate
First of all, kube-apiserver uses a certificate to encrypt control plane traffic and this certificate have something known as SAN (Subject Alternative Name).
SAN is a list of IP addresses that you will use to access the API, so you need to add there IP address of your LoadBalancer and probably the hostname of your LB as well.
To do that, you have to get kubeadm configuration e.g. using command:
$ kubeadm config view > kubeadm-config.yaml
and then add certSANs to kubeadm-config.yaml config file under apiServer section, it should looks like below example: (you may also need to add controlPlaneEndpoint to point to your LB).
apiServer:
certSANs:
- "192.168.0.2" # your LB address
- "loadbalancer" # your LB hostname
extraArgs:
authorization-mode: Node,RBAC
...
controlPlaneEndpoint: "loadbalancer" # your LB DNS name or DNS CNAME
...
Now you can update kube-apiserver cert using:
BUT please remember you must first delete/move your old kube-apiserver cert and key from /etc/kubernetes/pki/ !
$ kubeadm init phase certs apiserver --config kubeadm-config.yaml.
Finally restart your kube-apiserver.
Update the kubelet, the scheduler and the controller manager kubeconfig files
Next step is to update the kubelet, scheduler and controller manager to communicate with the kube-apiserver using LoadBalancer.
All three of these components use standard kubeconfig files:
/etc/kubernetes/kubelet.conf, /etc/kubernetes/scheduler.conf, /etc/kubernetes/controller-manager.conf to communicate with kube-apiserver.
The only thing to do is to edit the server: line to point to LB instead of kube-apiserver directly and then restart these components.
The kubelet is systemd service so to restart it use:
systemctl restart kubelet
the controller manager and schedulers are deployed as pods.
Update the kube-proxy kubeconfig files
Next it is time to update kubeconfig file for kube-proxy and same as before - the only thing to do is to edit the server: line to point to LoadBalancer instead of kube-apiserver directly.
This kubeconfig is in fact a configmap, so you can edit it directly using:
$ kubectl edit cm kube-proxy -n kube-system
or first save it as manifest file:
$ kubectl get cm kube-proxy -n kube-system -o yaml > kube-proxy.yml
and then apply changes.
Don't forget to restart kube-proxy after these changes.
Update the kubeadm-config configmap
At the end upload new kubeadm-config configmap (with certSANs and controlPlaneEndpoint entries) to the cluster, it's especially important when you want to add new node to the cluster.
$ kubeadm config upload from-file --config kubeadm-config.yaml
if command above doesn't work, try this:
$ kubeadm upgrade apply --config kubeadm-config.yaml

MountVolume.SetUp failed for volume "rook-ceph-crash-collector-keyring" : secret "rook-ceph-crash-collector-keyring" not found

I am trying to configure ceph on kubernetes cluster using rook, I have run the following commands:
kubectl apply -f common.yaml
kubectl apply -f operator.yaml
kubectl apply -f cluster.yaml
I have three worker nodes with atached volumes and on master, all the created pods are running except the rook-ceph-crashcollector pods for the three nodes, when I describe these pods I get this message
MountVolume.SetUp failed for volume "rook-ceph-crash-collector-keyring" : secret "rook-ceph-crash-collector-keyring" not found
However all the nodes are running and working
It is hard to exactly tell what might be the cause of this but there are few possibilities:
Cluster networking problem between nodes
Some possible leftover sockets in the /var/lib/kubelet directory related to rook ceph.
A bug when connecting to an external Ceph cluster.
In order to fix your issue you can:
Use Flannel and make sure it is using the right interface. Check the kube-flannel.yml file and see if it uses the --iface= option. Or alternatively try to use Calico.
Clear the ./var/lib/rook/, ./var/lib/kubelet/plugins/ and ./var/lib/kubelet/plugins_registry/ directories and reinstall the rook service.
Create the rook-ceph-crash-collector-keyring secret manually by executing: kubectl -n rook-ceph create secret generic rook-ceph-crash-collector-keyring.

Nginx Kubernetes POD stays in ContainerCreating

I was able to setup the Kubernetes Cluster on Centos7 with one master and two worker nodes, however when I try to deploy a pod with nginx, the state of the pod stays in ContainerRunning forever and doesn't seem to get out of it.
For pod network I am using the calico.
Can you please help me resolve this issue? for some reason I don't feel satisfied moving forward without resolving this issue, I tried to check forums etc, since the last two days and this is the last resort that I am reaching out to you.
[root#kube-master ~]# kubectl get pods --all-namespaces
[get pods result][1]
However when I run describe pods I see the below error for the nginx container under events section.
Warning FailedCreatePodSandBox 41s (x8 over 11m) kubelet,
kube-worker1 (combined from similar events): Failed to create pod
sandbox: rpc error: code = Unknown desc = failed to set up sandbox
container
"ac77a42270009cba0c508e2fd82a84d6caef287bdb117d288d5193960b52abcb"
network for pod "nginx-6db489d4b7-2r4d2": networkPlugin cni failed to
set up pod "nginx-6db489d4b7-2r4d2_default" network: unable to connect
to Cilium daemon: failed to create cilium agent client after 30.000000
seconds timeout: Get http:///var/run/cilium/cilium.sock/v1/config:
dial unix /var/run/cilium/cilium.sock: connect: no such file or
directory
Hope you can help here.
Edit 1:
The ip address of the master VM is 192.168.40.133
Used the below command to initialize the kubeadm:
kubeadm init --pod-network-cidr=192.168.0.0/16 --apiserver-advertise-address 192.168.40.133
Used the below command to install the pod network:
kubectl apply -f https://docs.projectcalico.org/v3.11/manifests/calico.yaml
The kubeadm init above gave me the join command that I used to join the workers into the cluster.
All the VMs are connected to host and bridged network adapters.
your pod subnet (specified by --pod-network-cidr) clashes with the network your VMs are located in: these 2 have to be distinct. Use something else for the pod subnet, for example 10.244.0.0/16 and then edit calico.yaml before applying it as described in the official docs:
POD_CIDR="10.244.0.0/16"
kubeadm init --pod-network-cidr=${POD_CIDR} --apiserver-advertise-address 192.168.40.133
curl https://docs.projectcalico.org/manifests/calico.yaml -O
sed -i -e "s?192.168.0.0/16?${POD_CIDR}?g" calico.yaml
kubectl apply -f calico.yaml
hope this helps :)
note: you don't really need to specify --apiserver-advertise-address flag: kubeadm will detect correctly the main IP of the machine most of the time.

adding master to Kubernetes cluster: cluster doesn't have a stable controlPlaneEndpoint address

How can I add a second master to the control plane of an existing Kubernetes 1.14 cluster?
The available documentation apparently assumes that both masters (in stacked control plane and etcd nodes) are created at the same time. I have created my first master already a while ago with kubeadm init --pod-network-cidr=10.244.0.0/16, so I don't have a kubeadm-config.yaml as referred to by this documentation.
I have tried the following instead:
kubeadm join ... --token ... --discovery-token-ca-cert-hash ... \
--experimental-control-plane --certificate-key ...
The part kubeadm join ... --token ... --discovery-token-ca-cert-hash ... is what is suggested when running kubeadm token create --print-join-command on the first master; it normally serves for adding another worker. --experimental-control-plane is for adding another master instead. The key in --certificate-key ... is as suggested by running kubeadm init phase upload-certs --experimental-upload-certs on the first master.
I receive the following errors:
[preflight] Running pre-flight checks
[WARNING IsDockerSystemdCheck]: detected "cgroupfs" as the Docker cgroup driver.
The recommended driver is "systemd". Please follow the guide at
https://kubernetes.io/docs/setup/cri/
[preflight] Reading configuration from the cluster...
[preflight] FYI: You can look at this config file with 'kubectl -n kube-system get cm kubeadm-config -oyaml'
error execution phase preflight:
One or more conditions for hosting a new control plane instance is not satisfied.
unable to add a new control plane instance a cluster that doesn't have a stable
controlPlaneEndpoint address
Please ensure that:
* The cluster has a stable controlPlaneEndpoint address.
* The certificates that must be shared among control plane instances are provided.
What does it mean for my cluster not to have a stable controlPlaneEndpoint address? Could this be related to controlPlaneEndpoint in the output from kubectl -n kube-system get configmap kubeadm-config -o yaml currently being an empty string? How can I overcome this situation?
As per HA - Create load balancer for kube-apiserver:
In a cloud environment you should place your control plane nodes behind a TCP forwarding load balancer. This load balancer distributes
traffic to all healthy control plane nodes in its target list. The
health check for an apiserver is a TCP check on the port the
kube-apiserver listens on (default value :6443).
The load balancer must be able to communicate with all control plane nodes on the apiserver port. It must also allow incoming traffic
on its listening port.
Make sure the address of the load balancer
always matches the address of kubeadm’s ControlPlaneEndpoint.
To set ControlPlaneEndpoint config, you should use kubeadm with the --config flag. Take a look here for a config file example:
apiVersion: kubeadm.k8s.io/v1beta1
kind: ClusterConfiguration
kubernetesVersion: stable
controlPlaneEndpoint: "LOAD_BALANCER_DNS:LOAD_BALANCER_PORT"
Kubeadm config files examples are scattered across many documentation sections. I recommend that you read the /apis/kubeadm/v1beta1 GoDoc, which have fully populated examples of YAML files used by multiple kubeadm configuration types.
If you are configuring a self-hosted control-plane, consider using the kubeadm alpha selfhosting feature:
[..] key components such as the API server, controller manager, and
scheduler run as DaemonSet pods configured via the Kubernetes API
instead of static pods configured in the kubelet via static files.
This PR (#59371) may clarify the differences of using a self-hosted config.
You need to copy the certificates ( etcd/api server/ca etc. ) from the existing master and place on the second master.
then run kubeadm init script. since the certs are already present the cert creation step is skipped and rest of the cluster initialization steps are resumed.

Unable to see join nodes in Kubernetes master

This is my worker node:
root#ivu:~# kubeadm join 10.16.70.174:6443 --token hl36mu.0uptj0rp3x1lfw6n --discovery-token-ca-cert-hash sha256:daac28160d160f938b82b8c720cfc91dd9e6988d743306f3aecb42e4fb114f19 --ignore-preflight-errors=swap
[preflight] Running pre-flight checks.
[WARNING Swap]: running with swap on is not supported. Please disable swap
[WARNING FileExisting-crictl]: crictl not found in system path
Suggestion: go get github.com/kubernetes-incubator/cri-tools/cmd/crictl
[discovery] Trying to connect to API Server "10.16.70.174:6443"
[discovery] Created cluster-info discovery client, requesting info from "https://10.16.70.174:6443"
[discovery] Requesting info from "https://10.16.70.174:6443" again to validate TLS against the pinned public key
[discovery] Cluster info signature and contents are valid and TLS certificate validates against pinned roots, will use API Server "10.16.70.174:6443"
[discovery] Successfully established connection with API Server "10.16.70.174:6443"
This node has joined the cluster:
* Certificate signing request was sent to master and a response
was received.
* The Kubelet was informed of the new secure connection details.
Run 'kubectl get nodes' on the master to see this node join the cluster.
While checking in master nodes using command kubectl get nodes, I can only able to see master:
ivum01#ivum01-HP-Pro-3330-SFF:~$ kubectl get nodes
NAME STATUS ROLES AGE VERSION
ivum01-hp-pro-3330-sff Ready master 36m v1.10.0
For question answer:
docker kubelet kubeadm kubectl installed fine
kubectl get node can not see the current added node; of cause kubectl get pods --all-namespaces has no result for this node;
docker which in current node has no report for kubeadm command(means no k8s images pull, no running container for that)
must import is kubelet not running in work node
run kubelet output:
Failed to get system container stats for "/user.slice/user-1000.slice/session-1.scope": failed to get cgroup stats for "/user.slice/user-1000.slice/session-1.scope": failed to get container info for "/user.slice/user-1000.slice/session-1.scope": unknown container "/user.slice/user-1000.slice/session-1.scope"
same as this issue said
tear down and reset cluster(kubeadm reset) and redo that has no problem in my case.
I had this problem and it was solved by ensuring that cgroup driver on the worker nodes also were set properly.
check with:
docker info | grep -i cgroup
cat /etc/systemd/system/kubelet.service.d/10-kubeadm.conf
set it with:
sed -i "s/cgroup-driver=systemd/cgroup-driver=cgroupfs/g" /etc/systemd/system/kubelet.service.d/10-kubeadm.conf
then restart kubelet service and rejoin the cluster:
systemctl daemon-reload
systemctl restart kubelet
kubeadm reset
kubeadm join ...
Info from docs: https://kubernetes.io/docs/tasks/tools/install-kubeadm/#configure-cgroup-driver-used-by-kubelet-on-master-node