My kubernetes cluster IP address changed and now kubectl will no longer connect - kubernetes

Running under Ubuntu I used kubeadm init to setup my cluster (master node) and copied over the /etc/kubernetes/admin.conf $HOME/.kube/config and all was well when using kubectl.
However after a reboot my master node has had an IP address change which is not the same as what is in $HOME/.kube/config so now I can no longer connect kubectl
So how do I regenerate the admin.conf now that I have a new IP address? Running kubeadm init will just kill everything which is not what I want.

I found this solution on the internet and it works for me:
systemctl stop kubelet docker
cd /etc/
mv kubernetes kubernetes-backup
mv /var/lib/kubelet /var/lib/kubelet-backup
mkdir -p kubernetes
cp -r kubernetes-backup/pki kubernetes
rm kubernetes/pki/{apiserver.*,etcd/peer.*}
systemctl start docker
kubeadm init --ignore-preflight-errors=DirAvailable--var-lib-etcd
#Run "kubeadm reset" on all nodes if was this error "error execution phase preflight: [preflight] Some fatal errors occurred:
[ERROR FileAvailable--etc-kubernetes-kubelet.conf]: /etc/kubernetes/kubelet.conf already exists
[ERROR Port-10250]: Port 10250 is in use
[ERROR FileAvailable--etc-kubernetes-pki-ca.crt]: /etc/kubernetes/pki/ca.crt already exists"
cp kubernetes/admin.conf ~/.kube/config
kubectl get nodes --sort-by=.metadata.creationTimestamp
kubectl delete node $(kubectl get nodes -o jsonpath='{.items[(#.status.conditions[0].status=="Unknown")].metadata.name}')
kubectl get pods --all-namespaces
After These, Join your Slaves to Master.
Reference: https://medium.com/#juniarto.samsudin/ip-address-changes-in-kubernetes-master-node-11527b867e88

The following command can be used to regenerate admin.conf
kubeadm alpha phase kubeconfig admin --apiserver-advertise-address <new_ip>
However, if you use an IP instead of a hostname, your API-server certificate will be invalid. So, either regenerate your certs ( kubeadm alpha phase certs renew apiserver ), use hostnames instead of IPs or add the insecure --insecure-skip-tls-verify flag when using kubectl

You do not want to use kubeadm reset. That will reset everything and you would have to start configuring your cluster again.
Well, in your scenario, please have a look on the steps below:
nano /etc/hosts (update your new IP against YOUR_HOSTNAME)
nano /etc/kubernetes/config (configuration settings related to your cluster) here in this file look for the following params and update accordingly
KUBE_MASTER="--master=http://YOUR_HOSTNAME:8080"
KUBE_ETCD_SERVERS="--etcd-servers=http://YOUR_HOSTNAME:2379" #2379 is default port
nano /etc/etcd/etcd.conf (conf related to etcd)
KUBE_ETCD_SERVERS="--etcd-servers=http://YOUR_HOSTNAME/WHERE_EVER_ETCD_HOSTED:2379"
2379 is default port for etcd. and you can have multiple etcd servers defined here comma separated
Restart kubelet, apiserver, etcd services.
It is good to use hostname instead of IP to avoid such scenarios.
Hope it helps!

Related

Kubernetes Cluster- Worker node error while using kubectl

I have created a Kubernetes cluster with 2 nodes, one Master node and one Worker node (2 different VMs).
The worker node has joined the cluster successfully, so when I run the commanad:
kubectl get nodes in my master node it appears the 2 nodes exists in the cluster!
However, when I run the command kubectl apply -f https://k8s.io/examples/controllers/nginx-deployment.yaml from my worker node terminal, in order to create a deployment in the worker node, I have the following error:
The connection to the server localhost:8080 was refused. - did you specify the right host or port?
Any help what is going on here?
It looks like you have issue with your kubeconfig file, as usually localhost:8080 is the default server to connect in absence of this file . Generally, Kubernetes uses this file to store cluster authentication information and a list of contexts to which kubectl refers when running commands - that's why kubectl can't work properly without this file.
To check the presence of kubeconfig file, enter this command: kubectl config view.
Or just check the presence of the file named config in the $HOME/.kube directory, which is the default location for kubeconfig file.
If it is absent, you would need to copy the config file to your node, e.g.:
sudo cp -i /etc/kubernetes/admin.conf $HOME/.kube/config
sudo chown $(id -u):$(id -g) $HOME/.kube/config
sudo service kubelet restart
It is also possible to generate config file in a more difficult way - instead of copying - as described here.
The easy way to do it is to copy the config from master node usually found here : /etc/kubernetes/admin.conf , to whetever node you want to configure kubectl ( even on master node) . The location to be copied is : $HOME/.kube/config
Also, you can this command from master node by specify nodeselector or label.
Assign POD node

How to convert a Kubernetes non-HA control plane into an HA control plane?

What is the best way to convert a kubernetes non-HA control plane into an HA control plane?
I have started with my cluster as a non-HA control plane - one master node and several worker nodes. The cluster is already running with a lots of services.
Now I would like to add additional master nodes to convert my cluster into a HA control plane. I have setup and configured a load balancer.
But I did not figure out how I can change the -control-plane-endpoint to my load balancer IP address for my existing master node.
Calling kubeadm results in the following error:
sudo kubeadm init --control-plane-endpoint "my-load-balancer:6443" --upload-certs
[init] Using Kubernetes version: v1.20.1
[preflight] Running pre-flight checks
[WARNING SystemVerification]: missing optional cgroups: hugetlb
error execution phase preflight: [preflight] Some fatal errors occurred:
[ERROR Port-6443]: Port 6443 is in use
[ERROR Port-10259]: Port 10259 is in use
[ERROR Port-10257]: Port 10257 is in use
[ERROR FileAvailable--etc-kubernetes-manifests-kube-apiserver.yaml]: /etc/kubernetes/manifests/kube-apiserver.yaml already exists
[ERROR FileAvailable--etc-kubernetes-manifests-kube-controller-manager.yaml]: /etc/kubernetes/manifests/kube-controller-manager.yaml already exists
[ERROR FileAvailable--etc-kubernetes-manifests-kube-scheduler.yaml]: /etc/kubernetes/manifests/kube-scheduler.yaml already exists
[ERROR FileAvailable--etc-kubernetes-manifests-etcd.yaml]: /etc/kubernetes/manifests/etcd.yaml already exists
[ERROR Port-10250]: Port 10250 is in use
[ERROR Port-2379]: Port 2379 is in use
[ERROR Port-2380]: Port 2380 is in use
[ERROR DirAvailable--var-lib-etcd]: /var/lib/etcd is not empty
[preflight] If you know what you are doing, you can make a check non-fatal with `--ignore-preflight-errors=...`
To see the stack trace of this error execute with --v=5 or higher
The error message seems to be clear as my master is already running.
Is there a way how I can easily tell my existing master node to use the new load balancer to run as a HA control plane?
Best solution in my opinion
The best approach to convert a non-HA control plane to an HA control plane is to create a completely new HA control plane and after that to migrate all your applications there.
Possible solution
Below I will try to help you to achieve your goal but I do not recommend using this procedure on any cluster that will ever be considered as production cluster. It work for my scenario, it also might help you.
Update the kube-apiserver certificate
First of all, kube-apiserver uses a certificate to encrypt control plane traffic and this certificate have something known as SAN (Subject Alternative Name).
SAN is a list of IP addresses that you will use to access the API, so you need to add there IP address of your LoadBalancer and probably the hostname of your LB as well.
To do that, you have to get kubeadm configuration e.g. using command:
$ kubeadm config view > kubeadm-config.yaml
and then add certSANs to kubeadm-config.yaml config file under apiServer section, it should looks like below example: (you may also need to add controlPlaneEndpoint to point to your LB).
apiServer:
certSANs:
- "192.168.0.2" # your LB address
- "loadbalancer" # your LB hostname
extraArgs:
authorization-mode: Node,RBAC
...
controlPlaneEndpoint: "loadbalancer" # your LB DNS name or DNS CNAME
...
Now you can update kube-apiserver cert using:
BUT please remember you must first delete/move your old kube-apiserver cert and key from /etc/kubernetes/pki/ !
$ kubeadm init phase certs apiserver --config kubeadm-config.yaml.
Finally restart your kube-apiserver.
Update the kubelet, the scheduler and the controller manager kubeconfig files
Next step is to update the kubelet, scheduler and controller manager to communicate with the kube-apiserver using LoadBalancer.
All three of these components use standard kubeconfig files:
/etc/kubernetes/kubelet.conf, /etc/kubernetes/scheduler.conf, /etc/kubernetes/controller-manager.conf to communicate with kube-apiserver.
The only thing to do is to edit the server: line to point to LB instead of kube-apiserver directly and then restart these components.
The kubelet is systemd service so to restart it use:
systemctl restart kubelet
the controller manager and schedulers are deployed as pods.
Update the kube-proxy kubeconfig files
Next it is time to update kubeconfig file for kube-proxy and same as before - the only thing to do is to edit the server: line to point to LoadBalancer instead of kube-apiserver directly.
This kubeconfig is in fact a configmap, so you can edit it directly using:
$ kubectl edit cm kube-proxy -n kube-system
or first save it as manifest file:
$ kubectl get cm kube-proxy -n kube-system -o yaml > kube-proxy.yml
and then apply changes.
Don't forget to restart kube-proxy after these changes.
Update the kubeadm-config configmap
At the end upload new kubeadm-config configmap (with certSANs and controlPlaneEndpoint entries) to the cluster, it's especially important when you want to add new node to the cluster.
$ kubeadm config upload from-file --config kubeadm-config.yaml
if command above doesn't work, try this:
$ kubeadm upgrade apply --config kubeadm-config.yaml

When I start prepare kubernetes in aws the errors shown below

Below are the commands and their outputs:
root#k8s-master:~# sudo cp -i /etc/kubernetes/admin.conf $HOME/.kube/config
cp: cannot stat '/etc/kubernetes/admin.conf': No such file or directory
root#k8s-master:~# kubectl get services -n kube-system
The connection to the server localhost:8080 was refused - did you specify the right host or port?
It looks like you are not running EKS. Otherwise you cannot access the masters. With EKS, the masters are managed by AWS and you can't ssh to them
your kubectl commands makes a call to the kubernetes api server. So you have to check if it is running on localhost on port 8080.

How to join worker node in existing cluster?

I was facing some issues while joining worker node in existing cluster.
Please find below the details of my scenario.
I've Created a HA clusters with 4 master and 3 worker.
I removed 1 master node.
Removed node was not a part of cluster now and reset was successful.
Now joining the removed node as worker node in existing cluster.
I'm firing below command
kubeadm join --token giify2.4i6n2jmc7v50c8st 192.168.230.207:6443 --discovery-token-ca-cert-hash sha256:dd431e6e19db45672add3ab0f0b711da29f1894231dbeb10d823ad833b2f6e1b
In above command - 192.168.230.207 is cluster IP
Result of above command
[preflight] Running pre-flight checks
[WARNING Service-Docker]: docker service is not enabled, please run 'systemctl enable docker.service'
[WARNING IsDockerSystemdCheck]: detected "cgroupfs" as the Docker cgroup driver. The recommended driver is "systemd". Please follow the guide at https://kubernetes.io/docs/setup/cri/
[WARNING FileExisting-tc]: tc not found in system path
[WARNING Service-Kubelet]: kubelet service is not enabled, please run 'systemctl enable kubelet.service'
[preflight] Reading configuration from the cluster...
[preflight] FYI: You can look at this config file with 'kubectl -n kube-system get cm kubeadm-config -oyaml'
error execution phase preflight: unable to fetch the kubeadm-config ConfigMap: failed to get config map: Get https://192.168.230.206:6443/api/v1/namespaces/kube-system/configmaps/kubeadm-config: dial tcp 192.168.230.206:6443: connect: connection refused
Already tried Steps
ted this file(kubectl -n kube-system get cm kubeadm-config -o yaml) using kubeadm patch and removed references of removed node("192.168.230.206")
We are using external etcd so checked member list to confirm removed node is not a part of etcd now. Fired below command etcdctl --endpoints=https://cluster-ip --ca-file=/etc/etcd/pki/ca.pem --cert-file=/etc/etcd/pki/client.pem --key-file=/etc/etcd/pki/client-key.pem member list
Can someone please help me resolve this issue as I'm not able to join this node?
In addition to #P Ekambaram answer, I assume that you probably have completely dispose of all redundant data from previous kubeadm join setup.
Remove cluster entries via kubeadm command on worker node: kubeadm reset;
Wipe all redundant data residing on worker node: rm -rf /etc/kubernetes; rm -rf ~/.kube;
Try to re-join worker node.
Use these instructions one after another to completely remove all old installation on worker node.....
kubeadm reset
systemctl stop kubelet
systemctl stop docker
rm -rf /var/lib/cni/
rm -rf /var/lib/kubelet/*
rm -rf /etc/cni/
ifconfig cni0 down
ifconfig flannel.1 down
ifconfig docker0 down
ip link delete cni0
ip link delete flannel.1
systemctl start docker.service
yum remove kubeadm kubectl kubelet kubernetes-cni kube*
yum autoremove
rm -rf ~/.kube
then reinstall using
yum install -y kubelet kubeadm kubectl
reboot
systemctl start docker && systemctl enable docker
systemctl start kubelet && systemctl enable kubelet
then use kubeadm join command
fix these problems and run join command
docker service is not enabled, please run 'systemctl enable docker.service'
detected "cgroupfs" as the Docker cgroup driver. The recommended driver is "systemd".
kubelet service is not enabled, please run 'systemctl enable kubelet.service
kubeadm join try to reach 192.168.230.206 whereas new IP is 192.168.230.207.
In addition to change in cm kubeadm-config, you may change your cluster IP address (cluster.server) in this configmap
kubectl edit cm -n kube-public cluster-info

Cannot connect to Kubernetes api on AWS vm's

I have deployed Kubernetes using the link Kubernetes official page
I see that Kubernetes is deployed because in the end i got this
Your Kubernetes master has initialized successfully!
To start using your cluster, you need to run the following as a regular user:
mkdir -p $HOME/.kube
sudo cp -i /etc/kubernetes/admin.conf $HOME/.kube/config
sudo chown $(id -u):$(id -g) $HOME/.kube/config
You should now deploy a pod network to the cluster.
Run "kubectl apply -f [podnetwork].yaml" with one of the options listed at:
https://kubernetes.io/docs/concepts/cluster-administration/addons/
You can now join any number of machines by running the following on each node
as root:
kubeadm join 172.16.32.101:6443 --token ma1d4q.qemewtyhkjhe1u9f --discovery-token-ca-cert-hash sha256:408b1fdf7a5ea5f282741db91ebc5aa2823802056ea9da843b8ff52b1daff240
when i do kubectl get pods it thorws this error
# kubectl get pods
The connection to the server 127.0.0.1:6553 was refused - did you specify the right host or port?
When I do see the cluster-info it says as follows
kubectl cluster-info
Kubernetes master is running at https://127.0.0.1:6553
But when i see the config it shows as follows
apiVersion: v1
clusters:
- cluster:
certificate-authority-data: LS0tLS1CRUdJTiBDRVJUSUZJQ0FURS0tLS0tCk1JSUN5RENDQWJDZ0F3SUJBZ0lCQURBTkJna3Foa2lHOXcwQkFRc0ZBREFWTVJNd0VRWURWUVFERXdwcmRXSmwKY201bGRHVnpNQjRYRFRFNE1EWXlNREEyTURJd04xb1hEVEk0TURZeE56QTJNREl3TjFvd0ZURVRNQkVHQTFVRQpBeE1LYTNWaVpYSnVaWFJsY3pDQ0FTSXdEUVlKS29aSWh2Y05BUUVCQlFBRGdnRVBBRENDQVFvQ2dnRUJBT0ZXCkxWQkJoWmZCQms4bXJrV0w2MmFGd2U0cUYvZkRHekJidnE5TFpGN3M4UW1rdDJVUlo5YmtSdWxzSlBrUXV1U2IKKy93YWRoM054S0JTQUkrTDIrUXdyaDVLSy9lU0pvbjl5TXJlWnhmRFdPTno2Y3c4K2txdnh5akVsRUdvSEhPYQpjZHpuZnJHSXVZS3lwcm1GOEIybys0VW9ldytWVUsxRG5Ra3ZwSUZmZ1VjVWF4UjVMYTVzY2ZLNFpweTU2UE4wCjh1ZjdHSkhJZFhNdXlVZVpFT3Z3ay9uUTM3S1NlWHVhcUlsWlFqcHQvN0RrUmFZeGdTWlBqSHd5c0tQOHMzU20KZHJoeEtyS0RPYU1Wczd5a2xSYjhzQjZOWDB6UitrTzhRNGJOUytOYVBwbXFhb3hac1lGTmhCeGJUM3BvUXhkQwpldmQyTmVlYndSWGJPV3hSVzNjQ0F3RUFBYU1qTUNFd0RnWURWUjBQQVFIL0JBUURBZ0trTUE4R0ExVWRFd0VCCi93UUZNQU1CQWY4d0RRWUpLb1pJaHZjTkFRRUxCUUFEZ2dFQkFDTFBlT0s5MUdsdFJJTjdmMWZsNmlINTg0c1UKUWhBdW1xRTJkUHFNT0ZWWkxjbDVJZlhIQ1dGeE13OEQwOG1NNTZGUTNEVDd4bi9lNk5aK1l2M1JrK2hBdmdaQgpaQk9XT3k4UFJDOVQ2S1NrYjdGTDRSWDBEamdSeE5WTFIvUHd1TUczK3V2ZFhjeXhTYVJJRUtrLzYxZjJsTGZmCjNhcTdrWkV3a05pWXMwOVh0YVZGQ21UaTd0M0xrc1NsbDZXM0NTdVlEYlRQSzJBWjUzUVhhMmxYUlZVZkhCMFEKMHVOQWE3UUtscE9GdTF2UDBDRU1GMzc4MklDa1kzMDBHZlFEWFhiODA5MXhmcytxUjFQbEhJSHZKOGRqV29jNApvdTJ1b2dHc2tGTDhGdXVFTTRYRjhSV0grZXpJRkRjV1JsZnJmVHErZ2s2aUs4dGpSTUNVc2lFNEI5QT0KLS0tLS1FTkQgQ0VSVElGSUNBVEUtLS0tLQo=
**server: https://172.16.32.101:6443**
Even telnet shows that there is a process running on 6443 but not on 6553
how can change the port and how can I fix the issue??
Any help would be of great use
Thanks in advance.
It looks like your last kubectl config interferes with the previous clusters configurations.
It is possible to have settings for several different clusters in one .kube/config or in separate files.
But in some cases, you may want to manage only the cluster you've just created.
Note: After tearing down the exited cluster using kubeadm reset followed by initializing fresh cluster using kubeadm init, new certificates will be generated. To operate the new cluster, you have to update kubectl configuration or replace it with the new one.
To clean up old kubectl configurations and apply the last one, run the following commands:
rm -rf $HOME/.kube
unset KUBECONFIG
# Check if you have KUBECONFIG configured in profile dot files and comment or remove it.
mkdir -p $HOME/.kube
sudo cp -i /etc/kubernetes/admin.conf $HOME/.kube/config
sudo chown $(id -u):$(id -g) $HOME/.kube/config
It gives you up-to-date configuration for the last cluster you've created using kubeadm tool.
Note: You should copy kubectl configuration for all users accounts which you are going to use to manage the cluster.
Here are some examples of how to manage config file using the command line.
I figured out the issue it is because of the firewall in the machine I could join nodes to the cluster once I allowed traffic via port 6443. I didn't fix the issue with this post but for beginners use this K8's on AWS for a better idea.
Thanks for the help guys...!!!