pods are restarted automatically in the node which is added to the existing kubeadm cluster - kubernetes

Recently added a kubenode to the existing kubeadm cluster using
kubeadm join --token (TOKEN) (MASTER IP):6443
and with
--discovery-token-ca-cert-hash.
The node attached successfully and it is listed in the "kubectl get nodes".
Now the pods are assigned to the node, but those pods are restarted automatically and it seems this pods cannot communicate the other node pods also.

Related

Kubeadm join: Fails while creating HA cluster with multiple master nodes

I have 5 Vm in my GCP, out of which three are supposed to be master1, master2, master3 and other two are worker nodes (worker1 & worker 2). I have created a TCP Loadbalancer(LB) to enable load balancing for the master nodes. I have two sections in the LB:
i)frontend ii)backend
In the backend, i have defined all master ips there. And the frontend, i generated a static public ip and given port 6443 as LB port.
In master1, i sucessfully ran the kubeadm init command as follows:
kubeadm init --control-plane-endpoint="<LB_IP>:6443" --apiserver-advertise-address=10.128.0.2 --pod-network-cidr=10.244.0.0/16
where 10.128.0.2 is the master1 internal ip & 10.244.0.0/16 is the network cidr for the kube-flannel.
The kubeadm init runs sucessfully and gives two kubeadm join commands, one to join a new control plane and other to join a new worker node.
You can now join any number of control-plane nodes by copying certificate authorities
and service account keys on each node and then running the following as root:
kubeadm join LB_IP:6443 --token znnlha.6Gfn1vlkunwpz36b \
--discovery-token-ca-cert-hash sha256:dc8834a2a5b4ada38a1ab9831e4cae67e9d64cb585458a194018f3ba5a82ac4U \
--control-plane
Then you can join any number of worker nodes by running the following on each as root:
kubeadm join LB_IP:6443 --token znnlha.6sfn1vlkunwpz36A \
--discovery-token-ca-cert-hash sha256:dc8834a2a5b4ada38a1ab9831e4cae68e9d64cb585458a194018f3ba5a82ac4e
I am not using --upload-certs for transfering the certificates from one control plane to another. I am doing it manually.
But when I run the above kubeadm join command to add a new control plane, on the one of my other master nodes,say master2, I am getting an error like following :
[preflight] Running pre-flight checks
[preflight] Reading configuration from the cluster...
[preflight] FYI: You can look at this config file with 'kubectl -n kube-system get cm kubeadm-config -o yaml'
error execution phase preflight: unable to fetch the kubeadm-config ConfigMap: failed to get config map: Get "https://LB_IP:6443/api/v1/namespaces/kube-system/configmaps/kubeadm-config?timeout=10s": dial tcp LB_IP:6443: connect: connection refused
I just came across the same issue - in my case the problem was that when hitting GCP's loadbalancer from an instance which is also a target of this loadbalancer, the request will always land on the same instance from where you sent the request.
Based on this:
you run kubeadm init on Node A using LB_IP. LB_IP gets resolved to Node A. Everything works as expected, as you are creating a new cluster.
you run kubeadm join on Node B using LB_IP. LB_IP gets resolved to Node B, while the master you just initialized is on Node A. Node B doesn't have anything running on port 6443, thus you get connection refused error.
A simple solution is to remove the instance you are running kubeadm join on from the loadbalancer targets. You can re-add it right after successful join.

Kubernetes node without master

Cluster consists of one master and one worker node. If the master is down and worker is restarted no workloads (deployments) are started on boot. How and if it is possible to make worker resume last state without master?
Kubernetes 1.18.3
On worker node are installed: kubelet, kubectl, kubeadm
Ideally you should have more than one(typically a odd number like 3 or 5) node serving as master and accessible from worker nodes via a LoadBalancer.
The state is stored in ETCD which is accessed by worker nodes via the API Server. So without master nodes running there is no way for workers to know the desired state.
Although it's not recommended you but can use static pod as potential solution here.Static Pods are managed directly by the kubelet daemon on a specific node, without the API server observing them.Unlike Pods that are managed by the control plane (for example, a Deployment ), instead the kubelet watches each static Pod (and restarts it if it crashes).
The caveat of using static pod is since those pods are not dependent on API Server Hence static Pods cannot be managed with kubectl or other Kubernetes API clients.

GEK does not automatically start new node after I delete a node from the GKE cluster

I created a cluster:
gcloud container clusters create test
so there will be 3 nodes:
kubectl get nodes
NAME STATUS ROLES AGE VERSION
gke-test-default-pool-cec920a8-9cgz Ready <none> 23h v1.9.7-gke.5
gke-test-default-pool-cec920a8-nh0s Ready <none> 23h v1.9.7-gke.5
gke-test-default-pool-cec920a8-q83b Ready <none> 23h v1.9.7-gke.5
then I delete a node from the cluster
kubectl delete node gke-test-default-pool-cec920a8-9cgz
node "gke-test-default-pool-cec920a8-9cgz" deleted
no new node is created.
Then I delete all nodes. still there is no new node created.
kubectl get nodes
No resources found.
Am I doing something wrong? I suppose it can automatically bring up new node if some node died.
After running kubectl delete node gke-test-default-pool-cec920a8-9cgz run gcloud compute instances delete gke-test-default-pool-cec920a8-9cgz
This will actually delete VM (kubectl delete only "disconnects" it from the cluster). GCP will recreate the VM and it will automatically rejoin the cluster.
Kubernetes is a system for managing workloads and not the machines. Kubernetes node object reflects the state of the underlying infrastructure.
As such node objects are automatically managed by Kubernetes. "kubectl delete node" simply removes a serialized object from Kubernetes "etcd" storage. It does nothing to delete VM on GCE side where the kubernetes node is hosted. "kubectl delete node" is not meant to be used to remove nodes. Node pool itself carries the desired declared state, which cannot be altered by the "kubectl delete node" command.
If you want to remove a node you should resize the instance group.

How to start & stop Kubernetes 1.8.5 cluster?

Question
What are the commands to start/stop the K8S cluster? After installation is done following Using kubeadm to Create a Cluster, restarted the CentOS server and the K8S cluster is not running after restart.
There are services mentioned in Fedora (Single Node) listing services but there are no such services installed via kubeadm.
Failed to restart etcd.service: Unit not found.
Failed to restart kube-apiserver.service: Unit not found.
Failed to restart kube-controller-manager.service: Unit not found.
Environment
CentOS 7 on Virtual Box. K8S 1.8.5
$ kubectl get nodes
NAME STATUS ROLES AGE VERSION
master Ready master 36m v1.8.5
node01 Ready <none> 35m v1.8.5
node02 Ready <none> 35m v1.8.5
As you are using kubeadm to initiate and administrate the k8s cluster.As I understand kubeadm use following approach
Systemd manage only kubelet service on the node.
Kubelet create and manage k8s control plane componenets (kube-api server, kube-controller-manager , etcd and scheduler, kube-proxy) as a static pod.
Kubelet access their json manifest files from /etc/kubernetes/manifests.
So if you want to remove control plane components you just need to move these manifest files in another directory.

How to change name of a kubernetes node

I have a running node in a kubernetes cluster. Is there a way I can change its name?
I have tried to
delete the node using kubectl delete
change the name in the node's manifest
add the node back.
But the node won't start.
Anyone know how it should be done?
Thanks
Usualy it's kubelet that is responsible for registering the node under particular name, so you should make changes to your nodes kubelet configuration and then it should pop up as new node.
Changing the node's name is not possible at the moment, it requires you to remove and rejoin the node.
You need to make sure the hostname is changed to the new name, remove the node, reset it and rejoin it.
(you will notice that with the command : kubectl edit node , you will get an error if you try and save the name:
A copy of your changes has been stored to "/tmp/kubectl-edit-qlh54.yaml"
error: At least one of apiVersion, kind and name was changed
)
Ideally you have removed the running pods on it.
You can try to run kubectl drain <node_name_to_rename> . Proceed at your own risk if that doesn't complete . --ignore-daemon-sets can be used to ignore possible issues for pods that cannot be evicted.
In short, for a node that has been renamed and is out of the cluster on CentOS 7:
kubectl delete node <original-nodename>
Then on the node that you want to rejoin, as root:
kubeadm reset
check the output and see if it applies to your setup (for potential further cleanup).
Now generate the join command on the master node:
export KUBECONFIG=/etc/kubernetes/admin.conf #(or wherever you have it)
kubeadm token create --print-join-command
Run the output on the worker node you have just reset:
kubeadm join <masternode_ip_address>:6443 --token somegeneratedtoken --discovery-token-ca-cert-hash sha256:somesha256hashthatyougotfromtheabovecommand
If you run kubectl get nodes it should show up now with the new name
output in my case:
W0220 10:43:23.286109 11473 join.go:346] [preflight] WARNING: JoinControlPane.controlPlane settings will be ignored when control-plane flag is not set.
[preflight] Running pre-flight checks
[preflight] Reading configuration from the cluster...
[preflight] FYI: You can look at this config file with 'kubectl -n kube-system get cm kubeadm-config -oyaml'
[kubelet-start] Downloading configuration for the kubelet from the "kubelet-config-1.17" ConfigMap in the kube-system namespace
[kubelet-start] Writing kubelet configuration to file "/var/lib/kubelet/config.yaml"
[kubelet-start] Writing kubelet environment file with flags to file "/var/lib/kubelet/kubeadm-flags.env"
[kubelet-start] Starting the kubelet
[kubelet-start] Waiting for the kubelet to perform the TLS Bootstrap...
This node has joined the cluster:
* Certificate signing request was sent to apiserver and a response was received.
* The Kubelet was informed of the new secure connection details.
Run 'kubectl get nodes' on the control-plane to see this node join the cluster.
Enjoy your renamed node!
Based on source: https://www.youtube.com/watch?v=TqoA9HwFLVU