Unable to see join nodes in Kubernetes master - kubernetes

This is my worker node:
root#ivu:~# kubeadm join 10.16.70.174:6443 --token hl36mu.0uptj0rp3x1lfw6n --discovery-token-ca-cert-hash sha256:daac28160d160f938b82b8c720cfc91dd9e6988d743306f3aecb42e4fb114f19 --ignore-preflight-errors=swap
[preflight] Running pre-flight checks.
[WARNING Swap]: running with swap on is not supported. Please disable swap
[WARNING FileExisting-crictl]: crictl not found in system path
Suggestion: go get github.com/kubernetes-incubator/cri-tools/cmd/crictl
[discovery] Trying to connect to API Server "10.16.70.174:6443"
[discovery] Created cluster-info discovery client, requesting info from "https://10.16.70.174:6443"
[discovery] Requesting info from "https://10.16.70.174:6443" again to validate TLS against the pinned public key
[discovery] Cluster info signature and contents are valid and TLS certificate validates against pinned roots, will use API Server "10.16.70.174:6443"
[discovery] Successfully established connection with API Server "10.16.70.174:6443"
This node has joined the cluster:
* Certificate signing request was sent to master and a response
was received.
* The Kubelet was informed of the new secure connection details.
Run 'kubectl get nodes' on the master to see this node join the cluster.
While checking in master nodes using command kubectl get nodes, I can only able to see master:
ivum01#ivum01-HP-Pro-3330-SFF:~$ kubectl get nodes
NAME STATUS ROLES AGE VERSION
ivum01-hp-pro-3330-sff Ready master 36m v1.10.0
For question answer:
docker kubelet kubeadm kubectl installed fine
kubectl get node can not see the current added node; of cause kubectl get pods --all-namespaces has no result for this node;
docker which in current node has no report for kubeadm command(means no k8s images pull, no running container for that)
must import is kubelet not running in work node
run kubelet output:
Failed to get system container stats for "/user.slice/user-1000.slice/session-1.scope": failed to get cgroup stats for "/user.slice/user-1000.slice/session-1.scope": failed to get container info for "/user.slice/user-1000.slice/session-1.scope": unknown container "/user.slice/user-1000.slice/session-1.scope"
same as this issue said
tear down and reset cluster(kubeadm reset) and redo that has no problem in my case.

I had this problem and it was solved by ensuring that cgroup driver on the worker nodes also were set properly.
check with:
docker info | grep -i cgroup
cat /etc/systemd/system/kubelet.service.d/10-kubeadm.conf
set it with:
sed -i "s/cgroup-driver=systemd/cgroup-driver=cgroupfs/g" /etc/systemd/system/kubelet.service.d/10-kubeadm.conf
then restart kubelet service and rejoin the cluster:
systemctl daemon-reload
systemctl restart kubelet
kubeadm reset
kubeadm join ...
Info from docs: https://kubernetes.io/docs/tasks/tools/install-kubeadm/#configure-cgroup-driver-used-by-kubelet-on-master-node

Related

Nginx Kubernetes POD stays in ContainerCreating

I was able to setup the Kubernetes Cluster on Centos7 with one master and two worker nodes, however when I try to deploy a pod with nginx, the state of the pod stays in ContainerRunning forever and doesn't seem to get out of it.
For pod network I am using the calico.
Can you please help me resolve this issue? for some reason I don't feel satisfied moving forward without resolving this issue, I tried to check forums etc, since the last two days and this is the last resort that I am reaching out to you.
[root#kube-master ~]# kubectl get pods --all-namespaces
[get pods result][1]
However when I run describe pods I see the below error for the nginx container under events section.
Warning FailedCreatePodSandBox 41s (x8 over 11m) kubelet,
kube-worker1 (combined from similar events): Failed to create pod
sandbox: rpc error: code = Unknown desc = failed to set up sandbox
container
"ac77a42270009cba0c508e2fd82a84d6caef287bdb117d288d5193960b52abcb"
network for pod "nginx-6db489d4b7-2r4d2": networkPlugin cni failed to
set up pod "nginx-6db489d4b7-2r4d2_default" network: unable to connect
to Cilium daemon: failed to create cilium agent client after 30.000000
seconds timeout: Get http:///var/run/cilium/cilium.sock/v1/config:
dial unix /var/run/cilium/cilium.sock: connect: no such file or
directory
Hope you can help here.
Edit 1:
The ip address of the master VM is 192.168.40.133
Used the below command to initialize the kubeadm:
kubeadm init --pod-network-cidr=192.168.0.0/16 --apiserver-advertise-address 192.168.40.133
Used the below command to install the pod network:
kubectl apply -f https://docs.projectcalico.org/v3.11/manifests/calico.yaml
The kubeadm init above gave me the join command that I used to join the workers into the cluster.
All the VMs are connected to host and bridged network adapters.
your pod subnet (specified by --pod-network-cidr) clashes with the network your VMs are located in: these 2 have to be distinct. Use something else for the pod subnet, for example 10.244.0.0/16 and then edit calico.yaml before applying it as described in the official docs:
POD_CIDR="10.244.0.0/16"
kubeadm init --pod-network-cidr=${POD_CIDR} --apiserver-advertise-address 192.168.40.133
curl https://docs.projectcalico.org/manifests/calico.yaml -O
sed -i -e "s?192.168.0.0/16?${POD_CIDR}?g" calico.yaml
kubectl apply -f calico.yaml
hope this helps :)
note: you don't really need to specify --apiserver-advertise-address flag: kubeadm will detect correctly the main IP of the machine most of the time.

How to join worker node in existing cluster?

I was facing some issues while joining worker node in existing cluster.
Please find below the details of my scenario.
I've Created a HA clusters with 4 master and 3 worker.
I removed 1 master node.
Removed node was not a part of cluster now and reset was successful.
Now joining the removed node as worker node in existing cluster.
I'm firing below command
kubeadm join --token giify2.4i6n2jmc7v50c8st 192.168.230.207:6443 --discovery-token-ca-cert-hash sha256:dd431e6e19db45672add3ab0f0b711da29f1894231dbeb10d823ad833b2f6e1b
In above command - 192.168.230.207 is cluster IP
Result of above command
[preflight] Running pre-flight checks
[WARNING Service-Docker]: docker service is not enabled, please run 'systemctl enable docker.service'
[WARNING IsDockerSystemdCheck]: detected "cgroupfs" as the Docker cgroup driver. The recommended driver is "systemd". Please follow the guide at https://kubernetes.io/docs/setup/cri/
[WARNING FileExisting-tc]: tc not found in system path
[WARNING Service-Kubelet]: kubelet service is not enabled, please run 'systemctl enable kubelet.service'
[preflight] Reading configuration from the cluster...
[preflight] FYI: You can look at this config file with 'kubectl -n kube-system get cm kubeadm-config -oyaml'
error execution phase preflight: unable to fetch the kubeadm-config ConfigMap: failed to get config map: Get https://192.168.230.206:6443/api/v1/namespaces/kube-system/configmaps/kubeadm-config: dial tcp 192.168.230.206:6443: connect: connection refused
Already tried Steps
ted this file(kubectl -n kube-system get cm kubeadm-config -o yaml) using kubeadm patch and removed references of removed node("192.168.230.206")
We are using external etcd so checked member list to confirm removed node is not a part of etcd now. Fired below command etcdctl --endpoints=https://cluster-ip --ca-file=/etc/etcd/pki/ca.pem --cert-file=/etc/etcd/pki/client.pem --key-file=/etc/etcd/pki/client-key.pem member list
Can someone please help me resolve this issue as I'm not able to join this node?
In addition to #P Ekambaram answer, I assume that you probably have completely dispose of all redundant data from previous kubeadm join setup.
Remove cluster entries via kubeadm command on worker node: kubeadm reset;
Wipe all redundant data residing on worker node: rm -rf /etc/kubernetes; rm -rf ~/.kube;
Try to re-join worker node.
Use these instructions one after another to completely remove all old installation on worker node.....
kubeadm reset
systemctl stop kubelet
systemctl stop docker
rm -rf /var/lib/cni/
rm -rf /var/lib/kubelet/*
rm -rf /etc/cni/
ifconfig cni0 down
ifconfig flannel.1 down
ifconfig docker0 down
ip link delete cni0
ip link delete flannel.1
systemctl start docker.service
yum remove kubeadm kubectl kubelet kubernetes-cni kube*
yum autoremove
rm -rf ~/.kube
then reinstall using
yum install -y kubelet kubeadm kubectl
reboot
systemctl start docker && systemctl enable docker
systemctl start kubelet && systemctl enable kubelet
then use kubeadm join command
fix these problems and run join command
docker service is not enabled, please run 'systemctl enable docker.service'
detected "cgroupfs" as the Docker cgroup driver. The recommended driver is "systemd".
kubelet service is not enabled, please run 'systemctl enable kubelet.service
kubeadm join try to reach 192.168.230.206 whereas new IP is 192.168.230.207.
In addition to change in cm kubeadm-config, you may change your cluster IP address (cluster.server) in this configmap
kubectl edit cm -n kube-public cluster-info

Worker node unable to join master node in kubernetes

I have two network interface in my master node -
192.168.56.118
10.0.3.15
While doing kubeadm init on master node, I got following command to add workers
kubeadm join --token qr1czu.5lh1nt34ldiauc1u 192.168.56.118:6443 --discovery-token-ca-cert-hash sha256:e5d90dfa0fff67589551559c443762dac3f1e5c7a5d2b4a630e4c0156ad0e16c
As you can see, it shows 192.168.56.118 IP to connect from worker.
But while executing the same on worker node, I'm getting following error.
[root#k8s-worker ~]# kubeadm join --token qr1czu.5lh1nt34ldiauc1u 192.168.56.118:6443 --discovery-token-ca-cert-hash sha256:e5d90dfa0fff67589551559c443762dac3f1e5c7a5d2b4a630e4c0156ad0e16c
[preflight] Running pre-flight checks
[discovery] Trying to connect to API Server "192.168.56.118:6443"
[discovery] Created cluster-info discovery client, requesting info from "https://192.168.56.118:6443"
[discovery] Requesting info from "https://192.168.56.118:6443" again to validate TLS against the pinned public key
[discovery] Failed to request cluster info, will try again: [Get https://192.168.56.118:6443/api/v1/namespaces/kube-public/configmaps/cluster-info: x509: certificate is valid for 10.96.0.1, 10.0.3.15, not 192.168.56.118]
I tried with other IP - 10.0.3.15. But it returns connection refused error, despite the fact that the firewall is disabled in master.
[root#k8s-worker ~]# kubeadm join --token qr1czu.5lh1nt34ldiauc1u 10.0.3.15:6443 --discovery-token-ca-cert-hash sha256:e5d90dfa0fff67589551559c443762dac3f1e5c7a5d2b4a630e4c0156ad0e16c
[preflight] Running pre-flight checks
[WARNING Service-Kubelet]: kubelet service is not enabled, please run 'systemctl enable kubelet.service'
[discovery] Trying to connect to API Server "10.0.3.15:6443"
[discovery] Created cluster-info discovery client, requesting info from "https://10.0.3.15:6443"
[discovery] Failed to request cluster info, will try again: [Get https://10.0.3.15:6443/api/v1/namespaces/kube-public/configmaps/cluster-info: dial tcp 10.0.3.15:6443: connect: connection refused]
How can I force the certificate to make 192.168.56.118 as valid? or any idea how can I resolve this issue?
You need to provide extra apiserver certificate SAN (--apiserver-cert-extra-sans <ip_address>) and api server advertise address(--apiserver-advertise-address) while initialising the cluster using kubeadm init. Your kubeadm init command will look like:
kubeadm init --apiserver-cert-extra-sans 192.168.56.118 --apiserver-advertise-address 192.168.56.118
Once, you initialise cluster with above command you will not face the issue of certificates while joining the cluster

kubernetes HA joing work node error - cluster CA found in cluster-info configmap is invalid: public key

I trying to setup the 3 node Multi Master kubernetes v1.10.0 cluster setup using Centos 7.
following kubeadm document steps. I was able to setup the 3 master all of them in Ready status.
https://kubernetes.io/docs/setup/independent/high-availability/#kubeadm-init-master0
while joining work node. I am getting certificate issue, kubeadm join command giving this error message.
kubeadm join <masterip>:6443 --token v5ylg8.jgqn122kewvobaoo --discovery-token-ca-cert-hash sha256:b01832713190461cc96ca02e5c2b1e578473c6712b0965d2383aa9de9b41d4b6 --ignore-preflight-errors=cri
[preflight] Running pre-flight checks.
[WARNING SystemVerification]: docker version is greater than the most recently validated version. Docker version: 17.09.1-ce. Max validated version: 17.03
[WARNING CRI]: unable to check if the container runtime at "/var/run/dockershim.sock" is running: exit status 1
[discovery] Trying to connect to API Server "<masterip>:6443"
[discovery] Created cluster-info discovery client, requesting info from "https://<masterip>:6443"
[discovery] Failed to connect to API Server "<masterip>:6443": cluster CA found in cluster-info configmap is invalid: public key sha256:87d28cf32666a75cb7ed6502ab5c726de29438754c48507687648e84ad9b6693 not pinned
[discovery] Trying to connect to API Server "<masterip>:6443"
[discovery] Created cluster-info discovery client, requesting info from "https://<masterip>:6443"
[discovery] Failed to connect to API Server "<masterip>:6443": cluster CA found in cluster-info configmap is invalid: public key sha256:87d28cf32666a75cb7ed6502ab5c726de29438754c48507687648e84ad9b6693 not pinned
How to find out root cause for this certificate error message?
Thanks
SR

How to change name of a kubernetes node

I have a running node in a kubernetes cluster. Is there a way I can change its name?
I have tried to
delete the node using kubectl delete
change the name in the node's manifest
add the node back.
But the node won't start.
Anyone know how it should be done?
Thanks
Usualy it's kubelet that is responsible for registering the node under particular name, so you should make changes to your nodes kubelet configuration and then it should pop up as new node.
Changing the node's name is not possible at the moment, it requires you to remove and rejoin the node.
You need to make sure the hostname is changed to the new name, remove the node, reset it and rejoin it.
(you will notice that with the command : kubectl edit node , you will get an error if you try and save the name:
A copy of your changes has been stored to "/tmp/kubectl-edit-qlh54.yaml"
error: At least one of apiVersion, kind and name was changed
)
Ideally you have removed the running pods on it.
You can try to run kubectl drain <node_name_to_rename> . Proceed at your own risk if that doesn't complete . --ignore-daemon-sets can be used to ignore possible issues for pods that cannot be evicted.
In short, for a node that has been renamed and is out of the cluster on CentOS 7:
kubectl delete node <original-nodename>
Then on the node that you want to rejoin, as root:
kubeadm reset
check the output and see if it applies to your setup (for potential further cleanup).
Now generate the join command on the master node:
export KUBECONFIG=/etc/kubernetes/admin.conf #(or wherever you have it)
kubeadm token create --print-join-command
Run the output on the worker node you have just reset:
kubeadm join <masternode_ip_address>:6443 --token somegeneratedtoken --discovery-token-ca-cert-hash sha256:somesha256hashthatyougotfromtheabovecommand
If you run kubectl get nodes it should show up now with the new name
output in my case:
W0220 10:43:23.286109 11473 join.go:346] [preflight] WARNING: JoinControlPane.controlPlane settings will be ignored when control-plane flag is not set.
[preflight] Running pre-flight checks
[preflight] Reading configuration from the cluster...
[preflight] FYI: You can look at this config file with 'kubectl -n kube-system get cm kubeadm-config -oyaml'
[kubelet-start] Downloading configuration for the kubelet from the "kubelet-config-1.17" ConfigMap in the kube-system namespace
[kubelet-start] Writing kubelet configuration to file "/var/lib/kubelet/config.yaml"
[kubelet-start] Writing kubelet environment file with flags to file "/var/lib/kubelet/kubeadm-flags.env"
[kubelet-start] Starting the kubelet
[kubelet-start] Waiting for the kubelet to perform the TLS Bootstrap...
This node has joined the cluster:
* Certificate signing request was sent to apiserver and a response was received.
* The Kubelet was informed of the new secure connection details.
Run 'kubectl get nodes' on the control-plane to see this node join the cluster.
Enjoy your renamed node!
Based on source: https://www.youtube.com/watch?v=TqoA9HwFLVU