Kubernetes network plugin - kubernetes

I have installed a Kubernetes cluster of 3 nodes with the calico network plugin.
For some reason I decided to remove totally kubernetes and reisntall it with a different network plugin: Flannel.
All seemed fine until I tried to deploy my first container.
kubectl describe pod/cassandra return the following error:
Unknown desc = [failed to set up sandbox container "957f68c3cbe9b230b0e2bd6729a12c340f903de568622e28e335f7b48563a445" network for pod "cassandra-d7db46b86-dz7ck": networkPlugin cni failed to set up pod "cassandra-d7db46b86-dz7ck_default" network: error getting ClusterInformation: Get https://[10.96.0.1]:443/apis/crd.projectcalico.org/v1/clusterinformations/default: x509: certificate signed by unknown authority (possibly because of "crypto/rsa: verification error" while trying to verify candidate authority certificate "kubernetes"), failed to clean up sandbox container "957f68c3cbe9b230b0e2bd6729a12c340f903de568622e28e335f7b48563a445" network for pod "cassandra-d7db46b86-dz7ck": networkPlugin cni failed to teardown pod "cassandra-d7db46b86-dz7ck_default" network: error getting ClusterInformation: Get https://[10.96.0.1]:443/apis/crd.projectcalico.org/v1/clusterinformations/default: x509: certificate signed by unknown authority (possibly because of "crypto/rsa: verification error" while trying to verify candidate authority certificate "kubernetes")]
Normal SandboxChanged 3s (x3 over 18s) kubelet, <node name> Pod sandbox changed, it will be killed and re-created.
By reading at the errors it seems that the calico plugin is still used by Kubernetes, although I removed it and installed the Flannel plugin.
How can I clean this mess ?

Clear ip route: ip route flush proto bird
remove all calico links in all nodes
ip link list | grep cali | awk '{print $2}' | cut -c 1-15 | xargs -I {} ip link delete {}
remove ipip module modprobe -r ipip
remove calico configs
rm /etc/cni/net.d/10-calico.conflist && rm /etc/cni/net.d/calico-kubeconfig
restart kubelet service
After this you install flannel.

Can you try to rejoin (remove from the cluster and join it again) the compute/slave nodes? It worked for one of my cases before.

Related

kubernetes - Couldn't able to join master node - error execution phase preflight: couldn't validate the identity of the API Server

I am novice to k8s, so this might be very simple issue for someone with expertise in the k8s.
I am working with two nodes
master - 2cpu, 2 GB memory
worker - 1 cpu, 1 GB memory
OS - ubuntu - hashicorp/bionic64
I did setup the master node successfully and i can see it is up and running
vagrant#master:~$ kubectl get nodes
NAME STATUS ROLES AGE VERSION
master Ready master 29m v1.18.2
Here is token which i have generated
vagrant#master:~$ kubeadm token create --print-join-command
W0419 13:45:52.513532 16403 configset.go:202] WARNING: kubeadm cannot validate component configs for API groups [kubelet.config.k8s.io kubeproxy.config.k8s.io]
kubeadm join 10.0.2.15:6443 --token xuz63z.todnwgijqb3z1vhz --discovery-token-ca-cert-hash sha256:d4dadda6fa90c94eca1c8dcd3a441af24bb0727ffc45c0c27161ee8f7e883521
Issue - But when i try to join it from the worker node i get
vagrant#worker:~$ sudo kubeadm join 10.0.2.15:6443 --token xuz63z.todnwgijqb3z1vhz --discovery-token-ca-cert-hash sha256:d4dadda6fa90c94eca1c8dcd3a441af24bb0727ffc45c0c27161ee8f7e883521
W0419 13:46:17.651819 15987 join.go:346] [preflight] WARNING: JoinControlPane.controlPlane settings will be ignored when control-plane flag is not set.
[preflight] Running pre-flight checks
[WARNING IsDockerSystemdCheck]: detected "cgroupfs" as the Docker cgroup driver. The recommended driver is "systemd". Please follow the guide at https://kubernetes.io/docs/setup/cri/
error execution phase preflight: couldn't validate the identity of the API Server: Get https://10.0.2.15:6443/api/v1/namespaces/kube-public/configmaps/cluster-info?timeout=10s: dial tcp 10.0.2.15:6443: connect: connection refused
To see the stack trace of this error execute with --v=5 or higher
Here are the ports which are occupied
10.0.2.15:2379
10.0.2.15:2380
10.0.2.15:68
Note i am using CNI from -
kubectl apply -f https://docs.projectcalico.org/manifests/calico.yaml
Here are mistakes which realized i did during my kubernetes installation -
(For detailed installation step follow - Steps for Installation )
But here are the key mistakes which i did -
Mistake 1 - Since i was working on the VMs so i had multiple ethernet adapter on my both the VMs (master as well as worker ). By default the the CNI always takes the eth0 but i our case it should be eth1
1: lo: <LOOPBACK,UP,LOWER_UP>
2: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc fq_codel state UP group default qlen 1000
link/ether 08:00:27:bb:14:75 brd ff:ff:ff:ff:ff:ff
inet 10.0.2.15
3: eth1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc fq_codel state UP group default qlen 1000
link/ether 08:00:27:fb:48:77 brd ff:ff:ff:ff:ff:ff
inet 100.0.0.1
4: docker0: <NO-CARRIER,BROADCAST,MULTICAST,UP>
Mistake 2- The way i was initializing the my kubeadm without --apiserver-advertise-address and --pod-network-cidr
So here is kubeadm command which i used -
[vagrant#master ~]$ sudo kubeadm init --apiserver-advertise-address=100.0.0.1 --pod-network-cidr=10.244.0.0/16
Mistake 3 - - Since we have mulitple ethernet adapter in our VMs so i coudln't find the a way to set up extra args to switch from eth0 to eth1 in calico.yml configuration.
So i used flannel CNI*
[vagrant#master ~]$ wget https://raw.githubusercontent.com/coreos/flannel/master/Documentation/kube-flannel.yml
and in the args section added - --iface=eth1
- --iface=eth1
args:
- --ip-masq
- --kube-subnet-mgr
- --iface=eth1
And it worked after that
It worked for me using this --apiserver-advertise-address:
sudo kubeadm init --apiserver-advertise-address=172.16.28.10 --pod-network-cidr=192.168.0.0/16
kubectl apply -f https://docs.projectcalico.org/v3.8/manifests/calico.yaml
I have used calico and worked for me.
In Member node for join:
kubeadm join 172.16.28.10:6443 --token 2jm9hd.o2gulx4x1b8l1t5d \ --discovery-token-ca-cert-hash sha256:b8b679e86c4d228bfa486086f18dcac4760e5871e8dd023ec166acfd93781595
I ran into the same problem while setting up the Kubernetes Master and worker node setup. I got the same error while adding the worker nodes to the master node. I just stopped the Firewalld and then tried adding the nodes adn then IT WORKED. Hope this helps someone.
Thanks
When joining master node with slave nodes, received below error:
====================================================================
TASK [Joining worker nodes with kubernetes master] ***********************************************
fatal: [ip-0-0-0-0.ec2.internal]: FAILED! => {"changed": true, "cmd": "grep -i -A2 'kubeadm join' join_token|bash", "delta": "0:05:07.728008", "end": "2022-01-15 11:41:28.615600", "msg": "non-zero return code", "rc": 1, "start": "2022-01-15 11:36:20.887592", "stderr": "error execution phase preflight: couldn't validate the identity of the API Server: Get \"https://139.41.82.79:6443/api/v1/namespaces/kube-public/configmaps/cluster-info?timeout=10s\": net/http: request canceled while waiting for connection (Client.Timeout exceeded while awaiting headers)\nTo see the stack trace of this error execute with --v=5 or higher", "stderr_lines": ["error execution phase preflight: couldn't validate the identity of the API Server: Get \"https://139.41.82.79:6443/api/v1/namespaces/kube-public/configmaps/cluster-info?timeout=10s\": net/http: request canceled while waiting for connection (Client.Timeout exceeded while awaiting headers)", "To see the stack trace of this error execute with --v=5 or higher"], "stdout": "[preflight] Running pre-flight checks", "stdout_lines": ["[preflight] Running pre-flight checks"]}
===============================================================================
If you observe the error carefully, you will see the ip address and port number as below:
https://139.41.82.79:6443
Solution:
Please follow the below steps to resolve:
Go to your security groups
Add the port number 6443 entry in your security group for custom IP, for 0.0.0.0:0, and save the details.
Rerun the yaml script to join the master nodes with slave.
Run the command 'kubectl config view' or 'kubectl cluster-info' to check the IP address of Kubernetes control plane. In my case it is 10.0.0.2.
$ kubectl config view
apiVersion: v1
clusters:
cluster:
certificate-authority-data: DATA+OMITTED
server: https://10.0.0.2:6443
Or
$ kubectl cluster-info
Kubernetes control plane is running at https://10.0.0.2:6443
Tried to telnet the Kubernetes control plane.
telnet 10.0.0.2 6443
Trying 10.0.0.2...
Press Control + C in your keyboard to terminate the telnet command.
Go to your Firewall Rules and add port 6443 and make sure to allow all instances in the network.
Then try to telnet the Kubernetes control plane once again and you should be able to connect now:
$ telnet 10.0.0.2 6443
Trying 10.0.0.2...
Connected to 10.0.0.2.
Escape character is '^]'.
Try to join the worker nodes now. You can run the command 'kubeadm token create --print-join-command' to create new token just in case you forgot to save the old one.
Run 'kubectl get nodes' on the control-plane to see this node join the cluster
$ kubectl get nodes
NAME STATUS ROLES AGE VERSION
k8s Ready control-plane 57m v1.25.0
wk8s-node-0 Ready 36m v1.25.0
wk8s-node-1 Ready 35m v1.25.0
I ran into similar issue, problem was my node VM's timezone was different. Corrected the time on node and it worked!
Hope it may help someone.
Simple fix: Expose port:6443 in Security Group of my AWS EC2 instance.
If the master is on Ubuntu 21.04 64 bit version ( VM) then opening up the firewall for the port used for joining the cluster using the command sudo ufw allow 6443 will help
With addition to the good examples gave for this issue by #Rahul Wagh - I'll just add the fact that I got this error because I ran export KUBECONFIG on the wrong kubeconfig file which was configured with an api-server endpoint which wasn't reachable.
First Check if you could connect from Worker Node to Master with telnet
telnet Master_IP 6443
If you could then please In Master Run kubeadm with --apiserver-advertise-address=Master_IP and --pod-network-cidr=10.244.0.0/16
N.B: 10.244.0.0/16 is Network CIDR for vagrant most of the time and save the join command
Then Create the pod network using CNI {flannel/Calico/WeaveNet}
Now run the join command which you saved from Worker Node {Here 10.0.2.15 will be set max time}
kubeadm join 10.0.2.15:6443 --token 7gm48u.6ffny379c1mw3hpu
--discovery-token-ca-cert-hash sha256:936daab57e3302ed7b70f665af3d041736e265d19a16abc710fa0efbf318b5bf

Nginx Kubernetes POD stays in ContainerCreating

I was able to setup the Kubernetes Cluster on Centos7 with one master and two worker nodes, however when I try to deploy a pod with nginx, the state of the pod stays in ContainerRunning forever and doesn't seem to get out of it.
For pod network I am using the calico.
Can you please help me resolve this issue? for some reason I don't feel satisfied moving forward without resolving this issue, I tried to check forums etc, since the last two days and this is the last resort that I am reaching out to you.
[root#kube-master ~]# kubectl get pods --all-namespaces
[get pods result][1]
However when I run describe pods I see the below error for the nginx container under events section.
Warning FailedCreatePodSandBox 41s (x8 over 11m) kubelet,
kube-worker1 (combined from similar events): Failed to create pod
sandbox: rpc error: code = Unknown desc = failed to set up sandbox
container
"ac77a42270009cba0c508e2fd82a84d6caef287bdb117d288d5193960b52abcb"
network for pod "nginx-6db489d4b7-2r4d2": networkPlugin cni failed to
set up pod "nginx-6db489d4b7-2r4d2_default" network: unable to connect
to Cilium daemon: failed to create cilium agent client after 30.000000
seconds timeout: Get http:///var/run/cilium/cilium.sock/v1/config:
dial unix /var/run/cilium/cilium.sock: connect: no such file or
directory
Hope you can help here.
Edit 1:
The ip address of the master VM is 192.168.40.133
Used the below command to initialize the kubeadm:
kubeadm init --pod-network-cidr=192.168.0.0/16 --apiserver-advertise-address 192.168.40.133
Used the below command to install the pod network:
kubectl apply -f https://docs.projectcalico.org/v3.11/manifests/calico.yaml
The kubeadm init above gave me the join command that I used to join the workers into the cluster.
All the VMs are connected to host and bridged network adapters.
your pod subnet (specified by --pod-network-cidr) clashes with the network your VMs are located in: these 2 have to be distinct. Use something else for the pod subnet, for example 10.244.0.0/16 and then edit calico.yaml before applying it as described in the official docs:
POD_CIDR="10.244.0.0/16"
kubeadm init --pod-network-cidr=${POD_CIDR} --apiserver-advertise-address 192.168.40.133
curl https://docs.projectcalico.org/manifests/calico.yaml -O
sed -i -e "s?192.168.0.0/16?${POD_CIDR}?g" calico.yaml
kubectl apply -f calico.yaml
hope this helps :)
note: you don't really need to specify --apiserver-advertise-address flag: kubeadm will detect correctly the main IP of the machine most of the time.

Unable to run pods on new node

Had to change node (server) with the new one leaving the same node name. What I did was:
master> kubectl delete no srv1 (removing old node)
srv1> kubeadm join... (joining new node)
after new node joined cluster no pods can be created.
Warning FailedCreatePodSandBox 16s kubelet, srv1 Failed create pod sandbox: rpc error: code = Unknown desc = failed to set up sandbox container "b85728b51a18533e9d57f6a1b1808dbb5ad72bff4d516217de04e7dad4ce358d" network for pod "dpl-6f56777485-6jzm6": NetworkPlugin cni failed to set up pod "dpl-6f56777485-6jzm6_default" network: failed to set bridge addr: "cni0" already has an IP address different from 10.244.16.1/24
Ideally when performing such a task like "replacing a node" below steps should be considered:
Drain node kubectl drain NODE_NAME
Reset that node kubeadm reset in the old node (optional step if the old node is accessible)
Finally kubeadm delete node NODE_NAME
Things to consider when replacing a old node with new node:
The new node should have the same name as the old node which is echo $HOSTNAME should remain same.
The new node should have the same ip as the old one.
Because these are a node identity.
Finally in a scenario where you have already performed kubectl delete node ... and replaced it with a new one.
curl -LO https://raw.githubusercontent.com/coreos/flannel/62e44c867a2846fefb68bd5f178daf4da3095ccb/Documentation/kube-flannel.yml
kubectl delete -f kube-flannel.yml
[perform below in the nodes which are having problems]
sudo ip link del cni0
sudo ip link del flannel.1
sudo systemctl restart network
[re-apply network plugin]
kubectl apply -f kube-flannel.yml

My kubernetes cluster IP address changed and now kubectl will no longer connect

Running under Ubuntu I used kubeadm init to setup my cluster (master node) and copied over the /etc/kubernetes/admin.conf $HOME/.kube/config and all was well when using kubectl.
However after a reboot my master node has had an IP address change which is not the same as what is in $HOME/.kube/config so now I can no longer connect kubectl
So how do I regenerate the admin.conf now that I have a new IP address? Running kubeadm init will just kill everything which is not what I want.
I found this solution on the internet and it works for me:
systemctl stop kubelet docker
cd /etc/
mv kubernetes kubernetes-backup
mv /var/lib/kubelet /var/lib/kubelet-backup
mkdir -p kubernetes
cp -r kubernetes-backup/pki kubernetes
rm kubernetes/pki/{apiserver.*,etcd/peer.*}
systemctl start docker
kubeadm init --ignore-preflight-errors=DirAvailable--var-lib-etcd
#Run "kubeadm reset" on all nodes if was this error "error execution phase preflight: [preflight] Some fatal errors occurred:
[ERROR FileAvailable--etc-kubernetes-kubelet.conf]: /etc/kubernetes/kubelet.conf already exists
[ERROR Port-10250]: Port 10250 is in use
[ERROR FileAvailable--etc-kubernetes-pki-ca.crt]: /etc/kubernetes/pki/ca.crt already exists"
cp kubernetes/admin.conf ~/.kube/config
kubectl get nodes --sort-by=.metadata.creationTimestamp
kubectl delete node $(kubectl get nodes -o jsonpath='{.items[(#.status.conditions[0].status=="Unknown")].metadata.name}')
kubectl get pods --all-namespaces
After These, Join your Slaves to Master.
Reference: https://medium.com/#juniarto.samsudin/ip-address-changes-in-kubernetes-master-node-11527b867e88
The following command can be used to regenerate admin.conf
kubeadm alpha phase kubeconfig admin --apiserver-advertise-address <new_ip>
However, if you use an IP instead of a hostname, your API-server certificate will be invalid. So, either regenerate your certs ( kubeadm alpha phase certs renew apiserver ), use hostnames instead of IPs or add the insecure --insecure-skip-tls-verify flag when using kubectl
You do not want to use kubeadm reset. That will reset everything and you would have to start configuring your cluster again.
Well, in your scenario, please have a look on the steps below:
nano /etc/hosts (update your new IP against YOUR_HOSTNAME)
nano /etc/kubernetes/config (configuration settings related to your cluster) here in this file look for the following params and update accordingly
KUBE_MASTER="--master=http://YOUR_HOSTNAME:8080"
KUBE_ETCD_SERVERS="--etcd-servers=http://YOUR_HOSTNAME:2379" #2379 is default port
nano /etc/etcd/etcd.conf (conf related to etcd)
KUBE_ETCD_SERVERS="--etcd-servers=http://YOUR_HOSTNAME/WHERE_EVER_ETCD_HOSTED:2379"
2379 is default port for etcd. and you can have multiple etcd servers defined here comma separated
Restart kubelet, apiserver, etcd services.
It is good to use hostname instead of IP to avoid such scenarios.
Hope it helps!

Unable to see join nodes in Kubernetes master

This is my worker node:
root#ivu:~# kubeadm join 10.16.70.174:6443 --token hl36mu.0uptj0rp3x1lfw6n --discovery-token-ca-cert-hash sha256:daac28160d160f938b82b8c720cfc91dd9e6988d743306f3aecb42e4fb114f19 --ignore-preflight-errors=swap
[preflight] Running pre-flight checks.
[WARNING Swap]: running with swap on is not supported. Please disable swap
[WARNING FileExisting-crictl]: crictl not found in system path
Suggestion: go get github.com/kubernetes-incubator/cri-tools/cmd/crictl
[discovery] Trying to connect to API Server "10.16.70.174:6443"
[discovery] Created cluster-info discovery client, requesting info from "https://10.16.70.174:6443"
[discovery] Requesting info from "https://10.16.70.174:6443" again to validate TLS against the pinned public key
[discovery] Cluster info signature and contents are valid and TLS certificate validates against pinned roots, will use API Server "10.16.70.174:6443"
[discovery] Successfully established connection with API Server "10.16.70.174:6443"
This node has joined the cluster:
* Certificate signing request was sent to master and a response
was received.
* The Kubelet was informed of the new secure connection details.
Run 'kubectl get nodes' on the master to see this node join the cluster.
While checking in master nodes using command kubectl get nodes, I can only able to see master:
ivum01#ivum01-HP-Pro-3330-SFF:~$ kubectl get nodes
NAME STATUS ROLES AGE VERSION
ivum01-hp-pro-3330-sff Ready master 36m v1.10.0
For question answer:
docker kubelet kubeadm kubectl installed fine
kubectl get node can not see the current added node; of cause kubectl get pods --all-namespaces has no result for this node;
docker which in current node has no report for kubeadm command(means no k8s images pull, no running container for that)
must import is kubelet not running in work node
run kubelet output:
Failed to get system container stats for "/user.slice/user-1000.slice/session-1.scope": failed to get cgroup stats for "/user.slice/user-1000.slice/session-1.scope": failed to get container info for "/user.slice/user-1000.slice/session-1.scope": unknown container "/user.slice/user-1000.slice/session-1.scope"
same as this issue said
tear down and reset cluster(kubeadm reset) and redo that has no problem in my case.
I had this problem and it was solved by ensuring that cgroup driver on the worker nodes also were set properly.
check with:
docker info | grep -i cgroup
cat /etc/systemd/system/kubelet.service.d/10-kubeadm.conf
set it with:
sed -i "s/cgroup-driver=systemd/cgroup-driver=cgroupfs/g" /etc/systemd/system/kubelet.service.d/10-kubeadm.conf
then restart kubelet service and rejoin the cluster:
systemctl daemon-reload
systemctl restart kubelet
kubeadm reset
kubeadm join ...
Info from docs: https://kubernetes.io/docs/tasks/tools/install-kubeadm/#configure-cgroup-driver-used-by-kubelet-on-master-node