"kubectl get nodes" shows NotReady always even after giving the appropriate IP - kubernetes

i am trying to setup a kubernetes cluster for testing purpose with a master and one minion. When i run the kubectl get nodes it always says NotReady. Following the configuration on minion in /etc/kubernetes/kubelet
KUBELET_ADDRESS="--address=0.0.0.0"
KUBELET_PORT="--port=10250"
KUBELET_HOSTNAME="--hostname-override=centos-minion"
KUBELET_API_SERVER="--api-servers=http://centos-master:8080"
KUBELET_ARGS=""
When kubelete service is started following logs could be seen
Mar 16 13:29:49 centos-minion kubelet: E0316 13:29:49.126595 53912 event.go:202] Unable to write event: 'Post http://centos-master:8080/api/v1/namespaces/default/events: dial tcp 10.143.219.12:8080: i/o timeout' (may retry after sleeping)
Mar 16 13:16:01 centos-minion kube-proxy: E0316 13:16:01.195731 53595 event.go:202] Unable to write event: 'Post http://localhost:8080/api/v1/namespaces/default/events: dial tcp [::1]:8080: getsockopt: connection refused' (may retry after sleeping)
Following is the config on master /etc/kubernetes/apiserver
KUBE_API_ADDRESS="--bind-address=0.0.0.0"
KUBE_API_PORT="--port=8080"
KUBELET_PORT="--kubelet-port=10250"
KUBE_SERVICE_ADDRESSES="--service-cluster-ip-range=10.254.0.0/16"
/etc/kubernetes/config
KUBE_ETCD_SERVERS="--etcd-servers=http://centos-master:2379"
KUBE_LOGTOSTDERR="--logtostderr=true"
KUBE_LOG_LEVEL="--v=0"
KUBE_ALLOW_PRIV="--allow-privileged=false"
KUBE_MASTER="--master=http://centos-master:8080"
On master following processes are properly running
kube 5657 1 0 Mar15 ? 00:12:05 /usr/bin/kube-apiserver --logtostderr=true --v=0 --etcd-servers=http://centos-master:2379 --address=0.0.0.0 --port=8080 --kubelet-port=10250 --allow-privileged=false --service-cluster-ip-range=10.254.0.0/16
kube 5690 1 1 Mar15 ? 00:16:01 /usr/bin/kube-controller-manager --logtostderr=true --v=0 --master=http://centos-master:8080
kube 5723 1 0 Mar15 ? 00:02:23 /usr/bin/kube-scheduler --logtostderr=true --v=0 --master=http://centos-master:8080
So i still do not know what is missing.

I was having the same issue when I setting up the kubernetes with fedora following the steps on kubernetes.io.
In the tutorial, it's commenting out KUBELET_ARGS="--cgroup-driver=systemd" in node's /etc/kubernetes/kubelet, if you uncomment it, you will see the node status become ready.
Hope this help

rejoin the worker nodes to the master.
My install is on three physical machines. One master and two workers. All needed reboots.
you will need your join token, which you probably don't have:
sudo kubeadm token list
copy the TOKEN field data, the output looks like this (no, that's not my real one):
TOKEN
ow3v08ddddgmgzfkdkdkd7 18h 2018-07-30T12:39:53-05:00 authentication,signing The default bootstrap token generated by 'kubeadm init'. system:bootstrappers:kubeadm:default-node-token
THEN join the cluster here. Master node IP is the real IP address of your machine:
sudo kubeadm join --token <YOUR TOKEN HASH> <MASTER_NODE_IP>:6443 --discovery-token-unsafe-skip-ca-verification

Have to restart kubelet service in node (systemctl enable kubelet & systemctl restart kubelet). Then you can see your node is in "Ready" status.

Related

The connection to the server 127.0.0.1:6443 was refused - did you specify the right host or port?

I have setup the Kubernetes cluster with Kubespray
Once I restart the node and check the status of the node I am getting as below
$ kubectl get nodes
The connection to the server 127.0.0.1:6443 was refused - did you specify the right host or port?
Environment:
OS : CentOS 7
Kubespray
kubelet version: 1.22.3
Need your help on this.
Regards,
Zain
This work for me, I'm using minukube,
When checking the minikube status by running the command minikube status you'll probably get something like that
E0121 07:14:19.882656 7165 status.go:415] kubeconfig endpoint: got:
127.0.0.1:55900, want: 127.0.0.1:49736
type: Control Plane
host: Running
kubelet: Stopped
apiserver: Stopped
kubeconfig: Misconfigured
To fix it, I just followed the next steps:
minikube update-context
minukube start
Below step can solve your issue.
kubelet may be down, use the below commands on the master node.
1. sudo -i
2. swapoff -a
3. exit
4. strace -eopenat kubectl version
Then try using kubectl get nodes.
Thank you Sai for your inputs. i was getting journalctl -xeu kubelet output was Error while dialing dial unix /var/run/cri-dockerd.sock: connect: no such file or directory i was restarted and enabled cri-dockerd services
sudo systemctl enable cri-dockerd.service
sudo systemctl restart cri-dockerd.service
then sudo systemctl start kubelet finally it works for me.
#kubectl cluster-info
Kubernetes control plane is running at https://127.0.0.1:6443
this link will give https://github.com/kubernetes-sigs/kubespray/issues/8734 more info.
Regards,Zain

Cant access pod of another Node

I install a 3 node Kubernetes Cluster with RKE tool. The installation was successful with no errors but I'm unable to ping from one pod to another pod.
If I ping a pod running on worker2 node(NODE-IP-10.222.22.47) I get a response, but no responses from pods running on worker1(NODE-IP-10.222.22.46).
My Pods are as follows -
Also I noticed for some pods it has given node-ip addresses. The node IP addresses are
Master1=10.222.22.45
Worker1=10.222.22.46
Worker2=10.222.22.47
cluster_cidr: 10.42.0.0/16
service_cluster_ip_range: 10.43.0.0/16
cluster_dns_server: 10.43.0.10
Overlay network - canal
OS- CentOS Linux release 7.8.2003
Kubernetes - v1.20.8 installed with rke tool
Docker - 20.10.7
Sysctl entries in all nodes
firewall was disabled in all nodes before install.
Check - sysctl net.bridge.bridge-nf-call-ip6tables
net.bridge.bridge-nf-call-ip6tables = 1
Check - sysctl net.bridge.bridge-nf-call-iptables
net.bridge.bridge-nf-call-iptables = 1
From description and screenshots provided, some addons and
controllers of master01 do not seem ready, which might need to be
reconfigured.
You can also use routes to connect to pods on worker1. Check this for
detailed instructions.
The reason was the UDP ports were blocked

kubernetes - Couldn't able to join master node - error execution phase preflight: couldn't validate the identity of the API Server

I am novice to k8s, so this might be very simple issue for someone with expertise in the k8s.
I am working with two nodes
master - 2cpu, 2 GB memory
worker - 1 cpu, 1 GB memory
OS - ubuntu - hashicorp/bionic64
I did setup the master node successfully and i can see it is up and running
vagrant#master:~$ kubectl get nodes
NAME STATUS ROLES AGE VERSION
master Ready master 29m v1.18.2
Here is token which i have generated
vagrant#master:~$ kubeadm token create --print-join-command
W0419 13:45:52.513532 16403 configset.go:202] WARNING: kubeadm cannot validate component configs for API groups [kubelet.config.k8s.io kubeproxy.config.k8s.io]
kubeadm join 10.0.2.15:6443 --token xuz63z.todnwgijqb3z1vhz --discovery-token-ca-cert-hash sha256:d4dadda6fa90c94eca1c8dcd3a441af24bb0727ffc45c0c27161ee8f7e883521
Issue - But when i try to join it from the worker node i get
vagrant#worker:~$ sudo kubeadm join 10.0.2.15:6443 --token xuz63z.todnwgijqb3z1vhz --discovery-token-ca-cert-hash sha256:d4dadda6fa90c94eca1c8dcd3a441af24bb0727ffc45c0c27161ee8f7e883521
W0419 13:46:17.651819 15987 join.go:346] [preflight] WARNING: JoinControlPane.controlPlane settings will be ignored when control-plane flag is not set.
[preflight] Running pre-flight checks
[WARNING IsDockerSystemdCheck]: detected "cgroupfs" as the Docker cgroup driver. The recommended driver is "systemd". Please follow the guide at https://kubernetes.io/docs/setup/cri/
error execution phase preflight: couldn't validate the identity of the API Server: Get https://10.0.2.15:6443/api/v1/namespaces/kube-public/configmaps/cluster-info?timeout=10s: dial tcp 10.0.2.15:6443: connect: connection refused
To see the stack trace of this error execute with --v=5 or higher
Here are the ports which are occupied
10.0.2.15:2379
10.0.2.15:2380
10.0.2.15:68
Note i am using CNI from -
kubectl apply -f https://docs.projectcalico.org/manifests/calico.yaml
Here are mistakes which realized i did during my kubernetes installation -
(For detailed installation step follow - Steps for Installation )
But here are the key mistakes which i did -
Mistake 1 - Since i was working on the VMs so i had multiple ethernet adapter on my both the VMs (master as well as worker ). By default the the CNI always takes the eth0 but i our case it should be eth1
1: lo: <LOOPBACK,UP,LOWER_UP>
2: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc fq_codel state UP group default qlen 1000
link/ether 08:00:27:bb:14:75 brd ff:ff:ff:ff:ff:ff
inet 10.0.2.15
3: eth1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc fq_codel state UP group default qlen 1000
link/ether 08:00:27:fb:48:77 brd ff:ff:ff:ff:ff:ff
inet 100.0.0.1
4: docker0: <NO-CARRIER,BROADCAST,MULTICAST,UP>
Mistake 2- The way i was initializing the my kubeadm without --apiserver-advertise-address and --pod-network-cidr
So here is kubeadm command which i used -
[vagrant#master ~]$ sudo kubeadm init --apiserver-advertise-address=100.0.0.1 --pod-network-cidr=10.244.0.0/16
Mistake 3 - - Since we have mulitple ethernet adapter in our VMs so i coudln't find the a way to set up extra args to switch from eth0 to eth1 in calico.yml configuration.
So i used flannel CNI*
[vagrant#master ~]$ wget https://raw.githubusercontent.com/coreos/flannel/master/Documentation/kube-flannel.yml
and in the args section added - --iface=eth1
- --iface=eth1
args:
- --ip-masq
- --kube-subnet-mgr
- --iface=eth1
And it worked after that
It worked for me using this --apiserver-advertise-address:
sudo kubeadm init --apiserver-advertise-address=172.16.28.10 --pod-network-cidr=192.168.0.0/16
kubectl apply -f https://docs.projectcalico.org/v3.8/manifests/calico.yaml
I have used calico and worked for me.
In Member node for join:
kubeadm join 172.16.28.10:6443 --token 2jm9hd.o2gulx4x1b8l1t5d \ --discovery-token-ca-cert-hash sha256:b8b679e86c4d228bfa486086f18dcac4760e5871e8dd023ec166acfd93781595
I ran into the same problem while setting up the Kubernetes Master and worker node setup. I got the same error while adding the worker nodes to the master node. I just stopped the Firewalld and then tried adding the nodes adn then IT WORKED. Hope this helps someone.
Thanks
When joining master node with slave nodes, received below error:
====================================================================
TASK [Joining worker nodes with kubernetes master] ***********************************************
fatal: [ip-0-0-0-0.ec2.internal]: FAILED! => {"changed": true, "cmd": "grep -i -A2 'kubeadm join' join_token|bash", "delta": "0:05:07.728008", "end": "2022-01-15 11:41:28.615600", "msg": "non-zero return code", "rc": 1, "start": "2022-01-15 11:36:20.887592", "stderr": "error execution phase preflight: couldn't validate the identity of the API Server: Get \"https://139.41.82.79:6443/api/v1/namespaces/kube-public/configmaps/cluster-info?timeout=10s\": net/http: request canceled while waiting for connection (Client.Timeout exceeded while awaiting headers)\nTo see the stack trace of this error execute with --v=5 or higher", "stderr_lines": ["error execution phase preflight: couldn't validate the identity of the API Server: Get \"https://139.41.82.79:6443/api/v1/namespaces/kube-public/configmaps/cluster-info?timeout=10s\": net/http: request canceled while waiting for connection (Client.Timeout exceeded while awaiting headers)", "To see the stack trace of this error execute with --v=5 or higher"], "stdout": "[preflight] Running pre-flight checks", "stdout_lines": ["[preflight] Running pre-flight checks"]}
===============================================================================
If you observe the error carefully, you will see the ip address and port number as below:
https://139.41.82.79:6443
Solution:
Please follow the below steps to resolve:
Go to your security groups
Add the port number 6443 entry in your security group for custom IP, for 0.0.0.0:0, and save the details.
Rerun the yaml script to join the master nodes with slave.
Run the command 'kubectl config view' or 'kubectl cluster-info' to check the IP address of Kubernetes control plane. In my case it is 10.0.0.2.
$ kubectl config view
apiVersion: v1
clusters:
cluster:
certificate-authority-data: DATA+OMITTED
server: https://10.0.0.2:6443
Or
$ kubectl cluster-info
Kubernetes control plane is running at https://10.0.0.2:6443
Tried to telnet the Kubernetes control plane.
telnet 10.0.0.2 6443
Trying 10.0.0.2...
Press Control + C in your keyboard to terminate the telnet command.
Go to your Firewall Rules and add port 6443 and make sure to allow all instances in the network.
Then try to telnet the Kubernetes control plane once again and you should be able to connect now:
$ telnet 10.0.0.2 6443
Trying 10.0.0.2...
Connected to 10.0.0.2.
Escape character is '^]'.
Try to join the worker nodes now. You can run the command 'kubeadm token create --print-join-command' to create new token just in case you forgot to save the old one.
Run 'kubectl get nodes' on the control-plane to see this node join the cluster
$ kubectl get nodes
NAME STATUS ROLES AGE VERSION
k8s Ready control-plane 57m v1.25.0
wk8s-node-0 Ready 36m v1.25.0
wk8s-node-1 Ready 35m v1.25.0
I ran into similar issue, problem was my node VM's timezone was different. Corrected the time on node and it worked!
Hope it may help someone.
Simple fix: Expose port:6443 in Security Group of my AWS EC2 instance.
If the master is on Ubuntu 21.04 64 bit version ( VM) then opening up the firewall for the port used for joining the cluster using the command sudo ufw allow 6443 will help
With addition to the good examples gave for this issue by #Rahul Wagh - I'll just add the fact that I got this error because I ran export KUBECONFIG on the wrong kubeconfig file which was configured with an api-server endpoint which wasn't reachable.
First Check if you could connect from Worker Node to Master with telnet
telnet Master_IP 6443
If you could then please In Master Run kubeadm with --apiserver-advertise-address=Master_IP and --pod-network-cidr=10.244.0.0/16
N.B: 10.244.0.0/16 is Network CIDR for vagrant most of the time and save the join command
Then Create the pod network using CNI {flannel/Calico/WeaveNet}
Now run the join command which you saved from Worker Node {Here 10.0.2.15 will be set max time}
kubeadm join 10.0.2.15:6443 --token 7gm48u.6ffny379c1mw3hpu
--discovery-token-ca-cert-hash sha256:936daab57e3302ed7b70f665af3d041736e265d19a16abc710fa0efbf318b5bf

Unable to see join nodes in Kubernetes master

This is my worker node:
root#ivu:~# kubeadm join 10.16.70.174:6443 --token hl36mu.0uptj0rp3x1lfw6n --discovery-token-ca-cert-hash sha256:daac28160d160f938b82b8c720cfc91dd9e6988d743306f3aecb42e4fb114f19 --ignore-preflight-errors=swap
[preflight] Running pre-flight checks.
[WARNING Swap]: running with swap on is not supported. Please disable swap
[WARNING FileExisting-crictl]: crictl not found in system path
Suggestion: go get github.com/kubernetes-incubator/cri-tools/cmd/crictl
[discovery] Trying to connect to API Server "10.16.70.174:6443"
[discovery] Created cluster-info discovery client, requesting info from "https://10.16.70.174:6443"
[discovery] Requesting info from "https://10.16.70.174:6443" again to validate TLS against the pinned public key
[discovery] Cluster info signature and contents are valid and TLS certificate validates against pinned roots, will use API Server "10.16.70.174:6443"
[discovery] Successfully established connection with API Server "10.16.70.174:6443"
This node has joined the cluster:
* Certificate signing request was sent to master and a response
was received.
* The Kubelet was informed of the new secure connection details.
Run 'kubectl get nodes' on the master to see this node join the cluster.
While checking in master nodes using command kubectl get nodes, I can only able to see master:
ivum01#ivum01-HP-Pro-3330-SFF:~$ kubectl get nodes
NAME STATUS ROLES AGE VERSION
ivum01-hp-pro-3330-sff Ready master 36m v1.10.0
For question answer:
docker kubelet kubeadm kubectl installed fine
kubectl get node can not see the current added node; of cause kubectl get pods --all-namespaces has no result for this node;
docker which in current node has no report for kubeadm command(means no k8s images pull, no running container for that)
must import is kubelet not running in work node
run kubelet output:
Failed to get system container stats for "/user.slice/user-1000.slice/session-1.scope": failed to get cgroup stats for "/user.slice/user-1000.slice/session-1.scope": failed to get container info for "/user.slice/user-1000.slice/session-1.scope": unknown container "/user.slice/user-1000.slice/session-1.scope"
same as this issue said
tear down and reset cluster(kubeadm reset) and redo that has no problem in my case.
I had this problem and it was solved by ensuring that cgroup driver on the worker nodes also were set properly.
check with:
docker info | grep -i cgroup
cat /etc/systemd/system/kubelet.service.d/10-kubeadm.conf
set it with:
sed -i "s/cgroup-driver=systemd/cgroup-driver=cgroupfs/g" /etc/systemd/system/kubelet.service.d/10-kubeadm.conf
then restart kubelet service and rejoin the cluster:
systemctl daemon-reload
systemctl restart kubelet
kubeadm reset
kubeadm join ...
Info from docs: https://kubernetes.io/docs/tasks/tools/install-kubeadm/#configure-cgroup-driver-used-by-kubelet-on-master-node

skydns is not able to resolve dns in kubernetes cluster

I am setting up skydns for kubernetes following this template http://kubernetes.io/docs/getting-started-guides/docker-multinode/skydns.yaml.in. But it is not able to resolve dns. After looking for solutions also added -kube_master_url in kube2sky arguments but still the same issue. Here is the skydns logs:
2016/04/23 02:49:26 skydns: falling back to default configuration, could not read from etcd: 501: All the given peers are not reachable (failed to propose on members [http://127.0.0.1:4001] twice [last error: Get http://127.0.0.1:4001/v2/keys/skydns/config?quorum=false&recursive=false&sorted=false: dial tcp 127.0.0.1:4001: connection refused]) [0]
2016/04/23 02:49:26 skydns: ready for queries on cluster.local. for tcp://0.0.0.0:53 [rcache 0]
2016/04/23 02:49:26 skydns: ready for queries on cluster.local. for udp://0.0.0.0:53 [rcache 0]
2016/04/23 02:49:33 skydns: error from backend: 501: All the given peers are not reachable (failed to propose on members [http://127.0.0.1:4001] twice [last error: Get http://127.0.0.1:4001/v2/keys/skydns/local/cluster/svc/default/kubernetes?quorum=false&recursive=true&sorted=false: dial tcp 127.0.0.1:4001: connection refused]) [0]
Any pointers?
Kube2sky logs:
I0423 02:49:39.286489 1 kube2sky.go:436] Etcd server found: http://127.0.0.1:4001
I0423 02:49:40.295909 1 kube2sky.go:503] Using http://172.17.0.1:8080 for kubernetes master
I0423 02:49:40.296183 1 kube2sky.go:504] Using kubernetes API v1
I had some iptables rules set which were blocking the connection on docker0 interface where kubernetes services including skydns were running. After flushing the rule it worked. Thus it appears that the first problem was with my local setup rather than skydns.
However the problem reappeared after installing local docker registry and I got this error:
I0427 20:30:45.183419 1 kube2sky.go:627] Ignoring error while waiting for service default/kubernetes: Get https://10.0.0.1:443/api/v1/namespaces/default/services/kubernetes: x509: certificate signed by unknown authority. Sleeping 1s before retrying.
As a workaround I deleted the secrets using kubectl delete secrets/default-token-q4siz then restarted skydns and it started working again.