Kubernetes: can't join on different subnet - TLS Bootstrap timeout - kubernetes

I have two Ubuntu 18.04 Server machines on AWS (the network conf its okay, I'm able even to connect through SSH between them but they are on different subnets of the same LAN). Ubuntu firewall also disabled.
M1: 172.31.32.210/255.255.240.0 -> 172.31.32.0/20 
M2: 172.31.20.59/255.255.240.0 -> 172.31.16.0/20
The command I execute on the master:
sudo kubeadm init --pod-network-cidr=192.168.0.0/16 --apiserver-cert-extra-sans=17
2.31.32.210
# After that
mkdir -p $HOME/.kube
sudo cp -i /etc/kubernetes/admin.conf $HOME/.kube/config
sudo chown $(id -u):$(id -g) $HOME/.kube/config
I noticed that as they are on different subnets, I need to create a Calico node to able the communications between them: https://docs.projectcalico.org/getting-started/kubernetes/quickstart
After making all that, I introduce the kubeadm join command that return the init procudure, an the following message appears... No way to make the connection:
ubuntu#ip-175-31-20-59:~$ sudo kubeadm join 172.31.45.77:6443 --token yht6uv.zrynwczvad9ra5e4 --discovery-token-ca-cert-hash sha256:6f4f3e98067151768d1339b52159b5469cb83511ad6ea31dc26e15e8631074f6
[preflight] Running pre-flight checks
[preflight] Reading configuration from the cluster...
[preflight] FYI: You can look at this config file with 'kubectl -n kube-system get cm kubeadm-config -o yaml'
[kubelet-start] Writing kubelet configuration to file "/var/lib/kubelet/config.yaml"
[kubelet-start] Writing kubelet environment file with flags to file "/var/lib/kubelet/kubeadm-flags.env"
[kubelet-start] Starting the kubelet
[kubelet-start] Waiting for the kubelet to perform the TLS Bootstrap...
[kubelet-check] Initial timeout of 40s passed.
[kubelet-check] It seems like the kubelet isn't running or healthy.
[kubelet-check] The HTTP call equal to 'curl -sSL http://localhost:10248/healthz' failed with error: Get "http://localhost:10248/healthz": dial tcp 127.0.0.1:10248: connect: connection refused.
[kubelet-check] It seems like the kubelet isn't running or healthy.
[kubelet-check] The HTTP call equal to 'curl -sSL http://localhost:10248/healthz' failed with error: Get "http://localhost:10248/healthz": dial tcp 127.0.0.1:10248: connect: connection refused.
[kubelet-check] It seems like the kubelet isn't running or healthy.
[kubelet-check] The HTTP call equal to 'curl -sSL http://localhost:10248/healthz' failed with error: Get "http://localhost:10248/healthz": dial tcp 127.0.0.1:10248: connect: connection refused.
[kubelet-check] It seems like the kubelet isn't running or healthy.
[kubelet-check] The HTTP call equal to 'curl -sSL http://localhost:10248/healthz' failed with error: Get "http://localhost:10248/healthz": dial tcp 127.0.0.1:10248: connect: connection refused.
[kubelet-check] It seems like the kubelet isn't running or healthy.
[kubelet-check] The HTTP call equal to 'curl -sSL http://localhost:10248/healthz' failed with error: Get "http://localhost:10248/healthz": dial tcp 127.0.0.1:10248: connect: connection refused.
error execution phase kubelet-start: error uploading crisocket: timed out waiting for the condition
To see the stack trace of this error execute with --v=5 or higher
I have followed many tutorials (this or this for example), but I always find the same problem, the TLS Bootstrap when I make the join command on the worker. Any idea?

Related

Getting error in kubernetes in master node from worker node. Error from server: error dialing backend: dial tcp privateip:10250: i/o timeout

I have created a deployment. When i tried to access the pod with command.
"kubectl exec -it podname bash"
It gives error "Error from server: error dialing backend: dial tcp privateip:10250: i/o timeout"
I have allowed all the ports in security group. also disabled the firewalld service from masternode and workernode but still the problem remains the same.
I am expecting to access inside the container from my master node.

Unable to start Kube cluster

I am trying to setup the kube cluster using Oracle VM Virtual Box. The command kubeadm is failing to start the cluster.
It waits on below:
[wait-control-plane] Waiting for the kubelet to boot up the control plane as static Pods from directory "/etc/kubernetes/manifests". This can take up to 4m0s
Then fails because of below:
[kubelet-check] It seems like the kubelet isn't running or healthy.
[kubelet-check] The HTTP call equal to 'curl -sSL http://localhost:10248/healthz' failed with error: Get "http://localhost:10248/healthz": dial tcp 127.0.0.1:10248: connect: connection refused.
OS: Ubuntu 16.04-xenial Docker version: 18.09.7 Kube version:
Kubernetes v1.23.5 Cluster type: Flannel
OS: Ubuntu 16.04-xenial Docker version: 20.10.7 Kube version:
Kubernetes v1.23.5 Cluster type: Calico
What I tried so far, with help of Google:
turn off swap - which was already done
combinations of kube-docker as above
restarting kubelet service
other bits I do not remember.
ensured that the static ips have been allocated, and other
prerequisites.
Can anyone assist? I am new to Kube.

k8s:Get "http://localhost:10248/healthz": dial tcp [::1]:10248: connect: connection refused

when intalled k8s node installation happened:
[preflight] Reading configuration from the cluster...
[preflight] FYI: You can look at this config file with 'kubectl -n kube-system get cm kubeadm-config -o yaml'
[kubelet-start] Writing kubelet configuration to file "/var/lib/kubelet/config.yaml"
[kubelet-start] Writing kubelet environment file with flags to file "/var/lib/kubelet/kubeadm-flags.env"
[kubelet-start] Starting the kubelet
[kubelet-start] Waiting for the kubelet to perform the TLS Bootstrap...
[kubelet-check] Initial timeout of 40s passed.
[kubelet-check] It seems like the kubelet isn't running or healthy.
[kubelet-check] The HTTP call equal to 'curl -sSL http://localhost:10248/healthz' failed with error: Get "http://localhost:10248/healthz": dial tcp [::1]:10248: connect: connection refused.
[kubelet-check] It seems like the kubelet isn't running or healthy.
[kubelet-check] The HTTP call equal to 'curl -sSL http://localhost:10248/healthz' failed with error: Get "http://localhost:10248/healthz": dial tcp [::1]:10248: connect: connection refused.
[kubelet-check] It seems like the kubelet isn't running or healthy.
[kubelet-check] The HTTP call equal to 'curl -sSL http://localhost:10248/healthz' failed with error: Get "http://localhost:10248/healthz": dial tcp [::1]:10248: connect: connection refused.
[kubelet-check] It seems like the kubelet isn't running or healthy.
[kubelet-check] The HTTP call equal to 'curl -sSL http://localhost:10248/healthz' failed with error: Get "http://localhost:10248/healthz": dial tcp [::1]:10248: connect: connection refused.
[kubelet-check] It seems like the kubelet isn't running or healthy.
[kubelet-check] The HTTP call equal to 'curl -sSL http://localhost:10248/healthz' failed with error: Get "http://localhost:10248/healthz": dial tcp [::1]:10248: connect: connection refused.
error execution phase kubelet-start: error uploading crisocket: timed out waiting for the condition
Thank you!
The issue was caused by Docker version mismatch on one of the machines. Problem was resolved after reinstalling Docker to correct version.
In my case i fixed it disabling swap with
swapoff -a

Kubeadm init issue

Kubeadm init issue.
Config data versions:
os -rhel7.5
env -onprem server
docker - 19
kube - 18
Console output:
[wait-control-plane] Waiting for the kubelet to boot up the control plane as static Pods from directory "/etc/kubernetes/manifests". This can take up to 4m0s
[kubelet-check] Initial timeout of 40s passed.
[kubelet-check] It seems like the kubelet isn't running or healthy.
[kubelet-check] The HTTP call equal to 'curl -sSL http://localhost:10248/healthz' failed with error: Get http://localhost:10248/healthz: dial tcp [::1]:10248: connect: connection refused.
[kubelet-check] It seems like the kubelet isn't running or healthy.
[kubelet-check] The HTTP call equal to 'curl -sSL http://localhost:10248/healthz' failed with error: Get http://localhost:10248/healthz: dial tcp [::1]:10248: connect: connection refused.
[kubelet-check] It seems like the kubelet isn't running or healthy.
[kubelet-check] The HTTP call equal to 'curl -sSL http://localhost:10248/healthz' failed with error: Get http://localhost:10248/healthz: dial tcp [::1]:10248: connect: connection refused.
[kubelet-check] It seems like the kubelet isn't running or healthy.
[kubelet-check] The HTTP call equal to 'curl -sSL http://localhost:10248/healthz' failed with error: Get http://localhost:10248/healthz: dial tcp [::1]:10248: connect: connection refused.
What goes wrong and how to resolve it?
Based on the information you provided there are couple of things that can be done here.
First you can check if docker`s native cgroupdriver and kubelet are consistent.
You can view the config of the kubelet by running:
cat /var/lib/kubelet/kubeadm-flags.env
To check docker config you can simply use:
docker info | grep Cgroup
If you need to change it you can do it like this:
cat << EOF > /etc/docker/daemon.json
{
"exec-opts": ["native.cgroupdriver=systemd"]
}
EOF
To change kubelet cgroup driver you have to:
`vi /etc/systemd/system/kubelet.service.d/10-kubeadm.conf`
and update KUBELET_CGROUP_ARGS=--cgroup-driver=<systemd or cgroupfs>
Second possible solution could be disabling swap.
You can do that with these commands:
sudo swapoff -a
sudo sed -i '/ swap / s/^/#/' /etc/fstab
Reboot a machine after that and then perform kubeadm reset and try to initialize the cluster with kubeadm init.

Kubernetes cluster recreated from snapshots issue

OVERVIEW:: I am studying for the Kubernetes Administrator certification. To complete the training course, I created a dual node Kubernetes cluster on Google Cloud, 1 master and 1 slave. As I don't want to leave the instances alive all the time, I took snapshots of them to deploy new instances with the Kubernetes cluster already setup. I am aware that I would need to update the ens4 ip used by kubectl, as this will have changed, which I did.
ISSUE:: When I run "kubectl get pods --all-namespaces" I get the error "The connection to the server localhost:8080 was refused - did you specify the right host or port?"
QUESTION:: Would anyone have had similar issues and know if its possible to recreate a Kubernetes cluster from snapshots?
Adding -v=10 to command, the url matches info in .kube/config file
kubectl get pods --all-namespaces -v=10
I0214 17:11:35.317678 6246 loader.go:375] Config loaded from file: /home/student/.kube/config
I0214 17:11:35.321941 6246 round_trippers.go:423] curl -k -v -XGET -H "User-Agent: kubectl/v1.16.1 (linux/amd64) kubernetes/d647ddb" -H "Accept: application/json, /" 'https://k8smaster:6443/api?timeout=32s'
I0214 17:11:35.333308 6246 round_trippers.go:443] GET https://k8smaster:6443/api?timeout=32s in 11 milliseconds
I0214 17:11:35.333335 6246 round_trippers.go:449] Response Headers:
I0214 17:11:35.333422 6246 cached_discovery.go:121] skipped caching discovery info due to Get https://k8smaster:6443/api?timeout=32s: dial tcp 10.128.0.7:6443: connect: connection refused
I0214 17:11:35.333858 6246 round_trippers.go:423] curl -k -v -XGET -H "Accept: application/json, /" -H "User-Agent: kubectl/v1.16.1 (linux/amd64) kubernetes/d647ddb" 'https://k8smaster:6443/api?timeout=32s'
I0214 17:11:35.334234 6246 round_trippers.go:443] GET https://k8smaster:6443/api?timeout=32s in 0 milliseconds
I0214 17:11:35.334254 6246 round_trippers.go:449] Response Headers:
I0214 17:11:35.334281 6246 cached_discovery.go:121] skipped caching discovery info due to Get https://k8smaster:6443/api?timeout=32s: dial tcp 10.128.0.7:6443: connect: connection refused
I0214 17:11:35.334303 6246 shortcut.go:89] Error loading discovery information: Get https://k8smaster:6443/api?timeout=32s: dial tcp 10.128.0.7:6443: connect: connection refused
I replicated you issue and wrote this step by step debugging process for you so you can see what was my thinking.
I created 2 node cluster (master + worker) with kubeadm and made a snapshot.
Then I deleted all nodes and recreated them from snapshots.
After recreating master node from snapshot I started seeing the same error you are seeing:
#kmaster ~]$ kubectl get po -v=10
I0217 11:04:38.397823 3372 loader.go:375] Config loaded from file: /home/user/.kube/config
I0217 11:04:38.398909 3372 round_trippers.go:423] curl -k -v -XGET -H "Accept: application/json, */*" -H "User-Agent: kubectl/v1.17.3 (linux/amd64) kubernetes/06ad960" 'https://10.156.0.20:6443/api?timeout=32s'
^C
The connection was hanging so I interrupted it (ctrl+c).
First I noticed was that IP address of where kubectl was connecting was different than node ip, so I modified .kube/config file providing proper IP.
After doing this, here is what running kubectl showed:
$ kubectl get po -v=10
I0217 11:26:57.020744 15929 loader.go:375] Config loaded from file: /home/user/.kube/config
...
I0217 11:26:57.025155 15929 helpers.go:221] Connection error: Get https://10.156.0.23:6443/api?timeout=32s: dial tcp 10.156.0.23:6443: connect: connection refused
F0217 11:26:57.025201 15929 helpers.go:114] The connection to the server 10.156.0.23:6443 was refused - did you specify the right host or port?
As you see, connection to apiserver was beeing refused so I checked if apiserver was running:
$ sudo docker ps -a | grep apiserver
5e957ff48d11 90d27391b780 "kube-apiserver --ad…" 24 seconds ago Exited (2) 3 seconds ago k8s_kube-apiserver_kube-apiserver-kmaster_kube-system_997514ff25ec38012de6a5be7c43b0ae_14
d78e179f1565 k8s.gcr.io/pause:3.1 "/pause" 26 minutes ago Up 26 minutes k8s_POD_kube-apiserver-kmaster_kube-system_997514ff25ec38012de6a5be7c43b0ae_1
api-server was exiting for some reason.
I checked its logs (I am only including relevant logs for readability):
$ sudo docker logs 5e957ff48d11
...
W0217 11:30:46.710541 1 clientconn.go:1120] grpc: addrConn.createTransport failed to connect to {https://127.0.0.1:2379 0 <nil>}. Err :connection error: desc = "transport: Error while dialing dial tcp 127.0.0.1:2379: connect: connection refused". Reconnecting...
panic: context deadline exceeded
Notice apiserver was trying to connect to etcd (notice port: 2379) and receiving connection refused.
My first guess was etcd wasn't running, so I checked etcd container:
$ sudo docker ps -a | grep etcd
4a249cb0743b 303ce5db0e90 "etcd --advertise-cl…" 2 minutes ago Exited (1) 2 minutes ago k8s_etcd_etcd-kmaster_kube-system_9018aafee02ebb028a7befd10063ec1e_19
b89b7e7227de k8s.gcr.io/pause:3.1 "/pause" 30 minutes ago Up 30 minutes k8s_POD_etcd-kmaster_kube-system_9018aafee02ebb028a7befd10063ec1e_1
I was right: Exited (1) 2 minutes ago. I checked its logs:
$ sudo docker logs 4a249cb0743b
...
2020-02-17 11:34:31.493215 C | etcdmain: listen tcp 10.156.0.20:2380: bind: cannot assign requested address
etcd was trying to bind with old IP address.
I modified /etc/kubernetes/manifests/etcd.yaml and changed old IP address to new IP everywhere in file.
Quick sudo docker ps | grep etcd showed its running.
After a while apierver also started running.
Then I tried running kubectl:
$ kubectl get po
Unable to connect to the server: x509: certificate is valid for 10.96.0.1, 10.156.0.20, not 10.156.0.23
Invalid apiserver certificate. SSL certificate was genereated for old IP so that would mean I need to generate new certificate with new IP.
$ sudo kubeadm init phase certs apiserver
...
[certs] Using existing apiserver certificate and key on disk
That's not what I expected. I wanted to generate new certificates, not use old ones.
I deleted old certificates:
$ sudo rm /etc/kubernetes/pki/apiserver.crt \
/etc/kubernetes/pki/apiserver.key
And tried to generate certificates one more time:
$ sudo kubeadm init phase certs apiserver
[certs] Generating "apiserver" certificate and key
[certs] apiserver serving cert is signed for DNS names [kmaster kubernetes kubernetes.default kubernetes.default.svc kubernetes.default.svc.cluster.local] and IPs [10.96.0.1 10.156.0.23]
Looks good. Now let's try using kubectl:
$ kubectl get no
NAME STATUS ROLES AGE VERSION
instance-21 Ready master 102m v1.17.3
instance-22 Ready <none> 95m v1.17.3
As you can see now its working.