Kubeadm init issue - kubernetes

Kubeadm init issue.
Config data versions:
os -rhel7.5
env -onprem server
docker - 19
kube - 18
Console output:
[wait-control-plane] Waiting for the kubelet to boot up the control plane as static Pods from directory "/etc/kubernetes/manifests". This can take up to 4m0s
[kubelet-check] Initial timeout of 40s passed.
[kubelet-check] It seems like the kubelet isn't running or healthy.
[kubelet-check] The HTTP call equal to 'curl -sSL http://localhost:10248/healthz' failed with error: Get http://localhost:10248/healthz: dial tcp [::1]:10248: connect: connection refused.
[kubelet-check] It seems like the kubelet isn't running or healthy.
[kubelet-check] The HTTP call equal to 'curl -sSL http://localhost:10248/healthz' failed with error: Get http://localhost:10248/healthz: dial tcp [::1]:10248: connect: connection refused.
[kubelet-check] It seems like the kubelet isn't running or healthy.
[kubelet-check] The HTTP call equal to 'curl -sSL http://localhost:10248/healthz' failed with error: Get http://localhost:10248/healthz: dial tcp [::1]:10248: connect: connection refused.
[kubelet-check] It seems like the kubelet isn't running or healthy.
[kubelet-check] The HTTP call equal to 'curl -sSL http://localhost:10248/healthz' failed with error: Get http://localhost:10248/healthz: dial tcp [::1]:10248: connect: connection refused.
What goes wrong and how to resolve it?

Based on the information you provided there are couple of things that can be done here.
First you can check if docker`s native cgroupdriver and kubelet are consistent.
You can view the config of the kubelet by running:
cat /var/lib/kubelet/kubeadm-flags.env
To check docker config you can simply use:
docker info | grep Cgroup
If you need to change it you can do it like this:
cat << EOF > /etc/docker/daemon.json
{
"exec-opts": ["native.cgroupdriver=systemd"]
}
EOF
To change kubelet cgroup driver you have to:
`vi /etc/systemd/system/kubelet.service.d/10-kubeadm.conf`
and update KUBELET_CGROUP_ARGS=--cgroup-driver=<systemd or cgroupfs>
Second possible solution could be disabling swap.
You can do that with these commands:
sudo swapoff -a
sudo sed -i '/ swap / s/^/#/' /etc/fstab
Reboot a machine after that and then perform kubeadm reset and try to initialize the cluster with kubeadm init.

Related

Getting error in kubernetes in master node from worker node. Error from server: error dialing backend: dial tcp privateip:10250: i/o timeout

I have created a deployment. When i tried to access the pod with command.
"kubectl exec -it podname bash"
It gives error "Error from server: error dialing backend: dial tcp privateip:10250: i/o timeout"
I have allowed all the ports in security group. also disabled the firewalld service from masternode and workernode but still the problem remains the same.
I am expecting to access inside the container from my master node.

Unable to start Kube cluster

I am trying to setup the kube cluster using Oracle VM Virtual Box. The command kubeadm is failing to start the cluster.
It waits on below:
[wait-control-plane] Waiting for the kubelet to boot up the control plane as static Pods from directory "/etc/kubernetes/manifests". This can take up to 4m0s
Then fails because of below:
[kubelet-check] It seems like the kubelet isn't running or healthy.
[kubelet-check] The HTTP call equal to 'curl -sSL http://localhost:10248/healthz' failed with error: Get "http://localhost:10248/healthz": dial tcp 127.0.0.1:10248: connect: connection refused.
OS: Ubuntu 16.04-xenial Docker version: 18.09.7 Kube version:
Kubernetes v1.23.5 Cluster type: Flannel
OS: Ubuntu 16.04-xenial Docker version: 20.10.7 Kube version:
Kubernetes v1.23.5 Cluster type: Calico
What I tried so far, with help of Google:
turn off swap - which was already done
combinations of kube-docker as above
restarting kubelet service
other bits I do not remember.
ensured that the static ips have been allocated, and other
prerequisites.
Can anyone assist? I am new to Kube.

Kubernetes: can't join on different subnet - TLS Bootstrap timeout

I have two Ubuntu 18.04 Server machines on AWS (the network conf its okay, I'm able even to connect through SSH between them but they are on different subnets of the same LAN). Ubuntu firewall also disabled.
M1: 172.31.32.210/255.255.240.0 -> 172.31.32.0/20 
M2: 172.31.20.59/255.255.240.0 -> 172.31.16.0/20
The command I execute on the master:
sudo kubeadm init --pod-network-cidr=192.168.0.0/16 --apiserver-cert-extra-sans=17
2.31.32.210
# After that
mkdir -p $HOME/.kube
sudo cp -i /etc/kubernetes/admin.conf $HOME/.kube/config
sudo chown $(id -u):$(id -g) $HOME/.kube/config
I noticed that as they are on different subnets, I need to create a Calico node to able the communications between them: https://docs.projectcalico.org/getting-started/kubernetes/quickstart
After making all that, I introduce the kubeadm join command that return the init procudure, an the following message appears... No way to make the connection:
ubuntu#ip-175-31-20-59:~$ sudo kubeadm join 172.31.45.77:6443 --token yht6uv.zrynwczvad9ra5e4 --discovery-token-ca-cert-hash sha256:6f4f3e98067151768d1339b52159b5469cb83511ad6ea31dc26e15e8631074f6
[preflight] Running pre-flight checks
[preflight] Reading configuration from the cluster...
[preflight] FYI: You can look at this config file with 'kubectl -n kube-system get cm kubeadm-config -o yaml'
[kubelet-start] Writing kubelet configuration to file "/var/lib/kubelet/config.yaml"
[kubelet-start] Writing kubelet environment file with flags to file "/var/lib/kubelet/kubeadm-flags.env"
[kubelet-start] Starting the kubelet
[kubelet-start] Waiting for the kubelet to perform the TLS Bootstrap...
[kubelet-check] Initial timeout of 40s passed.
[kubelet-check] It seems like the kubelet isn't running or healthy.
[kubelet-check] The HTTP call equal to 'curl -sSL http://localhost:10248/healthz' failed with error: Get "http://localhost:10248/healthz": dial tcp 127.0.0.1:10248: connect: connection refused.
[kubelet-check] It seems like the kubelet isn't running or healthy.
[kubelet-check] The HTTP call equal to 'curl -sSL http://localhost:10248/healthz' failed with error: Get "http://localhost:10248/healthz": dial tcp 127.0.0.1:10248: connect: connection refused.
[kubelet-check] It seems like the kubelet isn't running or healthy.
[kubelet-check] The HTTP call equal to 'curl -sSL http://localhost:10248/healthz' failed with error: Get "http://localhost:10248/healthz": dial tcp 127.0.0.1:10248: connect: connection refused.
[kubelet-check] It seems like the kubelet isn't running or healthy.
[kubelet-check] The HTTP call equal to 'curl -sSL http://localhost:10248/healthz' failed with error: Get "http://localhost:10248/healthz": dial tcp 127.0.0.1:10248: connect: connection refused.
[kubelet-check] It seems like the kubelet isn't running or healthy.
[kubelet-check] The HTTP call equal to 'curl -sSL http://localhost:10248/healthz' failed with error: Get "http://localhost:10248/healthz": dial tcp 127.0.0.1:10248: connect: connection refused.
error execution phase kubelet-start: error uploading crisocket: timed out waiting for the condition
To see the stack trace of this error execute with --v=5 or higher
I have followed many tutorials (this or this for example), but I always find the same problem, the TLS Bootstrap when I make the join command on the worker. Any idea?

k8s:Get "http://localhost:10248/healthz": dial tcp [::1]:10248: connect: connection refused

when intalled k8s node installation happened:
[preflight] Reading configuration from the cluster...
[preflight] FYI: You can look at this config file with 'kubectl -n kube-system get cm kubeadm-config -o yaml'
[kubelet-start] Writing kubelet configuration to file "/var/lib/kubelet/config.yaml"
[kubelet-start] Writing kubelet environment file with flags to file "/var/lib/kubelet/kubeadm-flags.env"
[kubelet-start] Starting the kubelet
[kubelet-start] Waiting for the kubelet to perform the TLS Bootstrap...
[kubelet-check] Initial timeout of 40s passed.
[kubelet-check] It seems like the kubelet isn't running or healthy.
[kubelet-check] The HTTP call equal to 'curl -sSL http://localhost:10248/healthz' failed with error: Get "http://localhost:10248/healthz": dial tcp [::1]:10248: connect: connection refused.
[kubelet-check] It seems like the kubelet isn't running or healthy.
[kubelet-check] The HTTP call equal to 'curl -sSL http://localhost:10248/healthz' failed with error: Get "http://localhost:10248/healthz": dial tcp [::1]:10248: connect: connection refused.
[kubelet-check] It seems like the kubelet isn't running or healthy.
[kubelet-check] The HTTP call equal to 'curl -sSL http://localhost:10248/healthz' failed with error: Get "http://localhost:10248/healthz": dial tcp [::1]:10248: connect: connection refused.
[kubelet-check] It seems like the kubelet isn't running or healthy.
[kubelet-check] The HTTP call equal to 'curl -sSL http://localhost:10248/healthz' failed with error: Get "http://localhost:10248/healthz": dial tcp [::1]:10248: connect: connection refused.
[kubelet-check] It seems like the kubelet isn't running or healthy.
[kubelet-check] The HTTP call equal to 'curl -sSL http://localhost:10248/healthz' failed with error: Get "http://localhost:10248/healthz": dial tcp [::1]:10248: connect: connection refused.
error execution phase kubelet-start: error uploading crisocket: timed out waiting for the condition
Thank you!
The issue was caused by Docker version mismatch on one of the machines. Problem was resolved after reinstalling Docker to correct version.
In my case i fixed it disabling swap with
swapoff -a

Power cycle of one of the worker nodes becomes useless since after restart the pods struck in “ContainerCreating” state

Scenario:
One of the worker nodes goes down due to power cycle while the master is scheduling the pods between worker nodes.
Once the worker node comes up after power cycle, the master is able to schedule the remaining pods to worker node which came up.
However all the pods which are scheduled to the worker node are stuck in the "ContainerCreating" state for a long time which makes the worker node useless after the power cycle.
Cluster Details:
Docker Version: 18.06.1-ce
Kubernetes version: v1.14.0
helm version - v2.12.1
Host OS: Centos 7
Cloud being used: (put bare-metal if not on a public cloud)
Installation method: Ansible Script
Kubelet log:
Line 322: Jul 26 15:44:57 k8sworker3 kubelet[1832]: E0726 15:44:57.842527 1832 cni.go:331] Error adding logging_filebeat-kvdjg/acb6582a56c6d77fdb3364a1e0ab1dd47d31f63c64eb197903dc82007be4c7df to network weave-net/weave: unable to allocate IP address: Post http://127.0.0.1:6784/ip/acb6582a56c6d77fdb3364a1e0ab1dd47d31f63c64eb197903dc82007be4c7df: dial tcp 127.0.0.1:6784: connect: connection refused
Line 326: Jul 26 15:44:57 k8sworker3 kubelet[1832]: weave-cni: unable to release IP address: Delete http://127.0.0.1:6784/ip/acb6582a56c6d77fdb3364a1e0ab1dd47d31f63c64eb197903dc82007be4c7df: dial tcp 127.0.0.1:6784: connect: connection refused
Line 342: Jul 26 15:44:58 k8sworker3 kubelet[1832]: E0726 15:44:58.073865 1832 cni.go:331] Error adding vz1-db-backup_vz1-warrior-job-5d242b94c6ba2500011bfedc-1564172937569-pwpq2/a991d0c781d5c3ec6c2dca9753fc8a1a2958b762a75b3d619f3da3744c41d160 to network weave-net/weave: unable to allocate IP address: Post http://127.0.0.1:6784/ip/a991d0c781d5c3ec6c2dca9753fc8a1a2958b762a75b3d619f3da3744c41d160: dial tcp 127.0.0.1:6784: connect: connection refused
Line 349: Jul 26 15:44:58 k8sworker3 kubelet[1832]: E0726 15:44:58.093351 1832 remote_runtime.go:109] RunPodSandbox from runtime service failed: rpc error: code = Unknown desc = failed to set up sandbox container "acb6582a56c6d77fdb3364a1e0ab1dd47d31f63c64eb197903dc82007be4c7df" network for pod "filebeat-kvdjg": NetworkPlugin cni failed to set up pod "filebeat-kvdjg_logging" network: unable to allocate IP address: Post http://127.0.0.1:6784/ip/acb6582a56c6d77fdb3364a1e0ab1dd47d31f63c64eb197903dc82007be4c7df: dial tcp 127.0.0.1:6784: connect: connection refused
Please suggest me on how to prevent this issue.