Internet connectivity losses to Kubernetes cluster - kubernetes

I have a master and two worker node architecture. I had single node and later added two nodes using kubeadm, I can see worker nodes when I use kubectl get nodes.
But cluster loses internet connectivity soon after workers joins.
I found this error when I try to deploy nginx web server to worker.
I have installed Cal as described here.
But it does not show CoreDNS pod with kubectl get pods --all-namespaces.
When I use route -n on a worker node :
$route -n
Kernel IP routing table
Destination Gateway Genmask Flags Metric Ref Use Iface
0.0.0.0 192.168.100.1 0.0.0.0 UG 100 0 0 ens32
172.17.0.0 0.0.0.0 255.255.0.0 U 0 0 0 docker0
192.168.100.0 0.0.0.0 255.255.255.0 U 0 0 0 ens32
192.168.100.1 192.168.100.222 255.255.255.255 UGH 0 0 0 tunl0
192.168.100.1 0.0.0.0 255.255.255.255 UH 100 0 0 ens32
Here 192.168.100.1 is my network gateway server and 192.168.100.222 is my master node.
All 3 nodes running Ubuntu 18.04 LTS. As I described earlier, I had single node cluster and then I added two worker nodes using kubeadm join
I am really new to Kubernetes.What am I missing here?

Related

How to add IP route(s) So Kubernetes cluster addresses go via through appropriate adapter

I have installed Kubernetes cluster(one Master and one Worker- Node) on CentOS-8 OS stand-alone server separately as per the below link instructions.
https://www.tecmint.com/install-a-kubernetes-cluster-on-centos-8/
Weave-Net - CNI plugin installed as per above link. Now I can see below new network adapter in our K8s Master & Worker-Node server.
weave: flags=4163<UP,BROADCAST,RUNNING,MULTICAST> mtu 1376
inet 10.32.0.1 netmask 255.240.0.0 broadcast 10.47.255.255
inet6 fe80::a07d:21ff:fef1:4656 prefixlen 64 scopeid 0x20<link>
ether a2:7d:21:f1:46:56 txqueuelen 1000 (Ethernet)
RX packets 141 bytes 13322 (13.0 KiB)
RX errors 0 dropped 0 overruns 0 frame 0
TX packets 48 bytes 4896 (4.7 KiB)
TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0
But the problem is from host server unable to ping (Or) access any of our remote site/location IPs (ping response given below). whereas Local IPs are pinging & accessible.
ping -c 4 120.121.5.48
PING 120.121.5.48 (120.121.5.48) 56(84) bytes of data.
From 10.32.0.1 icmp_seq=1 Destination Host Unreachable
From 10.32.0.1 icmp_seq=2 Destination Host Unreachable
From 10.32.0.1 icmp_seq=3 Destination Host Unreachable
From 10.32.0.1 icmp_seq=4 Destination Host Unreachable
--- 120.121.5.48 ping statistics ---
4 packets transmitted, 0 received, +4 errors, 100% packet loss, time 2999ms
pipe 4
Also from host server tried to connect our remote LDAP server through telnet it shows below error message.
# telnet 120.121.5.48 389
Trying 120.121.5.48...
telnet: connect to address 120.121.5.48: No route to host
In our K8s Master & Worker-Node server have 23 network adapters, Statically network IP have configured, So any additional configuration need to be configured for K8s CNI reachable in default routing?
ip route show & route -n output as follows.
# ip route show
default via 45.46.47.1 dev ens1f0 proto static metric 100
10.32.0.0/12 dev weave proto kernel scope link src 10.32.0.1
45.46.47.0/24 dev ens1f0 proto kernel scope link src 45.46.47.48 metric 100
172.17.0.0/16 dev docker0 proto kernel scope link src 172.17.0.1
192.168.122.0/24 dev virbr0 proto kernel scope link src 192.168.122.1 linkdown
# route -n
Kernel IP routing table
Destination Gateway Genmask Flags Metric Ref Use Iface
0.0.0.0 45.46.47.1 0.0.0.0 UG 100 0 0 ens1f0
10.32.0.0 0.0.0.0 255.255.255.0 U 10 0 0 ens1f0
10.32.0.0 0.0.0.0 255.240.0.0 U 0 0 0 weave
45.46.47.0 0.0.0.0 255.255.255.0 U 100 0 0 ens1f0
172.17.0.0 0.0.0.0 255.255.0.0 U 0 0 0 docker0
192.168.122.0 0.0.0.0 255.255.255.0 U 0 0 0 virbr0
Tried to change the weave route to default with below command. it executed successfully, But still same problem.
ip route add 10.32.0.0/24 via 45.46.47.1 dev ens1f0 metric 100
Suppose if i run ifconfig weave down everything is working fine. But to use Kubernetes cluster i need Weave-net network adapter. So please help me to add IP route(s) So that my Kubernetes cluster addresses go via through appropriate adapter, So that i will be able to access both our local & remote location server.
I have changed the CNI-Weave-Net plugin to Flannel, now it is working as excepted.

No route to host from some Kubernetes containers to other containers in same cluster

This is a Kubespray deployment using calico. All the defaults are were left as-is except for the fact that there is a proxy. Kubespray ran to the end without issues.
Access to Kubernetes services started failing and after investigation, there was no route to host to the coredns service. Accessing a K8S service by IP worked. Everything else seems to be correct, so I am left with a cluster that works, but without DNS.
Here is some background information:
Starting up a busybox container:
# nslookup kubernetes.default
Server: 169.254.25.10
Address: 169.254.25.10:53
** server can't find kubernetes.default: NXDOMAIN
*** Can't find kubernetes.default: No answer
Now the output while explicitly defining the IP of one of the CoreDNS pods:
# nslookup kubernetes.default 10.233.0.3
;; connection timed out; no servers could be reached
Notice that telnet to the Kubernetes API works:
# telnet 10.233.0.1 443
Connected to 10.233.0.1
kube-proxy logs:
10.233.0.3 is the service IP for coredns. The last line looks concerning, even though it is INFO.
$ kubectl logs kube-proxy-45v8n -nkube-system
I1114 14:19:29.657685 1 node.go:135] Successfully retrieved node IP: X.59.172.20
I1114 14:19:29.657769 1 server_others.go:176] Using ipvs Proxier.
I1114 14:19:29.664959 1 server.go:529] Version: v1.16.0
I1114 14:19:29.665427 1 conntrack.go:52] Setting nf_conntrack_max to 262144
I1114 14:19:29.669508 1 config.go:313] Starting service config controller
I1114 14:19:29.669566 1 shared_informer.go:197] Waiting for caches to sync for service config
I1114 14:19:29.669602 1 config.go:131] Starting endpoints config controller
I1114 14:19:29.669612 1 shared_informer.go:197] Waiting for caches to sync for endpoints config
I1114 14:19:29.769705 1 shared_informer.go:204] Caches are synced for service config
I1114 14:19:29.769756 1 shared_informer.go:204] Caches are synced for endpoints config
I1114 14:21:29.666256 1 graceful_termination.go:93] lw: remote out of the list: 10.233.0.3:53/TCP/10.233.124.23:53
I1114 14:21:29.666380 1 graceful_termination.go:93] lw: remote out of the list: 10.233.0.3:53/TCP/10.233.122.11:53
All pods are running without crashing/restarts etc. and otherwise services behave correctly.
IPVS looks correct. CoreDNS service is defined there:
# ipvsadm -ln
IP Virtual Server version 1.2.1 (size=4096)
Prot LocalAddress:Port Scheduler Flags
-> RemoteAddress:Port Forward Weight ActiveConn InActConn
TCP 10.233.0.1:443 rr
-> x.59.172.19:6443 Masq 1 0 0
-> x.59.172.20:6443 Masq 1 1 0
TCP 10.233.0.3:53 rr
-> 10.233.122.12:53 Masq 1 0 0
-> 10.233.124.24:53 Masq 1 0 0
TCP 10.233.0.3:9153 rr
-> 10.233.122.12:9153 Masq 1 0 0
-> 10.233.124.24:9153 Masq 1 0 0
TCP 10.233.51.168:3306 rr
-> x.59.172.23:6446 Masq 1 0 0
TCP 10.233.53.155:44134 rr
-> 10.233.89.20:44134 Masq 1 0 0
UDP 10.233.0.3:53 rr
-> 10.233.122.12:53 Masq 1 0 314
-> 10.233.124.24:53 Masq 1 0 312
Host routing also looks correct.
# ip r
default via x.59.172.17 dev ens3 proto dhcp src x.59.172.22 metric 100
10.233.87.0/24 via x.59.172.21 dev tunl0 proto bird onlink
blackhole 10.233.89.0/24 proto bird
10.233.89.20 dev calib88cf6925c2 scope link
10.233.89.21 dev califdffa38ed52 scope link
10.233.122.0/24 via x.59.172.19 dev tunl0 proto bird onlink
10.233.124.0/24 via x.59.172.20 dev tunl0 proto bird onlink
x.59.172.16/28 dev ens3 proto kernel scope link src x.59.172.22
x.59.172.17 dev ens3 proto dhcp scope link src x.59.172.22 metric 100
172.17.0.0/16 dev docker0 proto kernel scope link src 172.17.0.1 linkdown
I have redeployed this same cluster in separate environments with flannel and calico with iptables instead of ipvs. I have also disabled the docker http proxy after deploy temporarily. None of which makes any difference.
Also:
kube_service_addresses: 10.233.0.0/18
kube_pods_subnet: 10.233.64.0/18
(They do not overlap)
What is the next step in debugging this issue?
I highly recommend you to avoid using latest busybox image to troubleshoot DNS. There are few issues reported regarding dnslookup on versions newer than 1.28.
v 1.28.4
user#node1:~$ kubectl exec -ti busybox busybox | head -1
BusyBox v1.28.4 (2018-05-22 17:00:17 UTC) multi-call binary.
user#node1:~$ kubectl exec -ti busybox -- nslookup kubernetes.default
Server: 169.254.25.10
Address 1: 169.254.25.10
Name: kubernetes.default
Address 1: 10.233.0.1 kubernetes.default.svc.cluster.local
v 1.31.1
user#node1:~$ kubectl exec -ti busyboxlatest busybox | head -1
BusyBox v1.31.1 (2019-10-28 18:40:01 UTC) multi-call binary.
user#node1:~$ kubectl exec -ti busyboxlatest -- nslookup kubernetes.default
Server: 169.254.25.10
Address: 169.254.25.10:53
** server can't find kubernetes.default: NXDOMAIN
*** Can't find kubernetes.default: No answer
command terminated with exit code 1
Going deeper and exploring more possibilities, I've reproduced your problem on GCP and after some digging I was able to figure out what is causing this communication problem.
GCE (Google Compute Engine) blocks traffic between hosts by default; we have to allow Calico traffic to flow between containers on different hosts.
According to calico documentation, you can do it by creating a firewall allowing this communication rule:
gcloud compute firewall-rules create calico-ipip --allow 4 --network "default" --source-ranges "10.128.0.0/9"
You can verify the rule with this command:
gcloud compute firewall-rules list
This is not present on the most recent calico documentation but it's still true and necessary.
Before creating firewall rule:
user#node1:~$ kubectl exec -ti busybox2 -- nslookup kubernetes.default
Server: 10.233.0.3
Address 1: 10.233.0.3 coredns.kube-system.svc.cluster.local
nslookup: can't resolve 'kubernetes.default'
command terminated with exit code 1
After creating firewall rule:
user#node1:~$ kubectl exec -ti busybox2 -- nslookup kubernetes.default
Server: 10.233.0.3
Address 1: 10.233.0.3 coredns.kube-system.svc.cluster.local
Name: kubernetes.default
Address 1: 10.233.0.1 kubernetes.default.svc.cluster.local
It doesn't matter if you bootstrap your cluster using kubespray or kubeadm, this problem will happen because calico needs to communicate between nodes and GCE is blocking it as default.
This is what works for me, I tried to install my k8s cluster using kubespray configured with calico as CNI and containerd as container runtime
iptables -P INPUT ACCEPT
iptables -P FORWARD ACCEPT
iptables -P OUTPUT ACCEPT
iptables -F
[delete coredns pod]

Ping other pod in the same & different pod

I would like to ping B pod (node 1) from A pod (node 0) but it's unreachable. However, pinging pod in the same node can not be reachable too.
I am setting up new cluster for trying Kubernetes from Kelsey.
I have tried to use this link as my reference Kubernetes: Can't ping pods across nodes
Node - IP Private - IP Pod
worker-0 - 10.240.0.20 - 10.200.0.0/24
worker-1 - 10.240.0.21 - 10.200.1.0/24
worker-2 - 10.240.0.22 - 10.200.2.0/24
route -n
worker-0
Destination Gateway Genmask Flags Metric Ref Use Iface
0.0.0.0 10.240.0.1 0.0.0.0 UG 100 0 0 ens4
10.200.0.0 0.0.0.0 255.255.255.0 U 0 0 0 cnio0
10.240.0.1 0.0.0.0 255.255.255.255 UH 100 0 0 ens4
172.17.0.0 0.0.0.0 255.255.0.0 U 0 0 0 docker0
worker-1
Destination Gateway Genmask Flags Metric Ref Use Iface
0.0.0.0 10.240.0.1 0.0.0.0 UG 100 0 0 ens4
10.200.1.0 0.0.0.0 255.255.255.0 U 0 0 0 cnio0
10.240.0.1 0.0.0.0 255.255.255.255 UH 100 0 0 ens4
172.17.0.0 0.0.0.0 255.255.0.0 U 0 0 0 docker0
worker-2
Destination Gateway Genmask Flags Metric Ref Use Iface
0.0.0.0 10.240.0.1 0.0.0.0 UG 100 0 0 ens4
10.200.2.0 0.0.0.0 255.255.255.0 U 0 0 0 cnio0
10.240.0.1 0.0.0.0 255.255.255.255 UH 100 0 0 ens4
172.17.0.0 0.0.0.0 255.255.0.0 U 0 0 0 docker0
I have done setting up VPC Netowrk Routes like this link.
After that i followed this reference Kubernetes: Can't ping pods across nodes
route add -net 10.200.1.0 netmask 255.255.255.0 gw 10.240.0.21 in worker-0
The result is
SIOCADDRT: Network is unreachable
I tried it in worker-0, worker-1, worker-2 and got same result.
Eventhough worker-0 can ping to worker-1 (10.240.0.21), reachable.
My expectation when i am in Pod A (worker-0) with IP Pod 10.200.0.3, i can ping to Pod B (worker-1) with IP Pod 10.200.1.3. And also, i can ping to Pod C (worker-0) same with Pod A.
Does this step should be using Calico or Flannel ? or Should we can ping other pod from different node without Calico or Flannel (only CNI setting) ?
Additional Information
I am using Docker not runc & containderd.
So, i installed Docker manually from this link.
In kubelet.service, --container-runtime=remote become --container-runtime=docker
Try adding the routes like this:
Worker-0:
$ sudo route add -net 10.200.1.0 netmask 255.255.255.0 gw 10.240.0.21
$ sudo route add -net 10.200.2.0 netmask 255.255.255.0 gw 10.240.0.22
Worker-1:
$ sudo route add -net 10.200.0.0 netmask 255.255.255.0 gw 10.240.0.20
$ sudo route add -net 10.200.2.0 netmask 255.255.255.0 gw 10.240.0.22
Worker-2:
$ sudo route add -net 10.200.0.0 netmask 255.255.255.0 gw 10.240.0.20
$ sudo route add -net 10.200.1.0 netmask 255.255.255.0 gw 10.240.0.21

kubernetes : unable to join a node

I have created a master and am trying to join a node to create a cluster. When I try the join command I get the below error. Both the nodes are on the same network. The error message indicates that no routing exist to the host. I'm not sure how to establish a route to the host. Any help is appreciated.
sudo kubeadm join --token d23afe.14fde99cd03def7e 192.168.178.24:6443 --discovery-token-ca-cert-hash sha256:6a5e2674825e683bbdfe9bab512b03c556bcf89d8648317a64372bb44746bb39
[preflight] Running pre-flight checks.
[WARNING SystemVerification]: docker version is greater than the most recently validated version. Docker version: 18.02.0-ce. Max validated version: 17.03
[WARNING FileExisting-crictl]: crictl not found in system path
[discovery] Trying to connect to API Server "192.168.178.24:6443"
[discovery] Created cluster-info discovery client, requesting info from "https://192.168.178.24:6443"
[discovery] Failed to request cluster info, will try again: [Get https://192.168.178.24:6443/api/v1/namespaces/kube-public/configmaps/cluster-info: dial tcp 192.168.178.24:6443: getsockopt: no route to host]
Here's the output of sudo route. Unfortunately, I have little knowledge to troubleshoot from this output
Here's the output of
`sudo route
Kernel IP routing table
Destination Gateway Genmask Flags Metric Ref Use Iface
default 192.168.178.1 0.0.0.0 UG 202 0 0 eth0
10.32.0.0 0.0.0.0 255.240.0.0 U 0 0 0 weave
link-local 0.0.0.0 255.255.0.0 U 205 0 0 datapath
link-local 0.0.0.0 255.255.0.0 U 210 0 0 vethwe-datapath
link-local 0.0.0.0 255.255.0.0 U 211 0 0 vethwe-bridge
link-local 0.0.0.0 255.255.0.0 U 212 0 0 vxlan-6784
172.17.0.0 0.0.0.0 255.255.0.0 U 0 0 0 docker0
192.168.178.0 0.0.0.0 255.255.255.0 U 202 0 0 eth0
`
I managed to identify the issue. The issue was with the weave net plugin. I did a tear down and reinstalled the plugin. I was then able to join the node. Thanks all for your suggestions.

Kubernetes not able to connect to pod using service IP

I am following below document to create the service. this pod is running on both nodes. when I make a request using master node IP, not getting any response.
curl http://192.168.15.101:30534
https://kubernetes.io/docs/tasks/access-application-cluster/service-access-application-cluster/
> # kubectl cluster-info Kubernetes master is running at https://192.168.15.101:6443 KubeDNS is running at
> https://192.168.15.101:6443/api/v1/proxy/namespaces/kube-
system/services/kube-dns
Here is my routes
# ip r
default via 192.168.15.1 dev eth1
10.0.2.0/24 dev eth0 proto kernel scope link src 10.0.2.15 metric 100
10.244.0.0/24 dev cni0 proto kernel scope link src 10.244.0.1
10.244.0.0/16 dev flannel.1
10.244.0.0/14 dev docker0 proto kernel scope link src 10.244.0.1
169.254.0.0/16 dev eth1 scope link metric 1003
192.168.15.0/24 dev eth1 proto kernel scope link src 192.168.15.101
kubectl get pods --selector="run=load-balancer-example" --output=wide
NAME READY STATUS RESTARTS AGE IP NODE
hello-world-3272482377-225z8 1/1 Running 1 18h 10.244.1.81 node-01
hello-world-3272482377-f6qqd 1/1 Running 1 18h 10.244.2.78 node-02
I check the iptables rules, NAT is set for cluster IP.
how can I troubleshoot this connection issue?
Thanks
-SR