No route to host from some Kubernetes containers to other containers in same cluster - kubernetes

This is a Kubespray deployment using calico. All the defaults are were left as-is except for the fact that there is a proxy. Kubespray ran to the end without issues.
Access to Kubernetes services started failing and after investigation, there was no route to host to the coredns service. Accessing a K8S service by IP worked. Everything else seems to be correct, so I am left with a cluster that works, but without DNS.
Here is some background information:
Starting up a busybox container:
# nslookup kubernetes.default
Server: 169.254.25.10
Address: 169.254.25.10:53
** server can't find kubernetes.default: NXDOMAIN
*** Can't find kubernetes.default: No answer
Now the output while explicitly defining the IP of one of the CoreDNS pods:
# nslookup kubernetes.default 10.233.0.3
;; connection timed out; no servers could be reached
Notice that telnet to the Kubernetes API works:
# telnet 10.233.0.1 443
Connected to 10.233.0.1
kube-proxy logs:
10.233.0.3 is the service IP for coredns. The last line looks concerning, even though it is INFO.
$ kubectl logs kube-proxy-45v8n -nkube-system
I1114 14:19:29.657685 1 node.go:135] Successfully retrieved node IP: X.59.172.20
I1114 14:19:29.657769 1 server_others.go:176] Using ipvs Proxier.
I1114 14:19:29.664959 1 server.go:529] Version: v1.16.0
I1114 14:19:29.665427 1 conntrack.go:52] Setting nf_conntrack_max to 262144
I1114 14:19:29.669508 1 config.go:313] Starting service config controller
I1114 14:19:29.669566 1 shared_informer.go:197] Waiting for caches to sync for service config
I1114 14:19:29.669602 1 config.go:131] Starting endpoints config controller
I1114 14:19:29.669612 1 shared_informer.go:197] Waiting for caches to sync for endpoints config
I1114 14:19:29.769705 1 shared_informer.go:204] Caches are synced for service config
I1114 14:19:29.769756 1 shared_informer.go:204] Caches are synced for endpoints config
I1114 14:21:29.666256 1 graceful_termination.go:93] lw: remote out of the list: 10.233.0.3:53/TCP/10.233.124.23:53
I1114 14:21:29.666380 1 graceful_termination.go:93] lw: remote out of the list: 10.233.0.3:53/TCP/10.233.122.11:53
All pods are running without crashing/restarts etc. and otherwise services behave correctly.
IPVS looks correct. CoreDNS service is defined there:
# ipvsadm -ln
IP Virtual Server version 1.2.1 (size=4096)
Prot LocalAddress:Port Scheduler Flags
-> RemoteAddress:Port Forward Weight ActiveConn InActConn
TCP 10.233.0.1:443 rr
-> x.59.172.19:6443 Masq 1 0 0
-> x.59.172.20:6443 Masq 1 1 0
TCP 10.233.0.3:53 rr
-> 10.233.122.12:53 Masq 1 0 0
-> 10.233.124.24:53 Masq 1 0 0
TCP 10.233.0.3:9153 rr
-> 10.233.122.12:9153 Masq 1 0 0
-> 10.233.124.24:9153 Masq 1 0 0
TCP 10.233.51.168:3306 rr
-> x.59.172.23:6446 Masq 1 0 0
TCP 10.233.53.155:44134 rr
-> 10.233.89.20:44134 Masq 1 0 0
UDP 10.233.0.3:53 rr
-> 10.233.122.12:53 Masq 1 0 314
-> 10.233.124.24:53 Masq 1 0 312
Host routing also looks correct.
# ip r
default via x.59.172.17 dev ens3 proto dhcp src x.59.172.22 metric 100
10.233.87.0/24 via x.59.172.21 dev tunl0 proto bird onlink
blackhole 10.233.89.0/24 proto bird
10.233.89.20 dev calib88cf6925c2 scope link
10.233.89.21 dev califdffa38ed52 scope link
10.233.122.0/24 via x.59.172.19 dev tunl0 proto bird onlink
10.233.124.0/24 via x.59.172.20 dev tunl0 proto bird onlink
x.59.172.16/28 dev ens3 proto kernel scope link src x.59.172.22
x.59.172.17 dev ens3 proto dhcp scope link src x.59.172.22 metric 100
172.17.0.0/16 dev docker0 proto kernel scope link src 172.17.0.1 linkdown
I have redeployed this same cluster in separate environments with flannel and calico with iptables instead of ipvs. I have also disabled the docker http proxy after deploy temporarily. None of which makes any difference.
Also:
kube_service_addresses: 10.233.0.0/18
kube_pods_subnet: 10.233.64.0/18
(They do not overlap)
What is the next step in debugging this issue?

I highly recommend you to avoid using latest busybox image to troubleshoot DNS. There are few issues reported regarding dnslookup on versions newer than 1.28.
v 1.28.4
user#node1:~$ kubectl exec -ti busybox busybox | head -1
BusyBox v1.28.4 (2018-05-22 17:00:17 UTC) multi-call binary.
user#node1:~$ kubectl exec -ti busybox -- nslookup kubernetes.default
Server: 169.254.25.10
Address 1: 169.254.25.10
Name: kubernetes.default
Address 1: 10.233.0.1 kubernetes.default.svc.cluster.local
v 1.31.1
user#node1:~$ kubectl exec -ti busyboxlatest busybox | head -1
BusyBox v1.31.1 (2019-10-28 18:40:01 UTC) multi-call binary.
user#node1:~$ kubectl exec -ti busyboxlatest -- nslookup kubernetes.default
Server: 169.254.25.10
Address: 169.254.25.10:53
** server can't find kubernetes.default: NXDOMAIN
*** Can't find kubernetes.default: No answer
command terminated with exit code 1
Going deeper and exploring more possibilities, I've reproduced your problem on GCP and after some digging I was able to figure out what is causing this communication problem.
GCE (Google Compute Engine) blocks traffic between hosts by default; we have to allow Calico traffic to flow between containers on different hosts.
According to calico documentation, you can do it by creating a firewall allowing this communication rule:
gcloud compute firewall-rules create calico-ipip --allow 4 --network "default" --source-ranges "10.128.0.0/9"
You can verify the rule with this command:
gcloud compute firewall-rules list
This is not present on the most recent calico documentation but it's still true and necessary.
Before creating firewall rule:
user#node1:~$ kubectl exec -ti busybox2 -- nslookup kubernetes.default
Server: 10.233.0.3
Address 1: 10.233.0.3 coredns.kube-system.svc.cluster.local
nslookup: can't resolve 'kubernetes.default'
command terminated with exit code 1
After creating firewall rule:
user#node1:~$ kubectl exec -ti busybox2 -- nslookup kubernetes.default
Server: 10.233.0.3
Address 1: 10.233.0.3 coredns.kube-system.svc.cluster.local
Name: kubernetes.default
Address 1: 10.233.0.1 kubernetes.default.svc.cluster.local
It doesn't matter if you bootstrap your cluster using kubespray or kubeadm, this problem will happen because calico needs to communicate between nodes and GCE is blocking it as default.

This is what works for me, I tried to install my k8s cluster using kubespray configured with calico as CNI and containerd as container runtime
iptables -P INPUT ACCEPT
iptables -P FORWARD ACCEPT
iptables -P OUTPUT ACCEPT
iptables -F
[delete coredns pod]

Related

Can't resolve dns in kubernetes

I use next command to check dns issue in my k8s:
kubectl apply -f https://k8s.io/examples/admin/dns/dnsutils.yaml
kubectl exec -i -t dnsutils -- nslookup kubernetes.default
The nslookup result is:
;; connection timed out; no servers could be reached
command terminated with exit code 1
dnsutils.yaml:
apiVersion: v1
kind: Pod
metadata:
name: dnsutils
namespace: default
spec:
containers:
- name: dnsutils
image: gcr.io/kubernetes-e2e-test-images/dnsutils:1.3
command:
- sleep
- "3600"
imagePullPolicy: IfNotPresent
restartPolicy: Always
NOTE: it's a machine which default disable all ports, so I ask our IT admin already open the port based on next doc check-required-ports, I'm not sure if this matters.
And use next I could get the pod ip of coredns.
kubectl get pods -n kube-system -o wide | grep core
coredns-7877db9d45-swb6c 1/1 Running 0 2m58s 10.244.1.8 node2 <none> <none>
coredns-7877db9d45-zwc8v 1/1 Running 0 2m57s 10.244.0.6 node1 <none> <none>
Here, 10.244.0.6 is my master while 10.244.1.8 is my working node.
Then if I directly specify coredns pod ip:
master node ok:
kubectl exec -i -t dnsutils -- nslookup kubernetes.default 10.244.0.6
Server: 10.244.0.6
Address: 10.244.0.6#53
Name: kubernetes.default.svc.cluster.local
Address: 10.96.0.1
work node not ok:
# kubectl exec -i -t dnsutils -- nslookup kubernetes.default 10.244.1.8
;; connection timed out; no servers could be reached
command terminated with exit code 1
So, the question narrow down to why COREDNS on work node not works? Anything I need to pay attention?
Environment:
OS: ubuntu18.04
K8S: v1.21.0
Cluster boot command:
kubeadm init --pod-network-cidr=10.244.0.0/16
kubectl apply -f https://raw.githubusercontent.com/coreos/flannel/master/Documentation/kube-flannel.yml
Finally, I find the root cause, this is hardware firewall issue, see this:
Firewalls
When using udp backend, flannel uses UDP port 8285 for sending encapsulated packets.
When using vxlan backend, kernel uses UDP port 8472 for sending encapsulated packets.
Make sure that your firewall rules allow this traffic for all hosts participating in the overlay network.
Make sure that your firewall rules allow traffic from pod network cidr visit your kubernetes master node.
When nslookup client on the same node of dns server, it won't trigger firewall block, so everything is ok.
When nslookup client not on the same node of dns server, it will trigger firewall block, so we can't access dns server.
So, after open the ports, everything ok now.

Can we setup a k8s bare matal server to run Bind DNS server (named) and have an access to it from the outside on port 53?

I have setup a k8s cluster using 2 bare metal servers (1 master and 1 worker) using kubespray with default settings (kube_proxy_mode: iptables and dns_mode: coredns) and I would like to run a BIND DNS server inside to manage a couple of domain names.
I deployed with helm 3 an helloworld web app for testing. Everything works like a charm (HTTP, HTTPs, Let's Encrypt thought cert-manager).
kubectl version
Client Version: version.Info{Major:"1", Minor:"17", GitVersion:"v1.17.4", GitCommit:"8d8aa39598534325ad77120c120a22b3a990b5ea", GitTreeState:"clean", BuildDate:"2020-03-12T21:03:42Z", GoVersion:"go1.13.8", Compiler:"gc", Platform:"linux/amd64"}
Server Version: version.Info{Major:"1", Minor:"16", GitVersion:"v1.16.7", GitCommit:"be3d344ed06bff7a4fc60656200a93c74f31f9a4", GitTreeState:"clean", BuildDate:"2020-02-11T19:24:46Z", GoVersion:"go1.13.6", Compiler:"gc", Platform:"linux/amd64"}
kubectl get nodes
NAME STATUS ROLES AGE VERSION
k8smaster Ready master 22d v1.16.7
k8sslave Ready <none> 21d v1.16.7
I deployed with an Helm 3 chart an image of my BIND DNS Server (named) in default namespace; with a service exposing the port 53 of the bind app container.
I have tested the DNS resolution with a pod and the bind service; it works well. Here is the test of the bind k8s service from the master node:
kubectl -n default get svc bind -o wide
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE SELECTOR
bind ClusterIP 10.233.31.255 <none> 53/TCP,53/UDP 4m5s app=bind,release=bind
kubectl get endpoints bind
NAME ENDPOINTS AGE
bind 10.233.75.239:53,10.233.93.245:53,10.233.75.239:53 + 1 more... 4m12s
export SERVICE_IP=`kubectl get services bind -o go-template='{{.spec.clusterIP}}{{"\n"}}'`
nslookup www.example.com ${SERVICE_IP}
Server: 10.233.31.255
Address: 10.233.31.255#53
Name: www.example.com
Address: 176.31.XXX.XXX
So the bind DNS app is deployed and is working fine through the bind k8s service.
For the next step; I followed the https://kubernetes.github.io/ingress-nginx/user-guide/exposing-tcp-udp-services/ documentation to setup the Nginx Ingress Controller (both configmap and service) to handle tcp/udp requests on port 53 and to redirect them to the bind DNS app.
When I test the name resolution from an external computer it does not work:
nslookup www.example.com <IP of the k8s master>
;; connection timed out; no servers could be reached
I digg into k8s configuration, logs, etc. and I found a warning message in kube-proxy logs:
ps auxw | grep kube-proxy
root 19984 0.0 0.2 141160 41848 ? Ssl Mar26 19:39 /usr/local/bin/kube-proxy --config=/var/lib/kube-proxy/config.conf --hostname-override=k8smaster
journalctl --since "2 days ago" | grep kube-proxy
<NOTHING RETURNED>
KUBEPROXY_FIRST_POD=`kubectl get pods -n kube-system -l k8s-app=kube-proxy -o go-template='{{range .items}}{{.metadata.name}}{{"\n"}}{{end}}' | head -n 1`
kubectl logs -n kube-system ${KUBEPROXY_FIRST_POD}
I0326 22:26:03.491900 1 node.go:135] Successfully retrieved node IP: 91.121.XXX.XXX
I0326 22:26:03.491957 1 server_others.go:150] Using iptables Proxier.
I0326 22:26:03.492453 1 server.go:529] Version: v1.16.7
I0326 22:26:03.493179 1 conntrack.go:52] Setting nf_conntrack_max to 262144
I0326 22:26:03.493647 1 config.go:131] Starting endpoints config controller
I0326 22:26:03.493663 1 config.go:313] Starting service config controller
I0326 22:26:03.493669 1 shared_informer.go:197] Waiting for caches to sync for endpoints config
I0326 22:26:03.493679 1 shared_informer.go:197] Waiting for caches to sync for service config
I0326 22:26:03.593986 1 shared_informer.go:204] Caches are synced for endpoints config
I0326 22:26:03.593992 1 shared_informer.go:204] Caches are synced for service config
E0411 17:02:48.113935 1 proxier.go:927] can't open "externalIP for ingress-nginx/ingress-nginx:bind-udp" (91.121.XXX.XXX:53/udp), skipping this externalIP: listen udp 91.121.XXX.XXX:53: bind: address already in use
E0411 17:02:48.119378 1 proxier.go:927] can't open "externalIP for ingress-nginx/ingress-nginx:bind-tcp" (91.121.XXX.XXX:53/tcp), skipping this externalIP: listen tcp 91.121.XXX.XXX:53: bind: address already in use
Then I look for who was already using the port 53...
netstat -lpnt | grep 53
tcp 0 0 0.0.0.0:5355 0.0.0.0:* LISTEN 1682/systemd-resolv
tcp 0 0 87.98.XXX.XXX:53 0.0.0.0:* LISTEN 19984/kube-proxy
tcp 0 0 169.254.25.10:53 0.0.0.0:* LISTEN 14448/node-cache
tcp6 0 0 :::9253 :::* LISTEN 14448/node-cache
tcp6 0 0 :::9353 :::* LISTEN 14448/node-cache
A look on the proc 14448/node-cache:
cat /proc/14448/cmdline
/node-cache-localip169.254.25.10-conf/etc/coredns/Corefile-upstreamsvccoredns
So coredns is already handling the port 53 which is normal cos it's the k8s internal DNS service.
In coredns documentation (https://github.com/coredns/coredns/blob/master/README.md) they talk about a -dns.port option to use a distinct port... but when I look into kubespray (which has 3 jinja templates https://github.com/kubernetes-sigs/kubespray/tree/release-2.12/roles/kubernetes-apps/ansible/templates for creating the coredns configmap, services etc. similar to https://kubernetes.io/docs/tasks/administer-cluster/dns-custom-nameservers/#coredns) everything is hardcoded with port 53.
So my question is : Is there a k8s cluster configuration/workaround so I can run my own DNS Server and exposed it to port 53?
Maybe?
Setup the coredns to use a a different port than 53 ? Seems hard and I'm really not sure this makes sense!
I can setup my bind k8s service to expose port 5353 and configure the nginx ingress controller to handle this 5353 port and redirect to the app 53 port. But this would require to setup iptables to route external DSN requests* received on port 53 to my bind k8s service on port 5353 ? What would be the iptables config (INPUT / PREROUTING or FORWARD)? Does this kind of network configuration would breakes coredns?
Regards,
Chris
I suppose Your nginx-ingress doesn't work as expected. You need Load Balancer provider, such as MetalLB, to Your bare metal k8s cluster to receive external connections on ports like 53. And You don't need nginx-ingress to use with bind, just change bind Service type from ClusterIP to LoadBalancer and ensure you got an external IP on this Service. Your helm chart manual may help to switch to LoadBalancer.

Kubernetes pods cannot communicate outbound

I have installed Kubernetes v1.13.10 on a group of VMs running CentOS 7. When I deploy pods, they can connect to one another but cannot connect to anything outside of the cluster. The CoreDNS pods have these errors in the log:
[ERROR] plugin/errors: 2 app.harness.io.xentaurs.com. A: unreachable backend: read udp 172.21.0.33:48105->10.20.10.52:53: i/o timeout
[ERROR] plugin/errors: 2 app.harness.io.xentaurs.com. AAAA: unreachable backend: read udp 172.21.0.33:49098->10.20.10.51:53: i/o timeout
[ERROR] plugin/errors: 2 app.harness.io.xentaurs.com. AAAA: unreachable backend: read udp 172.21.0.33:53113->10.20.10.51:53: i/o timeout
[ERROR] plugin/errors: 2 app.harness.io.xentaurs.com. A: unreachable backend: read udp 172.21.0.33:39648->10.20.10.51:53: i/o timeout
The IPs 10.20.10.51 and 10.20.10.52 are the internal DNS servers and are reachable from the nodes. I did a Wireshark capture from the DNS servers, and I see the traffic is coming in from the CoreDNS pod IP address 172.21.0.33. There would be no route for the DNS servers to get back to that IP as it isn't routable outside of the Kubernetes cluster.
My understanding is that an iptables rule should be implemented to nat the pod IPs to the address of the node when a pod is trying to communicate outbound (correct?). Below is the POSTROUTING chain in iptables:
[root#kube-aci-1 ~]# iptables -t nat -L POSTROUTING -v --line-number
Chain POSTROUTING (policy ACCEPT 23 packets, 2324 bytes)
num pkts bytes target prot opt in out source destination
1 1990 166K KUBE-POSTROUTING all -- any any anywhere anywhere /* kubernetes postrouting rules */
2 0 0 MASQUERADE all -- any ens192.152 172.21.0.0/16 anywhere
Line 1 was added by kube-proxy and line 2 was a line I manually added to try to nat anything coming from the pod subnet 172.21.0.0/16 to the node interface ens192.152, but that didn't work.
Here's the kube-proxy logs:
[root#kube-aci-1 ~]# kubectl logs kube-proxy-llq22 -n kube-system
W1117 16:31:59.225870 1 proxier.go:498] Failed to load kernel module ip_vs with modprobe. You can ignore this message when kube-proxy is running inside container without mounting /lib/modules
W1117 16:31:59.232006 1 proxier.go:498] Failed to load kernel module ip_vs_rr with modprobe. You can ignore this message when kube-proxy is running inside container without mounting /lib/modules
W1117 16:31:59.233727 1 proxier.go:498] Failed to load kernel module ip_vs_wrr with modprobe. You can ignore this message when kube-proxy is running inside container without mounting /lib/modules
W1117 16:31:59.235700 1 proxier.go:498] Failed to load kernel module ip_vs_sh with modprobe. You can ignore this message when kube-proxy is running inside container without mounting /lib/modules
W1117 16:31:59.255278 1 server_others.go:296] Flag proxy-mode="" unknown, assuming iptables proxy
I1117 16:31:59.289360 1 server_others.go:148] Using iptables Proxier.
I1117 16:31:59.296021 1 server_others.go:178] Tearing down inactive rules.
I1117 16:31:59.324352 1 server.go:484] Version: v1.13.10
I1117 16:31:59.335846 1 conntrack.go:52] Setting nf_conntrack_max to 131072
I1117 16:31:59.336443 1 config.go:102] Starting endpoints config controller
I1117 16:31:59.336466 1 controller_utils.go:1027] Waiting for caches to sync for endpoints config controller
I1117 16:31:59.336493 1 config.go:202] Starting service config controller
I1117 16:31:59.336499 1 controller_utils.go:1027] Waiting for caches to sync for service config controller
I1117 16:31:59.436617 1 controller_utils.go:1034] Caches are synced for service config controller
I1117 16:31:59.436739 1 controller_utils.go:1034] Caches are synced for endpoints config controller
I have tried flushing the iptables nat table as well as restarted kube-proxy on all nodes, but the problem still persisted. Any clues in the output above, or thoughts on further troubleshooting?
Output of kubectl get nodes:
[root#kube-aci-1 ~]# kubectl get nodes -o wide
NAME STATUS ROLES AGE VERSION INTERNAL-IP EXTERNAL-IP OS-IMAGE KERNEL-VERSION CONTAINER-RUNTIME
kube-aci-1 Ready master 85d v1.13.10 10.10.52.217 <none> CentOS Linux 7 (Core) 3.10.0-957.el7.x86_64 docker://1.13.1
kube-aci-2 Ready <none> 85d v1.13.10 10.10.52.218 <none> CentOS Linux 7 (Core) 3.10.0-957.el7.x86_64 docker://1.13.1
Turns out it is necessary to use a subnet that is routable on the network with the CNI in use if outbound communication from pods is necessary. I made the subnet routable on the external network and the pods can now communicate outbound.

flannel restart very often

Flannel on node restarts always.
Log as follows:
root#debian:~# docker logs faa668852544
I0425 07:14:37.721766 1 main.go:514] Determining IP address of default interface
I0425 07:14:37.724855 1 main.go:527] Using interface with name eth0 and address 192.168.50.19
I0425 07:14:37.815135 1 main.go:544] Defaulting external address to interface address (192.168.50.19)
E0425 07:15:07.825910 1 main.go:241] Failed to create SubnetManager: error retrieving pod spec for 'kube-system/kube-flannel-ds-arm-bg9rn': Get https://10.96.0.1:443/api/v1/namespaces/kube-system/pods/kube-flannel-ds-arm-bg9rn: dial tcp 10.96.0.1:443: i/o timeout
master configuration:
ubuntu: 16.04
node:
embedded system with debian rootfs(linux4.9).
kubernetes version:v1.14.1
docker version:18.09
flannel version:v0.11.0
I hope flannel run normal on node.
First, for flannel to work correctly, you must pass --pod-network-cidr=10.244.0.0/16 to kubeadm init.
kubeadm init --pod-network-cidr=10.244.0.0/16
Set /proc/sys/net/bridge/bridge-nf-call-iptables to 1 by running
sysctl net.bridge.bridge-nf-call-iptables=1
Next is to create the clusterrole and clusterrolebinding
as follows:
kubectl apply -f https://raw.githubusercontent.com/coreos/flannel/a70459be0084506e4ec919aa1c114638878db11b/Documentation/kube-flannel.yml

Kubernetes not able to connect to pod using service IP

I am following below document to create the service. this pod is running on both nodes. when I make a request using master node IP, not getting any response.
curl http://192.168.15.101:30534
https://kubernetes.io/docs/tasks/access-application-cluster/service-access-application-cluster/
> # kubectl cluster-info Kubernetes master is running at https://192.168.15.101:6443 KubeDNS is running at
> https://192.168.15.101:6443/api/v1/proxy/namespaces/kube-
system/services/kube-dns
Here is my routes
# ip r
default via 192.168.15.1 dev eth1
10.0.2.0/24 dev eth0 proto kernel scope link src 10.0.2.15 metric 100
10.244.0.0/24 dev cni0 proto kernel scope link src 10.244.0.1
10.244.0.0/16 dev flannel.1
10.244.0.0/14 dev docker0 proto kernel scope link src 10.244.0.1
169.254.0.0/16 dev eth1 scope link metric 1003
192.168.15.0/24 dev eth1 proto kernel scope link src 192.168.15.101
kubectl get pods --selector="run=load-balancer-example" --output=wide
NAME READY STATUS RESTARTS AGE IP NODE
hello-world-3272482377-225z8 1/1 Running 1 18h 10.244.1.81 node-01
hello-world-3272482377-f6qqd 1/1 Running 1 18h 10.244.2.78 node-02
I check the iptables rules, NAT is set for cluster IP.
how can I troubleshoot this connection issue?
Thanks
-SR