K8 DNS not resolving - kubernetes

My K8 DNS isn't resolving, thus I did follow the debugging steps as mentioned here. As I am new to K8, can someone point me to the issue I am facing? I cant extract any useful information out of the debugging steps.
cat /etc/os-release
NAME="Ubuntu"
VERSION="20.04.2 LTS (Focal Fossa)"
ID=ubuntu
ID_LIKE=debian
PRETTY_NAME="Ubuntu 20.04.2 LTS"
kubectl version
Client Version: version.Info{Major:"1", Minor:"20",
GitVersion:"v1.20.5", GitCommit:"6b1d87acf3c8253c123756b9e61dac642678305f", GitTreeState:"clean", BuildDate:"2021-03-18T01:10:43Z", GoVersion:"go1.15.8", Compiler:"gc", Platform:"linux/amd64"}
Server Version: version.Info{Major:"1", Minor:"20", GitVersion:"v1.20.5", GitCommit:"6b1d87acf3c8253c123756b9e61dac642678305f", GitTreeState:"clean", BuildDate:"2021-03-18T01:02:01Z", GoVersion:"go1.15.8", Compiler:"gc", Platform:"linux/amd64"}
kubectl get namespace
NAME STATUS AGE
default Active 7d4h
kubectl get pods dnsutils
NAME READY STATUS RESTARTS AGE
dnsutils 1/1 Running 18 18h
kubectl exec -i -t dnsutils -- nslookup kubernetes.default
;; connection timed out; no servers could be reached
kubectl exec -ti dnsutils -- cat /etc/resolv.conf
nameserver 10.96.0.10
search default.svc.cluster.local svc.cluster.local
cluster.local
options ndots:5
kubectl get pods --namespace=kube-system -l k8s-app=kube-dns
NAME READY STATUS RESTARTS
AGE
coredns-74ff55c5b-6vsml 1/1 Running 12 7d4h
coredns-74ff55c5b-mww7g 1/1 Running 12 7d4h
kubectl logs --namespace=kube-system -l k8s-app=kube-dns
.:53
[INFO] plugin/reload: Running configuration MD5 = db32ca3650231d74073ff4cf814959a7
CoreDNS-1.7.0
linux/amd64, go1.14.4, f59c03d
[INFO] Reloading
[INFO] plugin/health: Going into lameduck mode for 5s
[INFO] plugin/reload: Running configuration MD5 = 3d3f6363f05ccd60e0f885f0eca6c5ff
[INFO] Reloading complete
[INFO] 10.244.0.1:16732 - 59651 "HINFO IN 6307445054232439722.7934820194057826263. udp 57 false 512" NXDOMAIN qr,rd,ra 132 0.006053527s
[INFO] 127.0.0.1:58672 - 59651 "HINFO IN 6307445054232439722.7934820194057826263. udp 57 false 512" NXDOMAIN qr,rd,ra 132 0.00658948s
[INFO] plugin/reload: Running configuration MD5 = db32ca3650231d74073ff4cf814959a7
CoreDNS-1.7.0
linux/amd64, go1.14.4, f59c03d
[INFO] Reloading
[INFO] plugin/health: Going into lameduck mode for 5s
[INFO] plugin/reload: Running configuration MD5 = 3d3f6363f05ccd60e0f885f0eca6c5ff
[INFO] Reloading complete
[INFO] 10.244.0.62:56364 - 32900 "HINFO IN 2808379183970575835.6786373795048579500. udp 57 false 512" NXDOMAIN qr,rd,ra 132 0.004922932s
[INFO] 127.0.0.1:48277 - 32900 "HINFO IN 2808379183970575835.6786373795048579500. udp 57 false 512" NXDOMAIN qr,rd,ra 132 0.007889024s
[INFO] 10.244.0.62:49106 - 59651 "HINFO IN 6307445054232439722.7934820194057826263. udp 57 false 512" NXDOMAIN qr,rd,ra 132 0.005058199s
kubectl get svc --namespace=kube-system
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
kube-dns ClusterIP 10.96.0.10 <none> 53/UDP,53/TCP,9153/TCP 7d4h
monitoring-influxdb ClusterIP 10.102.51.183 <none> 8086/TCP 4d21h
kubectl get endpoints kube-dns --namespace=kube-system
NAME ENDPOINTS AGE
kube-dns 10.244.0.45:53,10.244.0.47:53,10.244.0.45:53 + 3 more... 7d4h
cat /run/systemd/resolve/resolv.conf
nameserver 8.8.8.8
nameserver 2001:4860:4860::8888
cat /etc/systemd/resolved.conf
[Resolve]
DNS=8.8.8.8 2001:4860:4860::8888
cat /etc/resolv.conf
nameserver 127.0.0.53
options edns0 trust-ad
It is kinda odd, that both resolv.conf have different values. Also, I have no clue (if I would have to set the DNS IP manually) which IP to choose.
kubeadm config view
apiServer:
extraArgs:
authorization-mode: Node,RBAC
timeoutForControlPlane: 4m0s
apiVersion: kubeadm.k8s.io/v1beta2
certificatesDir: /etc/kubernetes/pki
clusterName: kubernetes
controllerManager: {}
dns:
type: CoreDNS
etcd:
local:
dataDir: /var/lib/etcd
imageRepository: k8s.gcr.io
kind: ClusterConfiguration
kubernetesVersion: v1.20.5
networking:
dnsDomain: cluster.local
podSubnet: 10.244.0.0/16
serviceSubnet: 10.96.0.0/12
scheduler: {}
Update
The dnsutils assigned pod's IP is 10.244.2.20 and not reachable from the single k8 master node.
ping 10.244.2.20

There were several issues with my configuration. First off: I did use an incompatible docker version (20.10.5) which isn't supported yet. Hence, I don't know whether this issue also arises when using a supported docker version. However, even with this incompatible docker version, I was able to fix the issue with following steps:
1. DNS misconfiguration
I don't know who/what will set the resolved.conf's DNS entries, but my entry was clearly wrong. First, we need to obtain the K8's DNS Cluster-IP Address:
kubectl get services --all-namespaces -o wide
You will receive all services within all namespaces, including the kube-dns Cluster-IP. In my case It looks like following
NAMESPACE NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE SELECTOR
kube-system kube-dns ClusterIP 10.96.0.10 <none> 53/UDP,53/TCP,9153/TCP 11d k8s-app=kube-dns
kube-system monitoring-influxdb ClusterIP 10.102.51.183 <none> 8086/TCP 9d k8s-app=influxdb
kubernetes-dashboard dashboard-metrics-scraper ClusterIP 10.110.126.218 <none> 8000/TCP 11d k8s-app=dashboard-metrics-scraper
kubernetes-dashboard kubernetes-dashboard ClusterIP 10.98.164.199 <none> 443/TCP 11d k8s-app=kubernetes-dashboard
Use that DNS within your resolved.conf file. Where that file is located, depends on your OS. In my case (Ubuntu 20.04) /etc/systemd/resolved.conf.
nano /etc/systemd/resolved.conf
[Resolve]
DNS=10.96.0.10 8.8.8.8 2001:4860:4860::8888
2. Re-Join all nodes
I did use UFW next to IPTables, which was somehow messing with the configuration. Hence, I did remove all nodes, installed a fresh OS and re-joined the cluster; without activating UFW.
3. Forward packet policy
In some versions docker modifies the iptables, such that packets will be dropped in packet-forward scenarios. Override this behaviour on all nodes with:
iptables -P FORWARD ACCEPT
Just to be sure, also enable ipv4 forwarding with:
echo "net.ipv4.ip_forward=1" >> /etc/sysctl.conf

What is the operating system you are using. I was using redhat enterprise Linux and had the similar error.
I have removed everything in /etc/resolv.conf and kept only ip of dns server and it worked.
What is the network policy you are using, for me calico didn't work. I used kube-router with above /etc/resolv.conf setting.
Thanks.

Related

Kubernetes: Can't Resolve Hostnames from within pods

new to Kubernetes, but have used K3s a little in the past. Just setup a K8s cluster. None of my pods can do DNS lookups, even to google, or to an internal domain.
I init'd with: --pod-network-cidr=10.244.0.0/16. Metal-LB is installed (10.7.7.10-10.7.7.254) and the nodes and master are running with IPs 10.7.50.X/16 and 10.7.60.X/16 respectively. Flannel is setup with the default Kube-Flannel: https://raw.githubusercontent.com/coreos/flannel/master/Documentation/kube-flannel.yml
So far it's just 1 master with 2 nodes.
Versions:
$ kubeadm version
kubeadm version: &version.Info{Major:"1", Minor:"22", GitVersion:"v1.22.1", GitCommit:"632ed300f2c34f6d6d15ca4cef3d3c7073412212", GitTreeState:"clean", BuildDate:"2021-08-19T15:44:22Z", GoVersion:"go1.16.7", Compiler:"gc", Platform:"linux/amd64"}
$ kubectl version
Client Version: version.Info{Major:"1", Minor:"22", GitVersion:"v1.22.1", GitCommit:"632ed300f2c34f6d6d15ca4cef3d3c7073412212", GitTreeState:"clean", BuildDate:"2021-08-19T15:45:37Z", GoVersion:"go1.16.7", Compiler:"gc", Platform:"linux/amd64"}
Server Version: version.Info{Major:"1", Minor:"22", GitVersion:"v1.22.1", GitCommit:"632ed300f2c34f6d6d15ca4cef3d3c7073412212", GitTreeState:"clean", BuildDate:"2021-08-19T15:39:34Z", GoVersion:"go1.16.7", Compiler:"gc", Platform:"linux/amd64"}
$ kubelet --version
Kubernetes v1.22.1
Troubleshooting commands:
$ kubectl describe service kube-dns -n kube-system
Name: kube-dns
Namespace: kube-system
Labels: k8s-app=kube-dns
kubernetes.io/cluster-service=true
kubernetes.io/name=CoreDNS
Annotations: prometheus.io/port: 9153
prometheus.io/scrape: true
Selector: k8s-app=kube-dns
Type: ClusterIP
IP Family Policy: SingleStack
IP Families: IPv4
IP: 10.96.0.10
IPs: 10.96.0.10
Port: dns 53/UDP
TargetPort: 53/UDP
Endpoints: 10.244.1.20:53,10.244.2.28:53
Port: dns-tcp 53/TCP
TargetPort: 53/TCP
Endpoints: 10.244.1.20:53,10.244.2.28:53
Port: metrics 9153/TCP
TargetPort: 9153/TCP
Endpoints: 10.244.1.20:9153,10.244.2.28:9153
Session Affinity: None
Events: <none>
$ kubectl get pods -n kube-system -o wide
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
coredns-84f8874d6d-jgvwk 1/1 Running 1 (115m ago) 21h 10.244.1.20 k-w-001 <none> <none>
coredns-84f8874d6d-qh2f4 1/1 Running 1 (115m ago) 21h 10.244.2.28 k-w-002 <none> <none>
etcd-k-m-001 1/1 Running 12 (15m ago) 2d22h 10.7.50.11 k-m-001 <none> <none>
kube-apiserver-k-m-001 1/1 Running 11 (15m ago) 2d22h 10.7.50.11 k-m-001 <none> <none>
kube-controller-manager-k-m-001 1/1 Running 12 (15m ago) 2d22h 10.7.50.11 k-m-001 <none> <none>
kube-flannel-ds-286dc 1/1 Running 10 (15m ago) 2d22h 10.7.50.11 k-m-001 <none> <none>
kube-flannel-ds-rbmhx 1/1 Running 6 (114m ago) 2d21h 10.7.60.11 k-w-001 <none> <none>
kube-flannel-ds-vjl7l 1/1 Running 4 (115m ago) 2d21h 10.7.60.12 k-w-002 <none> <none>
kube-proxy-948z8 1/1 Running 8 (15m ago) 2d22h 10.7.50.11 k-m-001 <none> <none>
kube-proxy-l7h64 1/1 Running 4 (115m ago) 2d21h 10.7.60.12 k-w-002 <none> <none>
kube-proxy-pqmsr 1/1 Running 4 (115m ago) 2d21h 10.7.60.11 k-w-001 <none> <none>
kube-scheduler-k-m-001 1/1 Running 12 (15m ago) 2d22h 10.7.50.11 k-m-001 <none> <none>
metrics-server-6dfddc5fb8-47mnb 0/1 Running 3 (115m ago) 2d20h 10.244.1.21 k-w-001 <none> <none>
$ kubectl logs --namespace=kube-system coredns-84f8874d6d-jgvwk
.:53
[INFO] plugin/reload: Running configuration MD5 = db32ca3650231d74073ff4cf814959a7
CoreDNS-1.8.4
linux/amd64, go1.16.4, 053c4d5
$ kubectl logs --namespace=kube-system coredns-84f8874d6d-qh2f4
[INFO] plugin/ready: Still waiting on: "kubernetes"
.:53
[INFO] plugin/reload: Running configuration MD5 = db32ca3650231d74073ff4cf814959a7
CoreDNS-1.8.4
linux/amd64, go1.16.4, 053c4d5
These were ran seconds apart:
$ kubectl exec -ti busybox -- nslookup kubernetes.default
Server: 10.96.0.10
Address: 10.96.0.10:53
*** Can't find kubernetes.default: No answer
*** Can't find kubernetes.default: No answer
$ kubectl exec -ti busybox -- nslookup kubernetes.default
;; connection timed out; no servers could be reached
command terminated with exit code 1
Here are some more tests:
$ kubectl exec -ti busybox -- nslookup google.com
;; connection timed out; no servers could be reached
command terminated with exit code 1
$ kubectl exec -ti busybox -- nslookup google.com 8.8.8.8
Server: 8.8.8.8
Address: 8.8.8.8:53
Non-authoritative answer:
Name: google.com
Address: 142.251.33.78
*** Can't find google.com: No answer
$ kubectl exec -ti busybox -- ping 8.8.8.8
PING 8.8.8.8 (8.8.8.8): 56 data bytes
64 bytes from 8.8.8.8: seq=0 ttl=116 time=6.437 ms
$ kubectl exec busybox -- cat /etc/resolv.conf
nameserver 10.96.0.10
search default.svc.cluster.local svc.cluster.local cluster.local
options ndots:5
$ kubectl exec -ti busybox -- nslookup kubernetes.default 10.96.0.10
Server: 10.96.0.10
Address: 10.96.0.10:53
*** Can't find kubernetes.default: No answer
*** Can't find kubernetes.default: No answer
$ kubectl exec -ti busybox -- nslookup kubernetes.default 10.96.0.10
;; connection timed out; no servers could be reached
command terminated with exit code 1
I also noticed that the kube-dns service has the app selector set to k8s-app=kube-dns and coredns has the label k8s-app=kube-dns, is this correct?
The pods running in the kube-system namespace seem to have 2 different IP ranges. One is using the Node's IP, and the other is using Flannels.
I'm not sure what's happening here, being new to Kubernetes, but it appears like the DNS pods or service are not working at all.
Edit:
Further info:
$ sudo ufw status
Status: inactive
Issue was actually Flannel. DNS queries worked fine until the nodes were restarted, and then all pod queries failed until the Flannel pods were restarted.
Man this was a rabbit hole.
See: https://github.com/flannel-io/flannel/issues/1321

Kubernetes DNS lookg not working from worker node - connection timed out; no servers could be reached

I have build new Kubernetes cluster v1.20.1 single master and single node with Calico CNI.
I deployed the busybox pod in default namespace.
# kubectl get pods busybox -o wide
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
busybox 1/1 Running 0 12m 10.203.0.129 node02 <none> <none>
nslookup not working
kubectl exec -ti busybox -- nslookup kubernetes.default
Server: 10.96.0.10
Address 1: 10.96.0.10
nslookup: can't resolve 'kubernetes.default'
cluster is running RHEL 8 with latest update
followed this steps: https://kubernetes.io/docs/tasks/administer-cluster/dns-debugging-resolution/
nslookup command not able to reach nameserver
# kubectl exec -i -t dnsutils -- nslookup kubernetes.default
;; connection timed out; no servers could be reached
command terminated with exit code 1
resolve.conf file
# kubectl exec -ti dnsutils -- cat /etc/resolv.conf
search default.svc.cluster.local svc.cluster.local cluster.local
nameserver 10.96.0.10
options ndots:5
DNS pods running
# kubectl get pods --namespace=kube-system -l k8s-app=kube-dns
NAME READY STATUS RESTARTS AGE
coredns-74ff55c5b-472vx 1/1 Running 1 85m
coredns-74ff55c5b-c75bq 1/1 Running 1 85m
DNS pod logs
kubectl logs --namespace=kube-system -l k8s-app=kube-dns
.:53
[INFO] plugin/reload: Running configuration MD5 = db32ca3650231d74073ff4cf814959a7
CoreDNS-1.7.0
linux/amd64, go1.14.4, f59c03d
.:53
[INFO] plugin/reload: Running configuration MD5 = db32ca3650231d74073ff4cf814959a7
CoreDNS-1.7.0
linux/amd64, go1.14.4, f59c03d
Service is defined
# kubectl get svc --namespace=kube-system
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
kube-dns ClusterIP 10.96.0.10 <none> 53/UDP,53/TCP,9153/TCP 86m
**I can see the endpoints of DNS pod**
# kubectl get endpoints kube-dns --namespace=kube-system
NAME ENDPOINTS AGE
kube-dns 10.203.0.5:53,10.203.0.6:53,10.203.0.5:53 + 3 more... 86m
enabled the logging, but didn't see traffic coming to DNS pod
# kubectl logs --namespace=kube-system -l k8s-app=kube-dns
.:53
[INFO] plugin/reload: Running configuration MD5 = db32ca3650231d74073ff4cf814959a7
CoreDNS-1.7.0
linux/amd64, go1.14.4, f59c03d
.:53
[INFO] plugin/reload: Running configuration MD5 = db32ca3650231d74073ff4cf814959a7
CoreDNS-1.7.0
linux/amd64, go1.14.4, f59c03d
I can ping DNS POD
# kubectl exec -i -t dnsutils -- ping 10.203.0.5
PING 10.203.0.5 (10.203.0.5): 56 data bytes
64 bytes from 10.203.0.5: seq=0 ttl=62 time=6.024 ms
64 bytes from 10.203.0.5: seq=1 ttl=62 time=6.052 ms
64 bytes from 10.203.0.5: seq=2 ttl=62 time=6.175 ms
64 bytes from 10.203.0.5: seq=3 ttl=62 time=6.000 ms
^C
--- 10.203.0.5 ping statistics ---
4 packets transmitted, 4 packets received, 0% packet loss
round-trip min/avg/max = 6.000/6.062/6.175 ms
nmap show port filtered
# ke netshoot-6f677d4fdf-5t5cb -- nmap 10.203.0.5
Starting Nmap 7.80 ( https://nmap.org ) at 2021-01-15 22:29 UTC
Nmap scan report for 10.203.0.5
Host is up (0.0060s latency).
Not shown: 997 closed ports
PORT STATE SERVICE
53/tcp filtered domain
8080/tcp filtered http-proxy
8181/tcp filtered intermapper
Nmap done: 1 IP address (1 host up) scanned in 14.33 seconds
If I schedule the POD on master node, nslookup works nmap show port open?
# ke netshoot -- bash
bash-5.0# nslookup kubernetes.default
Server: 10.96.0.10
Address: 10.96.0.10#53
Name: kubernetes.default.svc.cluster.local
Address: 10.96.0.1
nmap -p 53 10.96.0.10
Starting Nmap 7.80 ( https://nmap.org ) at 2021-01-15 22:46 UTC
Nmap scan report for kube-dns.kube-system.svc.cluster.local (10.96.0.10)
Host is up (0.000098s latency).
PORT STATE SERVICE
53/tcp open domain
Nmap done: 1 IP address (1 host up) scanned in 0.14 seconds
Why nslookup from POD running on worker node is not working? how to troubleshoot this issue?
I re-build the server two times, still same issue.
Thanks
SR
Update adding kubeadm config file
# cat kubeadm-config.yaml
---
apiVersion: kubeadm.k8s.io/v1beta2
bootstrapTokens:
- groups:
- system:bootstrappers:kubeadm:default-node-token
token: abcdef.0123456789abcdef
ttl: 24h0m0s
usages:
- signing
- authentication
kind: InitConfiguration
nodeRegistration:
criSocket: unix:///run/containerd/containerd.sock
taints:
- effect: NoSchedule
key: node-role.kubernetes.io/master
kubeletExtraArgs:
cgroup-driver: "systemd"
---
apiVersion: kubeadm.k8s.io/v1beta2
kind: ClusterConfiguration
kubernetesVersion: stable
controlPlaneEndpoint: "master01:6443"
networking:
dnsDomain: cluster.local
podSubnet: 10.0.0.0/14
serviceSubnet: 10.96.0.0/12
---
apiVersion: kubeproxy.config.k8s.io/v1alpha1
kind: KubeProxyConfiguration
mode: "ipvs
"
First of all, according to the docs - please note that Calico and kubeadm support Centos/RHEL 7+.
In both Calico and kubeadm documentation we can see that they only support RHEL7+.
By default RHEL8 uses nftables instead of iptables ( we can still use iptables but "iptables" on RHEL8 is actually using the kernel's nft framework in the background - look at "Running Iptables on RHEL 8").
9.2.1. nftables replaces iptables as the default network packet filtering framework
I believe that nftables may cause this network issues because as we can find on nftables adoption page:
Kubernetes does not support nftables yet.
Note: For now I highly recommend you to use RHEL7 instead of RHEL8.
With that in mind, I'll present some information that may help you with RHEL8.
I have reproduced your issue and found a solution that works for me.
First I opened ports required by Calico - these ports can be found
here under "Network requirements".
As workaround:
Next I reverted to the old iptables backend on all cluster
nodes, you can easily do so by setting FirewallBackend in
/etc/firewalld/firewalld.conf to iptables as described
here.
Finally I restarted firewalld to make the new rules active.
I've tried nslookup from Pod running on worker node (kworker) and it seems to work correctly.
root#kmaster:~# kubectl get pod,svc -o wide
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
pod/web 1/1 Running 0 112s 10.99.32.1 kworker <none> <none>
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE SELECTOR
service/kubernetes ClusterIP 10.99.0.1 <none> 443/TCP 5m51s <none>
root#kmaster:~# kubectl exec -it web -- bash
root#web:/# nslookup kubernetes.default
Server: 10.99.0.10
Address: 10.99.0.10#53
Name: kubernetes.default.svc.cluster.local
Address: 10.99.0.1
root#web:/#
In my situation, we're using the K3S cluster. And the new agent couldn't make the default(ClusterFirst) DNS query. After lots of research, I found I need to change the kube-proxy cluster-cidr args to make the DNS work successfully.
Hope this info is useful for others.
I ran into the same issue setting up a vanilla kubeadm 1.25 cluster on RHEL8 and #matt_j's answer lead me to another solution that avoids nftables by using ipvs mode in kube-proxy.
Just modify the kube-proxy ConfigMap in kube-system namespace so the config.conf file has this value;
...
data:
config.conf:
...
mode: "ipvs"
...
And ensure kube-proxy or your nodes are restarted.

Can not connect to external location inside a Kubernetes Pod - DNS Issues

i have the following problem. I have a namespace "qa". Pods inside this namespace can communicate with each other.
For Example
kubectl exec -it qa-file-watcher-85575bd8f7-npkns -n qa /bin/bash
root#qa-file-watcher-85575bd8f7-npkns:/usr/src/app# nslookup qa-kafka-broker
root#qa-file-watcher-85575bd8f7-npkns:/usr/src/app# nslookup qa-kafka-broker
Server: 10.96.0.10
Address: 10.96.0.10#53
Name: qa-kafka-broker.qa.svc.cluster.local
Address: 10.102.218.167
But if i try to connect to an external service e.g. 8.8.8.8 oder security.debian.org for apt-get update i get the following errors
root#qa-file-watcher-85575bd8f7-npkns:/usr/src/app# nslookup 8.8.8.8
Server: 10.96.0.10
Address: 10.96.0.10#53
** server can't find 8.8.8.8.in-addr.arpa: SERVFAIL
root#qa-file-watcher-85575bd8f7-npkns:/usr/src/app# nslookup security.debian.org
Server: 10.96.0.10
Address: 10.96.0.10#53
** server can't find security.debian.org.eu-central-1.compute.internal: SERVFAIL
Here are some informations about the setup. I use a bitnami/kubernetes image on a EC2-instance on AWS
bitnami#ip-172-30-0-120:~/buildAgent/work/aad99852b1e5781f$ kubectl version
Client Version: version.Info{Major:"1", Minor:"17", GitVersion:"v1.17.3", GitCommit:"06ad960bfd03b39c8310aaf92d1e7c12ce618213", GitTreeState:"clean", BuildDate:"2020-02-11T18:14:22Z", GoVersion:"go1.13.6", Compiler:"gc", Platform:"linux/amd64"}
Server Version: version.Info{Major:"1", Minor:"17", GitVersion:"v1.17.3", GitCommit:"06ad960bfd03b39c8310aaf92d1e7c12ce618213", GitTreeState:"clean", BuildDate:"2020-02-11T18:07:13Z", GoVersion:"go1.13.6", Compiler:"gc", Platform:"linux/amd64"}
bitnami#ip-172-30-0-120:~/buildAgent/work/aad99852b1e5781f$ cat /etc/os-release
NAME="Ubuntu"
VERSION="16.04.6 LTS (Xenial Xerus)"
ID=ubuntu
ID_LIKE=debian
PRETTY_NAME="Ubuntu 16.04.6 LTS"
VERSION_ID="16.04"
HOME_URL="http://www.ubuntu.com/"
SUPPORT_URL="http://help.ubuntu.com/"
BUG_REPORT_URL="http://bugs.launchpad.net/ubuntu/"
VERSION_CODENAME=xenial
UBUNTU_CODENAME=xenial
bitnami#ip-172-30-0-120:~/buildAgent/work/aad99852b1e5781f$ cat /etc/resolv.conf
# Dynamic resolv.conf(5) file for glibc resolver(3) generated by resolvconf(8)
# DO NOT EDIT THIS FILE BY HAND -- YOUR CHANGES WILL BE OVERWRITTEN
nameserver 172.30.0.2
search xxxxxxxx.compute.internal default.svc.cluster.local svc.cluster.local cluster.local deb.debian.org
options ndots:5 single request-reopen
DNS=8.8.8.8
there are running coredns on the kubernetes with the following config
# Please edit the object below. Lines beginning with a '#' will be ignored,
# and an empty file will abort the edit. If an error occurs while saving this file will be
# reopened with the relevant failures.
#
apiVersion: v1
data:
Corefile: |
.:53 {
log
errors
health {
lameduck 5s
}
ready
kubernetes cluster.local in-addr.arpa ip6.arpa {
pods insecure
fallthrough in-addr.arpa ip6.arpa
ttl 30
}
prometheus :9153
forward . /etc/resolv.conf
cache 30
loop
reload
loadbalance
}
kind: ConfigMap
metadata:
creationTimestamp: "2020-02-25T12:52:17Z"
name: coredns
namespace: kube-system
resourceVersion: "31099780"
selfLink: /api/v1/namespaces/kube-system/configmaps/coredns
uid: 26a6800a-2ceb-4f29-ab85-82beaec0add8
Anyone has an idea what is going wrong here? If more detailed informations are needed pleas let me know.
Greetings and Thanks
EDIT:
this are the pods which are running on the namespace kube-system
bitnami#ip-172-30-0-120:~/deployments/qa-deployment$ kubectl get pods -n kube-system
NAME READY STATUS RESTARTS AGE
coredns-6955765f44-5glwz 1/1 Running 0 151m
coredns-6955765f44-hf2hd 1/1 Running 0 151m
etcd-ip-172-30-0-120 1/1 Running 4 9d
heapster-744b794df7-v2vz9 1/1 Running 1 9d
kube-apiserver-ip-172-30-0-120 1/1 Running 4 9d
kube-controller-manager-ip-172-30-0-120 1/1 Running 7 9d
kube-proxy-lfstn 1/1 Running 1 9d
kube-scheduler-ip-172-30-0-120 1/1 Running 6 9d
kubernetes-dashboard-8f7798644-m7r8x 1/1 Running 13 9d
kubernetes-metrics-scraper-6b97c6d857-nl98d 1/1 Running 0 8d
local-volume-provisioner-69vrv 1/1 Running 33 9d
monitoring-grafana-845bc8df5f-62d4x 1/1 Running 1 9d
monitoring-influxdb-56d9446bd9-wlrd5 1/1 Running 1 9d
nginx-ingress-controller-574d4c9dcf-fmdgm 1/1 Running 1 9d
registry-86c45b9d9b-pm6zj 1/1 Running 0 7d23h
weave-net-g78mj 2/2 Running 5 9d
and this is the log from the core-dns
...
...
...
[INFO] 10.32.0.35:49254 - 6294 "AAAA IN monitoring.xxxxxx.de.qa.svc.cluster.local. udp 66 false 512" NXDOMAIN qr,aa,rd 159 0.000297909s
[INFO] 10.32.0.35:55396 - 52809 "A IN monitoring.xxxxxx.de.svc.cluster.local. udp 63 false 512" NXDOMAIN qr,aa,rd 156 0.000152558s
[INFO] 10.32.0.35:55396 - 36432 "AAAA IN monitoring.xxxxxx.de.svc.cluster.local. udp 63 false 512" NXDOMAIN qr,aa,rd 156 0.000384192s
[INFO] 10.32.0.31:54436 - 61896 "AAAA IN xxxxxx.cq5rq6zjwmfc.eu-central-1.rds.amazonaws.com. udp 74 false 512" NOERROR - 0 2.000274796s
[ERROR] plugin/errors: 2 xxxxxx.cq5rq6zjwmfc.eu-central-1.rds.amazonaws.com. AAAA: read udp 10.32.0.30:41402->172.30.0.2:53: i/o timeout
[INFO] 10.32.0.31:54436 - 64312 "A IN xxxxxx.cq5rq6zjwmfc.eu-central-1.rds.amazonaws.com. udp 74 false 512" NOERROR - 0 2.000270418s
[ERROR] plugin/errors: 2 xxxxxx.cq5rq6zjwmfc.eu-central-1.rds.amazonaws.com. A: read udp 10.32.0.30:43606->172.30.0.2:53: i/o timeout
[INFO] 10.32.0.31:54436 - 8384 "AAAA IN postgres.qa.svc.cluster.local. udp 47 false 512" NOERROR qr,aa,rd 146 2.000560668s
[INFO] 10.32.0.31:54436 - 60087 "A IN postgres.qa.svc.cluster.local. udp 47 false 512" NOERROR qr,aa,rd 146 2.000566155s
EDIT2:
I cant go inside the coredns pod with
bitnami#ip-172-30-0-120:~/deployments/qa-deployment$ kubectl exec -it coredns-6955765f44-5glwz -n kube-system bash
error: Internal error occurred: error executing command in container: failed to exec in container: failed to start exec "2a604d5b8cfad5341acc0d548412f8376fdf063bf97d92d1aaa501841f959671": OCI runtime exec failed: exec failed: container_linux.go:349: starting container process caused "exec: \"bash\": executable file not found in $PATH": unknown
The resolve.conf inside the pod file-watcher-service in the namespace qa:
root#qa-file-watcher-service-7b7d47c67d-fjb8m:/etc# cat resolv.conf
search qa.svc.cluster.local svc.cluster.local cluster.local eu-central-1.compute.internal default.svc.cluster.local
nameserver 10.96.0.10
options ndots:5

"nslookup: read: Connection refused" from inside of a pod in Kubernetes (K8S) cluster (DNS problem)

Problem
I have custom installation of k8s cluster with 1 master and 1 node on AWS ec2 based on Centos 7. It uses Core-DNS (pods running fine with no errors in logs)
Inside of a node pod when calling e.g. nslookup google.com
the output is nslookup: write to '10.96.0.10': Connection refused
;; connection timed out; no servers could be reached
For example, pinging inside of a pod ping 8.8.8.8 works fine:
PING 8.8.8.8 (8.8.8.8): 56 data bytes
64 bytes from 8.8.8.8: seq=0 ttl=50 time=1.330 ms
/etc/resolv.conf inside a pod it looks like:
nameserver 10.96.0.10
search default.svc.cluster.local svc.cluster.local cluster.local ec2.internal
options ndots:5
This command works fine from the node itself nslookup google.com:
Server: 172.31.0.2
Address: 172.31.0.2#53
Non-authoritative answer:
Name: google.com
Address: 172.217.15.110
Name: google.com
Address: 2607:f8b0:4004:801::200e
Kubelet config kubectl get configmap kubelet-config-1.17 -n kube-system -o yaml returns
data:
kubelet: |
apiVersion: kubelet.config.k8s.io/v1beta1
authentication:
anonymous:
enabled: false
webhook:
cacheTTL: 0s
enabled: true
x509:
clientCAFile: /etc/kubernetes/pki/ca.crt
authorization:
mode: Webhook
webhook:
cacheAuthorizedTTL: 0s
cacheUnauthorizedTTL: 0s
clusterDNS:
- 10.96.0.10
clusterDomain: cluster.local
cpuManagerReconcilePeriod: 0s
evictionPressureTransitionPeriod: 0s
fileCheckFrequency: 0s
healthzBindAddress: 127.0.0.1
healthzPort: 10248
httpCheckFrequency: 0s
imageMinimumGCAge: 0s
kind: KubeletConfiguration
nodeStatusReportFrequency: 0s
nodeStatusUpdateFrequency: 0s
rotateCertificates: true
runtimeRequestTimeout: 0s
staticPodPath: /etc/kubernetes/manifests
streamingConnectionIdleTimeout: 0s
syncFrequency: 0s
volumeStatsAggPeriod: 0s
kind: ConfigMap
Pods in kube namespace kubectl get pods -n kube-system look like this:
coredns-6955765f44-qdbgx 1/1 Running 6 11d
coredns-6955765f44-r4v7n 1/1 Running 6 11d
etcd-ip-172-31-42-121.ec2.internal 1/1 Running 7 11d
kube-apiserver-ip-172-31-42-121.ec2.internal 1/1 Running 7 11d
kube-controller-manager-ip-172-31-42-121.ec2.internal 1/1 Running 6 11d
kube-proxy-lrpd9 1/1 Running 6 11d
kube-proxy-z55cv 1/1 Running 6 11d
kube-scheduler-ip-172-31-42-121.ec2.internal 1/1 Running 6 11d
weave-net-bdn5n 2/2 Running 0 39h
weave-net-z7mks 2/2 Running 5 39h
UPDATE
From the pod if I do ip route it returns:
default via 10.32.0.1 dev eth0
10.32.0.0/12 dev eth0 scope link src 10.32.0.16
From master:
default via 172.31.32.1 dev eth0
10.32.0.0/12 dev weave proto kernel scope link src 10.32.0.1
172.17.0.0/16 dev docker0 proto kernel scope link src 172.17.0.1
172.31.32.0/20 dev eth0 proto kernel scope link src 172.31.42.121
From node:
default via 172.31.32.1 dev eth0
10.32.0.0/12 dev weave proto kernel scope link src 10.32.0.1
172.17.0.0/16 dev docker0 proto kernel scope link src 172.17.0.1
172.31.32.0/20 dev eth0 proto kernel scope link src 172.31.46.62
CoreDNS configmap kubectl -n kube-system get configmap coredns -oyaml is:
apiVersion: v1
data:
Corefile: |
.:53 {
log
errors
health {
lameduck 5s
}
ready
kubernetes cluster.local in-addr.arpa ip6.arpa {
pods insecure
fallthrough in-addr.arpa ip6.arpa
ttl 30
}
prometheus :9153
forward . /etc/resolv.conf
cache 30
loop
reload
loadbalance
}
kind: ConfigMap
So why nslookup google.com doesn't work inside of a pod??
Installation of k8s cluster was wrong, ansible script should contain correct private IPs of master and nodes on ec2 vms.
dev-kubernetes-master ansible_host=34.233.207.xxx private_ip=172.31.37.xx
dev-kubernetes-slave ansible_host=52.6.10.xxx private_ip=172.31.42.xxx
I've reinstalled cluster with correct private ips specified (before there was no private ip at all) and the problem has gone.

coredns do not resolve service name correctly

i use Kubernetes v1.11.3 ,it use coredns to resolve host or service name,but i find in pod ,the resolve not work correctly,
# kubectl get services --all-namespaces -o wide
NAMESPACE NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE SELECTOR
default kubernetes ClusterIP 10.96.0.1 <none> 443/TCP 50d <none>
kube-system calico-etcd ClusterIP 10.96.232.136 <none> 6666/TCP 50d k8s-app=calico-etcd
kube-system kube-dns ClusterIP 10.96.0.10 <none> 53/UDP,53/TCP 50d k8s-app=kube-dns
kube-system kubelet ClusterIP None <none> 10250/TCP 32d <none>
testalex grafana NodePort 10.96.51.173 <none> 3000:30002/TCP 2d app=grafana
testalex k8s-alert NodePort 10.108.150.47 <none> 9093:30093/TCP 13m app=alertmanager
testalex prometheus NodePort 10.96.182.108 <none> 9090:30090/TCP 16m app=prometheus
following command no response
# kubectl exec -it k8s-monitor-7ddcb74b87-n6jsd -n testalex /bin/bash
[root#k8s-monitor-7ddcb74b87-n6jsd /]# ping k8s-alert
PING k8s-alert.testalex.svc.cluster.local (10.108.150.47) 56(84) bytes of data.
and no cordons output log
# kubectl logs coredns-78fcdf6894-h78sd -n kube-system
i think maybe something is wrong,but i can not locate the problem,another question is why the two coredns pods on the master node,it suppose to one on each node
UPDATE
it seems coredns work fine ,but i do not understand the ping command no return
[root#k8s-monitor-7ddcb74b87-n6jsd yum.repos.d]# nslookup kubernetes.default
Server: 10.96.0.10
Address: 10.96.0.10#53
Name: kubernetes.default.svc.cluster.local
Address: 10.96.0.1
[root#k8s-monitor-7ddcb74b87-n6jsd yum.repos.d]# cat /etc/resolv.conf
nameserver 10.96.0.10
search testalex.svc.cluster.local svc.cluster.local cluster.local
options ndots:5
# kubectl get ep kube-dns --namespace=kube-system
NAME ENDPOINTS AGE
kube-dns 192.168.121.3:53,192.168.121.4:53,192.168.121.3:53 + 1 more... 50d
also dns server can not be reached
# kubectl exec -it k8s-monitor-7ddcb74b87-n6jsd -n testalex /bin/bash
[root#k8s-monitor-7ddcb74b87-n6jsd /]# cat /etc/resolv.conf
nameserver 10.96.0.10
search testalex.svc.cluster.local svc.cluster.local cluster.local
options ndots:5
[root#k8s-monitor-7ddcb74b87-n6jsd /]# ping 10.96.0.10
PING 10.96.0.10 (10.96.0.10) 56(84) bytes of data.
^C
--- 10.96.0.10 ping statistics ---
9 packets transmitted, 0 received, 100% packet loss, time 8000ms
i think maybe i misconfig the network
this is my cluster init command
kubeadm init --kubernetes-version=v1.11.3 --apiserver-advertise-address=10.100.1.20 --pod-network-cidr=172.16.0.0/16
and this is calico ip pool set
# kubectl exec -it calico-node-77m9l -n kube-system /bin/sh
Defaulting container name to calico-node.
Use 'kubectl describe pod/calico-node-77m9l -n kube-system' to see all of the containers in this pod.
/ # cd /tmp
/tmp # ls
calicoctl tunl-ip
/tmp # ./calicoctl get ipPool
CIDR
192.168.0.0/16
You can start by checking if the dns is working
Run the nslookup on kubernetes.default from inside the pod k8s-monitor-7ddcb74b87-n6jsd, check if it is working.
[root#k8s-monitor-7ddcb74b87-n6jsd /]# nslookup kubernetes.default
Server: 10.96.0.10
Address: 10.96.0.10#53
Name: kubernetes.default.svc.cluster.local
Address: 10.96.0.1
If this returns output that means everything is working from the coredns. If output is not okay, then look into the the resolve.conf inside the pod k8s-monitor-7ddcb74b87-n6jsd, it should return output something like this:
[root#metrics-master-2 /]# cat /etc/resolv.conf
nameserver 10.96.0.10
search default.svc.cluster.local svc.cluster.local cluster.local ec2.internal
options ndots:5
At last check the coredns endpoints are exposed using:
kubectl get ep kube-dns --namespace=kube-system
NAME ENDPOINTS AGE
kube-dns 10.180.3.17:53,10.180.3.17:53 1h
You can verify if queries are being received by CoreDNS by adding the log plugin to the CoreDNS configuration (aka Corefile). The CoreDNS Corefile is held in a ConfigMap named coredns
Hope this helps.
EDIT:
You might be having this issue, Please have a look:
https://github.com/kubernetes/kubeadm/issues/1056
You cannot ping ipaddress or hostname of service cluster always,since it is virtual ip
service’s cluster IP is a virtual IP, and only has meaning when combined with the service port.You can try the same via srv recored(combination of virtual ip and port)(refer kubernetes in action by mark luksa)
Thanks for the answer. This is the output. IP-s certainly not real.
[root#master ~]# nslookup kubernetes.default
Server: 203.150.92.12
Address: 203.150.92.12#53
** server can't find kubernetes.default: NXDOMAIN
[root#master ~]# kubectl cluster-info
Kubernetes master is running at https://203.150.72.81:6443
coredns is running at https://203.150.72.81:6443/api/v1/namespaces/kube-system/services/coredns:dns/proxy
kubernetes-dashboard is running at https://203.150.72.81:6443/api/v1/namespaces/kube-system/services/https:kubernetes-dashboard:/proxy
metrics-server is running at https://203.150.72.81:6443/api/v1/namespaces/kube-system/services/https:metrics-server:/proxy
To further debug and diagnose cluster problems, use 'kubectl cluster-info dump'.
[root#master ~]# cat /etc/resolv.conf
search invalid
nameserver 203.150.92.12
nameserver 203.150.92.10
nameserver 1111:c207::2:55
[root#master ~]# kubectl get ep kube-dns --namespace=kube-system
Error from server (NotFound): endpoints "kube-dns" not found
[root#master ~]#
I think the reason why you cannot get ping working is because you are using iptables to redirect the request to service cluster IP to the correct pods. The iptables rule will only redirect the traffic to the service cluster IP with the exported ports. The icmp request is never been redirected to the real endpoints.