coredns do not resolve service name correctly - kubernetes

i use Kubernetes v1.11.3 ,it use coredns to resolve host or service name,but i find in pod ,the resolve not work correctly,
# kubectl get services --all-namespaces -o wide
NAMESPACE NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE SELECTOR
default kubernetes ClusterIP 10.96.0.1 <none> 443/TCP 50d <none>
kube-system calico-etcd ClusterIP 10.96.232.136 <none> 6666/TCP 50d k8s-app=calico-etcd
kube-system kube-dns ClusterIP 10.96.0.10 <none> 53/UDP,53/TCP 50d k8s-app=kube-dns
kube-system kubelet ClusterIP None <none> 10250/TCP 32d <none>
testalex grafana NodePort 10.96.51.173 <none> 3000:30002/TCP 2d app=grafana
testalex k8s-alert NodePort 10.108.150.47 <none> 9093:30093/TCP 13m app=alertmanager
testalex prometheus NodePort 10.96.182.108 <none> 9090:30090/TCP 16m app=prometheus
following command no response
# kubectl exec -it k8s-monitor-7ddcb74b87-n6jsd -n testalex /bin/bash
[root#k8s-monitor-7ddcb74b87-n6jsd /]# ping k8s-alert
PING k8s-alert.testalex.svc.cluster.local (10.108.150.47) 56(84) bytes of data.
and no cordons output log
# kubectl logs coredns-78fcdf6894-h78sd -n kube-system
i think maybe something is wrong,but i can not locate the problem,another question is why the two coredns pods on the master node,it suppose to one on each node
UPDATE
it seems coredns work fine ,but i do not understand the ping command no return
[root#k8s-monitor-7ddcb74b87-n6jsd yum.repos.d]# nslookup kubernetes.default
Server: 10.96.0.10
Address: 10.96.0.10#53
Name: kubernetes.default.svc.cluster.local
Address: 10.96.0.1
[root#k8s-monitor-7ddcb74b87-n6jsd yum.repos.d]# cat /etc/resolv.conf
nameserver 10.96.0.10
search testalex.svc.cluster.local svc.cluster.local cluster.local
options ndots:5
# kubectl get ep kube-dns --namespace=kube-system
NAME ENDPOINTS AGE
kube-dns 192.168.121.3:53,192.168.121.4:53,192.168.121.3:53 + 1 more... 50d
also dns server can not be reached
# kubectl exec -it k8s-monitor-7ddcb74b87-n6jsd -n testalex /bin/bash
[root#k8s-monitor-7ddcb74b87-n6jsd /]# cat /etc/resolv.conf
nameserver 10.96.0.10
search testalex.svc.cluster.local svc.cluster.local cluster.local
options ndots:5
[root#k8s-monitor-7ddcb74b87-n6jsd /]# ping 10.96.0.10
PING 10.96.0.10 (10.96.0.10) 56(84) bytes of data.
^C
--- 10.96.0.10 ping statistics ---
9 packets transmitted, 0 received, 100% packet loss, time 8000ms
i think maybe i misconfig the network
this is my cluster init command
kubeadm init --kubernetes-version=v1.11.3 --apiserver-advertise-address=10.100.1.20 --pod-network-cidr=172.16.0.0/16
and this is calico ip pool set
# kubectl exec -it calico-node-77m9l -n kube-system /bin/sh
Defaulting container name to calico-node.
Use 'kubectl describe pod/calico-node-77m9l -n kube-system' to see all of the containers in this pod.
/ # cd /tmp
/tmp # ls
calicoctl tunl-ip
/tmp # ./calicoctl get ipPool
CIDR
192.168.0.0/16

You can start by checking if the dns is working
Run the nslookup on kubernetes.default from inside the pod k8s-monitor-7ddcb74b87-n6jsd, check if it is working.
[root#k8s-monitor-7ddcb74b87-n6jsd /]# nslookup kubernetes.default
Server: 10.96.0.10
Address: 10.96.0.10#53
Name: kubernetes.default.svc.cluster.local
Address: 10.96.0.1
If this returns output that means everything is working from the coredns. If output is not okay, then look into the the resolve.conf inside the pod k8s-monitor-7ddcb74b87-n6jsd, it should return output something like this:
[root#metrics-master-2 /]# cat /etc/resolv.conf
nameserver 10.96.0.10
search default.svc.cluster.local svc.cluster.local cluster.local ec2.internal
options ndots:5
At last check the coredns endpoints are exposed using:
kubectl get ep kube-dns --namespace=kube-system
NAME ENDPOINTS AGE
kube-dns 10.180.3.17:53,10.180.3.17:53 1h
You can verify if queries are being received by CoreDNS by adding the log plugin to the CoreDNS configuration (aka Corefile). The CoreDNS Corefile is held in a ConfigMap named coredns
Hope this helps.
EDIT:
You might be having this issue, Please have a look:
https://github.com/kubernetes/kubeadm/issues/1056

You cannot ping ipaddress or hostname of service cluster always,since it is virtual ip
service’s cluster IP is a virtual IP, and only has meaning when combined with the service port.You can try the same via srv recored(combination of virtual ip and port)(refer kubernetes in action by mark luksa)

Thanks for the answer. This is the output. IP-s certainly not real.
[root#master ~]# nslookup kubernetes.default
Server: 203.150.92.12
Address: 203.150.92.12#53
** server can't find kubernetes.default: NXDOMAIN
[root#master ~]# kubectl cluster-info
Kubernetes master is running at https://203.150.72.81:6443
coredns is running at https://203.150.72.81:6443/api/v1/namespaces/kube-system/services/coredns:dns/proxy
kubernetes-dashboard is running at https://203.150.72.81:6443/api/v1/namespaces/kube-system/services/https:kubernetes-dashboard:/proxy
metrics-server is running at https://203.150.72.81:6443/api/v1/namespaces/kube-system/services/https:metrics-server:/proxy
To further debug and diagnose cluster problems, use 'kubectl cluster-info dump'.
[root#master ~]# cat /etc/resolv.conf
search invalid
nameserver 203.150.92.12
nameserver 203.150.92.10
nameserver 1111:c207::2:55
[root#master ~]# kubectl get ep kube-dns --namespace=kube-system
Error from server (NotFound): endpoints "kube-dns" not found
[root#master ~]#

I think the reason why you cannot get ping working is because you are using iptables to redirect the request to service cluster IP to the correct pods. The iptables rule will only redirect the traffic to the service cluster IP with the exported ports. The icmp request is never been redirected to the real endpoints.

Related

kube service domain name not working, but clusterIP does work with Jupyter Enterprise Gateway

I have a Jupyter notebook setup in the jupyter namespace on a kubernetes cluster, and Jupyter Enterprise Gateway setup in the enterprise-gateway namespace as a Service in the same cluster.
If I configure the notebook to connect to the enterprise-gateway service using the clusterIP it works fine.
--gateway-url=http://172.20.186.249:8888
but if I switch to using the service domain name the notebook receives a 503 Connection Refused error
--gateway-url=http://enterprise-gateway.enterprise-gateway.svc.cluster.local:8888
When I use busybox check to check the kubernetes dns, the domain resolves as expected.
kubectl -n default exec -ti busybox nslookup enterprise-gateway.enterprise-gateway
kubectl exec [POD] [COMMAND] is DEPRECATED and will be removed in a future version. Use kubectl exec [POD] -- [COMMAND] instead.
Server: 172.20.0.10
Address 1: 172.20.0.10 kube-dns.kube-system.svc.cluster.local
Name: enterprise-gateway.enterprise-gateway
Address 1: 172.20.186.249 enterprise-gateway.enterprise-gateway.svc.cluster.local
How do I get the domain name to work?
The Service config for the JEG looks like this...
kubectl describe svc enterprise-gateway --namespace enterprise-gateway
Name: enterprise-gateway
Namespace: enterprise-gateway
Labels: app=enterprise-gateway
app.kubernetes.io/managed-by=Helm
chart=enterprise-gateway-2.6.0
component=enterprise-gateway
heritage=Helm
release=enterprise-gateway
Annotations: meta.helm.sh/release-name: enterprise-gateway
meta.helm.sh/release-namespace: enterprise-gateway
Selector: app=enterprise-gateway
Type: NodePort
IP Family Policy: SingleStack
IP Families: IPv4
IP: 172.20.250.15
IPs: 172.20.250.15
Port: http 8888/TCP
TargetPort: 8888/TCP
NodePort: http 31366/TCP
Endpoints: 10.1.16.136:8888,10.1.2.228:8888,10.1.30.90:8888
Port: response 8877/TCP
TargetPort: 8877/TCP
NodePort: response 31201/TCP
Endpoints: 10.1.16.136:8877,10.1.2.228:8877,10.1.30.90:8877
Session Affinity: ClientIP
External Traffic Policy: Cluster
Events: <none>
Ok, i dont know where to start i have a bunch of findings. I will start with the eye catcher one, i have a working test project i can share later on and i have to elaborate more in this answer if needed.
Step1
1- I see a mismatch on your IPs. The DNS lookup did not resolved the service DNS to the correct IP.
Address 1: 172.20.186.249 is different than IP: 172.20.250.15
To debug DNS:
kubectl exec "YOURPODNAME" cat /etc/resolv.conf
Verify that a search path and a name server are set up correctly
nameserver 10.96.0.10
search default.svc.cluster.local svc.cluster.local cluster.local
options ndots:5
check if the kubedns/coredns pods are running
kubectl get pods --namespace=kube-system
NAME READY STATUS RESTARTS AGE
....
kube-dns-86f4d74b45-2qkfd 3/3 Running 232 133d
kube-proxy-b2frq 1/1 Running 0 15m
...
If the pod is running, there might be something wrong with the global DNS service
kubectl get svc --namespace=kube-system
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE kube-dns ClusterIP 10.96.0.10 <none> 53/UDP,53/TCP
You might also need to check whether DNS endpoints are exposed:
kubectl get ep kube-dns --namespace=kube-system
NAME ENDPOINTS AGE
kube-dns 172.17.0.5:53,172.17.0.5:53 133d
These debugging actions will usually indicate the problem with your DNS configuration, or it will simply show you that a DNS add-on should be enabled in your cluster configuration.
Step 2
When using busybox to check the kubernetes dns
This seems incorrect when looking Address 1: 172.20.186.249 im expecting to get an IP 10.X.X.X
Install dnsutils on the pod as pointed below
kubectl exec --stdin --tty "YOURPODNAME" -- apt update && sudo
apt-get -y install dnsutils
kubectl exec -it "YOURPODNAME" -- /bin/bash
Inside the pod and again (weird) run apt-get install dnsutils
Stay inside the pod and run nslookup "YOURSERVICENAME" you will get
an IP and a Name(DNS).
Check this IP since it needs to match with the IP of the service description.
kubectl describe svc "YOURSERVICENAME", the IP should be the same as #4
What you must see:
Step 3
Once you have Step #2 solved you will be able to use the service
name(FQDN) returned in Step 2 item #4
To be continued...

Kubernetes DNS lookg not working from worker node - connection timed out; no servers could be reached

I have build new Kubernetes cluster v1.20.1 single master and single node with Calico CNI.
I deployed the busybox pod in default namespace.
# kubectl get pods busybox -o wide
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
busybox 1/1 Running 0 12m 10.203.0.129 node02 <none> <none>
nslookup not working
kubectl exec -ti busybox -- nslookup kubernetes.default
Server: 10.96.0.10
Address 1: 10.96.0.10
nslookup: can't resolve 'kubernetes.default'
cluster is running RHEL 8 with latest update
followed this steps: https://kubernetes.io/docs/tasks/administer-cluster/dns-debugging-resolution/
nslookup command not able to reach nameserver
# kubectl exec -i -t dnsutils -- nslookup kubernetes.default
;; connection timed out; no servers could be reached
command terminated with exit code 1
resolve.conf file
# kubectl exec -ti dnsutils -- cat /etc/resolv.conf
search default.svc.cluster.local svc.cluster.local cluster.local
nameserver 10.96.0.10
options ndots:5
DNS pods running
# kubectl get pods --namespace=kube-system -l k8s-app=kube-dns
NAME READY STATUS RESTARTS AGE
coredns-74ff55c5b-472vx 1/1 Running 1 85m
coredns-74ff55c5b-c75bq 1/1 Running 1 85m
DNS pod logs
kubectl logs --namespace=kube-system -l k8s-app=kube-dns
.:53
[INFO] plugin/reload: Running configuration MD5 = db32ca3650231d74073ff4cf814959a7
CoreDNS-1.7.0
linux/amd64, go1.14.4, f59c03d
.:53
[INFO] plugin/reload: Running configuration MD5 = db32ca3650231d74073ff4cf814959a7
CoreDNS-1.7.0
linux/amd64, go1.14.4, f59c03d
Service is defined
# kubectl get svc --namespace=kube-system
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
kube-dns ClusterIP 10.96.0.10 <none> 53/UDP,53/TCP,9153/TCP 86m
**I can see the endpoints of DNS pod**
# kubectl get endpoints kube-dns --namespace=kube-system
NAME ENDPOINTS AGE
kube-dns 10.203.0.5:53,10.203.0.6:53,10.203.0.5:53 + 3 more... 86m
enabled the logging, but didn't see traffic coming to DNS pod
# kubectl logs --namespace=kube-system -l k8s-app=kube-dns
.:53
[INFO] plugin/reload: Running configuration MD5 = db32ca3650231d74073ff4cf814959a7
CoreDNS-1.7.0
linux/amd64, go1.14.4, f59c03d
.:53
[INFO] plugin/reload: Running configuration MD5 = db32ca3650231d74073ff4cf814959a7
CoreDNS-1.7.0
linux/amd64, go1.14.4, f59c03d
I can ping DNS POD
# kubectl exec -i -t dnsutils -- ping 10.203.0.5
PING 10.203.0.5 (10.203.0.5): 56 data bytes
64 bytes from 10.203.0.5: seq=0 ttl=62 time=6.024 ms
64 bytes from 10.203.0.5: seq=1 ttl=62 time=6.052 ms
64 bytes from 10.203.0.5: seq=2 ttl=62 time=6.175 ms
64 bytes from 10.203.0.5: seq=3 ttl=62 time=6.000 ms
^C
--- 10.203.0.5 ping statistics ---
4 packets transmitted, 4 packets received, 0% packet loss
round-trip min/avg/max = 6.000/6.062/6.175 ms
nmap show port filtered
# ke netshoot-6f677d4fdf-5t5cb -- nmap 10.203.0.5
Starting Nmap 7.80 ( https://nmap.org ) at 2021-01-15 22:29 UTC
Nmap scan report for 10.203.0.5
Host is up (0.0060s latency).
Not shown: 997 closed ports
PORT STATE SERVICE
53/tcp filtered domain
8080/tcp filtered http-proxy
8181/tcp filtered intermapper
Nmap done: 1 IP address (1 host up) scanned in 14.33 seconds
If I schedule the POD on master node, nslookup works nmap show port open?
# ke netshoot -- bash
bash-5.0# nslookup kubernetes.default
Server: 10.96.0.10
Address: 10.96.0.10#53
Name: kubernetes.default.svc.cluster.local
Address: 10.96.0.1
nmap -p 53 10.96.0.10
Starting Nmap 7.80 ( https://nmap.org ) at 2021-01-15 22:46 UTC
Nmap scan report for kube-dns.kube-system.svc.cluster.local (10.96.0.10)
Host is up (0.000098s latency).
PORT STATE SERVICE
53/tcp open domain
Nmap done: 1 IP address (1 host up) scanned in 0.14 seconds
Why nslookup from POD running on worker node is not working? how to troubleshoot this issue?
I re-build the server two times, still same issue.
Thanks
SR
Update adding kubeadm config file
# cat kubeadm-config.yaml
---
apiVersion: kubeadm.k8s.io/v1beta2
bootstrapTokens:
- groups:
- system:bootstrappers:kubeadm:default-node-token
token: abcdef.0123456789abcdef
ttl: 24h0m0s
usages:
- signing
- authentication
kind: InitConfiguration
nodeRegistration:
criSocket: unix:///run/containerd/containerd.sock
taints:
- effect: NoSchedule
key: node-role.kubernetes.io/master
kubeletExtraArgs:
cgroup-driver: "systemd"
---
apiVersion: kubeadm.k8s.io/v1beta2
kind: ClusterConfiguration
kubernetesVersion: stable
controlPlaneEndpoint: "master01:6443"
networking:
dnsDomain: cluster.local
podSubnet: 10.0.0.0/14
serviceSubnet: 10.96.0.0/12
---
apiVersion: kubeproxy.config.k8s.io/v1alpha1
kind: KubeProxyConfiguration
mode: "ipvs
"
First of all, according to the docs - please note that Calico and kubeadm support Centos/RHEL 7+.
In both Calico and kubeadm documentation we can see that they only support RHEL7+.
By default RHEL8 uses nftables instead of iptables ( we can still use iptables but "iptables" on RHEL8 is actually using the kernel's nft framework in the background - look at "Running Iptables on RHEL 8").
9.2.1. nftables replaces iptables as the default network packet filtering framework
I believe that nftables may cause this network issues because as we can find on nftables adoption page:
Kubernetes does not support nftables yet.
Note: For now I highly recommend you to use RHEL7 instead of RHEL8.
With that in mind, I'll present some information that may help you with RHEL8.
I have reproduced your issue and found a solution that works for me.
First I opened ports required by Calico - these ports can be found
here under "Network requirements".
As workaround:
Next I reverted to the old iptables backend on all cluster
nodes, you can easily do so by setting FirewallBackend in
/etc/firewalld/firewalld.conf to iptables as described
here.
Finally I restarted firewalld to make the new rules active.
I've tried nslookup from Pod running on worker node (kworker) and it seems to work correctly.
root#kmaster:~# kubectl get pod,svc -o wide
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
pod/web 1/1 Running 0 112s 10.99.32.1 kworker <none> <none>
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE SELECTOR
service/kubernetes ClusterIP 10.99.0.1 <none> 443/TCP 5m51s <none>
root#kmaster:~# kubectl exec -it web -- bash
root#web:/# nslookup kubernetes.default
Server: 10.99.0.10
Address: 10.99.0.10#53
Name: kubernetes.default.svc.cluster.local
Address: 10.99.0.1
root#web:/#
In my situation, we're using the K3S cluster. And the new agent couldn't make the default(ClusterFirst) DNS query. After lots of research, I found I need to change the kube-proxy cluster-cidr args to make the DNS work successfully.
Hope this info is useful for others.
I ran into the same issue setting up a vanilla kubeadm 1.25 cluster on RHEL8 and #matt_j's answer lead me to another solution that avoids nftables by using ipvs mode in kube-proxy.
Just modify the kube-proxy ConfigMap in kube-system namespace so the config.conf file has this value;
...
data:
config.conf:
...
mode: "ipvs"
...
And ensure kube-proxy or your nodes are restarted.

How can I get CoreDNS to resolve on my Raspberry Pi Kubernetes cluster?

I've followed a number of online tutorials to set up a Kubernetes cluster on four Raspberry Pi 4s. I ended up using Flannel as the networking plugin as that seems to be the only one that actually works on RPi, with a pod network CIDR of 10.244.0.0/16, per this guide from 2017. Most everything is working... all of the base pods in the kube-system namespace are running/healthy, and I can pull down images and launch new containers. At first I wasn't able to get any pod logs, but that was quickly remedied by opening up port 10250 on each node.
But there still seems to be a problem DNS resolution. I should clarify that DNS resolution on the hosts clearly does work, as the cluster is able to download any container image I specify. But once a container is running, it isn't able to "dial out" to anything. As a test, I'm running the arm32v7/buildpack-deps:latest container in a pod. It pulls the image from Docker hub just fine. But when I shell into it and simply type curl https://www.google.com it hangs before eventually timing out. And the same is true of any pod I launch that needs to interact with the external Internet: they hang and hang and hang.
Here are all the networking-related commands I've already run on each node:
sudo iptables -P FORWARD ACCEPT
sudo iptables -A FORWARD -i cni0 -j ACCEPT
sudo iptables -A FORWARD -o cni0 -j ACCEPT
sudo ufw allow ssh
sudo ufw allow 443 # can't remember why i ran this one
sudo ufw allow 6443
sudo ufw allow 8080 # this one might not be strictly necessary, either
sudo ufw allow 10250
sudo ufw default allow routed
sudo ufw enable
I'm not entirely sure that the last two iptables commands did anything; I grabbed them from the comment section of that guide I linked to earlier. I know that guide assumes one is using kube-dns but it's also 3 years old so I am using the (newer) default, coredns, instead.
What am I missing? I feel like I'm so close to having this cluster fully operational, but obviously I need functioning DNS!
UPDATE: I know that it's a DNS problem, and not general Internet connectivity, for two reasons: (1) the cluster itself can pull down any image I specify from Dockerhub, and (2) when I shell into a running container that has curl and execute curl -H "Host: www.google.com" 142.250.73.206, it successfully returns the Google homepage HTML. But as mentioned if I try and do my earlier curl command using the hostname, that times out.
Create a simple Pod to use as a test environment for DNS diagnosing:
apiVersion: v1
kind: Pod
metadata:
name: dnsutils
namespace: default
spec:
containers:
- name: dnsutils
image: gcr.io/kubernetes-e2e-test-images/dnsutils:1.3
command:
- sleep
- "3600"
imagePullPolicy: IfNotPresent
restartPolicy: Always
kubectl apply -f dnsutils.yaml
Check the status of Pod
$ kubectl get pods dnsutils
NAME READY STATUS RESTARTS AGE
dnsutils 1/1 Running 0 <some-time>
Once that Pod is running, you can exec nslookup in that environment. If you see something like the following, DNS is working correctly.
$ kubectl exec -i -t dnsutils -- nslookup kubernetes.default
Server: 10.0.0.10
Address 1: 10.0.0.10
Name: kubernetes.default
Address 1: 10.0.0.1
If the nslookup command fails, check the following:
Take a look inside the resolv.conf file.
kubectl exec -ti dnsutils -- cat /etc/resolv.conf
Verify that the search path and name server are set up like the following (note that search path may vary for different cloud providers):
search default.svc.cluster.local svc.cluster.local cluster.local google.internal c.gce_project_id.internal
nameserver 10.0.0.10
options ndots:5
Errors such as the following indicate a problem with the CoreDNS (or kube-dns) add-on or with associated Services:
$ kubectl exec -i -t dnsutils -- nslookup kubernetes.default
Server: 10.0.0.10
Address 1: 10.0.0.10
nslookup: can't resolve 'kubernetes.default'
OR
Server: 10.0.0.10
Address 1: 10.0.0.10 kube-dns.kube-system.svc.cluster.local
nslookup: can't resolve 'kubernetes.default'
Check if the DNS pod is running
$ kubectl get pods --namespace=kube-system -l k8s-app=kube-dns
NAME READY STATUS RESTARTS AGE
...
coredns-7b96bf9f76-5hsxb 1/1 Running 0 1h
coredns-7b96bf9f76-mvmmt 1/1 Running 0 1h
...
Check for errors in the DNS pod
Here is an example of a healthy CoreDNS log:
$ kubectl logs --namespace=kube-system -l k8s-app=kube-dns
.:53
2018/08/15 14:37:17 [INFO] CoreDNS-1.2.2
2018/08/15 14:37:17 [INFO] linux/amd64, go1.10.3, 2e322f6
CoreDNS-1.2.2
linux/amd64, go1.10.3, 2e322f6
2018/08/15 14:37:17 [INFO] plugin/reload: Running configuration MD5 = 24e6c59e83ce706f07bcc82c31b1ea1c
Verify that the DNS service is up by using the kubectl get service command.
$ kubectl get svc --namespace=kube-system
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
...
kube-dns ClusterIP 10.0.0.10 <none> 53/UDP,53/TCP 1h
...
You can verify that DNS endpoints are exposed by using the kubectl get endpoints command.
$ kubectl get endpoints kube-dns --namespace=kube-system
NAME ENDPOINTS AGE
kube-dns 10.180.3.17:53,10.180.3.17:53 1h
You can verify if queries are being received by CoreDNS by adding the log plugin to the CoreDNS configuration (aka Corefile). The CoreDNS Corefile is held in a ConfigMap named coredns. To edit it, use the command:
$ kubectl -n kube-system edit configmap coredns
Then add log in the Corefile section per the example below:
apiVersion: v1
kind: ConfigMap
metadata:
name: coredns
namespace: kube-system
data:
Corefile: |
.:53 {
log
errors
health
kubernetes cluster.local in-addr.arpa ip6.arpa {
pods insecure
upstream
fallthrough in-addr.arpa ip6.arpa
}
prometheus :9153
forward . /etc/resolv.conf
cache 30
loop
reload
loadbalance
}
After saving the changes, it may take up to minute or two for Kubernetes to propagate these changes to the CoreDNS pods.
Next, make some queries and view the logs per the sections above in this document. If CoreDNS pods are receiving the queries, you should see them in the logs.
Here is an example of a query in the log:
.:53
2018/08/15 14:37:15 [INFO] CoreDNS-1.2.0
2018/08/15 14:37:15 [INFO] linux/amd64, go1.10.3, 2e322f6
CoreDNS-1.2.0
linux/amd64, go1.10.3, 2e322f6
2018/09/07 15:29:04 [INFO] plugin/reload: Running configuration MD5 = 162475cdf272d8aa601e6fe67a6ad42f
2018/09/07 15:29:04 [INFO] Reloading complete
172.17.0.18:41675 - [07/Sep/2018:15:29:11 +0000] 59925 "A IN kubernetes.default.svc.cluster.local. udp 54 false 512" NOERROR qr,aa,rd,ra 106 0.000066649s
As pointed out in the comments: The configuration of kubeadm seems fine.
Your pods have the correct /etc/resolv.conf and they should work.
It's pretty hard to clarily determine the problem - many things can be happend here.
My guess: There something not right with ufw.
You can easily proof it: Disable ufw on all nodes (with ufw disable).
I'm not hundred percent sure which ports are needed. I'm using iptables for my single node k8s and at the start I had many problems FORWARD vs INPUT rules. In docker all ports are forwarded.
So I guess there is something wrong with FORWARD-rules and/or the dns-ports (53/udp and 53/tcp).
Good luck.

Kubespray : Netchecker connectivity check fails

I deployed a Kubernetes (v1.17.5) cluster on OpenStack instances using Kubespray. Those instances are CentOS 7.6.1811 qcow2 images imported in Glance.
The install was successful, and I can see my nodes and pods with kubectl commands.
I used the deploy_netchecker option to deploy NetChecker and test the network within my cluster, and set network_plugin="flannel".
I also tried kube_proxy_mode="iptables", but it doesn't seem to affect the result.
That's pretty much all the changes I did in the k8s-cluster.yml file.
All the pods are running, services too :
[centos#cl1-master-0 ~]$ kubectl get svc --all-namespaces
NAMESPACE NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
default kubernetes ClusterIP 10.233.0.1 <none> 443/TCP 46h
default netchecker-service NodePort 10.233.13.213 <none> 8081:31081/TCP 46h
kube-system coredns ClusterIP 10.233.0.3 <none> 53/UDP,53/TCP,9153/TCP 46h
kube-system dashboard-metrics-scraper ClusterIP 10.233.59.12 <none> 8000/TCP 46h
kube-system kubernetes-dashboard ClusterIP 10.233.63.20 <none> 443/TCP 46h
But netchecker API gives the following answer :
[root#localhost ~]# curl http://X.X.X.X:31081/api/v1/connectivity_check
{"Message":"Connectivity check fails. Reason: there are absent or outdated pods; look up the payload","Absent":["netchecker-agent-hostnet-kk56x","netchecker-agent-hostnet-klldn","netchecker-agent-hostnet-r2vqs","netchecker-agent-hostnet-wqhjs"],"Outdated":["netchecker-agent-4jsgf","netchecker-agent-c9pcf","netchecker-agent-hostnet-jzbfv","netchecker-agent-vxgpf"]}
For an unknown reason, I cannot access the API from a cluster node with localhost, so I used a floating IP with OpenStack.
Here are some logs from the agent :
[centos#cl1-master-0 ~]$ sudo vi /var/log/pods/default_netchecker-agent-vjnwl_d8290268-3ea4-4e3c-acb4-295ab162a735/netchecker-agent/0.log
{"log":"I0701 13:04:01.814246 1 agent.go:135] Response status code: 200\n","stream":"stderr","time":"2020-07-01T13:04:01.81437579Z"}
{"log":"I0701 13:04:01.814272 1 agent.go:128] Sleep for 15 second(s)\n","stream":"stderr","time":"2020-07-01T13:04:01.814393199Z"}
{"log":"I0701 13:04:16.817398 1 agent.go:55] Send payload via URL: http://netchecker-service:8081/api/v1/agents/netchecker-agent-vjnwl\n","stream":"stderr","time":"2020-07-01T13:04:16.817786735Z"}
[centos#cl1-master-0 ~]$ sudo vi /var/log/pods/default_netchecker-agent-hostnet-klldn_d5fa6e72-885f-44e1-97a6-880a25e6d6d6/netchecker-agent/0.log
{"log":"E0701 13:05:22.804428 1 agent.go:133] Error while sending info. Details: Post http://netchecker-service:8081/api/v1/agents/netchecker-agent-hostnet-klldn: dial tcp 10.233.13.213:8081: i/o timeout\n","stream":"stderr","time":"2020-07-01T13:05:22.805138032Z"}
{"log":"I0701 13:05:22.804474 1 agent.go:128] Sleep for 15 second(s)\n","stream":"stderr","time":"2020-07-01T13:05:22.805190295Z"}
{"log":"I0701 13:05:37.807140 1 agent.go:55] Send payload via URL: http://netchecker-service:8081/api/v1/agents/netchecker-agent-hostnet-klldn\n","stream":"stderr","time":"2020-07-01T13:05:37.807309111Z"}
Logs from the server do not indicate any error.
I tried to check DNS resolve with the following :
[centos#cl1-master-0 ~]$ kubectl exec -it netchecker-agent-4jsgf -- /bin/sh
/ $ nslookup kubernetes.default
Server: 169.254.25.10
Address 1: 169.254.25.10
nslookup: can't resolve 'kubernetes.default'
[centos#cl1-master-0 ~]$ kubectl exec -it netchecker-agent-4jsgf -- cat /etc/resolv.conf
nameserver 169.254.25.10
search default.svc.cluster.local svc.cluster.local cluster.local openstacklocal
options ndots:5
169.254.25.10 is the IP of the nodelocaldns, but it doesn't seem to query the coredns service deployed.
When I use nslookup netchecker-service.default.svc.cluster.local 10.233.0.3, with the coredns IP, I get a correct answer.
What can be wrong with my configuration ?
Thanks in advance
UPDATE : The plugin Flannel has an issue and contains a fix to apply on all nodes of the cluster. Once done, the pods successfully report back to the netchecker server.
UPDATE : The plugin Flannel has an issue and contains a fix to apply on all nodes of the cluster. Once done, the pods successfully report back to the netchecker server.

Kubernetes pods can't ping each other using ClusterIP

I'm trying to ping the kube-dns service from a dnstools pod using the cluster IP assigned to the kube-dns service. The ping request times out. From the same dnstools pod, I tried to curl the kube-dns service using the exposed port, but that timed out as well.
Following is the output of kubectl get pods --all-namespaces -o wide
NAMESPACE NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE
default pod/busybox 1/1 Running 62 2d14h 192.168.1.37 kubenode <none>
default pod/dnstools 1/1 Running 0 2d13h 192.168.1.45 kubenode <none>
default pod/nginx-deploy-7c45b84548-ckqzb 1/1 Running 0 6d11h 192.168.1.5 kubenode <none>
default pod/nginx-deploy-7c45b84548-vl4kh 1/1 Running 0 6d11h 192.168.1.4 kubenode <none>
dmi pod/elastic-deploy-5d7c85b8c-btptq 1/1 Running 0 2d14h 192.168.1.39 kubenode <none>
kube-system pod/calico-node-68lc7 2/2 Running 0 6d11h 10.62.194.5 kubenode <none>
kube-system pod/calico-node-9c2jz 2/2 Running 0 6d12h 10.62.194.4 kubemaster <none>
kube-system pod/coredns-5c98db65d4-5nprd 1/1 Running 0 6d12h 192.168.0.2 kubemaster <none>
kube-system pod/coredns-5c98db65d4-5vw95 1/1 Running 0 6d12h 192.168.0.3 kubemaster <none>
kube-system pod/etcd-kubemaster 1/1 Running 0 6d12h 10.62.194.4 kubemaster <none>
kube-system pod/kube-apiserver-kubemaster 1/1 Running 0 6d12h 10.62.194.4 kubemaster <none>
kube-system pod/kube-controller-manager-kubemaster 1/1 Running 1 6d12h 10.62.194.4 kubemaster <none>
kube-system pod/kube-proxy-9hcgv 1/1 Running 0 6d11h 10.62.194.5 kubenode <none>
kube-system pod/kube-proxy-bxw9s 1/1 Running 0 6d12h 10.62.194.4 kubemaster <none>
kube-system pod/kube-scheduler-kubemaster 1/1 Running 1 6d12h 10.62.194.4 kubemaster <none>
kube-system pod/tiller-deploy-767d9b9584-5k95j 1/1 Running 0 3d9h 192.168.1.8 kubenode <none>
nginx-ingress pod/nginx-ingress-66wts 1/1 Running 0 5d17h 192.168.1.6 kubenode <none>
In the above output, why do some pods have an IP assigned in the 192.168.0.0/24 subnet whereas others have an IP that is equal to the IP address of my node/master? (10.62.194.4 is the IP of my master, 10.62.194.5 is the IP of my node)
This is the config.yml I used to initialize the cluster using kubeadm init --config=config.yml
apiServer:
certSANs:
- 10.62.194.4
extraArgs:
authorization-mode: Node,RBAC
timeoutForControlPlane: 4m0s
apiVersion: kubeadm.k8s.io/v1beta2
certificatesDir: /etc/kubernetes/pki
clusterName: dev-cluster
controllerManager: {}
dns:
type: CoreDNS
etcd:
local:
dataDir: /var/lib/etcd
imageRepository: k8s.gcr.io
kind: ClusterConfiguration
kubernetesVersion: v1.15.1
networking:
dnsDomain: cluster.local
podSubnet: 192.168.0.0/16
serviceSubnet: 10.96.0.0/12
scheduler: {}
Result of kubectl get svc --all-namespaces -o wide
NAMESPACE NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE SELECTOR
default service/kubernetes ClusterIP 10.96.0.1 <none> 443/TCP 6d12h <none>
default service/nginx-deploy ClusterIP 10.97.5.194 <none> 80/TCP 5d17h run=nginx
dmi service/elasticsearch ClusterIP 10.107.84.159 <none> 9200/TCP,9300/TCP 2d14h app=dmi,component=elasticse
dmi service/metric-server ClusterIP 10.106.117.2 <none> 8098/TCP 2d14h app=dmi,component=metric-se
kube-system service/calico-typha ClusterIP 10.97.201.232 <none> 5473/TCP 6d12h k8s-app=calico-typha
kube-system service/kube-dns ClusterIP 10.96.0.10 <none> 53/UDP,53/TCP,9153/TCP 6d12h k8s-app=kube-dns
kube-system service/tiller-deploy ClusterIP 10.98.133.94 <none> 44134/TCP 3d9h app=helm,name=tiller
The command I ran was kubectl exec -ti dnstools -- curl 10.96.0.10:53
EDIT:
I raised this question because I got this error when trying to resolve service names from within the cluster. I was under the impression that I got this error because I cannot ping the DNS server from a pod.
Output of kubectl exec -ti dnstools -- nslookup kubernetes.default
;; connection timed out; no servers could be reached
command terminated with exit code 1
Output of kubectl exec dnstools cat /etc/resolv.conf
nameserver 10.96.0.10
search default.svc.cluster.local svc.cluster.local cluster.local reddog.microsoft.com
options ndots:5
Result of kubectl get ep kube-dns --namespace=kube-system
NAME ENDPOINTS AGE
kube-dns 192.168.0.2:53,192.168.0.3:53,192.168.0.2:53 + 3 more... 6d13h
EDIT:
Ping-ing the CoreDNS pod directly using its Pod IP times out as well:
/ # ping 192.168.0.2
PING 192.168.0.2 (192.168.0.2): 56 data bytes
^C
--- 192.168.0.2 ping statistics ---
24 packets transmitted, 0 packets received, 100% packet loss
EDIT:
I think something has gone wrong when I was setting up the cluster. Below are the steps I took when setting up the cluster:
Edit host files on master and worker to include the IP's and hostnames of the nodes
Disabled swap using swapoff -a and disabled swap permanantly by editing /etc/fstab
Install docker prerequisites using apt-get install apt-transport-https ca-certificates curl software-properties-common -y
Added Docker GPG key using curl -fsSL https://download.docker.com/linux/ubuntu/gpg | apt-key add -
Added Docker repo using add-apt-repository "deb [arch=amd64] https://download.docker.com/linux/ubuntu $(lsb_release -cs) stable"
Install Docker using apt-get update -y; -get install docker-ce -y
Install Kubernetes prerequisites using curl -s https://packages.cloud.google.com/apt/doc/apt-key.gpg | apt-key add -
Added Kubernetes repo using echo 'deb http://apt.kubernetes.io/ kubernetes-xenial main' | sudo tee /etc/apt/sources.list.d/kubernetes.list
Update repo and install Kubernetes components using apt-get update -y; apt-get install kubelet kubeadm kubectl -y
Configure master node:
kubeadm init --apiserver-advertise-address=10.62.194.4 --apiserver-cert-extra-sans=10.62.194.4 --pod-network-cidr=192.168.0.0/16
Copy Kube config to $HOME: mkdir -p $HOME/.kube; sudo cp -i /etc/kubernetes/admin.conf $HOME/.kube/config; sudo chown $(id -u):$(id -g) $HOME/.kube/config
Installed Calico using kubectl apply -f https://docs.projectcalico.org/v3.1/getting-started/kubernetes/installation/hosted/rbac-kdd.yaml; kubectl apply -f https://docs.projectcalico.org/v3.1/getting-started/kubernetes/installation/hosted/kubernetes-datastore/calico-networking/1.7/calico.yaml
On node:
On the node I did the kubeadm join command using the command printed out from kubeadm token create --print-join-command on the master
The kubernetes system pods get assigned the host ip since they provide low level services that are not dependant on an overlay network (or in case of calico even provide the overlay network). They have the ip of the node where they run.
A common pod uses the overlay network and gets assigned an ip from the calico range, not from the metal node they run on.
You can't access DNS (port 53) with HTTP using curl. You can use dig to query a DNS resolver.
A service IP is not reachable by ping since it is a virtual IP just used as a routing handle for the iptables rules setup by kube-proxy, therefore a TCP connection works, but ICMP not.
You can ping a pod IP though, since it is assigned from the overlay network.
You should check on the same namespace
Currently, you are in default namespace and curl to other kube-system namespace.
You should check in the same namespace, I think it works.
On some cases the local host that Elasticsearch publishes is not routable/accessible from other hosts. On these cases you will have to configure network.publish_host in the yml config file, in order for Elasticsearch to use and publish the right address.
Try configuring network.publish_host to the right public address.
See more here:
https://www.elastic.co/guide/en/elasticsearch/reference/current/modules-network.html#advanced-network-settings
note that control plane components like api server, etcd that runs on master node are bound to host network. and hence you see the ip address of the master server.
On the other hand, the apps that you deployed are going to get the ips from the pod subnet range. those vary from cluster node ip's
Try below steps to test dns working or not
deploy nginx.yaml
apiVersion: v1
kind: Service
metadata:
name: nginx
labels:
app: nginx
spec:
ports:
- port: 80
name: web
clusterIP: None
selector:
app: nginx
---
apiVersion: apps/v1
kind: StatefulSet
metadata:
name: web
labels:
app: nginx
spec:
serviceName: "nginx"
replicas: 2
selector:
matchLabels:
app: nginx
template:
metadata:
labels:
app: nginx
spec:
containers:
- name: nginx
image: k8s.gcr.io/nginx-slim:0.8
ports:
- containerPort: 80
name: web
volumeMounts:
- name: www
mountPath: /usr/share/nginx/html
volumes:
- name: www
emptyDir:
kuebctl create -f nginx.yaml
master $ kubectl get po
NAME READY STATUS RESTARTS AGE
web-0 1/1 Running 0 1m
web-1 1/1 Running 0 1m
master $ kubectl get svc
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
kubernetes ClusterIP 10.96.0.1 <none> 443/TCP 35m
nginx ClusterIP None <none> 80/TCP 2m
master $ kubectl run -i --tty --image busybox:1.28 dns-test --restart=Never --rm
If you don't see a command prompt, try pressing enter.
/ # nslookup nginx
Server: 10.96.0.10
Address 1: 10.96.0.10 kube-dns.kube-system.svc.cluster.local
Name: nginx
Address 1: 10.40.0.1 web-0.nginx.default.svc.cluster.local
Address 2: 10.40.0.2 web-1.nginx.default.svc.cluster.local
/ #
/ # nslookup web-0.nginx
Server: 10.96.0.10
Address 1: 10.96.0.10 kube-dns.kube-system.svc.cluster.local
Name: web-0.nginx
Address 1: 10.40.0.1 web-0.nginx.default.svc.cluster.local
/ # nslookup web-0.nginx.default.svc.cluster.local
Server: 10.96.0.10
Address 1: 10.96.0.10 kube-dns.kube-system.svc.cluster.local
Name: web-0.nginx.default.svc.cluster.local
Address 1: 10.40.0.1 web-0.nginx.default.svc.cluster.local