kubernetes v1.18: DNS resolution records for Pods

kubernetes v1.18: DNS resolution records for Pods - kubernetes

The question is for pods DNS resolution in kubernetes. A statement from official doc here (choose v1.18 from top right dropdown list):
https://kubernetes.io/docs/concepts/services-networking/dns-pod-service/#pods
Pods.
A/AAAA records
Any pods created by a Deployment or DaemonSet have the following DNS resolution available:
pod-ip-address.deployment-name.my-namespace.svc.cluster-domain.example.
Here is my kubernetes environments:
master $ kubectl version
Client Version: version.Info{Major:"1", Minor:"18", GitVersion:"v1.18.0", GitCommit:"9e991415386e4cf155a24b1da15becaa390438d8", GitTreeState:"clean", BuildDate:"2020-03-25T14:58:59Z", GoVersion:"go1.13.8", Compiler:"gc", Platform:"linux/amd64"}
Server Version: version.Info{Major:"1", Minor:"18", GitVersion:"v1.18.0", GitCommit:"9e991415386e4cf155a24b1da15becaa390438d8", GitTreeState:"clean", BuildDate:"2020-03-25T14:50:46Z", GoVersion:"go1.13.8", Compiler:"gc", Platform:"linux/amd64"}
After I create a simple deployment using kubectl create deploy nginx --image=nginx, then I create a busybox pod in test namespace to do nslookup like this:
kubectl create ns test
cat <<EOF | kubectl apply -n test -f -
apiVersion: v1
kind: Pod
metadata:
name: busybox1
labels:
name: busybox
spec:
containers:
- image: busybox:1.28
command:
- sleep
- "3600"
name: busybox
EOF
Then I do nslookup like this, according to the offical doc pod-ip-address.deployment-name.my-namespace.svc.cluster-domain.example:
master $ kubectl get pods -o wide
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
nginx-f89759699-h8cj9 1/1 Running 0 12m 10.244.1.4 node01 <none> <none>
master $ kubectl get deploy -o wide
NAME READY UP-TO-DATE AVAILABLE AGE CONTAINERS IMAGES SELECTOR
nginx 1/1 1 1 17m nginx nginx app=nginx
master $ kubectl exec -it busybox1 -n test -- nslookup 10.244.1.4.nginx.default.svc.cluster.local
Server: 10.96.0.10
Address 1: 10.96.0.10 kube-dns.kube-system.svc.cluster.local
nslookup: can't resolve '10.244.1.4.nginx.default.svc.cluster.local'
command terminated with exit code 1
master $ kubectl exec -it busybox1 -n test -- nslookup 10-244-1-4.nginx.default.svc.cluster.local
Server: 10.96.0.10
Address 1: 10.96.0.10 kube-dns.kube-system.svc.cluster.local
nslookup: can't resolve '10-244-1-4.nginx.default.svc.cluster.local'
command terminated with exit code 1
Question 1:
Why nslookup for the name failed? Is there something I did wrong?
When I continue to explore the dns name for pods, I did this:
master $ kubectl exec -it busybox1 -n test -- nslookup 10-244-1-4.default.pod.cluster.local
Server: 10.96.0.10
Address 1: 10.96.0.10 kube-dns.kube-system.svc.cluster.local
Name: 10-244-1-4.default.pod.cluster.local
Address 1: 10.244.1.4
master $ kubectl exec -it busybox1 -n test -- nslookup 10-244-1-4.test.pod.cluster.local
Server: 10.96.0.10
Address 1: 10.96.0.10 kube-dns.kube-system.svc.cluster.local
Name: 10-244-1-4.test.pod.cluster.local
Address 1: 10.244.1.4
Questions 2:
Why nslookup 10-244-1-4.test.pod.cluster.local succeeded even the pod of 10.244.1.4 is in default namespace?

Regarding your first question, as far as I could check your assumptions are right, it seems like the documentation isn't accurate. The A/AAAA reference for pods is something new in the documentation (1.18). For that I highly encourage you to open an issue here so the developers can take a closer look into it.
I recommend you to refer to 1.17 documentation on that regard as it's to be reflecting the actual thing.
In 1.17 we can see this note:
Note: Because A or AAAA records are not created for Pod names, hostname is required for the Pod’s A or AAAA record to be created. A Pod with no hostname but with subdomain will only create the A or AAAA record for the headless service (default-subdomain.my-namespace.svc.cluster-domain.example), pointing to the Pod’s IP address. Also, Pod needs to become ready in order to have a record unless publishNotReadyAddresses=True is set on the Service.
As far as I could check this is still true on 1.18 despite of what the documentation is saying.
Regarding question two is going to the same direction and you can also open an issue but I personally don't see any practical reason for the usage of IP Based DNS names. These names are there for kubernetes internal use and using it isn't giving you any advantage.
The best scenario is to use service based dns names on Kubernetes. It's proven to be very reliable.

For questions 1, it may be a doc inaccuracy. If I create a ClusterIP service for the deployment:
kubectl expose deploy nginx --name=front-end --port=80
Then I can see this name:
kubectl exec -it busybox1 -n test -- nslookup 10-244-1-4.front-end.default.svc.cluster.local
Server: 10.96.0.10
Address 1: 10.96.0.10 kube-dns.kube-system.svc.cluster.local
Name: 10-244-1-4.front-end.default.svc.cluster.local
Address 1: 10.244.1.4 10-244-1-4.front-end.default.svc.cluster.local

Related

kube service domain name not working, but clusterIP does work with Jupyter Enterprise Gateway

I have a Jupyter notebook setup in the jupyter namespace on a kubernetes cluster, and Jupyter Enterprise Gateway setup in the enterprise-gateway namespace as a Service in the same cluster.
If I configure the notebook to connect to the enterprise-gateway service using the clusterIP it works fine.
--gateway-url=http://172.20.186.249:8888
but if I switch to using the service domain name the notebook receives a 503 Connection Refused error
--gateway-url=http://enterprise-gateway.enterprise-gateway.svc.cluster.local:8888
When I use busybox check to check the kubernetes dns, the domain resolves as expected.
kubectl -n default exec -ti busybox nslookup enterprise-gateway.enterprise-gateway
kubectl exec [POD] [COMMAND] is DEPRECATED and will be removed in a future version. Use kubectl exec [POD] -- [COMMAND] instead.
Server: 172.20.0.10
Address 1: 172.20.0.10 kube-dns.kube-system.svc.cluster.local
Name: enterprise-gateway.enterprise-gateway
Address 1: 172.20.186.249 enterprise-gateway.enterprise-gateway.svc.cluster.local
How do I get the domain name to work?
The Service config for the JEG looks like this...
kubectl describe svc enterprise-gateway --namespace enterprise-gateway
Name: enterprise-gateway
Namespace: enterprise-gateway
Labels: app=enterprise-gateway
app.kubernetes.io/managed-by=Helm
chart=enterprise-gateway-2.6.0
component=enterprise-gateway
heritage=Helm
release=enterprise-gateway
Annotations: meta.helm.sh/release-name: enterprise-gateway
meta.helm.sh/release-namespace: enterprise-gateway
Selector: app=enterprise-gateway
Type: NodePort
IP Family Policy: SingleStack
IP Families: IPv4
IP: 172.20.250.15
IPs: 172.20.250.15
Port: http 8888/TCP
TargetPort: 8888/TCP
NodePort: http 31366/TCP
Endpoints: 10.1.16.136:8888,10.1.2.228:8888,10.1.30.90:8888
Port: response 8877/TCP
TargetPort: 8877/TCP
NodePort: response 31201/TCP
Endpoints: 10.1.16.136:8877,10.1.2.228:8877,10.1.30.90:8877
Session Affinity: ClientIP
External Traffic Policy: Cluster
Events: <none>

Ok, i dont know where to start i have a bunch of findings. I will start with the eye catcher one, i have a working test project i can share later on and i have to elaborate more in this answer if needed.
Step1
1- I see a mismatch on your IPs. The DNS lookup did not resolved the service DNS to the correct IP.
Address 1: 172.20.186.249 is different than IP: 172.20.250.15
To debug DNS:
kubectl exec "YOURPODNAME" cat /etc/resolv.conf
Verify that a search path and a name server are set up correctly
nameserver 10.96.0.10
search default.svc.cluster.local svc.cluster.local cluster.local
options ndots:5
check if the kubedns/coredns pods are running
kubectl get pods --namespace=kube-system
NAME READY STATUS RESTARTS AGE
....
kube-dns-86f4d74b45-2qkfd 3/3 Running 232 133d
kube-proxy-b2frq 1/1 Running 0 15m
...
If the pod is running, there might be something wrong with the global DNS service
kubectl get svc --namespace=kube-system
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE kube-dns ClusterIP 10.96.0.10 <none> 53/UDP,53/TCP
You might also need to check whether DNS endpoints are exposed:
kubectl get ep kube-dns --namespace=kube-system
NAME ENDPOINTS AGE
kube-dns 172.17.0.5:53,172.17.0.5:53 133d
These debugging actions will usually indicate the problem with your DNS configuration, or it will simply show you that a DNS add-on should be enabled in your cluster configuration.
Step 2
When using busybox to check the kubernetes dns
This seems incorrect when looking Address 1: 172.20.186.249 im expecting to get an IP 10.X.X.X
Install dnsutils on the pod as pointed below
kubectl exec --stdin --tty "YOURPODNAME" -- apt update && sudo
apt-get -y install dnsutils
kubectl exec -it "YOURPODNAME" -- /bin/bash
Inside the pod and again (weird) run apt-get install dnsutils
Stay inside the pod and run nslookup "YOURSERVICENAME" you will get
an IP and a Name(DNS).
Check this IP since it needs to match with the IP of the service description.
kubectl describe svc "YOURSERVICENAME", the IP should be the same as #4
What you must see:
Step 3
Once you have Step #2 solved you will be able to use the service
name(FQDN) returned in Step 2 item #4
To be continued...

Can't resolve dns in kubernetes

I use next command to check dns issue in my k8s:
kubectl apply -f https://k8s.io/examples/admin/dns/dnsutils.yaml
kubectl exec -i -t dnsutils -- nslookup kubernetes.default
The nslookup result is:
;; connection timed out; no servers could be reached
command terminated with exit code 1
dnsutils.yaml:
apiVersion: v1
kind: Pod
metadata:
name: dnsutils
namespace: default
spec:
containers:
- name: dnsutils
image: gcr.io/kubernetes-e2e-test-images/dnsutils:1.3
command:
- sleep
- "3600"
imagePullPolicy: IfNotPresent
restartPolicy: Always
NOTE: it's a machine which default disable all ports, so I ask our IT admin already open the port based on next doc check-required-ports, I'm not sure if this matters.
And use next I could get the pod ip of coredns.
kubectl get pods -n kube-system -o wide | grep core
coredns-7877db9d45-swb6c 1/1 Running 0 2m58s 10.244.1.8 node2 <none> <none>
coredns-7877db9d45-zwc8v 1/1 Running 0 2m57s 10.244.0.6 node1 <none> <none>
Here, 10.244.0.6 is my master while 10.244.1.8 is my working node.
Then if I directly specify coredns pod ip:
master node ok:
kubectl exec -i -t dnsutils -- nslookup kubernetes.default 10.244.0.6
Server: 10.244.0.6
Address: 10.244.0.6#53
Name: kubernetes.default.svc.cluster.local
Address: 10.96.0.1
work node not ok:
# kubectl exec -i -t dnsutils -- nslookup kubernetes.default 10.244.1.8
;; connection timed out; no servers could be reached
command terminated with exit code 1
So, the question narrow down to why COREDNS on work node not works? Anything I need to pay attention?
Environment:
OS: ubuntu18.04
K8S: v1.21.0
Cluster boot command:
kubeadm init --pod-network-cidr=10.244.0.0/16
kubectl apply -f https://raw.githubusercontent.com/coreos/flannel/master/Documentation/kube-flannel.yml

Finally, I find the root cause, this is hardware firewall issue, see this:
Firewalls
When using udp backend, flannel uses UDP port 8285 for sending encapsulated packets.
When using vxlan backend, kernel uses UDP port 8472 for sending encapsulated packets.
Make sure that your firewall rules allow this traffic for all hosts participating in the overlay network.
Make sure that your firewall rules allow traffic from pod network cidr visit your kubernetes master node.
When nslookup client on the same node of dns server, it won't trigger firewall block, so everything is ok.
When nslookup client not on the same node of dns server, it will trigger firewall block, so we can't access dns server.
So, after open the ports, everything ok now.

How can I get CoreDNS to resolve on my Raspberry Pi Kubernetes cluster?

I've followed a number of online tutorials to set up a Kubernetes cluster on four Raspberry Pi 4s. I ended up using Flannel as the networking plugin as that seems to be the only one that actually works on RPi, with a pod network CIDR of 10.244.0.0/16, per this guide from 2017. Most everything is working... all of the base pods in the kube-system namespace are running/healthy, and I can pull down images and launch new containers. At first I wasn't able to get any pod logs, but that was quickly remedied by opening up port 10250 on each node.
But there still seems to be a problem DNS resolution. I should clarify that DNS resolution on the hosts clearly does work, as the cluster is able to download any container image I specify. But once a container is running, it isn't able to "dial out" to anything. As a test, I'm running the arm32v7/buildpack-deps:latest container in a pod. It pulls the image from Docker hub just fine. But when I shell into it and simply type curl https://www.google.com it hangs before eventually timing out. And the same is true of any pod I launch that needs to interact with the external Internet: they hang and hang and hang.
Here are all the networking-related commands I've already run on each node:
sudo iptables -P FORWARD ACCEPT
sudo iptables -A FORWARD -i cni0 -j ACCEPT
sudo iptables -A FORWARD -o cni0 -j ACCEPT
sudo ufw allow ssh
sudo ufw allow 443 # can't remember why i ran this one
sudo ufw allow 6443
sudo ufw allow 8080 # this one might not be strictly necessary, either
sudo ufw allow 10250
sudo ufw default allow routed
sudo ufw enable
I'm not entirely sure that the last two iptables commands did anything; I grabbed them from the comment section of that guide I linked to earlier. I know that guide assumes one is using kube-dns but it's also 3 years old so I am using the (newer) default, coredns, instead.
What am I missing? I feel like I'm so close to having this cluster fully operational, but obviously I need functioning DNS!
UPDATE: I know that it's a DNS problem, and not general Internet connectivity, for two reasons: (1) the cluster itself can pull down any image I specify from Dockerhub, and (2) when I shell into a running container that has curl and execute curl -H "Host: www.google.com" 142.250.73.206, it successfully returns the Google homepage HTML. But as mentioned if I try and do my earlier curl command using the hostname, that times out.

Create a simple Pod to use as a test environment for DNS diagnosing:
apiVersion: v1
kind: Pod
metadata:
name: dnsutils
namespace: default
spec:
containers:
- name: dnsutils
image: gcr.io/kubernetes-e2e-test-images/dnsutils:1.3
command:
- sleep
- "3600"
imagePullPolicy: IfNotPresent
restartPolicy: Always
kubectl apply -f dnsutils.yaml
Check the status of Pod
$ kubectl get pods dnsutils
NAME READY STATUS RESTARTS AGE
dnsutils 1/1 Running 0 <some-time>
Once that Pod is running, you can exec nslookup in that environment. If you see something like the following, DNS is working correctly.
$ kubectl exec -i -t dnsutils -- nslookup kubernetes.default
Server: 10.0.0.10
Address 1: 10.0.0.10
Name: kubernetes.default
Address 1: 10.0.0.1
If the nslookup command fails, check the following:
Take a look inside the resolv.conf file.
kubectl exec -ti dnsutils -- cat /etc/resolv.conf
Verify that the search path and name server are set up like the following (note that search path may vary for different cloud providers):
search default.svc.cluster.local svc.cluster.local cluster.local google.internal c.gce_project_id.internal
nameserver 10.0.0.10
options ndots:5
Errors such as the following indicate a problem with the CoreDNS (or kube-dns) add-on or with associated Services:
$ kubectl exec -i -t dnsutils -- nslookup kubernetes.default
Server: 10.0.0.10
Address 1: 10.0.0.10
nslookup: can't resolve 'kubernetes.default'
OR
Server: 10.0.0.10
Address 1: 10.0.0.10 kube-dns.kube-system.svc.cluster.local
nslookup: can't resolve 'kubernetes.default'
Check if the DNS pod is running
$ kubectl get pods --namespace=kube-system -l k8s-app=kube-dns
NAME READY STATUS RESTARTS AGE
...
coredns-7b96bf9f76-5hsxb 1/1 Running 0 1h
coredns-7b96bf9f76-mvmmt 1/1 Running 0 1h
...
Check for errors in the DNS pod
Here is an example of a healthy CoreDNS log:
$ kubectl logs --namespace=kube-system -l k8s-app=kube-dns
.:53
2018/08/15 14:37:17 [INFO] CoreDNS-1.2.2
2018/08/15 14:37:17 [INFO] linux/amd64, go1.10.3, 2e322f6
CoreDNS-1.2.2
linux/amd64, go1.10.3, 2e322f6
2018/08/15 14:37:17 [INFO] plugin/reload: Running configuration MD5 = 24e6c59e83ce706f07bcc82c31b1ea1c
Verify that the DNS service is up by using the kubectl get service command.
$ kubectl get svc --namespace=kube-system
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
...
kube-dns ClusterIP 10.0.0.10 <none> 53/UDP,53/TCP 1h
...
You can verify that DNS endpoints are exposed by using the kubectl get endpoints command.
$ kubectl get endpoints kube-dns --namespace=kube-system
NAME ENDPOINTS AGE
kube-dns 10.180.3.17:53,10.180.3.17:53 1h
You can verify if queries are being received by CoreDNS by adding the log plugin to the CoreDNS configuration (aka Corefile). The CoreDNS Corefile is held in a ConfigMap named coredns. To edit it, use the command:
$ kubectl -n kube-system edit configmap coredns
Then add log in the Corefile section per the example below:
apiVersion: v1
kind: ConfigMap
metadata:
name: coredns
namespace: kube-system
data:
Corefile: |
.:53 {
log
errors
health
kubernetes cluster.local in-addr.arpa ip6.arpa {
pods insecure
upstream
fallthrough in-addr.arpa ip6.arpa
}
prometheus :9153
forward . /etc/resolv.conf
cache 30
loop
reload
loadbalance
}
After saving the changes, it may take up to minute or two for Kubernetes to propagate these changes to the CoreDNS pods.
Next, make some queries and view the logs per the sections above in this document. If CoreDNS pods are receiving the queries, you should see them in the logs.
Here is an example of a query in the log:
.:53
2018/08/15 14:37:15 [INFO] CoreDNS-1.2.0
2018/08/15 14:37:15 [INFO] linux/amd64, go1.10.3, 2e322f6
CoreDNS-1.2.0
linux/amd64, go1.10.3, 2e322f6
2018/09/07 15:29:04 [INFO] plugin/reload: Running configuration MD5 = 162475cdf272d8aa601e6fe67a6ad42f
2018/09/07 15:29:04 [INFO] Reloading complete
172.17.0.18:41675 - [07/Sep/2018:15:29:11 +0000] 59925 "A IN kubernetes.default.svc.cluster.local. udp 54 false 512" NOERROR qr,aa,rd,ra 106 0.000066649s

As pointed out in the comments: The configuration of kubeadm seems fine.
Your pods have the correct /etc/resolv.conf and they should work.
It's pretty hard to clarily determine the problem - many things can be happend here.
My guess: There something not right with ufw.
You can easily proof it: Disable ufw on all nodes (with ufw disable).
I'm not hundred percent sure which ports are needed. I'm using iptables for my single node k8s and at the start I had many problems FORWARD vs INPUT rules. In docker all ports are forwarded.
So I guess there is something wrong with FORWARD-rules and/or the dns-ports (53/udp and 53/tcp).
Good luck.

Kubespray : Netchecker connectivity check fails

I deployed a Kubernetes (v1.17.5) cluster on OpenStack instances using Kubespray. Those instances are CentOS 7.6.1811 qcow2 images imported in Glance.
The install was successful, and I can see my nodes and pods with kubectl commands.
I used the deploy_netchecker option to deploy NetChecker and test the network within my cluster, and set network_plugin="flannel".
I also tried kube_proxy_mode="iptables", but it doesn't seem to affect the result.
That's pretty much all the changes I did in the k8s-cluster.yml file.
All the pods are running, services too :
[centos#cl1-master-0 ~]$ kubectl get svc --all-namespaces
NAMESPACE NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
default kubernetes ClusterIP 10.233.0.1 <none> 443/TCP 46h
default netchecker-service NodePort 10.233.13.213 <none> 8081:31081/TCP 46h
kube-system coredns ClusterIP 10.233.0.3 <none> 53/UDP,53/TCP,9153/TCP 46h
kube-system dashboard-metrics-scraper ClusterIP 10.233.59.12 <none> 8000/TCP 46h
kube-system kubernetes-dashboard ClusterIP 10.233.63.20 <none> 443/TCP 46h
But netchecker API gives the following answer :
[root#localhost ~]# curl http://X.X.X.X:31081/api/v1/connectivity_check
{"Message":"Connectivity check fails. Reason: there are absent or outdated pods; look up the payload","Absent":["netchecker-agent-hostnet-kk56x","netchecker-agent-hostnet-klldn","netchecker-agent-hostnet-r2vqs","netchecker-agent-hostnet-wqhjs"],"Outdated":["netchecker-agent-4jsgf","netchecker-agent-c9pcf","netchecker-agent-hostnet-jzbfv","netchecker-agent-vxgpf"]}
For an unknown reason, I cannot access the API from a cluster node with localhost, so I used a floating IP with OpenStack.
Here are some logs from the agent :
[centos#cl1-master-0 ~]$ sudo vi /var/log/pods/default_netchecker-agent-vjnwl_d8290268-3ea4-4e3c-acb4-295ab162a735/netchecker-agent/0.log
{"log":"I0701 13:04:01.814246 1 agent.go:135] Response status code: 200\n","stream":"stderr","time":"2020-07-01T13:04:01.81437579Z"}
{"log":"I0701 13:04:01.814272 1 agent.go:128] Sleep for 15 second(s)\n","stream":"stderr","time":"2020-07-01T13:04:01.814393199Z"}
{"log":"I0701 13:04:16.817398 1 agent.go:55] Send payload via URL: http://netchecker-service:8081/api/v1/agents/netchecker-agent-vjnwl\n","stream":"stderr","time":"2020-07-01T13:04:16.817786735Z"}
[centos#cl1-master-0 ~]$ sudo vi /var/log/pods/default_netchecker-agent-hostnet-klldn_d5fa6e72-885f-44e1-97a6-880a25e6d6d6/netchecker-agent/0.log
{"log":"E0701 13:05:22.804428 1 agent.go:133] Error while sending info. Details: Post http://netchecker-service:8081/api/v1/agents/netchecker-agent-hostnet-klldn: dial tcp 10.233.13.213:8081: i/o timeout\n","stream":"stderr","time":"2020-07-01T13:05:22.805138032Z"}
{"log":"I0701 13:05:22.804474 1 agent.go:128] Sleep for 15 second(s)\n","stream":"stderr","time":"2020-07-01T13:05:22.805190295Z"}
{"log":"I0701 13:05:37.807140 1 agent.go:55] Send payload via URL: http://netchecker-service:8081/api/v1/agents/netchecker-agent-hostnet-klldn\n","stream":"stderr","time":"2020-07-01T13:05:37.807309111Z"}
Logs from the server do not indicate any error.
I tried to check DNS resolve with the following :
[centos#cl1-master-0 ~]$ kubectl exec -it netchecker-agent-4jsgf -- /bin/sh
/ $ nslookup kubernetes.default
Server: 169.254.25.10
Address 1: 169.254.25.10
nslookup: can't resolve 'kubernetes.default'
[centos#cl1-master-0 ~]$ kubectl exec -it netchecker-agent-4jsgf -- cat /etc/resolv.conf
nameserver 169.254.25.10
search default.svc.cluster.local svc.cluster.local cluster.local openstacklocal
options ndots:5
169.254.25.10 is the IP of the nodelocaldns, but it doesn't seem to query the coredns service deployed.
When I use nslookup netchecker-service.default.svc.cluster.local 10.233.0.3, with the coredns IP, I get a correct answer.
What can be wrong with my configuration ?
Thanks in advance
UPDATE : The plugin Flannel has an issue and contains a fix to apply on all nodes of the cluster. Once done, the pods successfully report back to the netchecker server.

UPDATE : The plugin Flannel has an issue and contains a fix to apply on all nodes of the cluster. Once done, the pods successfully report back to the netchecker server.

coredns do not resolve service name correctly

i use Kubernetes v1.11.3 ,it use coredns to resolve host or service name,but i find in pod ,the resolve not work correctly,
# kubectl get services --all-namespaces -o wide
NAMESPACE NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE SELECTOR
default kubernetes ClusterIP 10.96.0.1 <none> 443/TCP 50d <none>
kube-system calico-etcd ClusterIP 10.96.232.136 <none> 6666/TCP 50d k8s-app=calico-etcd
kube-system kube-dns ClusterIP 10.96.0.10 <none> 53/UDP,53/TCP 50d k8s-app=kube-dns
kube-system kubelet ClusterIP None <none> 10250/TCP 32d <none>
testalex grafana NodePort 10.96.51.173 <none> 3000:30002/TCP 2d app=grafana
testalex k8s-alert NodePort 10.108.150.47 <none> 9093:30093/TCP 13m app=alertmanager
testalex prometheus NodePort 10.96.182.108 <none> 9090:30090/TCP 16m app=prometheus
following command no response
# kubectl exec -it k8s-monitor-7ddcb74b87-n6jsd -n testalex /bin/bash
[root#k8s-monitor-7ddcb74b87-n6jsd /]# ping k8s-alert
PING k8s-alert.testalex.svc.cluster.local (10.108.150.47) 56(84) bytes of data.
and no cordons output log
# kubectl logs coredns-78fcdf6894-h78sd -n kube-system
i think maybe something is wrong,but i can not locate the problem,another question is why the two coredns pods on the master node,it suppose to one on each node
UPDATE
it seems coredns work fine ,but i do not understand the ping command no return
[root#k8s-monitor-7ddcb74b87-n6jsd yum.repos.d]# nslookup kubernetes.default
Server: 10.96.0.10
Address: 10.96.0.10#53
Name: kubernetes.default.svc.cluster.local
Address: 10.96.0.1
[root#k8s-monitor-7ddcb74b87-n6jsd yum.repos.d]# cat /etc/resolv.conf
nameserver 10.96.0.10
search testalex.svc.cluster.local svc.cluster.local cluster.local
options ndots:5
# kubectl get ep kube-dns --namespace=kube-system
NAME ENDPOINTS AGE
kube-dns 192.168.121.3:53,192.168.121.4:53,192.168.121.3:53 + 1 more... 50d
also dns server can not be reached
# kubectl exec -it k8s-monitor-7ddcb74b87-n6jsd -n testalex /bin/bash
[root#k8s-monitor-7ddcb74b87-n6jsd /]# cat /etc/resolv.conf
nameserver 10.96.0.10
search testalex.svc.cluster.local svc.cluster.local cluster.local
options ndots:5
[root#k8s-monitor-7ddcb74b87-n6jsd /]# ping 10.96.0.10
PING 10.96.0.10 (10.96.0.10) 56(84) bytes of data.
^C
--- 10.96.0.10 ping statistics ---
9 packets transmitted, 0 received, 100% packet loss, time 8000ms
i think maybe i misconfig the network
this is my cluster init command
kubeadm init --kubernetes-version=v1.11.3 --apiserver-advertise-address=10.100.1.20 --pod-network-cidr=172.16.0.0/16
and this is calico ip pool set
# kubectl exec -it calico-node-77m9l -n kube-system /bin/sh
Defaulting container name to calico-node.
Use 'kubectl describe pod/calico-node-77m9l -n kube-system' to see all of the containers in this pod.
/ # cd /tmp
/tmp # ls
calicoctl tunl-ip
/tmp # ./calicoctl get ipPool
CIDR
192.168.0.0/16

You can start by checking if the dns is working
Run the nslookup on kubernetes.default from inside the pod k8s-monitor-7ddcb74b87-n6jsd, check if it is working.
[root#k8s-monitor-7ddcb74b87-n6jsd /]# nslookup kubernetes.default
Server: 10.96.0.10
Address: 10.96.0.10#53
Name: kubernetes.default.svc.cluster.local
Address: 10.96.0.1
If this returns output that means everything is working from the coredns. If output is not okay, then look into the the resolve.conf inside the pod k8s-monitor-7ddcb74b87-n6jsd, it should return output something like this:
[root#metrics-master-2 /]# cat /etc/resolv.conf
nameserver 10.96.0.10
search default.svc.cluster.local svc.cluster.local cluster.local ec2.internal
options ndots:5
At last check the coredns endpoints are exposed using:
kubectl get ep kube-dns --namespace=kube-system
NAME ENDPOINTS AGE
kube-dns 10.180.3.17:53,10.180.3.17:53 1h
You can verify if queries are being received by CoreDNS by adding the log plugin to the CoreDNS configuration (aka Corefile). The CoreDNS Corefile is held in a ConfigMap named coredns
Hope this helps.
EDIT:
You might be having this issue, Please have a look:
https://github.com/kubernetes/kubeadm/issues/1056

You cannot ping ipaddress or hostname of service cluster always,since it is virtual ip
service’s cluster IP is a virtual IP, and only has meaning when combined with the service port.You can try the same via srv recored(combination of virtual ip and port)(refer kubernetes in action by mark luksa)

Thanks for the answer. This is the output. IP-s certainly not real.
[root#master ~]# nslookup kubernetes.default
Server: 203.150.92.12
Address: 203.150.92.12#53
** server can't find kubernetes.default: NXDOMAIN
[root#master ~]# kubectl cluster-info
Kubernetes master is running at https://203.150.72.81:6443
coredns is running at https://203.150.72.81:6443/api/v1/namespaces/kube-system/services/coredns:dns/proxy
kubernetes-dashboard is running at https://203.150.72.81:6443/api/v1/namespaces/kube-system/services/https:kubernetes-dashboard:/proxy
metrics-server is running at https://203.150.72.81:6443/api/v1/namespaces/kube-system/services/https:metrics-server:/proxy
To further debug and diagnose cluster problems, use 'kubectl cluster-info dump'.
[root#master ~]# cat /etc/resolv.conf
search invalid
nameserver 203.150.92.12
nameserver 203.150.92.10
nameserver 1111:c207::2:55
[root#master ~]# kubectl get ep kube-dns --namespace=kube-system
Error from server (NotFound): endpoints "kube-dns" not found
[root#master ~]#

I think the reason why you cannot get ping working is because you are using iptables to redirect the request to service cluster IP to the correct pods. The iptables rule will only redirect the traffic to the service cluster IP with the exported ports. The icmp request is never been redirected to the real endpoints.

We Keep Coding

iphone swift flutter scala powershell matlab mongodb postgresql perl eclipse