Can't resolve dns in kubernetes - kubernetes

I use next command to check dns issue in my k8s:
kubectl apply -f https://k8s.io/examples/admin/dns/dnsutils.yaml
kubectl exec -i -t dnsutils -- nslookup kubernetes.default
The nslookup result is:
;; connection timed out; no servers could be reached
command terminated with exit code 1
dnsutils.yaml:
apiVersion: v1
kind: Pod
metadata:
name: dnsutils
namespace: default
spec:
containers:
- name: dnsutils
image: gcr.io/kubernetes-e2e-test-images/dnsutils:1.3
command:
- sleep
- "3600"
imagePullPolicy: IfNotPresent
restartPolicy: Always
NOTE: it's a machine which default disable all ports, so I ask our IT admin already open the port based on next doc check-required-ports, I'm not sure if this matters.
And use next I could get the pod ip of coredns.
kubectl get pods -n kube-system -o wide | grep core
coredns-7877db9d45-swb6c 1/1 Running 0 2m58s 10.244.1.8 node2 <none> <none>
coredns-7877db9d45-zwc8v 1/1 Running 0 2m57s 10.244.0.6 node1 <none> <none>
Here, 10.244.0.6 is my master while 10.244.1.8 is my working node.
Then if I directly specify coredns pod ip:
master node ok:
kubectl exec -i -t dnsutils -- nslookup kubernetes.default 10.244.0.6
Server: 10.244.0.6
Address: 10.244.0.6#53
Name: kubernetes.default.svc.cluster.local
Address: 10.96.0.1
work node not ok:
# kubectl exec -i -t dnsutils -- nslookup kubernetes.default 10.244.1.8
;; connection timed out; no servers could be reached
command terminated with exit code 1
So, the question narrow down to why COREDNS on work node not works? Anything I need to pay attention?
Environment:
OS: ubuntu18.04
K8S: v1.21.0
Cluster boot command:
kubeadm init --pod-network-cidr=10.244.0.0/16
kubectl apply -f https://raw.githubusercontent.com/coreos/flannel/master/Documentation/kube-flannel.yml

Finally, I find the root cause, this is hardware firewall issue, see this:
Firewalls
When using udp backend, flannel uses UDP port 8285 for sending encapsulated packets.
When using vxlan backend, kernel uses UDP port 8472 for sending encapsulated packets.
Make sure that your firewall rules allow this traffic for all hosts participating in the overlay network.
Make sure that your firewall rules allow traffic from pod network cidr visit your kubernetes master node.
When nslookup client on the same node of dns server, it won't trigger firewall block, so everything is ok.
When nslookup client not on the same node of dns server, it will trigger firewall block, so we can't access dns server.
So, after open the ports, everything ok now.

Related

kube service domain name not working, but clusterIP does work with Jupyter Enterprise Gateway

I have a Jupyter notebook setup in the jupyter namespace on a kubernetes cluster, and Jupyter Enterprise Gateway setup in the enterprise-gateway namespace as a Service in the same cluster.
If I configure the notebook to connect to the enterprise-gateway service using the clusterIP it works fine.
--gateway-url=http://172.20.186.249:8888
but if I switch to using the service domain name the notebook receives a 503 Connection Refused error
--gateway-url=http://enterprise-gateway.enterprise-gateway.svc.cluster.local:8888
When I use busybox check to check the kubernetes dns, the domain resolves as expected.
kubectl -n default exec -ti busybox nslookup enterprise-gateway.enterprise-gateway
kubectl exec [POD] [COMMAND] is DEPRECATED and will be removed in a future version. Use kubectl exec [POD] -- [COMMAND] instead.
Server: 172.20.0.10
Address 1: 172.20.0.10 kube-dns.kube-system.svc.cluster.local
Name: enterprise-gateway.enterprise-gateway
Address 1: 172.20.186.249 enterprise-gateway.enterprise-gateway.svc.cluster.local
How do I get the domain name to work?
The Service config for the JEG looks like this...
kubectl describe svc enterprise-gateway --namespace enterprise-gateway
Name: enterprise-gateway
Namespace: enterprise-gateway
Labels: app=enterprise-gateway
app.kubernetes.io/managed-by=Helm
chart=enterprise-gateway-2.6.0
component=enterprise-gateway
heritage=Helm
release=enterprise-gateway
Annotations: meta.helm.sh/release-name: enterprise-gateway
meta.helm.sh/release-namespace: enterprise-gateway
Selector: app=enterprise-gateway
Type: NodePort
IP Family Policy: SingleStack
IP Families: IPv4
IP: 172.20.250.15
IPs: 172.20.250.15
Port: http 8888/TCP
TargetPort: 8888/TCP
NodePort: http 31366/TCP
Endpoints: 10.1.16.136:8888,10.1.2.228:8888,10.1.30.90:8888
Port: response 8877/TCP
TargetPort: 8877/TCP
NodePort: response 31201/TCP
Endpoints: 10.1.16.136:8877,10.1.2.228:8877,10.1.30.90:8877
Session Affinity: ClientIP
External Traffic Policy: Cluster
Events: <none>
Ok, i dont know where to start i have a bunch of findings. I will start with the eye catcher one, i have a working test project i can share later on and i have to elaborate more in this answer if needed.
Step1
1- I see a mismatch on your IPs. The DNS lookup did not resolved the service DNS to the correct IP.
Address 1: 172.20.186.249 is different than IP: 172.20.250.15
To debug DNS:
kubectl exec "YOURPODNAME" cat /etc/resolv.conf
Verify that a search path and a name server are set up correctly
nameserver 10.96.0.10
search default.svc.cluster.local svc.cluster.local cluster.local
options ndots:5
check if the kubedns/coredns pods are running
kubectl get pods --namespace=kube-system
NAME READY STATUS RESTARTS AGE
....
kube-dns-86f4d74b45-2qkfd 3/3 Running 232 133d
kube-proxy-b2frq 1/1 Running 0 15m
...
If the pod is running, there might be something wrong with the global DNS service
kubectl get svc --namespace=kube-system
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE kube-dns ClusterIP 10.96.0.10 <none> 53/UDP,53/TCP
You might also need to check whether DNS endpoints are exposed:
kubectl get ep kube-dns --namespace=kube-system
NAME ENDPOINTS AGE
kube-dns 172.17.0.5:53,172.17.0.5:53 133d
These debugging actions will usually indicate the problem with your DNS configuration, or it will simply show you that a DNS add-on should be enabled in your cluster configuration.
Step 2
When using busybox to check the kubernetes dns
This seems incorrect when looking Address 1: 172.20.186.249 im expecting to get an IP 10.X.X.X
Install dnsutils on the pod as pointed below
kubectl exec --stdin --tty "YOURPODNAME" -- apt update && sudo
apt-get -y install dnsutils
kubectl exec -it "YOURPODNAME" -- /bin/bash
Inside the pod and again (weird) run apt-get install dnsutils
Stay inside the pod and run nslookup "YOURSERVICENAME" you will get
an IP and a Name(DNS).
Check this IP since it needs to match with the IP of the service description.
kubectl describe svc "YOURSERVICENAME", the IP should be the same as #4
What you must see:
Step 3
Once you have Step #2 solved you will be able to use the service
name(FQDN) returned in Step 2 item #4
To be continued...

How to expose traefik v2 dashboard in k3d/k3s via configuration?

*Cross-posted to k3d github discussions, to a thread in Rancher forums, and to traefik's community discussion board
Tutorials from 2020 refer to editing the traefik configmap. Where did it go?
The traefik installation instructions refer to a couple of ways to expose the dashboard:
This works, but isn't persistent: Using a 1-time command kubectl -n kube-system port-forward $(kubectl -n kube-system get pods --selector "app.kubernetes.io/name=traefik" --output=name) 9000:9000
I cannot get this to work: Creating an "IngressRoute" yaml file and applying it to the cluster. This might be due to the Klipper LB and/or my ignorance.
No configmap in use by traefik deployment
For a 2-server, 2-agent cluster... kubectl -n kube-system describe deploy traefik does not show any configmap:
Volumes:
data:
Type: EmptyDir (a temporary directory that shares a pod's lifetime)
Medium:
SizeLimit: <unset>
tmp:
Type: EmptyDir (a temporary directory that shares a pod's lifetime)
Medium:
SizeLimit: <unset>
Priority Class Name: system-cluster-critical
No "traefik" configmap
And, kubectl get -n kube-system cm shows:
NAME DATA AGE
chart-content-traefik 0 28m
chart-content-traefik-crd 0 28m
chart-values-traefik 1 28m
chart-values-traefik-crd 0 28m
cluster-dns 2 28m
coredns 2 28m
extension-apiserver-authentication 6 28m
k3s 0 28m
k3s-etcd-snapshots 0 28m
kube-root-ca.crt 1 27m
local-path-config 4 28m
No configmap in use by traefik pods
Describing the pod doesn't turn up anything either. kubectl -n kube-system describe pod traefik-.... results in no configmap too.
Traefik ports in use, but not responding
However, I did see arguments to the traefik pod with ports in use:
--entryPoints.traefik.address=:9000/tcp
--entryPoints.web.address=:8000/tcp
--entryPoints.websecure.address=:8443/tcp
However, these ports are not exposed. So, I tried port-forward with kubectl port-forward pods/traefik-97b44b794-r9srz 9000:9000 8000:8000 8443:8443 -n kube-system --address 0.0.0.0, but when I curl -v localhost:9000 (or 8000 or 8443) and curl -v localhost:9000/dashboard, I get "404 Not Found".
After getting a terminal to traefik, I discovered the local ports that are open, but it seems nothing is responding:
/ $ netstat -lntu
Active Internet connections (only servers)
Proto Recv-Q Send-Q Local Address Foreign Address State
tcp 0 0 :::8443 :::* LISTEN
tcp 0 0 :::8000 :::* LISTEN
tcp 0 0 :::9000 :::* LISTEN
/ $ wget localhost:9000
Connecting to localhost:9000 ([::1]:9000)
wget: server returned error: HTTP/1.1 404 Not Found
/ $ wget localhost:8000
Connecting to localhost:8000 ([::1]:8000)
wget: server returned error: HTTP/1.1 404 Not Found
/ $ wget localhost:8443
Connecting to localhost:8443 ([::1]:8443)
wget: server returned error: HTTP/1.1 404 Not Found
Versions
k3d version v4.4.7
k3s version v1.21.2-k3s1 (default)
I found a solution and hopefully someone find a better one soon
you need to control your k3s cluster from your pc and not to ssh into master node, so add /etc/rancher/k3s/k3s.yaml into your local ~/.kube/config (in order to port forward in last step into your pc)
now get your pod name as follows:
kubectl get pod -n kube-system
and seach for traefik-something-somethingElse
mine was traefik-97b44b794-bsvjn
now this part is needed from your local pc
kubectl port-forward traefik-97b44b794-bsvjn -n kube-system 9000:9000
get http://localhost:9000/dashboard/ in your favorite browser and don't forget the second slash
enjoy the dashboard
please note you have to enable the dashboard first in /var/lib/rancher/k3s/server/manifests/traefik.yaml by adding
dashboard:
enabled: true
Jakub's answer is pretty good. But one thing that is unfortunate about it is that if k3s restarts, the configs get reset. According to the k3s docs, if you create a custom file called /var/lib/rancher/k3s/server/manifests/traefik-config.yaml, k3s' traefik will automatically update with this new config and use its values. Here is what I have:
apiVersion: helm.cattle.io/v1
kind: HelmChartConfig
metadata:
name: traefik
namespace: kube-system
spec:
valuesContent: |-
dashboard:
enabled: true
ports:
traefik:
expose: true # this is not recommended in production deployments, but I want to be able to see my dashboard locally
logs:
access:
enabled: true
With this setup, you can skip the port-forwarding and just go to http://localhost:9000/dashboard/ directly!
for the current latest version of k3s (1.21.4):
according to traefik's installation guide (https://doc.traefik.io/traefik/getting-started/install-traefik/#exposing-the-traefik-dashboard), create dashboard.yaml with the proper CRD content, and run
kubectl apply -f dashboard.yaml
create DNS record or modify host file of the hostname - ip mapping for you set up in last step

Kubernetes DNS lookg not working from worker node - connection timed out; no servers could be reached

I have build new Kubernetes cluster v1.20.1 single master and single node with Calico CNI.
I deployed the busybox pod in default namespace.
# kubectl get pods busybox -o wide
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
busybox 1/1 Running 0 12m 10.203.0.129 node02 <none> <none>
nslookup not working
kubectl exec -ti busybox -- nslookup kubernetes.default
Server: 10.96.0.10
Address 1: 10.96.0.10
nslookup: can't resolve 'kubernetes.default'
cluster is running RHEL 8 with latest update
followed this steps: https://kubernetes.io/docs/tasks/administer-cluster/dns-debugging-resolution/
nslookup command not able to reach nameserver
# kubectl exec -i -t dnsutils -- nslookup kubernetes.default
;; connection timed out; no servers could be reached
command terminated with exit code 1
resolve.conf file
# kubectl exec -ti dnsutils -- cat /etc/resolv.conf
search default.svc.cluster.local svc.cluster.local cluster.local
nameserver 10.96.0.10
options ndots:5
DNS pods running
# kubectl get pods --namespace=kube-system -l k8s-app=kube-dns
NAME READY STATUS RESTARTS AGE
coredns-74ff55c5b-472vx 1/1 Running 1 85m
coredns-74ff55c5b-c75bq 1/1 Running 1 85m
DNS pod logs
kubectl logs --namespace=kube-system -l k8s-app=kube-dns
.:53
[INFO] plugin/reload: Running configuration MD5 = db32ca3650231d74073ff4cf814959a7
CoreDNS-1.7.0
linux/amd64, go1.14.4, f59c03d
.:53
[INFO] plugin/reload: Running configuration MD5 = db32ca3650231d74073ff4cf814959a7
CoreDNS-1.7.0
linux/amd64, go1.14.4, f59c03d
Service is defined
# kubectl get svc --namespace=kube-system
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
kube-dns ClusterIP 10.96.0.10 <none> 53/UDP,53/TCP,9153/TCP 86m
**I can see the endpoints of DNS pod**
# kubectl get endpoints kube-dns --namespace=kube-system
NAME ENDPOINTS AGE
kube-dns 10.203.0.5:53,10.203.0.6:53,10.203.0.5:53 + 3 more... 86m
enabled the logging, but didn't see traffic coming to DNS pod
# kubectl logs --namespace=kube-system -l k8s-app=kube-dns
.:53
[INFO] plugin/reload: Running configuration MD5 = db32ca3650231d74073ff4cf814959a7
CoreDNS-1.7.0
linux/amd64, go1.14.4, f59c03d
.:53
[INFO] plugin/reload: Running configuration MD5 = db32ca3650231d74073ff4cf814959a7
CoreDNS-1.7.0
linux/amd64, go1.14.4, f59c03d
I can ping DNS POD
# kubectl exec -i -t dnsutils -- ping 10.203.0.5
PING 10.203.0.5 (10.203.0.5): 56 data bytes
64 bytes from 10.203.0.5: seq=0 ttl=62 time=6.024 ms
64 bytes from 10.203.0.5: seq=1 ttl=62 time=6.052 ms
64 bytes from 10.203.0.5: seq=2 ttl=62 time=6.175 ms
64 bytes from 10.203.0.5: seq=3 ttl=62 time=6.000 ms
^C
--- 10.203.0.5 ping statistics ---
4 packets transmitted, 4 packets received, 0% packet loss
round-trip min/avg/max = 6.000/6.062/6.175 ms
nmap show port filtered
# ke netshoot-6f677d4fdf-5t5cb -- nmap 10.203.0.5
Starting Nmap 7.80 ( https://nmap.org ) at 2021-01-15 22:29 UTC
Nmap scan report for 10.203.0.5
Host is up (0.0060s latency).
Not shown: 997 closed ports
PORT STATE SERVICE
53/tcp filtered domain
8080/tcp filtered http-proxy
8181/tcp filtered intermapper
Nmap done: 1 IP address (1 host up) scanned in 14.33 seconds
If I schedule the POD on master node, nslookup works nmap show port open?
# ke netshoot -- bash
bash-5.0# nslookup kubernetes.default
Server: 10.96.0.10
Address: 10.96.0.10#53
Name: kubernetes.default.svc.cluster.local
Address: 10.96.0.1
nmap -p 53 10.96.0.10
Starting Nmap 7.80 ( https://nmap.org ) at 2021-01-15 22:46 UTC
Nmap scan report for kube-dns.kube-system.svc.cluster.local (10.96.0.10)
Host is up (0.000098s latency).
PORT STATE SERVICE
53/tcp open domain
Nmap done: 1 IP address (1 host up) scanned in 0.14 seconds
Why nslookup from POD running on worker node is not working? how to troubleshoot this issue?
I re-build the server two times, still same issue.
Thanks
SR
Update adding kubeadm config file
# cat kubeadm-config.yaml
---
apiVersion: kubeadm.k8s.io/v1beta2
bootstrapTokens:
- groups:
- system:bootstrappers:kubeadm:default-node-token
token: abcdef.0123456789abcdef
ttl: 24h0m0s
usages:
- signing
- authentication
kind: InitConfiguration
nodeRegistration:
criSocket: unix:///run/containerd/containerd.sock
taints:
- effect: NoSchedule
key: node-role.kubernetes.io/master
kubeletExtraArgs:
cgroup-driver: "systemd"
---
apiVersion: kubeadm.k8s.io/v1beta2
kind: ClusterConfiguration
kubernetesVersion: stable
controlPlaneEndpoint: "master01:6443"
networking:
dnsDomain: cluster.local
podSubnet: 10.0.0.0/14
serviceSubnet: 10.96.0.0/12
---
apiVersion: kubeproxy.config.k8s.io/v1alpha1
kind: KubeProxyConfiguration
mode: "ipvs
"
First of all, according to the docs - please note that Calico and kubeadm support Centos/RHEL 7+.
In both Calico and kubeadm documentation we can see that they only support RHEL7+.
By default RHEL8 uses nftables instead of iptables ( we can still use iptables but "iptables" on RHEL8 is actually using the kernel's nft framework in the background - look at "Running Iptables on RHEL 8").
9.2.1. nftables replaces iptables as the default network packet filtering framework
I believe that nftables may cause this network issues because as we can find on nftables adoption page:
Kubernetes does not support nftables yet.
Note: For now I highly recommend you to use RHEL7 instead of RHEL8.
With that in mind, I'll present some information that may help you with RHEL8.
I have reproduced your issue and found a solution that works for me.
First I opened ports required by Calico - these ports can be found
here under "Network requirements".
As workaround:
Next I reverted to the old iptables backend on all cluster
nodes, you can easily do so by setting FirewallBackend in
/etc/firewalld/firewalld.conf to iptables as described
here.
Finally I restarted firewalld to make the new rules active.
I've tried nslookup from Pod running on worker node (kworker) and it seems to work correctly.
root#kmaster:~# kubectl get pod,svc -o wide
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
pod/web 1/1 Running 0 112s 10.99.32.1 kworker <none> <none>
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE SELECTOR
service/kubernetes ClusterIP 10.99.0.1 <none> 443/TCP 5m51s <none>
root#kmaster:~# kubectl exec -it web -- bash
root#web:/# nslookup kubernetes.default
Server: 10.99.0.10
Address: 10.99.0.10#53
Name: kubernetes.default.svc.cluster.local
Address: 10.99.0.1
root#web:/#
In my situation, we're using the K3S cluster. And the new agent couldn't make the default(ClusterFirst) DNS query. After lots of research, I found I need to change the kube-proxy cluster-cidr args to make the DNS work successfully.
Hope this info is useful for others.
I ran into the same issue setting up a vanilla kubeadm 1.25 cluster on RHEL8 and #matt_j's answer lead me to another solution that avoids nftables by using ipvs mode in kube-proxy.
Just modify the kube-proxy ConfigMap in kube-system namespace so the config.conf file has this value;
...
data:
config.conf:
...
mode: "ipvs"
...
And ensure kube-proxy or your nodes are restarted.

How can I get CoreDNS to resolve on my Raspberry Pi Kubernetes cluster?

I've followed a number of online tutorials to set up a Kubernetes cluster on four Raspberry Pi 4s. I ended up using Flannel as the networking plugin as that seems to be the only one that actually works on RPi, with a pod network CIDR of 10.244.0.0/16, per this guide from 2017. Most everything is working... all of the base pods in the kube-system namespace are running/healthy, and I can pull down images and launch new containers. At first I wasn't able to get any pod logs, but that was quickly remedied by opening up port 10250 on each node.
But there still seems to be a problem DNS resolution. I should clarify that DNS resolution on the hosts clearly does work, as the cluster is able to download any container image I specify. But once a container is running, it isn't able to "dial out" to anything. As a test, I'm running the arm32v7/buildpack-deps:latest container in a pod. It pulls the image from Docker hub just fine. But when I shell into it and simply type curl https://www.google.com it hangs before eventually timing out. And the same is true of any pod I launch that needs to interact with the external Internet: they hang and hang and hang.
Here are all the networking-related commands I've already run on each node:
sudo iptables -P FORWARD ACCEPT
sudo iptables -A FORWARD -i cni0 -j ACCEPT
sudo iptables -A FORWARD -o cni0 -j ACCEPT
sudo ufw allow ssh
sudo ufw allow 443 # can't remember why i ran this one
sudo ufw allow 6443
sudo ufw allow 8080 # this one might not be strictly necessary, either
sudo ufw allow 10250
sudo ufw default allow routed
sudo ufw enable
I'm not entirely sure that the last two iptables commands did anything; I grabbed them from the comment section of that guide I linked to earlier. I know that guide assumes one is using kube-dns but it's also 3 years old so I am using the (newer) default, coredns, instead.
What am I missing? I feel like I'm so close to having this cluster fully operational, but obviously I need functioning DNS!
UPDATE: I know that it's a DNS problem, and not general Internet connectivity, for two reasons: (1) the cluster itself can pull down any image I specify from Dockerhub, and (2) when I shell into a running container that has curl and execute curl -H "Host: www.google.com" 142.250.73.206, it successfully returns the Google homepage HTML. But as mentioned if I try and do my earlier curl command using the hostname, that times out.
Create a simple Pod to use as a test environment for DNS diagnosing:
apiVersion: v1
kind: Pod
metadata:
name: dnsutils
namespace: default
spec:
containers:
- name: dnsutils
image: gcr.io/kubernetes-e2e-test-images/dnsutils:1.3
command:
- sleep
- "3600"
imagePullPolicy: IfNotPresent
restartPolicy: Always
kubectl apply -f dnsutils.yaml
Check the status of Pod
$ kubectl get pods dnsutils
NAME READY STATUS RESTARTS AGE
dnsutils 1/1 Running 0 <some-time>
Once that Pod is running, you can exec nslookup in that environment. If you see something like the following, DNS is working correctly.
$ kubectl exec -i -t dnsutils -- nslookup kubernetes.default
Server: 10.0.0.10
Address 1: 10.0.0.10
Name: kubernetes.default
Address 1: 10.0.0.1
If the nslookup command fails, check the following:
Take a look inside the resolv.conf file.
kubectl exec -ti dnsutils -- cat /etc/resolv.conf
Verify that the search path and name server are set up like the following (note that search path may vary for different cloud providers):
search default.svc.cluster.local svc.cluster.local cluster.local google.internal c.gce_project_id.internal
nameserver 10.0.0.10
options ndots:5
Errors such as the following indicate a problem with the CoreDNS (or kube-dns) add-on or with associated Services:
$ kubectl exec -i -t dnsutils -- nslookup kubernetes.default
Server: 10.0.0.10
Address 1: 10.0.0.10
nslookup: can't resolve 'kubernetes.default'
OR
Server: 10.0.0.10
Address 1: 10.0.0.10 kube-dns.kube-system.svc.cluster.local
nslookup: can't resolve 'kubernetes.default'
Check if the DNS pod is running
$ kubectl get pods --namespace=kube-system -l k8s-app=kube-dns
NAME READY STATUS RESTARTS AGE
...
coredns-7b96bf9f76-5hsxb 1/1 Running 0 1h
coredns-7b96bf9f76-mvmmt 1/1 Running 0 1h
...
Check for errors in the DNS pod
Here is an example of a healthy CoreDNS log:
$ kubectl logs --namespace=kube-system -l k8s-app=kube-dns
.:53
2018/08/15 14:37:17 [INFO] CoreDNS-1.2.2
2018/08/15 14:37:17 [INFO] linux/amd64, go1.10.3, 2e322f6
CoreDNS-1.2.2
linux/amd64, go1.10.3, 2e322f6
2018/08/15 14:37:17 [INFO] plugin/reload: Running configuration MD5 = 24e6c59e83ce706f07bcc82c31b1ea1c
Verify that the DNS service is up by using the kubectl get service command.
$ kubectl get svc --namespace=kube-system
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
...
kube-dns ClusterIP 10.0.0.10 <none> 53/UDP,53/TCP 1h
...
You can verify that DNS endpoints are exposed by using the kubectl get endpoints command.
$ kubectl get endpoints kube-dns --namespace=kube-system
NAME ENDPOINTS AGE
kube-dns 10.180.3.17:53,10.180.3.17:53 1h
You can verify if queries are being received by CoreDNS by adding the log plugin to the CoreDNS configuration (aka Corefile). The CoreDNS Corefile is held in a ConfigMap named coredns. To edit it, use the command:
$ kubectl -n kube-system edit configmap coredns
Then add log in the Corefile section per the example below:
apiVersion: v1
kind: ConfigMap
metadata:
name: coredns
namespace: kube-system
data:
Corefile: |
.:53 {
log
errors
health
kubernetes cluster.local in-addr.arpa ip6.arpa {
pods insecure
upstream
fallthrough in-addr.arpa ip6.arpa
}
prometheus :9153
forward . /etc/resolv.conf
cache 30
loop
reload
loadbalance
}
After saving the changes, it may take up to minute or two for Kubernetes to propagate these changes to the CoreDNS pods.
Next, make some queries and view the logs per the sections above in this document. If CoreDNS pods are receiving the queries, you should see them in the logs.
Here is an example of a query in the log:
.:53
2018/08/15 14:37:15 [INFO] CoreDNS-1.2.0
2018/08/15 14:37:15 [INFO] linux/amd64, go1.10.3, 2e322f6
CoreDNS-1.2.0
linux/amd64, go1.10.3, 2e322f6
2018/09/07 15:29:04 [INFO] plugin/reload: Running configuration MD5 = 162475cdf272d8aa601e6fe67a6ad42f
2018/09/07 15:29:04 [INFO] Reloading complete
172.17.0.18:41675 - [07/Sep/2018:15:29:11 +0000] 59925 "A IN kubernetes.default.svc.cluster.local. udp 54 false 512" NOERROR qr,aa,rd,ra 106 0.000066649s
As pointed out in the comments: The configuration of kubeadm seems fine.
Your pods have the correct /etc/resolv.conf and they should work.
It's pretty hard to clarily determine the problem - many things can be happend here.
My guess: There something not right with ufw.
You can easily proof it: Disable ufw on all nodes (with ufw disable).
I'm not hundred percent sure which ports are needed. I'm using iptables for my single node k8s and at the start I had many problems FORWARD vs INPUT rules. In docker all ports are forwarded.
So I guess there is something wrong with FORWARD-rules and/or the dns-ports (53/udp and 53/tcp).
Good luck.

kubernetes v1.18: DNS resolution records for Pods

The question is for pods DNS resolution in kubernetes. A statement from official doc here (choose v1.18 from top right dropdown list):
https://kubernetes.io/docs/concepts/services-networking/dns-pod-service/#pods
Pods.
A/AAAA records
Any pods created by a Deployment or DaemonSet have the following DNS resolution available:
pod-ip-address.deployment-name.my-namespace.svc.cluster-domain.example.
Here is my kubernetes environments:
master $ kubectl version
Client Version: version.Info{Major:"1", Minor:"18", GitVersion:"v1.18.0", GitCommit:"9e991415386e4cf155a24b1da15becaa390438d8", GitTreeState:"clean", BuildDate:"2020-03-25T14:58:59Z", GoVersion:"go1.13.8", Compiler:"gc", Platform:"linux/amd64"}
Server Version: version.Info{Major:"1", Minor:"18", GitVersion:"v1.18.0", GitCommit:"9e991415386e4cf155a24b1da15becaa390438d8", GitTreeState:"clean", BuildDate:"2020-03-25T14:50:46Z", GoVersion:"go1.13.8", Compiler:"gc", Platform:"linux/amd64"}
After I create a simple deployment using kubectl create deploy nginx --image=nginx, then I create a busybox pod in test namespace to do nslookup like this:
kubectl create ns test
cat <<EOF | kubectl apply -n test -f -
apiVersion: v1
kind: Pod
metadata:
name: busybox1
labels:
name: busybox
spec:
containers:
- image: busybox:1.28
command:
- sleep
- "3600"
name: busybox
EOF
Then I do nslookup like this, according to the offical doc pod-ip-address.deployment-name.my-namespace.svc.cluster-domain.example:
master $ kubectl get pods -o wide
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
nginx-f89759699-h8cj9 1/1 Running 0 12m 10.244.1.4 node01 <none> <none>
master $ kubectl get deploy -o wide
NAME READY UP-TO-DATE AVAILABLE AGE CONTAINERS IMAGES SELECTOR
nginx 1/1 1 1 17m nginx nginx app=nginx
master $ kubectl exec -it busybox1 -n test -- nslookup 10.244.1.4.nginx.default.svc.cluster.local
Server: 10.96.0.10
Address 1: 10.96.0.10 kube-dns.kube-system.svc.cluster.local
nslookup: can't resolve '10.244.1.4.nginx.default.svc.cluster.local'
command terminated with exit code 1
master $ kubectl exec -it busybox1 -n test -- nslookup 10-244-1-4.nginx.default.svc.cluster.local
Server: 10.96.0.10
Address 1: 10.96.0.10 kube-dns.kube-system.svc.cluster.local
nslookup: can't resolve '10-244-1-4.nginx.default.svc.cluster.local'
command terminated with exit code 1
Question 1:
Why nslookup for the name failed? Is there something I did wrong?
When I continue to explore the dns name for pods, I did this:
master $ kubectl exec -it busybox1 -n test -- nslookup 10-244-1-4.default.pod.cluster.local
Server: 10.96.0.10
Address 1: 10.96.0.10 kube-dns.kube-system.svc.cluster.local
Name: 10-244-1-4.default.pod.cluster.local
Address 1: 10.244.1.4
master $ kubectl exec -it busybox1 -n test -- nslookup 10-244-1-4.test.pod.cluster.local
Server: 10.96.0.10
Address 1: 10.96.0.10 kube-dns.kube-system.svc.cluster.local
Name: 10-244-1-4.test.pod.cluster.local
Address 1: 10.244.1.4
Questions 2:
Why nslookup 10-244-1-4.test.pod.cluster.local succeeded even the pod of 10.244.1.4 is in default namespace?
Regarding your first question, as far as I could check your assumptions are right, it seems like the documentation isn't accurate. The A/AAAA reference for pods is something new in the documentation (1.18). For that I highly encourage you to open an issue here so the developers can take a closer look into it.
I recommend you to refer to 1.17 documentation on that regard as it's to be reflecting the actual thing.
In 1.17 we can see this note:
Note: Because A or AAAA records are not created for Pod names, hostname is required for the Pod’s A or AAAA record to be created. A Pod with no hostname but with subdomain will only create the A or AAAA record for the headless service (default-subdomain.my-namespace.svc.cluster-domain.example), pointing to the Pod’s IP address. Also, Pod needs to become ready in order to have a record unless publishNotReadyAddresses=True is set on the Service.
As far as I could check this is still true on 1.18 despite of what the documentation is saying.
Regarding question two is going to the same direction and you can also open an issue but I personally don't see any practical reason for the usage of IP Based DNS names. These names are there for kubernetes internal use and using it isn't giving you any advantage.
The best scenario is to use service based dns names on Kubernetes. It's proven to be very reliable.
For questions 1, it may be a doc inaccuracy. If I create a ClusterIP service for the deployment:
kubectl expose deploy nginx --name=front-end --port=80
Then I can see this name:
kubectl exec -it busybox1 -n test -- nslookup 10-244-1-4.front-end.default.svc.cluster.local
Server: 10.96.0.10
Address 1: 10.96.0.10 kube-dns.kube-system.svc.cluster.local
Name: 10-244-1-4.front-end.default.svc.cluster.local
Address 1: 10.244.1.4 10-244-1-4.front-end.default.svc.cluster.local