new to Kubernetes, but have used K3s a little in the past. Just setup a K8s cluster. None of my pods can do DNS lookups, even to google, or to an internal domain.
I init'd with: --pod-network-cidr=10.244.0.0/16. Metal-LB is installed (10.7.7.10-10.7.7.254) and the nodes and master are running with IPs 10.7.50.X/16 and 10.7.60.X/16 respectively. Flannel is setup with the default Kube-Flannel: https://raw.githubusercontent.com/coreos/flannel/master/Documentation/kube-flannel.yml
So far it's just 1 master with 2 nodes.
Versions:
$ kubeadm version
kubeadm version: &version.Info{Major:"1", Minor:"22", GitVersion:"v1.22.1", GitCommit:"632ed300f2c34f6d6d15ca4cef3d3c7073412212", GitTreeState:"clean", BuildDate:"2021-08-19T15:44:22Z", GoVersion:"go1.16.7", Compiler:"gc", Platform:"linux/amd64"}
$ kubectl version
Client Version: version.Info{Major:"1", Minor:"22", GitVersion:"v1.22.1", GitCommit:"632ed300f2c34f6d6d15ca4cef3d3c7073412212", GitTreeState:"clean", BuildDate:"2021-08-19T15:45:37Z", GoVersion:"go1.16.7", Compiler:"gc", Platform:"linux/amd64"}
Server Version: version.Info{Major:"1", Minor:"22", GitVersion:"v1.22.1", GitCommit:"632ed300f2c34f6d6d15ca4cef3d3c7073412212", GitTreeState:"clean", BuildDate:"2021-08-19T15:39:34Z", GoVersion:"go1.16.7", Compiler:"gc", Platform:"linux/amd64"}
$ kubelet --version
Kubernetes v1.22.1
Troubleshooting commands:
$ kubectl describe service kube-dns -n kube-system
Name: kube-dns
Namespace: kube-system
Labels: k8s-app=kube-dns
kubernetes.io/cluster-service=true
kubernetes.io/name=CoreDNS
Annotations: prometheus.io/port: 9153
prometheus.io/scrape: true
Selector: k8s-app=kube-dns
Type: ClusterIP
IP Family Policy: SingleStack
IP Families: IPv4
IP: 10.96.0.10
IPs: 10.96.0.10
Port: dns 53/UDP
TargetPort: 53/UDP
Endpoints: 10.244.1.20:53,10.244.2.28:53
Port: dns-tcp 53/TCP
TargetPort: 53/TCP
Endpoints: 10.244.1.20:53,10.244.2.28:53
Port: metrics 9153/TCP
TargetPort: 9153/TCP
Endpoints: 10.244.1.20:9153,10.244.2.28:9153
Session Affinity: None
Events: <none>
$ kubectl get pods -n kube-system -o wide
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
coredns-84f8874d6d-jgvwk 1/1 Running 1 (115m ago) 21h 10.244.1.20 k-w-001 <none> <none>
coredns-84f8874d6d-qh2f4 1/1 Running 1 (115m ago) 21h 10.244.2.28 k-w-002 <none> <none>
etcd-k-m-001 1/1 Running 12 (15m ago) 2d22h 10.7.50.11 k-m-001 <none> <none>
kube-apiserver-k-m-001 1/1 Running 11 (15m ago) 2d22h 10.7.50.11 k-m-001 <none> <none>
kube-controller-manager-k-m-001 1/1 Running 12 (15m ago) 2d22h 10.7.50.11 k-m-001 <none> <none>
kube-flannel-ds-286dc 1/1 Running 10 (15m ago) 2d22h 10.7.50.11 k-m-001 <none> <none>
kube-flannel-ds-rbmhx 1/1 Running 6 (114m ago) 2d21h 10.7.60.11 k-w-001 <none> <none>
kube-flannel-ds-vjl7l 1/1 Running 4 (115m ago) 2d21h 10.7.60.12 k-w-002 <none> <none>
kube-proxy-948z8 1/1 Running 8 (15m ago) 2d22h 10.7.50.11 k-m-001 <none> <none>
kube-proxy-l7h64 1/1 Running 4 (115m ago) 2d21h 10.7.60.12 k-w-002 <none> <none>
kube-proxy-pqmsr 1/1 Running 4 (115m ago) 2d21h 10.7.60.11 k-w-001 <none> <none>
kube-scheduler-k-m-001 1/1 Running 12 (15m ago) 2d22h 10.7.50.11 k-m-001 <none> <none>
metrics-server-6dfddc5fb8-47mnb 0/1 Running 3 (115m ago) 2d20h 10.244.1.21 k-w-001 <none> <none>
$ kubectl logs --namespace=kube-system coredns-84f8874d6d-jgvwk
.:53
[INFO] plugin/reload: Running configuration MD5 = db32ca3650231d74073ff4cf814959a7
CoreDNS-1.8.4
linux/amd64, go1.16.4, 053c4d5
$ kubectl logs --namespace=kube-system coredns-84f8874d6d-qh2f4
[INFO] plugin/ready: Still waiting on: "kubernetes"
.:53
[INFO] plugin/reload: Running configuration MD5 = db32ca3650231d74073ff4cf814959a7
CoreDNS-1.8.4
linux/amd64, go1.16.4, 053c4d5
These were ran seconds apart:
$ kubectl exec -ti busybox -- nslookup kubernetes.default
Server: 10.96.0.10
Address: 10.96.0.10:53
*** Can't find kubernetes.default: No answer
*** Can't find kubernetes.default: No answer
$ kubectl exec -ti busybox -- nslookup kubernetes.default
;; connection timed out; no servers could be reached
command terminated with exit code 1
Here are some more tests:
$ kubectl exec -ti busybox -- nslookup google.com
;; connection timed out; no servers could be reached
command terminated with exit code 1
$ kubectl exec -ti busybox -- nslookup google.com 8.8.8.8
Server: 8.8.8.8
Address: 8.8.8.8:53
Non-authoritative answer:
Name: google.com
Address: 142.251.33.78
*** Can't find google.com: No answer
$ kubectl exec -ti busybox -- ping 8.8.8.8
PING 8.8.8.8 (8.8.8.8): 56 data bytes
64 bytes from 8.8.8.8: seq=0 ttl=116 time=6.437 ms
$ kubectl exec busybox -- cat /etc/resolv.conf
nameserver 10.96.0.10
search default.svc.cluster.local svc.cluster.local cluster.local
options ndots:5
$ kubectl exec -ti busybox -- nslookup kubernetes.default 10.96.0.10
Server: 10.96.0.10
Address: 10.96.0.10:53
*** Can't find kubernetes.default: No answer
*** Can't find kubernetes.default: No answer
$ kubectl exec -ti busybox -- nslookup kubernetes.default 10.96.0.10
;; connection timed out; no servers could be reached
command terminated with exit code 1
I also noticed that the kube-dns service has the app selector set to k8s-app=kube-dns and coredns has the label k8s-app=kube-dns, is this correct?
The pods running in the kube-system namespace seem to have 2 different IP ranges. One is using the Node's IP, and the other is using Flannels.
I'm not sure what's happening here, being new to Kubernetes, but it appears like the DNS pods or service are not working at all.
Edit:
Further info:
$ sudo ufw status
Status: inactive
Issue was actually Flannel. DNS queries worked fine until the nodes were restarted, and then all pod queries failed until the Flannel pods were restarted.
Man this was a rabbit hole.
See: https://github.com/flannel-io/flannel/issues/1321
Related
I'm newbie at kubernetes.
I set up a local cluster with 1 master and 2 workers (worker1,worker2) using kubeadm and virtualbox.
I chose containerd as my Container Runtime.
I'm facing a issue with networking that it's driving me crazy.
I cant ping any outside address from pods because DNS is not resolving
I used the following to set up the cluster:
kubeadm init --apiserver-advertise-address=10.16.10.10 --apiserver-cert-extra-sans=10.16.10.10 --node-name=master0 --pod-network-cidr=10.244.0.0/16
Swap and SELinux are disabled.
I'm using flannel.
[masterk8s#master0 .kube]$ kubectl get nodes -o wide
NAME STATUS ROLES AGE VERSION INTERNAL-IP EXTERNAL-IP OS-IMAGE KERNEL-VERSION CONTAINER-RUNTIME
master0 Ready control-plane,master 3h26m v1.23.1 10.16.10.10 <none> CentOS Linux 7 (Core) 3.10.0-1160.49.1.el7.x86_64 containerd://1.4.12
worker1 Ready <none> 169m v1.23.1 10.16.10.11 <none> CentOS Linux 7 (Core) 3.10.0-1160.49.1.el7.x86_64 containerd://1.4.12
worker2 Ready <none> 161m v1.23.1 10.16.10.12 <none> CentOS Linux 7 (Core) 3.10.0-1160.49.1.el7.x86_64 containerd://1.4.12
NAMESPACE NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
default pod/dnsutils 1/1 Running 1 (59m ago) 119m 10.244.3.2 worker1 <none> <none>
default pod/nginx 1/1 Running 0 11s 10.244.4.2 worker2 <none> <none>
kube-system pod/coredns-64897985d-lnzs7 1/1 Running 0 126m 10.244.0.2 master0 <none> <none>
kube-system pod/coredns-64897985d-vfngl 1/1 Running 0 126m 10.244.0.3 master0 <none> <none>
kube-system pod/etcd-master0 1/1 Running 1 (125m ago) 126m 10.16.10.10 master0 <none> <none>
kube-system pod/kube-apiserver-master0 1/1 Running 1 (125m ago) 126m 10.16.10.10 master0 <none> <none>
kube-system pod/kube-controller-manager-master0 1/1 Running 1 (125m ago) 126m 10.16.10.10 master0 <none> <none>
kube-system pod/kube-flannel-ds-6g4dm 1/1 Running 0 81m 10.16.10.12 worker2 <none> <none>
kube-system pod/kube-flannel-ds-lvgpf 1/1 Running 0 89m 10.16.10.11 worker1 <none> <none>
kube-system pod/kube-flannel-ds-pkm4k 1/1 Running 1 (125m ago) 126m 10.16.10.10 master0 <none> <none>
kube-system pod/kube-proxy-8gnfx 1/1 Running 0 89m 10.16.10.11 worker1 <none> <none>
kube-system pod/kube-proxy-cbws6 1/1 Running 0 81m 10.16.10.12 worker2 <none> <none>
kube-system pod/kube-proxy-fxvm5 1/1 Running 1 (125m ago) 126m 10.16.10.10 master0 <none> <none>
kube-system pod/kube-scheduler-master0 1/1 Running 1 (125m ago) 126m 10.16.10.10 master0 <none> <none>
NAMESPACE NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE SELECTOR
default service/kubernetes ClusterIP 10.96.0.1 <none> 443/TCP 126m <none>
kube-system service/kube-dns ClusterIP 10.96.0.10 <none> 53/UDP,53/TCP,9153/TCP 126m k8s-app=kube-dns
cat /usr/lib/systemd/system/kubelet.service.d/10-kubeadm.conf
# Note: This dropin only works with kubeadm and kubelet v1.11+
[Service]
Environment="KUBELET_KUBECONFIG_ARGS=--bootstrap-kubeconfig=/etc/kubernetes/bootstrap-kubelet.conf --kubeconfig=/etc/kubernetes/kubelet.conf"
Environment="KUBELET_CONFIG_ARGS=--config=/var/lib/kubelet/config.yaml"
# This is a file that "kubeadm init" and "kubeadm join" generates at runtime, populating the KUBELET_KUBEADM_ARGS variable dynamically
EnvironmentFile=-/var/lib/kubelet/kubeadm-flags.env
# This is a file that the user can use for overrides of the kubelet args as a last resort. Preferably, the user should use
# the .NodeRegistration.KubeletExtraArgs object in the configuration files instead. KUBELET_EXTRA_ARGS should be sourced from this file.
EnvironmentFile=-/etc/sysconfig/kubelet
ExecStart=
ExecStart=/usr/bin/kubelet $KUBELET_KUBECONFIG_ARGS $KUBELET_CONFIG_ARGS $KUBELET_KUBEADM_ARGS $KUBELET_EXTRA_ARGS
master:
[masterk8s#master0 .kube]$ ip r
default via 10.0.2.2 dev enp0s3
default via 10.16.10.1 dev enp0s9 proto static metric 102
10.0.2.0/24 dev enp0s3 proto kernel scope link src 10.0.2.15 metric 100
10.16.10.0/24 dev enp0s9 proto kernel scope link src 10.16.10.10 metric 102
10.244.0.0/24 dev cni0 proto kernel scope link src 10.244.0.1
10.244.3.0/24 via 10.244.3.0 dev flannel.1 onlink
10.244.4.0/24 via 10.244.4.0 dev flannel.1 onlink
192.168.56.0/24 dev enp0s8 proto kernel scope link src 192.168.56.100 metric 101
worker1:
[workerk8s#worker1 ~]$ ip r
default via 10.0.2.2 dev enp0s3 proto dhcp metric 100
default via 10.16.10.1 dev enp0s9 proto static metric 102
10.0.2.0/24 dev enp0s3 proto kernel scope link src 10.0.2.15 metric 100
10.16.10.0/24 dev enp0s9 proto kernel scope link src 10.16.10.11 metric 102
10.244.0.0/24 via 10.244.0.0 dev flannel.1 onlink
10.244.3.0/24 dev cni0 proto kernel scope link src 10.244.3.1
10.244.4.0/24 via 10.244.4.0 dev flannel.1 onlink
192.168.56.0/24 dev enp0s8 proto kernel scope link src 192.168.56.101 metric 101
I can reach kube-dns cluster-IP from master:
[masterk8s#master0 .kube]$ telnet 10.96.0.10 53
Trying 10.96.0.10...
Connected to 10.96.0.10.
Escape character is '^]'.
But cannot from worker:
[workerk8s#worker1 ~]$ telnet 10.96.0.10 53
Trying 10.96.0.10...
^C
I used dnsutils pod from kubernetes (https://kubernetes.io/docs/tasks/administer-cluster/dns-debugging-resolution/) to do some tests:
(This pod's been deployed on worker1 but same issue for worker2)
[masterk8s#master0 .kube]$ kubectl exec -i -t dnsutils -- nslookup kubernetes.default
^C
command terminated with exit code 1
[masterk8s#master0 .kube]$ kubectl exec -i -t dnsutils -- cat /etc/resolv.conf
search default.svc.cluster.local svc.cluster.local cluster.local Home
nameserver 10.96.0.10
options ndots:5
There's connection between nodes. But pods on different nodes can't ping each other. Example:
default pod/dnsutils 1/1 Running 1 (59m ago) 119m 10.244.3.2 worker1 <none> <none>
default pod/nginx 1/1 Running 0 11s 10.244.4.2 worker2 <none> <none>
10.244.3.2 is only reachable from worker1 and 10.224.4.2 only reachable from worker2.
My guessing is there's something wrong with kube-proxy but don't know what it could be.
I can't see any errors in pod logs.
Any suggestions?
Thanks
EDITED:
SOLVED
Flannel was using wrong interface, as my nodes have 3 network interfaces, I specified the correct one with --iface
name: kube-flannel
image: quay.io/coreos/flannel:v0.15.1
command:
- /opt/bin/flanneld
args:
- --ip-masq
- --kube-subnet-mgr
- --iface=enp0s9
Also realized firewalld was blocking requests to DNS, and solved that adding (How can I use Flannel without disabing firewalld (Kubernetes)):
firewall-cmd --add-masquerade --permanent
I installed a clean K8s cluster in virtual machines (Debian 10). After the installation and the integration into my landscape, I checked the connectivity inside my testing alpine image. As result the connection of outgoing traffic not working and no information was inside the coreDNS log. I used the workaround on my build image to overwrite my /etc/resolv.conf and replace the DNS entries (e.g. set 1.1.1.1 as Nameserver). After that temporary "hack" the connection to the internet works perfectly. But the workaround is not a long term solution and I want to use the official way. Inside the documentation of K8s coreDNS, I found the forward section and I interpret the flag like an option, to forward the inquiry to the predefined local resolver. I think the forwarding to the local resolv.conf and the resolve process works not correctly. Can anyone help me to solve that issue?
Basic setup:
K8s version: 1.19.0
K8s setup: 1 master + 2 worker nodes
Based on: Debian 10 VM's
CNI: Flannel
Status of CoreDNS Pods
kube-system coredns-xxxx 1/1 Running 1 26h
kube-system coredns-yyyy 1/1 Running 1 26h
CoreDNS Log:
.:53
[INFO] plugin/reload: Running configuration MD5 = 4e235fcc3696966e76816bcd9034ebc7
CoreDNS-1.6.7
CoreDNS config:
apiVersion: v1
data:
Corefile: |
.:53 {
errors
health {
lameduck 5s
}
ready
kubernetes cluster.local in-addr.arpa ip6.arpa {
pods insecure
fallthrough in-addr.arpa ip6.arpa
ttl 30
}
prometheus :9153
forward . /etc/resolv.conf
cache 30
loop
reload
loadbalance
}
kind: ConfigMap
metadata:
creationTimestamp: ""
name: coredns
namespace: kube-system
resourceVersion: "219"
selfLink: /api/v1/namespaces/kube-system/configmaps/coredns
uid: xxx
Ouput alpine image:
/ # nslookup -debug google.de
;; connection timed out; no servers could be reached
Output of pods resolv.conf
/ # cat /etc/resolv.conf
nameserver 10.96.0.10
search development.svc.cluster.local svc.cluster.local cluster.local invalid
options ndots:5
Output of host resolv.conf
cat /etc/resolv.conf
# Dynamic resolv.conf(5) file for glibc resolver(3) generated by resolvconf(8)
# DO NOT EDIT THIS FILE BY HAND -- YOUR CHANGES WILL BE OVERWRITTEN
nameserver 213.136.95.11
nameserver 213.136.95.10
search invalid
Output of host /run/flannel/subnet.env
cat /run/flannel/subnet.env
FLANNEL_NETWORK=10.244.0.0/16
FLANNEL_SUBNET=10.244.0.1/24
FLANNEL_MTU=1450
FLANNEL_IPMASQ=true
Output of kubectl get pods -n kube-system -o wide
coredns-54694b8f47-4sm4t 1/1 Running 0 14d 10.244.1.48 xxx3-node-1 <none> <none>
coredns-54694b8f47-6c7zh 1/1 Running 0 14d 10.244.0.43 xxx2-master <none> <none>
coredns-54694b8f47-lcthf 1/1 Running 0 14d 10.244.2.88 xxx4-node-2 <none> <none>
etcd-xxx2-master 1/1 Running 7 27d xxx.xx.xx.xxx xxx2-master <none> <none>
kube-apiserver-xxx2-master 1/1 Running 7 27d xxx.xx.xx.xxx xxx2-master <none> <none>
kube-controller-manager-xxx2-master 1/1 Running 7 27d xxx.xx.xx.xxx xxx2-master <none> <none>
kube-flannel-ds-amd64-4w8zl 1/1 Running 8 28d xxx.xx.xx.xxx xxx2-master <none> <none>
kube-flannel-ds-amd64-w7m44 1/1 Running 7 28d xxx.xx.xx.xxx xxx3-node-1 <none> <none>
kube-flannel-ds-amd64-xztqm 1/1 Running 6 28d xxx.xx.xx.xxx xxx4-node-2 <none> <none>
kube-proxy-dfs85 1/1 Running 4 28d xxx.xx.xx.xxx xxx4-node-2 <none> <none>
kube-proxy-m4hl2 1/1 Running 4 28d xxx.xx.xx.xxx xxx3-node-1 <none> <none>
kube-proxy-s7p4s 1/1 Running 8 28d xxx.xx.xx.xxx xxx2-master <none> <none>
kube-scheduler-xxx2-master 1/1 Running 7 27d xxx.xx.xx.xxx xxx2-master <none> <none>
Problem:
The (two) coreDNS pods were only deployed on the master node. You can check the settings with this command.
kubectl get pods -n kube-system -o wide | grep coredns
Solution:
I could solve the problem by scaling up the coreDNS pods and edit the deployment configuration. The following commands must be executed.
kubectl edit deployment coredns -n kube-system
Set replicas value to node quantity e.g. 3
kubectl patch deployment coredns -n kube-system -p "{\"spec\":{\"template\":{\"metadata\":{\"annotations\":{\"force-update/updated-at\":\"$(date +%s)\"}}}}}"
kubectl get pods -n kube-system -o wide | grep coredns
Source
https://blog.dbi-services.com/kubernetes-dns-resolution-using-coredns-force-update-deployment/
Hint
If you still have a problem with your coreDNS and your DNS resolution works sporadically, take a look at this post.
I'm trying to ping the kube-dns service from a dnstools pod using the cluster IP assigned to the kube-dns service. The ping request times out. From the same dnstools pod, I tried to curl the kube-dns service using the exposed port, but that timed out as well.
Following is the output of kubectl get pods --all-namespaces -o wide
NAMESPACE NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE
default pod/busybox 1/1 Running 62 2d14h 192.168.1.37 kubenode <none>
default pod/dnstools 1/1 Running 0 2d13h 192.168.1.45 kubenode <none>
default pod/nginx-deploy-7c45b84548-ckqzb 1/1 Running 0 6d11h 192.168.1.5 kubenode <none>
default pod/nginx-deploy-7c45b84548-vl4kh 1/1 Running 0 6d11h 192.168.1.4 kubenode <none>
dmi pod/elastic-deploy-5d7c85b8c-btptq 1/1 Running 0 2d14h 192.168.1.39 kubenode <none>
kube-system pod/calico-node-68lc7 2/2 Running 0 6d11h 10.62.194.5 kubenode <none>
kube-system pod/calico-node-9c2jz 2/2 Running 0 6d12h 10.62.194.4 kubemaster <none>
kube-system pod/coredns-5c98db65d4-5nprd 1/1 Running 0 6d12h 192.168.0.2 kubemaster <none>
kube-system pod/coredns-5c98db65d4-5vw95 1/1 Running 0 6d12h 192.168.0.3 kubemaster <none>
kube-system pod/etcd-kubemaster 1/1 Running 0 6d12h 10.62.194.4 kubemaster <none>
kube-system pod/kube-apiserver-kubemaster 1/1 Running 0 6d12h 10.62.194.4 kubemaster <none>
kube-system pod/kube-controller-manager-kubemaster 1/1 Running 1 6d12h 10.62.194.4 kubemaster <none>
kube-system pod/kube-proxy-9hcgv 1/1 Running 0 6d11h 10.62.194.5 kubenode <none>
kube-system pod/kube-proxy-bxw9s 1/1 Running 0 6d12h 10.62.194.4 kubemaster <none>
kube-system pod/kube-scheduler-kubemaster 1/1 Running 1 6d12h 10.62.194.4 kubemaster <none>
kube-system pod/tiller-deploy-767d9b9584-5k95j 1/1 Running 0 3d9h 192.168.1.8 kubenode <none>
nginx-ingress pod/nginx-ingress-66wts 1/1 Running 0 5d17h 192.168.1.6 kubenode <none>
In the above output, why do some pods have an IP assigned in the 192.168.0.0/24 subnet whereas others have an IP that is equal to the IP address of my node/master? (10.62.194.4 is the IP of my master, 10.62.194.5 is the IP of my node)
This is the config.yml I used to initialize the cluster using kubeadm init --config=config.yml
apiServer:
certSANs:
- 10.62.194.4
extraArgs:
authorization-mode: Node,RBAC
timeoutForControlPlane: 4m0s
apiVersion: kubeadm.k8s.io/v1beta2
certificatesDir: /etc/kubernetes/pki
clusterName: dev-cluster
controllerManager: {}
dns:
type: CoreDNS
etcd:
local:
dataDir: /var/lib/etcd
imageRepository: k8s.gcr.io
kind: ClusterConfiguration
kubernetesVersion: v1.15.1
networking:
dnsDomain: cluster.local
podSubnet: 192.168.0.0/16
serviceSubnet: 10.96.0.0/12
scheduler: {}
Result of kubectl get svc --all-namespaces -o wide
NAMESPACE NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE SELECTOR
default service/kubernetes ClusterIP 10.96.0.1 <none> 443/TCP 6d12h <none>
default service/nginx-deploy ClusterIP 10.97.5.194 <none> 80/TCP 5d17h run=nginx
dmi service/elasticsearch ClusterIP 10.107.84.159 <none> 9200/TCP,9300/TCP 2d14h app=dmi,component=elasticse
dmi service/metric-server ClusterIP 10.106.117.2 <none> 8098/TCP 2d14h app=dmi,component=metric-se
kube-system service/calico-typha ClusterIP 10.97.201.232 <none> 5473/TCP 6d12h k8s-app=calico-typha
kube-system service/kube-dns ClusterIP 10.96.0.10 <none> 53/UDP,53/TCP,9153/TCP 6d12h k8s-app=kube-dns
kube-system service/tiller-deploy ClusterIP 10.98.133.94 <none> 44134/TCP 3d9h app=helm,name=tiller
The command I ran was kubectl exec -ti dnstools -- curl 10.96.0.10:53
EDIT:
I raised this question because I got this error when trying to resolve service names from within the cluster. I was under the impression that I got this error because I cannot ping the DNS server from a pod.
Output of kubectl exec -ti dnstools -- nslookup kubernetes.default
;; connection timed out; no servers could be reached
command terminated with exit code 1
Output of kubectl exec dnstools cat /etc/resolv.conf
nameserver 10.96.0.10
search default.svc.cluster.local svc.cluster.local cluster.local reddog.microsoft.com
options ndots:5
Result of kubectl get ep kube-dns --namespace=kube-system
NAME ENDPOINTS AGE
kube-dns 192.168.0.2:53,192.168.0.3:53,192.168.0.2:53 + 3 more... 6d13h
EDIT:
Ping-ing the CoreDNS pod directly using its Pod IP times out as well:
/ # ping 192.168.0.2
PING 192.168.0.2 (192.168.0.2): 56 data bytes
^C
--- 192.168.0.2 ping statistics ---
24 packets transmitted, 0 packets received, 100% packet loss
EDIT:
I think something has gone wrong when I was setting up the cluster. Below are the steps I took when setting up the cluster:
Edit host files on master and worker to include the IP's and hostnames of the nodes
Disabled swap using swapoff -a and disabled swap permanantly by editing /etc/fstab
Install docker prerequisites using apt-get install apt-transport-https ca-certificates curl software-properties-common -y
Added Docker GPG key using curl -fsSL https://download.docker.com/linux/ubuntu/gpg | apt-key add -
Added Docker repo using add-apt-repository "deb [arch=amd64] https://download.docker.com/linux/ubuntu $(lsb_release -cs) stable"
Install Docker using apt-get update -y; -get install docker-ce -y
Install Kubernetes prerequisites using curl -s https://packages.cloud.google.com/apt/doc/apt-key.gpg | apt-key add -
Added Kubernetes repo using echo 'deb http://apt.kubernetes.io/ kubernetes-xenial main' | sudo tee /etc/apt/sources.list.d/kubernetes.list
Update repo and install Kubernetes components using apt-get update -y; apt-get install kubelet kubeadm kubectl -y
Configure master node:
kubeadm init --apiserver-advertise-address=10.62.194.4 --apiserver-cert-extra-sans=10.62.194.4 --pod-network-cidr=192.168.0.0/16
Copy Kube config to $HOME: mkdir -p $HOME/.kube; sudo cp -i /etc/kubernetes/admin.conf $HOME/.kube/config; sudo chown $(id -u):$(id -g) $HOME/.kube/config
Installed Calico using kubectl apply -f https://docs.projectcalico.org/v3.1/getting-started/kubernetes/installation/hosted/rbac-kdd.yaml; kubectl apply -f https://docs.projectcalico.org/v3.1/getting-started/kubernetes/installation/hosted/kubernetes-datastore/calico-networking/1.7/calico.yaml
On node:
On the node I did the kubeadm join command using the command printed out from kubeadm token create --print-join-command on the master
The kubernetes system pods get assigned the host ip since they provide low level services that are not dependant on an overlay network (or in case of calico even provide the overlay network). They have the ip of the node where they run.
A common pod uses the overlay network and gets assigned an ip from the calico range, not from the metal node they run on.
You can't access DNS (port 53) with HTTP using curl. You can use dig to query a DNS resolver.
A service IP is not reachable by ping since it is a virtual IP just used as a routing handle for the iptables rules setup by kube-proxy, therefore a TCP connection works, but ICMP not.
You can ping a pod IP though, since it is assigned from the overlay network.
You should check on the same namespace
Currently, you are in default namespace and curl to other kube-system namespace.
You should check in the same namespace, I think it works.
On some cases the local host that Elasticsearch publishes is not routable/accessible from other hosts. On these cases you will have to configure network.publish_host in the yml config file, in order for Elasticsearch to use and publish the right address.
Try configuring network.publish_host to the right public address.
See more here:
https://www.elastic.co/guide/en/elasticsearch/reference/current/modules-network.html#advanced-network-settings
note that control plane components like api server, etcd that runs on master node are bound to host network. and hence you see the ip address of the master server.
On the other hand, the apps that you deployed are going to get the ips from the pod subnet range. those vary from cluster node ip's
Try below steps to test dns working or not
deploy nginx.yaml
apiVersion: v1
kind: Service
metadata:
name: nginx
labels:
app: nginx
spec:
ports:
- port: 80
name: web
clusterIP: None
selector:
app: nginx
---
apiVersion: apps/v1
kind: StatefulSet
metadata:
name: web
labels:
app: nginx
spec:
serviceName: "nginx"
replicas: 2
selector:
matchLabels:
app: nginx
template:
metadata:
labels:
app: nginx
spec:
containers:
- name: nginx
image: k8s.gcr.io/nginx-slim:0.8
ports:
- containerPort: 80
name: web
volumeMounts:
- name: www
mountPath: /usr/share/nginx/html
volumes:
- name: www
emptyDir:
kuebctl create -f nginx.yaml
master $ kubectl get po
NAME READY STATUS RESTARTS AGE
web-0 1/1 Running 0 1m
web-1 1/1 Running 0 1m
master $ kubectl get svc
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
kubernetes ClusterIP 10.96.0.1 <none> 443/TCP 35m
nginx ClusterIP None <none> 80/TCP 2m
master $ kubectl run -i --tty --image busybox:1.28 dns-test --restart=Never --rm
If you don't see a command prompt, try pressing enter.
/ # nslookup nginx
Server: 10.96.0.10
Address 1: 10.96.0.10 kube-dns.kube-system.svc.cluster.local
Name: nginx
Address 1: 10.40.0.1 web-0.nginx.default.svc.cluster.local
Address 2: 10.40.0.2 web-1.nginx.default.svc.cluster.local
/ #
/ # nslookup web-0.nginx
Server: 10.96.0.10
Address 1: 10.96.0.10 kube-dns.kube-system.svc.cluster.local
Name: web-0.nginx
Address 1: 10.40.0.1 web-0.nginx.default.svc.cluster.local
/ # nslookup web-0.nginx.default.svc.cluster.local
Server: 10.96.0.10
Address 1: 10.96.0.10 kube-dns.kube-system.svc.cluster.local
Name: web-0.nginx.default.svc.cluster.local
Address 1: 10.40.0.1 web-0.nginx.default.svc.cluster.local
I'm trying to setup a basic Kubernetes cluster on a (Ubuntu 16) VM. I've just followed the getting started docs and would expect a working cluster, but unfortunately, no such luck - no pods can't seem to connect to the Kubenernetes API. Since I'm new to Kubernetes it is very tough for me to find where things are going wrong.
Provision script:
apt-get update && apt-get upgrade -y
apt-get install -y apt-transport-https curl
curl -s https://packages.cloud.google.com/apt/doc/apt-key.gpg | apt-key add -
cat <<EOF >/etc/apt/sources.list.d/kubernetes.list
deb https://apt.kubernetes.io/ kubernetes-xenial main
EOF
apt-get update
apt-get install -y kubelet kubeadm kubectl docker.io
apt-mark hold kubelet kubeadm kubectl
swapoff -a
sysctl net.bridge.bridge-nf-call-iptables=1
kubeadm init
mkdir -p /home/ubuntu/.kube
cp -i /etc/kubernetes/admin.conf /home/ubuntu/.kube/config
chown -R ubuntu:ubuntu /home/ubuntu/.kube
runuser -l ubuntu -c "kubectl apply -f \"https://cloud.weave.works/k8s/net?k8s-version=$(kubectl version | base64 | tr -d '\n')\""
runuser -l ubuntu -c "kubectl taint nodes --all node-role.kubernetes.io/master-"
Installation seems fine.
ubuntu#packer-Ubuntu-16:~$ kubectl get pods -o wide --all-namespaces
NAMESPACE NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
kube-system coredns-86c58d9df4-lbp46 0/1 CrashLoopBackOff 7 18m 10.32.0.2 packer-ubuntu-16 <none> <none>
kube-system coredns-86c58d9df4-t8nnn 0/1 CrashLoopBackOff 7 18m 10.32.0.3 packer-ubuntu-16 <none> <none>
kube-system etcd-packer-ubuntu-16 1/1 Running 0 17m 145.100.100.100 packer-ubuntu-16 <none> <none>
kube-system kube-apiserver-packer-ubuntu-16 1/1 Running 0 18m 145.100.100.100 packer-ubuntu-16 <none> <none>
kube-system kube-controller-manager-packer-ubuntu-16 1/1 Running 0 17m 145.100.100.100 packer-ubuntu-16 <none> <none>
kube-system kube-proxy-dwhhf 1/1 Running 0 18m 145.100.100.100 packer-ubuntu-16 <none> <none>
kube-system kube-scheduler-packer-ubuntu-16 1/1 Running 0 17m 145.100.100.100 packer-ubuntu-16 <none> <none>
kube-system weave-net-sfvz5 2/2 Running 0 18m 145.100.100.100 packer-ubuntu-16 <none> <none>
Question: is it normal that the Kubernetes pods have as IP the ip of eth0 of the host (145.100.100.100)? Seems weird to me, I would expect them to have a virtual IP?
As you can see the coredns pod is crashing, because, well, it cannot reach the API.
This is as I understand it, the service:
ubuntu#packer-Ubuntu-16:~$ kubectl get svc
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
kubernetes ClusterIP 10.96.0.1 <none> 443/TCP 22m
CoreDNS crashing, because API is unreachable:
ubuntu#packer-Ubuntu-16:~$ kubectl logs -n kube-system coredns-86c58d9df4-lbp46
.:53
2018-12-06T12:54:28.481Z [INFO] CoreDNS-1.2.6
2018-12-06T12:54:28.481Z [INFO] linux/amd64, go1.11.2, 756749c
CoreDNS-1.2.6
linux/amd64, go1.11.2, 756749c
[INFO] plugin/reload: Running configuration MD5 = f65c4821c8a9b7b5eb30fa4fbc167769
E1206 12:54:53.482269 1 reflector.go:205] github.com/coredns/coredns/plugin/kubernetes/controller.go:318: Failed to list *v1.Namespace: Get https://10.96.0.1:443/api/v1/namespaces?limit=500&resourceVersion=0: dial tcp 10.96.0.1:443: i/o timeout
E1206 12:54:53.482363 1 reflector.go:205] github.com/coredns/coredns/plugin/kubernetes/controller.go:311: Failed to list *v1.Service: Get https://10.96.0.1:443/api/v1/services?limit=500&resourceVersion=0: dial tcp 10.96.0.1:443: i/o timeout
E1206 12:54:53.482540 1 reflector.go:205] github.com/coredns/coredns/plugin/kubernetes/controller.go:313: Failed to list *v1.Endpoints: Get https://10.96.0.1:443/api/v1/endpoints?limit=500&resourceVersion=0: dial tcp 10.96.0.1:443: i/o timeout
I tried launching a simple alpine pod/container. And indeed 10.96.0.1 doesn't responds to pings or anything else.
I'm stuck here. I've tried to google a lot but nothing comes up and my understanding is pretty basic. I guess something's up with the networking, but I don't know what (for me it seems suspicious that when doing get pods, the pods show up with the host IP, but perhaps this is normal also?)
I found that the problem is caused by the host's iptables rules.
I created a kubeadm (Kubernetes 1.8) cluster on my Fedora machine with one vagrant node. The cluster is running fine but I am facing a weird issue when I test my dns:
$ kubectl exec busybox -- nslookup friendservice.mynamespace
Server: 10.96.0.10
Address 1: 10.96.0.10 kube-dns.kube-system.svc.cluster.local
Name: friendservice.mynamespace
Address 1: 10.44.0.2 friendservice-
0.friendservice.mynamespace.svc.cluster.local
$ kubectl -n mynamespace exec userservice-0 -- nslookup
friendservice.mynamespace
nslookup: can't resolve '(null)': Name does not resolve
Name: friendservice
Address 1: 10.44.0.2 friendservice-
0.friendservice.mynamespace.svc.cluster.local
nslookup from a busybox pod in the default namespace of a service running in the mynamespace namespace is working fine, but it seems when I try to do nslookup of a service in the same custom namespace (mynamespace) then dns first fails to resolve but then resolves. What am I missing here?
$ kubectl get pods --all-namespaces
kubectl get pods --all-namespaces
NAMESPACE NAME READY STATUS RESTARTS AGE
default busybox 1/1 Running 2 2h
kube-system etcd-fed-master 1/1 Running 6 2h
kube-system kube-apiserver-fed-master 1/1 Running 0 2h
kube-system kube-controller-manager-fed-master 1/1 Running 0 2h
kube-system kube-dns-545bc4bfd4-jkhrr 3/3 Running 0 2h
kube-system kube-proxy-5vcvr 1/1 Running 0 2h
kube-system kube-proxy-f4765 1/1 Running 0 2h
kube-system kube-scheduler-fed-master 1/1 Running 1 2h
kube-system weave-net-jw647 2/2 Running 0 2h
kube-system weave-net-z25rv 2/2 Running 0 2h
mynamespace friendservice-0 1/1 Running 5 10m
mynamespace userservice-0 1/1 Running 0 26m
$ kubectl exec busybox -- cat /etc/resolv.conf
nameserver 10.96.0.10
search default.svc.cluster.local svc.cluster.local cluster.local
options ndots:5
$ kubectl -n mynamespace exec bookentryservice-0 -- cat /etc/resolv.conf
nameserver 10.96.0.10
search mynamespace.svc.cluster.local svc.cluster.local cluster.local
options ndots:5
Any help will be greatly appreciated.
This is a problem with Alpine Linux and its musl library. It has broken DNS functionality and it has been this way for years and they apparently aren't really bothered to fix it.
https://github.com/gliderlabs/docker-alpine/blob/master/docs/caveats.md#dns
https://github.com/gliderlabs/docker-alpine/issues/8