Can not connect to external location inside a Kubernetes Pod - DNS Issues - kubernetes

i have the following problem. I have a namespace "qa". Pods inside this namespace can communicate with each other.
For Example
kubectl exec -it qa-file-watcher-85575bd8f7-npkns -n qa /bin/bash
root#qa-file-watcher-85575bd8f7-npkns:/usr/src/app# nslookup qa-kafka-broker
root#qa-file-watcher-85575bd8f7-npkns:/usr/src/app# nslookup qa-kafka-broker
Server: 10.96.0.10
Address: 10.96.0.10#53
Name: qa-kafka-broker.qa.svc.cluster.local
Address: 10.102.218.167
But if i try to connect to an external service e.g. 8.8.8.8 oder security.debian.org for apt-get update i get the following errors
root#qa-file-watcher-85575bd8f7-npkns:/usr/src/app# nslookup 8.8.8.8
Server: 10.96.0.10
Address: 10.96.0.10#53
** server can't find 8.8.8.8.in-addr.arpa: SERVFAIL
root#qa-file-watcher-85575bd8f7-npkns:/usr/src/app# nslookup security.debian.org
Server: 10.96.0.10
Address: 10.96.0.10#53
** server can't find security.debian.org.eu-central-1.compute.internal: SERVFAIL
Here are some informations about the setup. I use a bitnami/kubernetes image on a EC2-instance on AWS
bitnami#ip-172-30-0-120:~/buildAgent/work/aad99852b1e5781f$ kubectl version
Client Version: version.Info{Major:"1", Minor:"17", GitVersion:"v1.17.3", GitCommit:"06ad960bfd03b39c8310aaf92d1e7c12ce618213", GitTreeState:"clean", BuildDate:"2020-02-11T18:14:22Z", GoVersion:"go1.13.6", Compiler:"gc", Platform:"linux/amd64"}
Server Version: version.Info{Major:"1", Minor:"17", GitVersion:"v1.17.3", GitCommit:"06ad960bfd03b39c8310aaf92d1e7c12ce618213", GitTreeState:"clean", BuildDate:"2020-02-11T18:07:13Z", GoVersion:"go1.13.6", Compiler:"gc", Platform:"linux/amd64"}
bitnami#ip-172-30-0-120:~/buildAgent/work/aad99852b1e5781f$ cat /etc/os-release
NAME="Ubuntu"
VERSION="16.04.6 LTS (Xenial Xerus)"
ID=ubuntu
ID_LIKE=debian
PRETTY_NAME="Ubuntu 16.04.6 LTS"
VERSION_ID="16.04"
HOME_URL="http://www.ubuntu.com/"
SUPPORT_URL="http://help.ubuntu.com/"
BUG_REPORT_URL="http://bugs.launchpad.net/ubuntu/"
VERSION_CODENAME=xenial
UBUNTU_CODENAME=xenial
bitnami#ip-172-30-0-120:~/buildAgent/work/aad99852b1e5781f$ cat /etc/resolv.conf
# Dynamic resolv.conf(5) file for glibc resolver(3) generated by resolvconf(8)
# DO NOT EDIT THIS FILE BY HAND -- YOUR CHANGES WILL BE OVERWRITTEN
nameserver 172.30.0.2
search xxxxxxxx.compute.internal default.svc.cluster.local svc.cluster.local cluster.local deb.debian.org
options ndots:5 single request-reopen
DNS=8.8.8.8
there are running coredns on the kubernetes with the following config
# Please edit the object below. Lines beginning with a '#' will be ignored,
# and an empty file will abort the edit. If an error occurs while saving this file will be
# reopened with the relevant failures.
#
apiVersion: v1
data:
Corefile: |
.:53 {
log
errors
health {
lameduck 5s
}
ready
kubernetes cluster.local in-addr.arpa ip6.arpa {
pods insecure
fallthrough in-addr.arpa ip6.arpa
ttl 30
}
prometheus :9153
forward . /etc/resolv.conf
cache 30
loop
reload
loadbalance
}
kind: ConfigMap
metadata:
creationTimestamp: "2020-02-25T12:52:17Z"
name: coredns
namespace: kube-system
resourceVersion: "31099780"
selfLink: /api/v1/namespaces/kube-system/configmaps/coredns
uid: 26a6800a-2ceb-4f29-ab85-82beaec0add8
Anyone has an idea what is going wrong here? If more detailed informations are needed pleas let me know.
Greetings and Thanks
EDIT:
this are the pods which are running on the namespace kube-system
bitnami#ip-172-30-0-120:~/deployments/qa-deployment$ kubectl get pods -n kube-system
NAME READY STATUS RESTARTS AGE
coredns-6955765f44-5glwz 1/1 Running 0 151m
coredns-6955765f44-hf2hd 1/1 Running 0 151m
etcd-ip-172-30-0-120 1/1 Running 4 9d
heapster-744b794df7-v2vz9 1/1 Running 1 9d
kube-apiserver-ip-172-30-0-120 1/1 Running 4 9d
kube-controller-manager-ip-172-30-0-120 1/1 Running 7 9d
kube-proxy-lfstn 1/1 Running 1 9d
kube-scheduler-ip-172-30-0-120 1/1 Running 6 9d
kubernetes-dashboard-8f7798644-m7r8x 1/1 Running 13 9d
kubernetes-metrics-scraper-6b97c6d857-nl98d 1/1 Running 0 8d
local-volume-provisioner-69vrv 1/1 Running 33 9d
monitoring-grafana-845bc8df5f-62d4x 1/1 Running 1 9d
monitoring-influxdb-56d9446bd9-wlrd5 1/1 Running 1 9d
nginx-ingress-controller-574d4c9dcf-fmdgm 1/1 Running 1 9d
registry-86c45b9d9b-pm6zj 1/1 Running 0 7d23h
weave-net-g78mj 2/2 Running 5 9d
and this is the log from the core-dns
...
...
...
[INFO] 10.32.0.35:49254 - 6294 "AAAA IN monitoring.xxxxxx.de.qa.svc.cluster.local. udp 66 false 512" NXDOMAIN qr,aa,rd 159 0.000297909s
[INFO] 10.32.0.35:55396 - 52809 "A IN monitoring.xxxxxx.de.svc.cluster.local. udp 63 false 512" NXDOMAIN qr,aa,rd 156 0.000152558s
[INFO] 10.32.0.35:55396 - 36432 "AAAA IN monitoring.xxxxxx.de.svc.cluster.local. udp 63 false 512" NXDOMAIN qr,aa,rd 156 0.000384192s
[INFO] 10.32.0.31:54436 - 61896 "AAAA IN xxxxxx.cq5rq6zjwmfc.eu-central-1.rds.amazonaws.com. udp 74 false 512" NOERROR - 0 2.000274796s
[ERROR] plugin/errors: 2 xxxxxx.cq5rq6zjwmfc.eu-central-1.rds.amazonaws.com. AAAA: read udp 10.32.0.30:41402->172.30.0.2:53: i/o timeout
[INFO] 10.32.0.31:54436 - 64312 "A IN xxxxxx.cq5rq6zjwmfc.eu-central-1.rds.amazonaws.com. udp 74 false 512" NOERROR - 0 2.000270418s
[ERROR] plugin/errors: 2 xxxxxx.cq5rq6zjwmfc.eu-central-1.rds.amazonaws.com. A: read udp 10.32.0.30:43606->172.30.0.2:53: i/o timeout
[INFO] 10.32.0.31:54436 - 8384 "AAAA IN postgres.qa.svc.cluster.local. udp 47 false 512" NOERROR qr,aa,rd 146 2.000560668s
[INFO] 10.32.0.31:54436 - 60087 "A IN postgres.qa.svc.cluster.local. udp 47 false 512" NOERROR qr,aa,rd 146 2.000566155s
EDIT2:
I cant go inside the coredns pod with
bitnami#ip-172-30-0-120:~/deployments/qa-deployment$ kubectl exec -it coredns-6955765f44-5glwz -n kube-system bash
error: Internal error occurred: error executing command in container: failed to exec in container: failed to start exec "2a604d5b8cfad5341acc0d548412f8376fdf063bf97d92d1aaa501841f959671": OCI runtime exec failed: exec failed: container_linux.go:349: starting container process caused "exec: \"bash\": executable file not found in $PATH": unknown
The resolve.conf inside the pod file-watcher-service in the namespace qa:
root#qa-file-watcher-service-7b7d47c67d-fjb8m:/etc# cat resolv.conf
search qa.svc.cluster.local svc.cluster.local cluster.local eu-central-1.compute.internal default.svc.cluster.local
nameserver 10.96.0.10
options ndots:5

Related

Kubernetes: Can't Resolve Hostnames from within pods

new to Kubernetes, but have used K3s a little in the past. Just setup a K8s cluster. None of my pods can do DNS lookups, even to google, or to an internal domain.
I init'd with: --pod-network-cidr=10.244.0.0/16. Metal-LB is installed (10.7.7.10-10.7.7.254) and the nodes and master are running with IPs 10.7.50.X/16 and 10.7.60.X/16 respectively. Flannel is setup with the default Kube-Flannel: https://raw.githubusercontent.com/coreos/flannel/master/Documentation/kube-flannel.yml
So far it's just 1 master with 2 nodes.
Versions:
$ kubeadm version
kubeadm version: &version.Info{Major:"1", Minor:"22", GitVersion:"v1.22.1", GitCommit:"632ed300f2c34f6d6d15ca4cef3d3c7073412212", GitTreeState:"clean", BuildDate:"2021-08-19T15:44:22Z", GoVersion:"go1.16.7", Compiler:"gc", Platform:"linux/amd64"}
$ kubectl version
Client Version: version.Info{Major:"1", Minor:"22", GitVersion:"v1.22.1", GitCommit:"632ed300f2c34f6d6d15ca4cef3d3c7073412212", GitTreeState:"clean", BuildDate:"2021-08-19T15:45:37Z", GoVersion:"go1.16.7", Compiler:"gc", Platform:"linux/amd64"}
Server Version: version.Info{Major:"1", Minor:"22", GitVersion:"v1.22.1", GitCommit:"632ed300f2c34f6d6d15ca4cef3d3c7073412212", GitTreeState:"clean", BuildDate:"2021-08-19T15:39:34Z", GoVersion:"go1.16.7", Compiler:"gc", Platform:"linux/amd64"}
$ kubelet --version
Kubernetes v1.22.1
Troubleshooting commands:
$ kubectl describe service kube-dns -n kube-system
Name: kube-dns
Namespace: kube-system
Labels: k8s-app=kube-dns
kubernetes.io/cluster-service=true
kubernetes.io/name=CoreDNS
Annotations: prometheus.io/port: 9153
prometheus.io/scrape: true
Selector: k8s-app=kube-dns
Type: ClusterIP
IP Family Policy: SingleStack
IP Families: IPv4
IP: 10.96.0.10
IPs: 10.96.0.10
Port: dns 53/UDP
TargetPort: 53/UDP
Endpoints: 10.244.1.20:53,10.244.2.28:53
Port: dns-tcp 53/TCP
TargetPort: 53/TCP
Endpoints: 10.244.1.20:53,10.244.2.28:53
Port: metrics 9153/TCP
TargetPort: 9153/TCP
Endpoints: 10.244.1.20:9153,10.244.2.28:9153
Session Affinity: None
Events: <none>
$ kubectl get pods -n kube-system -o wide
NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE READINESS GATES
coredns-84f8874d6d-jgvwk 1/1 Running 1 (115m ago) 21h 10.244.1.20 k-w-001 <none> <none>
coredns-84f8874d6d-qh2f4 1/1 Running 1 (115m ago) 21h 10.244.2.28 k-w-002 <none> <none>
etcd-k-m-001 1/1 Running 12 (15m ago) 2d22h 10.7.50.11 k-m-001 <none> <none>
kube-apiserver-k-m-001 1/1 Running 11 (15m ago) 2d22h 10.7.50.11 k-m-001 <none> <none>
kube-controller-manager-k-m-001 1/1 Running 12 (15m ago) 2d22h 10.7.50.11 k-m-001 <none> <none>
kube-flannel-ds-286dc 1/1 Running 10 (15m ago) 2d22h 10.7.50.11 k-m-001 <none> <none>
kube-flannel-ds-rbmhx 1/1 Running 6 (114m ago) 2d21h 10.7.60.11 k-w-001 <none> <none>
kube-flannel-ds-vjl7l 1/1 Running 4 (115m ago) 2d21h 10.7.60.12 k-w-002 <none> <none>
kube-proxy-948z8 1/1 Running 8 (15m ago) 2d22h 10.7.50.11 k-m-001 <none> <none>
kube-proxy-l7h64 1/1 Running 4 (115m ago) 2d21h 10.7.60.12 k-w-002 <none> <none>
kube-proxy-pqmsr 1/1 Running 4 (115m ago) 2d21h 10.7.60.11 k-w-001 <none> <none>
kube-scheduler-k-m-001 1/1 Running 12 (15m ago) 2d22h 10.7.50.11 k-m-001 <none> <none>
metrics-server-6dfddc5fb8-47mnb 0/1 Running 3 (115m ago) 2d20h 10.244.1.21 k-w-001 <none> <none>
$ kubectl logs --namespace=kube-system coredns-84f8874d6d-jgvwk
.:53
[INFO] plugin/reload: Running configuration MD5 = db32ca3650231d74073ff4cf814959a7
CoreDNS-1.8.4
linux/amd64, go1.16.4, 053c4d5
$ kubectl logs --namespace=kube-system coredns-84f8874d6d-qh2f4
[INFO] plugin/ready: Still waiting on: "kubernetes"
.:53
[INFO] plugin/reload: Running configuration MD5 = db32ca3650231d74073ff4cf814959a7
CoreDNS-1.8.4
linux/amd64, go1.16.4, 053c4d5
These were ran seconds apart:
$ kubectl exec -ti busybox -- nslookup kubernetes.default
Server: 10.96.0.10
Address: 10.96.0.10:53
*** Can't find kubernetes.default: No answer
*** Can't find kubernetes.default: No answer
$ kubectl exec -ti busybox -- nslookup kubernetes.default
;; connection timed out; no servers could be reached
command terminated with exit code 1
Here are some more tests:
$ kubectl exec -ti busybox -- nslookup google.com
;; connection timed out; no servers could be reached
command terminated with exit code 1
$ kubectl exec -ti busybox -- nslookup google.com 8.8.8.8
Server: 8.8.8.8
Address: 8.8.8.8:53
Non-authoritative answer:
Name: google.com
Address: 142.251.33.78
*** Can't find google.com: No answer
$ kubectl exec -ti busybox -- ping 8.8.8.8
PING 8.8.8.8 (8.8.8.8): 56 data bytes
64 bytes from 8.8.8.8: seq=0 ttl=116 time=6.437 ms
$ kubectl exec busybox -- cat /etc/resolv.conf
nameserver 10.96.0.10
search default.svc.cluster.local svc.cluster.local cluster.local
options ndots:5
$ kubectl exec -ti busybox -- nslookup kubernetes.default 10.96.0.10
Server: 10.96.0.10
Address: 10.96.0.10:53
*** Can't find kubernetes.default: No answer
*** Can't find kubernetes.default: No answer
$ kubectl exec -ti busybox -- nslookup kubernetes.default 10.96.0.10
;; connection timed out; no servers could be reached
command terminated with exit code 1
I also noticed that the kube-dns service has the app selector set to k8s-app=kube-dns and coredns has the label k8s-app=kube-dns, is this correct?
The pods running in the kube-system namespace seem to have 2 different IP ranges. One is using the Node's IP, and the other is using Flannels.
I'm not sure what's happening here, being new to Kubernetes, but it appears like the DNS pods or service are not working at all.
Edit:
Further info:
$ sudo ufw status
Status: inactive
Issue was actually Flannel. DNS queries worked fine until the nodes were restarted, and then all pod queries failed until the Flannel pods were restarted.
Man this was a rabbit hole.
See: https://github.com/flannel-io/flannel/issues/1321

K8 DNS not resolving

My K8 DNS isn't resolving, thus I did follow the debugging steps as mentioned here. As I am new to K8, can someone point me to the issue I am facing? I cant extract any useful information out of the debugging steps.
cat /etc/os-release
NAME="Ubuntu"
VERSION="20.04.2 LTS (Focal Fossa)"
ID=ubuntu
ID_LIKE=debian
PRETTY_NAME="Ubuntu 20.04.2 LTS"
kubectl version
Client Version: version.Info{Major:"1", Minor:"20",
GitVersion:"v1.20.5", GitCommit:"6b1d87acf3c8253c123756b9e61dac642678305f", GitTreeState:"clean", BuildDate:"2021-03-18T01:10:43Z", GoVersion:"go1.15.8", Compiler:"gc", Platform:"linux/amd64"}
Server Version: version.Info{Major:"1", Minor:"20", GitVersion:"v1.20.5", GitCommit:"6b1d87acf3c8253c123756b9e61dac642678305f", GitTreeState:"clean", BuildDate:"2021-03-18T01:02:01Z", GoVersion:"go1.15.8", Compiler:"gc", Platform:"linux/amd64"}
kubectl get namespace
NAME STATUS AGE
default Active 7d4h
kubectl get pods dnsutils
NAME READY STATUS RESTARTS AGE
dnsutils 1/1 Running 18 18h
kubectl exec -i -t dnsutils -- nslookup kubernetes.default
;; connection timed out; no servers could be reached
kubectl exec -ti dnsutils -- cat /etc/resolv.conf
nameserver 10.96.0.10
search default.svc.cluster.local svc.cluster.local
cluster.local
options ndots:5
kubectl get pods --namespace=kube-system -l k8s-app=kube-dns
NAME READY STATUS RESTARTS
AGE
coredns-74ff55c5b-6vsml 1/1 Running 12 7d4h
coredns-74ff55c5b-mww7g 1/1 Running 12 7d4h
kubectl logs --namespace=kube-system -l k8s-app=kube-dns
.:53
[INFO] plugin/reload: Running configuration MD5 = db32ca3650231d74073ff4cf814959a7
CoreDNS-1.7.0
linux/amd64, go1.14.4, f59c03d
[INFO] Reloading
[INFO] plugin/health: Going into lameduck mode for 5s
[INFO] plugin/reload: Running configuration MD5 = 3d3f6363f05ccd60e0f885f0eca6c5ff
[INFO] Reloading complete
[INFO] 10.244.0.1:16732 - 59651 "HINFO IN 6307445054232439722.7934820194057826263. udp 57 false 512" NXDOMAIN qr,rd,ra 132 0.006053527s
[INFO] 127.0.0.1:58672 - 59651 "HINFO IN 6307445054232439722.7934820194057826263. udp 57 false 512" NXDOMAIN qr,rd,ra 132 0.00658948s
[INFO] plugin/reload: Running configuration MD5 = db32ca3650231d74073ff4cf814959a7
CoreDNS-1.7.0
linux/amd64, go1.14.4, f59c03d
[INFO] Reloading
[INFO] plugin/health: Going into lameduck mode for 5s
[INFO] plugin/reload: Running configuration MD5 = 3d3f6363f05ccd60e0f885f0eca6c5ff
[INFO] Reloading complete
[INFO] 10.244.0.62:56364 - 32900 "HINFO IN 2808379183970575835.6786373795048579500. udp 57 false 512" NXDOMAIN qr,rd,ra 132 0.004922932s
[INFO] 127.0.0.1:48277 - 32900 "HINFO IN 2808379183970575835.6786373795048579500. udp 57 false 512" NXDOMAIN qr,rd,ra 132 0.007889024s
[INFO] 10.244.0.62:49106 - 59651 "HINFO IN 6307445054232439722.7934820194057826263. udp 57 false 512" NXDOMAIN qr,rd,ra 132 0.005058199s
kubectl get svc --namespace=kube-system
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
kube-dns ClusterIP 10.96.0.10 <none> 53/UDP,53/TCP,9153/TCP 7d4h
monitoring-influxdb ClusterIP 10.102.51.183 <none> 8086/TCP 4d21h
kubectl get endpoints kube-dns --namespace=kube-system
NAME ENDPOINTS AGE
kube-dns 10.244.0.45:53,10.244.0.47:53,10.244.0.45:53 + 3 more... 7d4h
cat /run/systemd/resolve/resolv.conf
nameserver 8.8.8.8
nameserver 2001:4860:4860::8888
cat /etc/systemd/resolved.conf
[Resolve]
DNS=8.8.8.8 2001:4860:4860::8888
cat /etc/resolv.conf
nameserver 127.0.0.53
options edns0 trust-ad
It is kinda odd, that both resolv.conf have different values. Also, I have no clue (if I would have to set the DNS IP manually) which IP to choose.
kubeadm config view
apiServer:
extraArgs:
authorization-mode: Node,RBAC
timeoutForControlPlane: 4m0s
apiVersion: kubeadm.k8s.io/v1beta2
certificatesDir: /etc/kubernetes/pki
clusterName: kubernetes
controllerManager: {}
dns:
type: CoreDNS
etcd:
local:
dataDir: /var/lib/etcd
imageRepository: k8s.gcr.io
kind: ClusterConfiguration
kubernetesVersion: v1.20.5
networking:
dnsDomain: cluster.local
podSubnet: 10.244.0.0/16
serviceSubnet: 10.96.0.0/12
scheduler: {}
Update
The dnsutils assigned pod's IP is 10.244.2.20 and not reachable from the single k8 master node.
ping 10.244.2.20
There were several issues with my configuration. First off: I did use an incompatible docker version (20.10.5) which isn't supported yet. Hence, I don't know whether this issue also arises when using a supported docker version. However, even with this incompatible docker version, I was able to fix the issue with following steps:
1. DNS misconfiguration
I don't know who/what will set the resolved.conf's DNS entries, but my entry was clearly wrong. First, we need to obtain the K8's DNS Cluster-IP Address:
kubectl get services --all-namespaces -o wide
You will receive all services within all namespaces, including the kube-dns Cluster-IP. In my case It looks like following
NAMESPACE NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE SELECTOR
kube-system kube-dns ClusterIP 10.96.0.10 <none> 53/UDP,53/TCP,9153/TCP 11d k8s-app=kube-dns
kube-system monitoring-influxdb ClusterIP 10.102.51.183 <none> 8086/TCP 9d k8s-app=influxdb
kubernetes-dashboard dashboard-metrics-scraper ClusterIP 10.110.126.218 <none> 8000/TCP 11d k8s-app=dashboard-metrics-scraper
kubernetes-dashboard kubernetes-dashboard ClusterIP 10.98.164.199 <none> 443/TCP 11d k8s-app=kubernetes-dashboard
Use that DNS within your resolved.conf file. Where that file is located, depends on your OS. In my case (Ubuntu 20.04) /etc/systemd/resolved.conf.
nano /etc/systemd/resolved.conf
[Resolve]
DNS=10.96.0.10 8.8.8.8 2001:4860:4860::8888
2. Re-Join all nodes
I did use UFW next to IPTables, which was somehow messing with the configuration. Hence, I did remove all nodes, installed a fresh OS and re-joined the cluster; without activating UFW.
3. Forward packet policy
In some versions docker modifies the iptables, such that packets will be dropped in packet-forward scenarios. Override this behaviour on all nodes with:
iptables -P FORWARD ACCEPT
Just to be sure, also enable ipv4 forwarding with:
echo "net.ipv4.ip_forward=1" >> /etc/sysctl.conf
What is the operating system you are using. I was using redhat enterprise Linux and had the similar error.
I have removed everything in /etc/resolv.conf and kept only ip of dns server and it worked.
What is the network policy you are using, for me calico didn't work. I used kube-router with above /etc/resolv.conf setting.
Thanks.

"nslookup: read: Connection refused" from inside of a pod in Kubernetes (K8S) cluster (DNS problem)

Problem
I have custom installation of k8s cluster with 1 master and 1 node on AWS ec2 based on Centos 7. It uses Core-DNS (pods running fine with no errors in logs)
Inside of a node pod when calling e.g. nslookup google.com
the output is nslookup: write to '10.96.0.10': Connection refused
;; connection timed out; no servers could be reached
For example, pinging inside of a pod ping 8.8.8.8 works fine:
PING 8.8.8.8 (8.8.8.8): 56 data bytes
64 bytes from 8.8.8.8: seq=0 ttl=50 time=1.330 ms
/etc/resolv.conf inside a pod it looks like:
nameserver 10.96.0.10
search default.svc.cluster.local svc.cluster.local cluster.local ec2.internal
options ndots:5
This command works fine from the node itself nslookup google.com:
Server: 172.31.0.2
Address: 172.31.0.2#53
Non-authoritative answer:
Name: google.com
Address: 172.217.15.110
Name: google.com
Address: 2607:f8b0:4004:801::200e
Kubelet config kubectl get configmap kubelet-config-1.17 -n kube-system -o yaml returns
data:
kubelet: |
apiVersion: kubelet.config.k8s.io/v1beta1
authentication:
anonymous:
enabled: false
webhook:
cacheTTL: 0s
enabled: true
x509:
clientCAFile: /etc/kubernetes/pki/ca.crt
authorization:
mode: Webhook
webhook:
cacheAuthorizedTTL: 0s
cacheUnauthorizedTTL: 0s
clusterDNS:
- 10.96.0.10
clusterDomain: cluster.local
cpuManagerReconcilePeriod: 0s
evictionPressureTransitionPeriod: 0s
fileCheckFrequency: 0s
healthzBindAddress: 127.0.0.1
healthzPort: 10248
httpCheckFrequency: 0s
imageMinimumGCAge: 0s
kind: KubeletConfiguration
nodeStatusReportFrequency: 0s
nodeStatusUpdateFrequency: 0s
rotateCertificates: true
runtimeRequestTimeout: 0s
staticPodPath: /etc/kubernetes/manifests
streamingConnectionIdleTimeout: 0s
syncFrequency: 0s
volumeStatsAggPeriod: 0s
kind: ConfigMap
Pods in kube namespace kubectl get pods -n kube-system look like this:
coredns-6955765f44-qdbgx 1/1 Running 6 11d
coredns-6955765f44-r4v7n 1/1 Running 6 11d
etcd-ip-172-31-42-121.ec2.internal 1/1 Running 7 11d
kube-apiserver-ip-172-31-42-121.ec2.internal 1/1 Running 7 11d
kube-controller-manager-ip-172-31-42-121.ec2.internal 1/1 Running 6 11d
kube-proxy-lrpd9 1/1 Running 6 11d
kube-proxy-z55cv 1/1 Running 6 11d
kube-scheduler-ip-172-31-42-121.ec2.internal 1/1 Running 6 11d
weave-net-bdn5n 2/2 Running 0 39h
weave-net-z7mks 2/2 Running 5 39h
UPDATE
From the pod if I do ip route it returns:
default via 10.32.0.1 dev eth0
10.32.0.0/12 dev eth0 scope link src 10.32.0.16
From master:
default via 172.31.32.1 dev eth0
10.32.0.0/12 dev weave proto kernel scope link src 10.32.0.1
172.17.0.0/16 dev docker0 proto kernel scope link src 172.17.0.1
172.31.32.0/20 dev eth0 proto kernel scope link src 172.31.42.121
From node:
default via 172.31.32.1 dev eth0
10.32.0.0/12 dev weave proto kernel scope link src 10.32.0.1
172.17.0.0/16 dev docker0 proto kernel scope link src 172.17.0.1
172.31.32.0/20 dev eth0 proto kernel scope link src 172.31.46.62
CoreDNS configmap kubectl -n kube-system get configmap coredns -oyaml is:
apiVersion: v1
data:
Corefile: |
.:53 {
log
errors
health {
lameduck 5s
}
ready
kubernetes cluster.local in-addr.arpa ip6.arpa {
pods insecure
fallthrough in-addr.arpa ip6.arpa
ttl 30
}
prometheus :9153
forward . /etc/resolv.conf
cache 30
loop
reload
loadbalance
}
kind: ConfigMap
So why nslookup google.com doesn't work inside of a pod??
Installation of k8s cluster was wrong, ansible script should contain correct private IPs of master and nodes on ec2 vms.
dev-kubernetes-master ansible_host=34.233.207.xxx private_ip=172.31.37.xx
dev-kubernetes-slave ansible_host=52.6.10.xxx private_ip=172.31.42.xxx
I've reinstalled cluster with correct private ips specified (before there was no private ip at all) and the problem has gone.

DNS resolve problem in kubernetes cluster

We have a kubernetes cluster consist of four worker and one master node. On the worker1 and worker2 we can't resolve the DNS names but in the two other nodes everything is OK! I follow the instructions by official documentation here and I realized that the queries from worker1 and 2 are not received by the coredns pods.
I repeat all thing is good in the worker3 and worker4, I have a problem with worker1 and worker2. For example, when I run the busybox container in the worker1 and do nslookup kubernetes.default it doesn't return any thing but when it run in the worker3 the DNS resolving is OK.
Cluster information:
$ kubeadm version
kubeadm version: &version.Info{Major:"1", Minor:"12", GitVersion:"v1.12.1", GitCommit:"4ed3216f3ec431b140b1d899130a69fc671678f4", GitTreeState:"clean", BuildDate:"2018-10-05T16:43:08Z", GoVersion:"go1.10.4", Compiler:"gc", Platform:"linux/amd64"}
$ kubectl get pod -n kube-system
NAME READY STATUS RESTARTS AGE
coredns-576cbf47c7-6dtrc 1/1 Running 5 82d
coredns-576cbf47c7-jvx5l 1/1 Running 6 82d
etcd-master 1/1 Running 35 298d
kube-apiserver-master 1/1 Running 14 135m
kube-controller-manager-master 1/1 Running 42 298d
kube-proxy-22f49 1/1 Running 9 91d
kube-proxy-2s9sx 1/1 Running 34 298d
kube-proxy-jh2m7 1/1 Running 5 81d
kube-proxy-rc5r8 1/1 Running 5 63d
kube-proxy-vg8jd 1/1 Running 6 104d
kube-scheduler-master 1/1 Running 39 298d
kubernetes-dashboard-65c76f6c97-7cwwp 1/1 Running 45 293d
tiller-deploy-779784fbd6-dzq7k 1/1 Running 5 87d
weave-net-556ml 2/2 Running 12 66d
weave-net-h9km9 2/2 Running 15 81d
weave-net-s88z4 2/2 Running 0 145m
weave-net-smrgc 2/2 Running 14 63d
weave-net-xf6ng 2/2 Running 15 82d
$ kubectl logs coredns-576cbf47c7-6dtrc -n kube-system | tail -20
10.44.0.28:32837 - [14/Dec/2019:12:22:51 +0000] 2957 "AAAA IN spark-master.default.svc.cluster.local. udp 56 false 512" NOERROR qr,aa,rd,ra 149 0.000661167s
10.44.0.28:51373 - [14/Dec/2019:12:25:09 +0000] 46278 "AAAA IN spark-master.default.svc.cluster.local. udp 56 false 512" NOERROR qr,aa,rd,ra 149 0.000440918s
10.44.0.28:51373 - [14/Dec/2019:12:25:09 +0000] 47697 "A IN spark-master.default.svc.cluster.local. udp 56 false 512" NOERROR qr,aa,rd,ra 110 0.00059741s
10.44.0.28:44969 - [14/Dec/2019:12:27:27 +0000] 33222 "AAAA IN spark-master.default.svc.cluster.local. udp 56 false 512" NOERROR qr,aa,rd,ra 149 0.00044739s
10.44.0.28:44969 - [14/Dec/2019:12:27:27 +0000] 52126 "A IN spark-master.default.svc.cluster.local. udp 56 false 512" NOERROR qr,aa,rd,ra 110 0.000310494s
10.44.0.28:39392 - [14/Dec/2019:12:29:11 +0000] 41041 "AAAA IN spark-master.default.svc.cluster.local. udp 56 false 512" NOERROR qr,aa,rd,ra 149 0.000481309s
10.44.0.28:40999 - [14/Dec/2019:12:29:11 +0000] 695 "AAAA IN spark-master.svc.cluster.local. udp 48 false 512" NXDOMAIN qr,aa,rd,ra 141 0.000247078s
10.44.0.28:54835 - [14/Dec/2019:12:29:12 +0000] 59604 "AAAA IN spark-master. udp 30 false 512" NXDOMAIN qr,rd,ra 106 0.020408006s
10.44.0.28:38604 - [14/Dec/2019:12:29:15 +0000] 53244 "A IN spark-master.default.svc.cluster.local. udp 56 false 512" NOERROR qr,aa,rd,ra 110 0.000209231s
10.44.0.28:38604 - [14/Dec/2019:12:29:15 +0000] 23079 "AAAA IN spark-master.default.svc.cluster.local. udp 56 false 512" NOERROR qr,rd,ra 149 0.000191722s
10.44.0.28:57478 - [14/Dec/2019:12:32:15 +0000] 15451 "AAAA IN spark-master.default.svc.cluster.local. udp 56 false 512" NOERROR qr,aa,rd,ra 149 0.000383919s
10.44.0.28:57478 - [14/Dec/2019:12:32:15 +0000] 45086 "A IN spark-master.default.svc.cluster.local. udp 56 false 512" NOERROR qr,aa,rd,ra 110 0.001197812s
10.40.0.34:54678 - [14/Dec/2019:12:52:31 +0000] 6509 "A IN kubernetes.default.svc.monitoring.svc.cluster.local. udp 69 false 512" NXDOMAIN qr,aa,rd,ra 162 0.000522769s
10.40.0.34:60234 - [14/Dec/2019:12:52:31 +0000] 15538 "AAAA IN kubernetes.default.svc.monitoring.svc.cluster.local. udp 69 false 512" NXDOMAIN qr,aa,rd,ra 162 0.000851171s
10.40.0.34:43989 - [14/Dec/2019:12:52:31 +0000] 2712 "AAAA IN kubernetes.default.svc.svc.cluster.local. udp 58 false 512" NXDOMAIN qr,aa,rd,ra 151 0.000306038s
10.40.0.34:59265 - [14/Dec/2019:12:52:31 +0000] 23765 "A IN kubernetes.default.svc.cluster.local. udp 54 false 512" NOERROR qr,aa,rd,ra 106 0.000274748s
10.40.0.34:45622 - [14/Dec/2019:13:26:31 +0000] 38766 "AAAA IN kubernetes.default.svc.monitoring.svc.cluster.local. udp 69 false 512" NXDOMAIN qr,aa,rd,ra 162 0.000436681s
10.40.0.34:42759 - [14/Dec/2019:13:26:31 +0000] 56753 "A IN kubernetes.default.svc.monitoring.svc.cluster.local. udp 69 false 512" NXDOMAIN qr,aa,rd,ra 162 0.000706638s
10.40.0.34:39563 - [14/Dec/2019:13:26:31 +0000] 37876 "AAAA IN kubernetes.default.svc.svc.cluster.local. udp 58 false 512" NXDOMAIN qr,aa,rd,ra 151 0.000445999s
10.40.0.34:57224 - [14/Dec/2019:13:26:31 +0000] 33157 "A IN kubernetes.default.svc.svc.cluster.local. udp 58 false 512" NXDOMAIN qr,aa,rd,ra 151 0.000536896s
$ kubectl get svc -n kube-system
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
kube-dns ClusterIP 10.96.0.10 <none> 53/UDP,53/TCP 298d
kubernetes-dashboard ClusterIP 10.96.204.236 <none> 443/TCP 298d
tiller-deploy ClusterIP 10.110.41.66 <none> 44134/TCP 123d
$ kubectl get ep kube-dns --namespace=kube-system
NAME ENDPOINTS AGE
kube-dns 10.32.0.98:53,10.44.0.21:53,10.32.0.98:53 + 1 more... 298d
When busybox is in the worker1:
$ kubectl exec -ti busybox -- nslookup kubernetes.default
Server: 10.96.0.10
Address 1: 10.96.0.10
nslookup: can't resolve 'kubernetes.default'
command terminated with exit code 1
But when busybox is in the worker3:
$ kubectl exec -ti busybox -- nslookup kubernetes.default
Server: 10.96.0.10
Address 1: 10.96.0.10
Name: kubernetes.default
Address 1: 10.96.0.1 kubernetes.default.svc.cluster.local
All Nodes are : Ubuntu 16.04
The content of /etc/resolve.conf for all pods are same.
The only difference which I can find is in the kube-proxy logs:
The working node kube-proxy logs:
$ kubectl logs kube-proxy-vg8jd -n kube-system
W1214 06:12:19.201889 1 server_others.go:295] Flag proxy-mode="" unknown, assuming iptables proxy
I1214 06:12:19.321747 1 server_others.go:148] Using iptables Proxier.
W1214 06:12:19.332725 1 proxier.go:317] clusterCIDR not specified, unable to distinguish between internal and external traffic
I1214 06:12:19.332949 1 server_others.go:178] Tearing down inactive rules.
I1214 06:12:20.557875 1 server.go:447] Version: v1.12.1
I1214 06:12:20.601081 1 conntrack.go:98] Set sysctl 'net/netfilter/nf_conntrack_max' to 131072
I1214 06:12:20.601393 1 conntrack.go:52] Setting nf_conntrack_max to 131072
I1214 06:12:20.601958 1 conntrack.go:83] Setting conntrack hashsize to 32768
I1214 06:12:20.602234 1 conntrack.go:98] Set sysctl 'net/netfilter/nf_conntrack_tcp_timeout_established' to 86400
I1214 06:12:20.602300 1 conntrack.go:98] Set sysctl 'net/netfilter/nf_conntrack_tcp_timeout_close_wait' to 3600
I1214 06:12:20.602544 1 config.go:202] Starting service config controller
I1214 06:12:20.602561 1 controller_utils.go:1027] Waiting for caches to sync for service config controller
I1214 06:12:20.602585 1 config.go:102] Starting endpoints config controller
I1214 06:12:20.602619 1 controller_utils.go:1027] Waiting for caches to sync for endpoints config controller
I1214 06:12:20.702774 1 controller_utils.go:1034] Caches are synced for service config controller
I1214 06:12:20.702827 1 controller_utils.go:1034] Caches are synced for endpoints config controller
The not working node kube-proxy logs:
$ kubectl logs kube-proxy-fgzpf -n kube-system
W1215 12:47:12.660749 1 server_others.go:295] Flag proxy-mode="" unknown, assuming iptables proxy
I1215 12:47:12.679348 1 server_others.go:148] Using iptables Proxier.
W1215 12:47:12.679538 1 proxier.go:317] clusterCIDR not specified, unable to distinguish between internal and external traffic
I1215 12:47:12.679665 1 server_others.go:178] Tearing down inactive rules.
E1215 12:47:12.760702 1 proxier.go:529] Error removing iptables rules in ipvs proxier: error deleting chain "KUBE-MARK-MASQ": exit status 1: iptables: Too many links.
I1215 12:47:12.799926 1 server.go:447] Version: v1.12.1
I1215 12:47:12.832047 1 conntrack.go:98] Set sysctl 'net/netfilter/nf_conntrack_max' to 131072
I1215 12:47:12.833067 1 conntrack.go:52] Setting nf_conntrack_max to 131072
I1215 12:47:12.833266 1 conntrack.go:98] Set sysctl 'net/netfilter/nf_conntrack_tcp_timeout_established' to 86400
I1215 12:47:12.833498 1 conntrack.go:98] Set sysctl 'net/netfilter/nf_conntrack_tcp_timeout_close_wait' to 3600
I1215 12:47:12.833934 1 config.go:202] Starting service config controller
I1215 12:47:12.834061 1 controller_utils.go:1027] Waiting for caches to sync for service config controller
I1215 12:47:12.834253 1 config.go:102] Starting endpoints config controller
I1215 12:47:12.834338 1 controller_utils.go:1027] Waiting for caches to sync for endpoints config controller
I1215 12:47:12.934408 1 controller_utils.go:1034] Caches are synced for service config controller
I1215 12:47:12.934564 1 controller_utils.go:1034] Caches are synced for endpoints config controller
Line five doesn't appears in first one. I don't know that is related to the issue or not.
Any suggestions are welcomed.
The double svc.svc in kubernetes.default.svc.svc.cluster.local looks stange. Check if that is the same in the coredns-576cbf47c7-6dtrc pod.
Shutdown the coredns-576cbf47c7-6dtrc pod to guarantee that the single remaining DNS instance will be answering the DNS queries from all worker nodes.
According to the docs, problems like this "... indicate a problem with the coredns/kube-dns add-on or associated Services". Restarting coredns may solve the issue.
I'd add to the list of things to look into to check and compare /etc/resolv.conf on the nodes.
Looks like this commands can help:
sudo update-alternatives --set iptables /usr/sbin/iptables-legacy
sudo update-alternatives --set ip6tables /usr/sbin/ip6tables-legacy
sudo update-alternatives --set arptables /usr/sbin/arptables-legacy
sudo update-alternatives --set ebtables /usr/sbin/ebtables-legacy
Also if there is a failing flannel-pod, we can check logs for it container.
So
sudo ip link delete flannel.1 on the failing node allows the pod to recreate successfully after deleting the failing pod.

Kubernetes v1.12 dashboard is running but timeout occurred while accessing it via api server proxy

Link: Kubernetes v1.12 dashboard is running but timeout occurred while accessing it via api server proxy
Started: 2018-12-01
Title: Kubernetes v1.12 dashboard is running but timeout occurred while accessing it via api server proxy
Body:
I have windows 10 home (1803 update) host machine, Virtual Box 5.22, 2 guest ubuntu 18.04.1 servers.
Each guest has 2 networks: NAT (host IP 10.0.2.15) and shared host-only with gateway IP 192.168.151.1.
I set IPs:
for k8s master(ubuk8sma) - 192.168.151.21
for worker1 (ubuk8swrk1) - 192.168.151.22
I remained docker as is, version is 18.09.0.
I installed k8s version stable-1.12 on master and worker. For master init is:
K8S_POD_CIDR='10.244.0.0/16'
K8S_IP_ADDR='192.168.151.21'
K8S_VER='stable-1.12' # or latest
sudo kubeadm init --pod-network-cidr=${K8S_POD_CIDR} --apiserver-advertise-address=${K8S_IP_ADDR} --kubernetes-version ${K8S_VER} --ignore-preflight-errors=all
Why I set "ignore errors" flag:
[ERROR SystemVerification]: unsupported docker version: 18.09.0
I was reluctant to reinstall k8s fully compatible docker version (may be not very smart move, just I'm usually eager to try the latest stuff).
For CNI I installed flannel network:
kubectl apply -f https://raw.githubusercontent.com/coreos/flannel/master/Documentation/kube-flannel.yml
kubectl apply -f https://raw.githubusercontent.com/coreos/flannel/master/Documentation/k8s-manifests/kube-flannel-rbac.yml
After installing worker1 nodes state looks like:
u1#ubuk8sma:~$ kubectl get nodes
NAME STATUS ROLES AGE VERSION
ubuk8sma Ready master 6d v1.12.2
ubuk8swrk1 Ready <none> 4d1h v1.12.2
No big issues shown up. Next I wanted is to have visualization of this pretty k8s bundle ecosystem, so I headed towards installing k8s dashboard.
I followed "defaults" path, with zero intervention, if possible. I used this yaml:
kubectl create -f https://raw.githubusercontent.com/kubernetes/dashboard/master/src/deploy/recommended/kubernetes-dashboard.yaml
From basic level it looks like installed, deployed to worker Pod, running. From pod list info:
u1#ubuk8sma:~$ kubectl get all --namespace=kube-system
NAME READY STATUS RESTARTS AGE
pod/coredns-576cbf47c7-4tzm9 1/1 Running 5 6d
pod/coredns-576cbf47c7-tqtpw 1/1 Running 5 6d
pod/etcd-ubuk8sma 1/1 Running 7 6d
pod/kube-apiserver-ubuk8sma 1/1 Running 7 6d
pod/kube-controller-manager-ubuk8sma 1/1 Running 11 6d
pod/kube-flannel-ds-amd64-rt442 1/1 Running 3 4d1h
pod/kube-flannel-ds-amd64-zx78x 1/1 Running 5 6d
pod/kube-proxy-6b6mc 1/1 Running 6 6d
pod/kube-proxy-zcchn 1/1 Running 3 4d1h
pod/kube-scheduler-ubuk8sma 1/1 Running 10 6d
pod/kubernetes-dashboard-77fd78f978-crl7b 1/1 Running 1 2d1h
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
service/kube-dns ClusterIP 10.96.0.10 <none> 53/UDP,53/TCP 6d
service/kubernetes-dashboard ClusterIP 10.103.36.134 <none> 443/TCP 2d1h
NAME DESIRED CURRENT READY UP-TO-DATE AVAILABLE NODE SELECTOR AGE
daemonset.apps/kube-flannel-ds-amd64 2 2 2 2 2 beta.kubernetes.io/arch=amd64 6d
...
daemonset.apps/kube-proxy 2 2 2 2 2 <none> 6d
NAME DESIRED CURRENT UP-TO-DATE AVAILABLE AGE
deployment.apps/coredns 2 2 2 2 6d
deployment.apps/kubernetes-dashboard 1 1 1 1 2d1h
NAME DESIRED CURRENT READY AGE
replicaset.apps/coredns-576cbf47c7 2 2 2 6d
replicaset.apps/kubernetes-dashboard-77fd78f978 1 1 1 2d1h
I started proxy for both API server and dashboard service validation:
kubectl proxy
Version check for API server:
u1#ubuk8sma:~$ curl http://localhost:8001/version
{
"major": "1",
"minor": "12",
"gitVersion": "v1.12.2",
"gitCommit": "17c77c7898218073f14c8d573582e8d2313dc740",
"gitTreeState": "clean",
"buildDate": "2018-10-24T06:43:59Z",
"goVersion": "go1.10.4",
"compiler": "gc",
"platform": "linux/amd64"
}
And here is problem I'm writing this question about:
u1#ubuk8sma:~$ curl "http://localhost:8001/api/v1/namespaces/kube-system/services/https:kubernetes-dashboard:/proxy/"
Error: 'dial tcp 10.244.1.8:8443: i/o timeout'
Trying to reach: 'https://10.244.1.8:8443/'
Fragment of Pod info:
status:
conditions:
- lastProbeTime: null
lastTransitionTime: 2018-11-29T07:16:04Z
status: "True"
type: Initialized
- lastProbeTime: null
lastTransitionTime: 2018-11-30T06:53:24Z
status: "True"
type: Ready
- lastProbeTime: null
lastTransitionTime: 2018-11-30T06:53:24Z
status: "True"
type: ContainersReady
- lastProbeTime: null
lastTransitionTime: 2018-11-29T07:16:04Z
status: "True"
type: PodScheduled
containerStatuses:
- containerID: docker://fb0937959c7680046130e670c483877e4c0f1854870cb0b20ed4fe066d72df18
image: k8s.gcr.io/kubernetes-dashboard-amd64:v1.10.0
imageID: docker-pullable://k8s.gcr.io/kubernetes-dashboard-amd64#sha256:1d2e1229a918f4bc38b5a3f9f5f11302b3e71f8397b492afac7f273a0008776a
lastState:
terminated:
containerID: docker://f85e1cc50f59adbd8a13d42694aef7c5e726c07b3d852a26288c4bfc1124c718
exitCode: 2
finishedAt: 2018-11-30T06:53:21Z
reason: Error
startedAt: 2018-11-29T07:16:07Z
name: kubernetes-dashboard
ready: true
restartCount: 1
state:
running:
startedAt: 2018-11-30T06:53:23Z
hostIP: 10.0.2.15
phase: Running
podIP: 10.244.1.8
qosClass: BestEffort
startTime: 2018-11-29T07:16:04Z
Docker check on worker1 node:
CONTAINER ID IMAGE COMMAND CREATED STATUS NAMES
fb0937959c... sha256:0dab2435c100... "/dashboard --insecure-bind-address=0.0.0.0 --bind-address=0.0.0.0 --auto-generate-certificates" 27 hours ago Up 27 hours k8s_kubernetes-dashboard_kube...
Tried to check Pod logs, no luck:
DASHBOARD_POD_NAME='kubernetes-dashboard-77fd78f978-crl7b'
kubectl -n kube-system logs $DASHBOARD_POD_NAME
Error from server (NotFound): the server could not find the requested resource ( pods/log kubernetes-dashboard-77fd78f978-crl7b)
Tried to wget from API server:
API_SRV_POD_NAME='kube-apiserver-ubuk8sma'
kubectl -n 'kube-system' exec -ti $API_SRV_POD_NAME wget https://10.244.1.8:8443/
No response.
Tried to check dashboard service existence, no luck:
u1#ubuk8sma:~$ kubectl get svc $DASHBOARD_SVC_NAME
Error from server (NotFound): services "kubernetes-dashboard" not found
Checked IP route table on API server:
kubectl -n 'kube-system' exec -ti $API_SRV_POD_NAME ip route show
default via 10.0.2.2 dev enp0s3 src 10.0.2.15 metric 100
10.0.2.0/24 dev enp0s3 scope link src 10.0.2.15
10.0.2.2 dev enp0s3 scope link src 10.0.2.15 metric 100
10.244.0.0/24 dev cni0 scope link src 10.244.0.1
10.244.1.0/24 via 10.244.1.0 dev flannel.1 onlink
172.17.0.0/16 dev docker0 scope link src 172.17.0.1
192.168.151.0/24 dev enp0s8 scope link src 192.168.151.21
For reference, enp0s3 is NAT NIC adapter, enp0s8 - host-only one.
I see flannel route 10.244.1.x. Seems to be the issue is hardly about network misconfig (but I can be wrong).
So, dashboard Pod looks like running, but has some errors and I cannot diagnose which ones. Could you help to find root cause and ideally make dashboard service run without errors?
Thanks in advance, folks!
Update1:
I see events on master:
Type Reason Age From Message
---- ------ ---- ---- -------
Warning FailedCreatePodSandBox 11h kubelet, ubuk8swrk1 Failed create pod sandbox: rpc error: code = Unknown desc = failed to set up sandbox container "43191144d447d0e9da52c8b6600bd96a23fab1e96c79af8c8fedc4e4e50882c7" network for pod "kubernetes-dashboard-77fd78f978-crl7b": NetworkPlugin cni failed to set up pod "kubernetes-dashboard-77fd78f978-crl7b_kube-system" network: open /run/flannel/subnet.env: no such file or directory
Normal SandboxChanged 11h (x4 over 11h) kubelet, ubuk8swrk1 Pod sandbox changed, it will be killed and re-created.
Normal Pulled 11h kubelet, ubuk8swrk1 Container image "k8s.gcr.io/kubernetes-dashboard-amd64:v1.10.0" already present on machine
Normal Created 11h kubelet, ubuk8swrk1 Created container
Normal Started 11h kubelet, ubuk8swrk1 Started container
Error about subnet.env absence - a bit strange, as both master and minion have it (well, maybe created on the fly):
u1#ubuk8swrk1:~$ ls -la /run/flannel/subnet.env
-rw-r--r-- 1 root root 96 Dec 3 08:15 /run/flannel/subnet.env
This is dashboard service descriptor:
Name: kubernetes-dashboard
Namespace: kube-system
Labels: k8s-app=kubernetes-dashboard
Annotations: deployment.kubernetes.io/revision: 1
Selector: k8s-app=kubernetes-dashboard
Replicas: 1 desired | 1 updated | 1 total | 1 available | 0 unavailable
StrategyType: RollingUpdate
MinReadySeconds: 0
RollingUpdateStrategy: 25% max unavailable, 25% max surge
Pod Template:
Labels: k8s-app=kubernetes-dashboard
Service Account: kubernetes-dashboard
Containers:
kubernetes-dashboard:
Image: k8s.gcr.io/kubernetes-dashboard-amd64:v1.10.0
Port: 8443/TCP
Host Port: 0/TCP
Args:
--auto-generate-certificates
Liveness: http-get https://:8443/ delay=30s timeout=30s period=10s #success=1 #failure=3
Environment: <none>
Mounts:
/certs from kubernetes-dashboard-certs (rw)
/tmp from tmp-volume (rw)
Volumes:
kubernetes-dashboard-certs:
Type: Secret (a volume populated by a Secret)
SecretName: kubernetes-dashboard-certs
Optional: false
tmp-volume:
Type: EmptyDir (a temporary directory that shares a pod's lifetime)
Medium:
Conditions:
Type Status Reason
---- ------ ------
Progressing True NewReplicaSetAvailable
Available True MinimumReplicasAvailable
OldReplicaSets: <none>
NewReplicaSet: kubernetes-dashboard-77fd78f978 (1/1 replicas created)
Events: <none>
This is reduced description of pods(original yaml is 35K, too much to share):
Name: coredns-576cbf47c7-4tzm9
Namespace: kube-system
Node: ubuk8sma/10.0.2.15
Labels: k8s-app=kube-dns
pod-template-hash=576cbf47c7
Status: Running
IP: 10.244.0.14
Controlled By: ReplicaSet/coredns-576cbf47c7
Containers:
coredns:
Container ID: docker://0efcd043407d93fb9d052045828489f6b99bb59b4f0882ec89e1897071609b77
Image: k8s.gcr.io/coredns:1.2.2
Ports: 53/UDP, 53/TCP, 9153/TCP
Host Ports: 0/UDP, 0/TCP, 0/TCP
State: Running
Last State: Terminated
Reason: Error
Exit Code: 255
Ready: True
Restart Count: 6
Liveness: http-get http://:8080/health delay=60s timeout=5s period=10s #success=1 #failure=5
Conditions:
Type Status
Initialized True
Ready True
ContainersReady True
PodScheduled True
Events: <none>
Name: etcd-ubuk8sma
Namespace: kube-system
Priority: 2000000000
PriorityClassName: system-cluster-critical
Node: ubuk8sma/10.0.2.15
Labels: component=etcd
tier=control-plane
Status: Running
IP: 10.0.2.15
Containers:
etcd:
Container ID: docker://ba2bdcf5fa558beabdd8578628d71480d595d5ee3bb5c4edf42407419010144b
Image: k8s.gcr.io/etcd:3.2.24
Image ID: docker-pullable://k8s.gcr.io/etcd#sha256:905d7ca17fd02bc24c0eba9a062753aba15db3e31422390bc3238eb762339b20
Port: <none>
Host Port: <none>
Command:
etcd
--advertise-client-urls=https://127.0.0.1:2379
--cert-file=/etc/kubernetes/pki/etcd/server.crt
--client-cert-auth=true
--data-dir=/var/lib/etcd
--initial-advertise-peer-urls=https://127.0.0.1:2380
--initial-cluster=ubuk8sma=https://127.0.0.1:2380
--key-file=/etc/kubernetes/pki/etcd/server.key
--listen-client-urls=https://127.0.0.1:2379
--listen-peer-urls=https://127.0.0.1:2380
--name=ubuk8sma
--peer-cert-file=/etc/kubernetes/pki/etcd/peer.crt
--peer-client-cert-auth=true
--peer-key-file=/etc/kubernetes/pki/etcd/peer.key
--peer-trusted-ca-file=/etc/kubernetes/pki/etcd/ca.crt
--snapshot-count=10000
--trusted-ca-file=/etc/kubernetes/pki/etcd/ca.crt
State: Running
Started: Mon, 03 Dec 2018 08:12:56 +0000
Last State: Terminated
Reason: Error
Exit Code: 255
Started: Wed, 28 Nov 2018 09:31:46 +0000
Finished: Mon, 03 Dec 2018 08:12:35 +0000
Ready: True
Restart Count: 8
Liveness: exec [/bin/sh -ec ETCDCTL_API=3 etcdctl --endpoints=https://[127.0.0.1]:2379 --cacert=/etc/kubernetes/pki/etcd/ca.crt --cert=/etc/kubernetes/pki/etcd/healthcheck-client.crt --key=/etc/kubernetes/pki/etcd/healthcheck-client.key get foo] delay=15s timeout=15s period=10s #success=1 #failure=8
Environment: <none>
Mounts:
/etc/kubernetes/pki/etcd from etcd-certs (rw)
/var/lib/etcd from etcd-data (rw)
Conditions:
Type Status
Initialized True
Ready True
ContainersReady True
PodScheduled True
Events: <none>
Name: kube-apiserver-ubuk8sma
Namespace: kube-system
Node: ubuk8sma/10.0.2.15
Status: Running
IP: 10.0.2.15
Containers:
kube-apiserver:
Container ID: docker://099b2a30772b969c3919b57fd377980673f03a820afba6034daa70f011271a52
Image: k8s.gcr.io/kube-apiserver:v1.12.2
Image ID: docker-pullable://k8s.gcr.io/kube-apiserver#sha256:094929baf3a7681945d83a7654b3248e586b20506e28526121f50eb359cee44f
Port: <none>
Host Port: <none>
Command:
kube-apiserver
--authorization-mode=Node,RBAC
--advertise-address=192.168.151.21
--allow-privileged=true
--client-ca-file=/etc/kubernetes/pki/ca.crt
--enable-admission-plugins=NodeRestriction
--enable-bootstrap-token-auth=true
--etcd-cafile=/etc/kubernetes/pki/etcd/ca.crt
--etcd-certfile=/etc/kubernetes/pki/apiserver-etcd-client.crt
--etcd-keyfile=/etc/kubernetes/pki/apiserver-etcd-client.key
--etcd-servers=https://127.0.0.1:2379
--insecure-port=0
--kubelet-client-certificate=/etc/kubernetes/pki/apiserver-kubelet-client.crt
--kubelet-client-key=/etc/kubernetes/pki/apiserver-kubelet-client.key
--kubelet-preferred-address-types=InternalIP,ExternalIP,Hostname
--proxy-client-cert-file=/etc/kubernetes/pki/front-proxy-client.crt
--proxy-client-key-file=/etc/kubernetes/pki/front-proxy-client.key
--requestheader-allowed-names=front-proxy-client
--requestheader-client-ca-file=/etc/kubernetes/pki/front-proxy-ca.crt
--requestheader-extra-headers-prefix=X-Remote-Extra-
--requestheader-group-headers=X-Remote-Group
--requestheader-username-headers=X-Remote-User
--secure-port=6443
--service-account-key-file=/etc/kubernetes/pki/sa.pub
--service-cluster-ip-range=10.96.0.0/12
--tls-cert-file=/etc/kubernetes/pki/apiserver.crt
--tls-private-key-file=/etc/kubernetes/pki/apiserver.key
State: Running
Started: Mon, 03 Dec 2018 08:13:00 +0000
Last State: Terminated
Reason: Error
Exit Code: 255
Ready: True
Restart Count: 8
Liveness: http-get https://192.168.151.21:6443/healthz delay=15s timeout=15s period=10s #success=1 #failure=8
Environment: <none>
Events: <none>
Name: kube-flannel-ds-amd64-rt442
Namespace: kube-system
Node: ubuk8swrk1/10.0.2.15
Status: Running
IP: 10.0.2.15
Controlled By: DaemonSet/kube-flannel-ds-amd64
Init Containers:
install-cni:
Container ID: docker://a6377b0fe1b040235c24e9ca19455c56e77daecf688b212cfea5553b6e59ff68
Image: quay.io/coreos/flannel:v0.10.0-amd64
Port: <none>
Host Port: <none>
State: Terminated
Reason: Completed
Exit Code: 0
Ready: True
Restart Count: 4
Containers:
kube-flannel:
Container ID: docker://f7029bc2144c1ab8654407d742c1079df0059d418b7ba86b886091b5ad8c34a3
Image: quay.io/coreos/flannel:v0.10.0-amd64
Port: <none>
Host Port: <none>
Command:
/opt/bin/flanneld
Args:
--ip-masq
--kube-subnet-mgr
State: Running
Last State: Terminated
Reason: Error
Exit Code: 255
Ready: True
Restart Count: 4
Events: <none>
Name: kube-proxy-6b6mc
Namespace: kube-system
Node: ubuk8sma/10.0.2.15
Status: Running
IP: 10.0.2.15
The biggest suspect is node IP. I see 10.0.2.15 (NAT IP) everywhere. But host-only NIC should be used. I had long story of setting up network properly for my ubuntu VMs.
I edited /etc/netplan/01-netcfg.yaml before k8s setup (thanks https://askubuntu.com/questions/984445/netplan-configuration-on-ubuntu-17-04-virtual-machine?rq=1 for help). Example for master config:
network:
version: 2
renderer: networkd
ethernets:
enp0s3:
dhcp4: yes
dhcp6: yes
routes:
- to: 0.0.0.0/0
via: 10.0.2.2
metric: 0
enp0s8:
dhcp4: no
dhcp6: no
addresses: [192.168.151.21/24]
routes:
- to: 192.168.151.1/24
via: 192.168.151.1
metric: 100
Only after this and a few more changes NAT and host-only networks start work together. NAT remains default net adapter. Likely that's why its IP is everywhere. For api server I set --advertise-address=192.168.151.21 explicitly. That reduced using NAT IP at least for it.
So, maybe root cause is different, but current question, how to reconfigure networks to replace NAT IP to host-only. I already tried this for /etc/systemd/system/kubelet.service.d/10-kubeadm.conf
Environment="KUBELET_EXTRA_ARGS=--node-ip=192.168.151.21"
Restarted kubelet:
sudo systemctl daemon-reload
sudo systemctl restart kubelet
Didn't help. Restarted VMs. Didn't help (I only expected kubelet related changes, but nothing changed). Explored a few configs (5+) for potential changes, no luck.
Update2:
I mentioned NAT address config issue above. I resolved it with editing /etc/default/kubelet config. I found that idea in comments for this article:
https://medium.com/#joatmon08/playing-with-kubeadm-in-vagrant-machines-part-2-bac431095706
Dashboard config part now has proper IP:
hostIP: 192.168.151.22
phase: Running
podIP: 10.244.1.13
Then I went to docker container for API and tried to reach podIP via wget,ping,traceroute. Timeouts everywhere. Routes:
/ # route -n
Kernel IP routing table
Destination Gateway Genmask Flags Metric Ref Use Iface
0.0.0.0 10.0.2.2 0.0.0.0 UG 100 0 0 enp0s3
10.0.2.0 0.0.0.0 255.255.255.0 U 0 0 0 enp0s3
10.0.2.2 0.0.0.0 255.255.255.255 UH 100 0 0 enp0s3
10.244.0.0 0.0.0.0 255.255.255.0 U 0 0 0 cni0
10.244.1.0 10.244.1.0 255.255.255.0 UG 0 0 0 flannel.1
172.17.0.0 0.0.0.0 255.255.0.0 U 0 0 0 docker0
192.168.151.0 0.0.0.0 255.255.255.0 U 0 0 0 enp0s8
Attempt to perform curl call from master VM:
u1#ubuk8sma:~$ curl -v -i -kSs "https://192.168.151.21:6443/api/v1/namespaces/kube-system/services/https:kubernetes-dashboard:/proxy/" -H "$K8S_AUTH_HEADER"
...
* SSL certificate verify result: unable to get local issuer certificate (20), continuing anyway.
* Using HTTP2, server supports multi-use
* Connection state changed (HTTP/2 confirmed)
* Copying HTTP/2 data in stream buffer to connection buffer after upgrade: len=0
* Using Stream ID: 1 (easy handle 0x565072b5a750)
> GET /api/v1/namespaces/kube-system/services/https:kubernetes-dashboard:/proxy/ HTTP/2
> Host: 192.168.151.21:6443
> User-Agent: curl/7.58.0
> Accept: */*
> Authorization: Bearer eyJhbGciOiJSUzI1.....
>
* Connection state changed (MAX_CONCURRENT_STREAMS updated)!
< HTTP/2 503
HTTP/2 503
< content-type: text/plain; charset=utf-8
content-type: text/plain; charset=utf-8
< content-length: 92
content-length: 92
< date: Tue, 04 Dec 2018 08:44:25 GMT
date: Tue, 04 Dec 2018 08:44:25 GMT
<
Error: 'dial tcp 10.244.1.13:8443: i/o timeout'
* Connection #0 to host 192.168.151.21 left intact
Trying to reach: 'https://10.244.1.13:8443/'
Service info for dashboard:
u1#ubuk8sma:~$ kubectl -n kube-system get service kubernetes-dashboard
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
kubernetes-dashboard ClusterIP 10.103.36.134 <none> 443/TCP 5d
A bit more details:
u1#ubuk8sma:~$ kubectl -n kube-system describe services kubernetes-dashboard
Name: kubernetes-dashboard
Namespace: kube-system
Labels: k8s-app=kubernetes-dashboard
Annotations: <none>
Selector: k8s-app=kubernetes-dashboard
Type: ClusterIP
IP: 10.103.36.134
Port: <unset> 443/TCP
TargetPort: 8443/TCP
Endpoints: 10.244.1.13:8443
Session Affinity: None
Events: <none>
Also I tried to go to shell, both via kubectl and docker. For any usual linux command I see this 'OCI runtime exec failed' issue:
u1#ubuk8sma:~$ DASHBOARD_POD_NAME='kubernetes-dashboard-77fd78f978-crl7b'
u1#ubuk8sma:~$ kubectl -v=9 -n kube-system exec "$DASHBOARD_POD_NAME" -- env
I1204 09:57:17.673345 23517 loader.go:359] Config loaded from file /home/u1/.kube/config
I1204 09:57:17.679526 23517 round_trippers.go:386] curl -k -v -XGET -H "Accept: application/json, */*" -H "User-Agent: kubectl/v1.12.2 (linux/amd64) kubernetes/17c77c7" 'https://192.168.151.21:6443/api/v1/namespaces/kube-system/pods/kubernetes-dashboard-77fd78f978-crl7b'
I1204 09:57:17.703924 23517 round_trippers.go:405] GET https://192.168.151.21:6443/api/v1/namespaces/kube-system/pods/kubernetes-dashboard-77fd78f978-crl7b 200 OK in 23 milliseconds
I1204 09:57:17.703957 23517 round_trippers.go:411] Response Headers:
I1204 09:57:17.703971 23517 round_trippers.go:414] Content-Length: 3435
I1204 09:57:17.703984 23517 round_trippers.go:414] Date: Tue, 04 Dec 2018 09:57:17 GMT
I1204 09:57:17.703997 23517 round_trippers.go:414] Content-Type: application/json
I1204 09:57:17.704866 23517 request.go:942] Response Body: {"kind":"Pod","apiVersion":"v1","metadata":{"name":"kubernetes-dashboard-77fd78f978-crl7b","generateName":"kubernetes-dashboard-77fd78f978-","namespace":"kube-system","selfLink":"/api/v1/namespaces/kube-system/pods/kubernetes-dashboard-77fd78f978-crl7b","uid":"a1d005b8-f3a6-11e8-a2d0-08002783a80f"...
I1204 09:57:17.740811 23517 round_trippers.go:386] curl -k -v -XPOST -H "X-Stream-Protocol-Version: v4.channel.k8s.io" -H "X-Stream-Protocol-Version: v3.channel.k8s.io" -H "X-Stream-Protocol-Version: v2.channel.k8s.io" -H "X-Stream-Protocol-Version: channel.k8s.io" -H "User-Agent: kubectl/v1.12.2 (linux/amd64) kubernetes/17c77c7" 'https://192.168.151.21:6443/api/v1/namespaces/kube-system/pods/kubernetes-dashboard-77fd78f978-crl7b/exec?command=env&container=kubernetes-dashboard&container=kubernetes-dashboard&stderr=true&stdout=true'
I1204 09:57:17.805528 23517 round_trippers.go:405] POST https://192.168.151.21:6443/api/v1/namespaces/kube-system/pods/kubernetes-dashboard-77fd78f978-crl7b/exec?command=env&container=kubernetes-dashboard&container=kubernetes-dashboard&stderr=true&stdout=true 101 Switching Protocols in 64 milliseconds
I1204 09:57:17.805565 23517 round_trippers.go:411] Response Headers:
I1204 09:57:17.805581 23517 round_trippers.go:414] Connection: Upgrade
I1204 09:57:17.805594 23517 round_trippers.go:414] Upgrade: SPDY/3.1
I1204 09:57:17.805607 23517 round_trippers.go:414] X-Stream-Protocol-Version: v4.channel.k8s.io
I1204 09:57:17.805620 23517 round_trippers.go:414] Date: Tue, 04 Dec 2018 09:57:17 GMT
OCI runtime exec failed: exec failed: container_linux.go:348: starting container process caused "exec: \"env\": executable file not found in $PATH": unknown
F1204 09:57:18.088488 23517 helpers.go:119] command terminated with exit code 126
So, I cannot reach pod, cannot go to shell there. But at least I see some logs:
u1#ubuk8sma:~$ kubectl -n kube-system logs -p $DASHBOARD_POD_NAME
2018/12/03 08:15:16 Starting overwatch
2018/12/03 08:15:16 Using in-cluster config to connect to apiserver
2018/12/03 08:15:16 Using service account token for csrf signing
2018/12/03 08:15:16 No request provided. Skipping authorization
2018/12/03 08:15:16 Successful initial request to the apiserver, version: v1.12.2
2018/12/03 08:15:16 Generating JWE encryption key
2018/12/03 08:15:16 New synchronizer has been registered: kubernetes-dashboard-key-holder-kube-system. Starting
2018/12/03 08:15:16 Starting secret synchronizer for kubernetes-dashboard-key-holder in namespace kube-system
2018/12/03 08:15:18 Initializing JWE encryption key from synchronized object
2018/12/03 08:15:18 Creating in-cluster Heapster client
2018/12/03 08:15:19 Metric client health check failed: the server could not find the requested resource (get services heapster). Retrying in 30 seconds.
2018/12/03 08:15:19 Auto-generating certificates
2018/12/03 08:15:19 Successfully created certificates
2018/12/03 08:15:19 Serving securely on HTTPS port: 8443
2018/12/03 08:15:49 Metric client health check failed: the server could not find the requested resource (get services heapster). Retrying in 30 seconds.
No ideas, where to go further for now to fix this timeout.