I installed minikube on my mac. I see that I have CoreDNS service that is in use with minikube. I confirmed that by checking the coredns pod logs where my dns queries are ending. Why does the cluster info say that I am using Kube-dns when it is actually CoreDNS. Is it something that I could ignore as it looks plainly naming issue!!
my cluster info is as follows:
$ kubectl cluster-info
Kubernetes master is running at https://192.168.64.2:8443
KubeDNS is running at https://192.168.64.2:8443/api/v1/namespaces/kube-system/services/kube-dns:dns/proxy
CoreDNS pod logs
2020-02-14T14:33:49.106Z [INFO] 172.17.0.4:46239 - 734 "A IN hello-world-77b74d7cc8-6t5wt.default.svc.cluster.local. udp 72 false 512" NXDOMAIN qr,aa,rd 165 0.000177567s
2020-02-14T17:10:21.597Z [INFO] 172.17.0.4:52399 - 22998 "A IN hello-world-77b74d7cc8-6t5wt.cluster.local.default.svc.cluster.local. udp 86 false 512" NXDOMAIN qr,aa,rd 179 0.008847724s
2020-02-14T17:10:21.605Z [INFO] 172.17.0.4:59674 - 3370 "A IN hello-world-77b74d7cc8-6t5wt.cluster.local.cluster.local. udp 74 false 512" NXDOMAIN qr,aa,rd 167 0.000221285s
2020-02-14T17:10:21.606Z [INFO] 172.17.0.4:39439 - 62070 "A IN hello-world-77b74d7cc8-6t5wt.cluster.local. udp 60 false 512" NXDOMAIN qr,aa,rd 153 0.000156948s
2020-02-14T17:10:30.699Z [INFO] 172.17.0.4:42925 - 36746 "A IN hello-world-77b74d7cc8-6t5wt.svc.cluster.local.cluster.local. udp 78 false 512" NXDOMAIN qr,aa,rd 171 0.00038
Minikube version
$ minikube version
minikube version: v1.5.2
commit: 792dbf92a1de583fcee76f8791cff12e0c9440ad-dirty
The interface listing shows information below:
$ip addr show
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000
link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
inet 127.0.0.1/8 scope host lo
valid_lft forever preferred_lft forever
inet6 ::1/128 scope host
valid_lft forever preferred_lft forever
2: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP group default qlen 1000
link/ether 5a:2f:e8:1f:d2:c6 brd ff:ff:ff:ff:ff:ff
inet 192.168.64.2/24 brd 192.168.64.255 scope global dynamic eth0
valid_lft 55045sec preferred_lft 55045sec
inet6 fe80::582f:e8ff:fe1f:d2c6/64 scope link
valid_lft forever preferred_lft forever
When did a ssh to minikube I see that the process and the
$ sudo ps aux | grep dns
root 2541 12.8 4.3 1357844 86832 ? Ssl 10:11 83:44 /var/lib/minikube/binaries/v1.16.2/kubelet --authorization-mode=Webhook --bootstrap-kubeconfig=/etc/kubernetes/bootstrap-kubelet.conf --cgroup-driver=cgroupfs --client-ca-file=/var/lib/minikube/certs/ca.crt --cluster-dns=10.96.0.10 --cluster-domain=cluster.local --config=/var/lib/kubelet/config.yaml --container-runtime=docker --fail-swap-on=false --hostname-override=minikube --kubeconfig=/etc/kubernetes/kubelet.conf --node-ip=192.168.64.2 --pod-manifest-path=/etc/kubernetes/manifests
root 3565 0.4 1.1 146036 23192 ? Ssl 10:11 3:14 /coredns -conf /etc/coredns/Corefile
root 3572 0.4 1.1 146036 22768 ? Ssl 10:11 3:07 /coredns -conf /etc/coredns/Corefile
docker 12531 0.0 0.0 11408 556 pts/0 S+ 21:03 0:00 grep dns
When I look at the pods running
$ kubectl get pods -n kube-system
NAME READY STATUS RESTARTS AGE
coredns-5644d7b6d9-mkcgk 1/1 Running 1 70d
coredns-5644d7b6d9-q4jr9 1/1 Running 1 70d
etcd-minikube 1/1 Running 2 70d
kube-addon-manager-minikube 1/1 Running 1 70d
kube-apiserver-minikube 1/1 Running 2 70d
kube-controller-manager-minikube 1/1 Running 2 26h
kube-proxy-7qp8b 1/1 Running 1 70d
kube-scheduler-minikube 1/1 Running 5 70d
storage-provisioner 1/1 Running 3 70d
Update after I looked at the deployment manifest of CoreDNS following Shahed's answer
apiVersion: apps/v1
kind: Deployment
metadata:
annotations:
deployment.kubernetes.io/revision: "3"
creationTimestamp: "2019-12-06T19:47:59Z"
generation: 5
labels:
k8s-app: kube-dns
name: coredns
namespace: kube-system
resourceVersion: "10113"
selfLink: /apis/apps/v1/namespaces/kube-system/deployments/coredns
uid: ba1ef689-fe70-4e48-9e6f-5f659226722f
spec:
progressDeadlineSeconds: 600
replicas: 2
revisionHistoryLimit: 10
selector:
matchLabels:
k8s-app: kube-dns
strategy:
rollingUpdate:
maxSurge: 25%
maxUnavailable: 1
type: RollingUpdate
template:
metadata:
creationTimestamp: null
labels:
k8s-app: kube-dns
spec:
containers:
- args:
- -conf
- /etc/coredns/Corefile
image: k8s.gcr.io/coredns:1.6.2
imagePullPolicy: IfNotPresent
livenessProbe:
failureThreshold: 5
httpGet:
path: /health
port: 8080
scheme: HTTP
initialDelaySeconds: 60
periodSeconds: 10
successThreshold: 1
timeoutSeconds: 5
name: coredns
ports:
- containerPort: 53
CoreDNS is default from Kubernetes 1.11. For previous installations, it was kube-dns.
So, as per your terminal outputs, those are right. Your minikube is using CoreDNS pods for the services.
Is it something that I could ignore as it looks plainly naming issue!!
You are absolutely correct. It is a naming issue.
It has been done for the sake of compatibility so that the existing clients can still use the service name kube-dns to reach the service after moving to CoreDNS from kube-dns.
Related
I am trying to troubleshoot a DNS issue in our K8 cluster v1.19. There are 3 nodes (1 controller, 2 workers) all running vanilla Ubuntu 20.04 using Calico network with Metallb for inbound load balancing. This is all hosted on premise and has full access to the internet. There is also a proxy server (Traefik) in front of it that is handling the SSL to the K8 cluster and other services in the infrastructure.
This issue happened when I upgraded the helm chart for the pod that was/is connecting to the redis pod, but otherwise had been happy to run for the past 36 days.
In the log of one of the pods it is showing an error that it cannot determine where the redis pod(s) is/are:
2020-11-09 00:00:00 [1] [verbose]: [Cache] Attempting connection to redis.
2020-11-09 00:00:00 [1] [verbose]: [Cache] Successfully connected to redis.
2020-11-09 00:00:00 [1] [verbose]: [PubSub] Attempting connection to redis.
2020-11-09 00:00:00 [1] [verbose]: [PubSub] Successfully connected to redis.
2020-11-09 00:00:00 [1] [warn]: Secret key is weak. Please consider lengthening it for better security.
2020-11-09 00:00:00 [1] [verbose]: [Database] Connecting to database...
2020-11-09 00:00:00 [1] [info]: [Database] Successfully connected .
2020-11-09 00:00:00 [1] [verbose]: [Database] Ran 0 migration(s).
2020-11-09 00:00:00 [1] [verbose]: Sending request for public key.
Error: getaddrinfo EAI_AGAIN oct-2020-redis-master
at GetAddrInfoReqWrap.onlookup [as oncomplete] (dns.js:67:26) {
errno: -3001,
code: 'EAI_AGAIN',
syscall: 'getaddrinfo',
hostname: 'oct-2020-redis-master'
}
[ioredis] Unhandled error event: Error: getaddrinfo EAI_AGAIN oct-2020-redis-master
at GetAddrInfoReqWrap.onlookup [as oncomplete] (dns.js:67:26)
Error: connect ETIMEDOUT
at Socket.<anonymous> (/app/node_modules/ioredis/built/redis/index.js:307:37)
at Object.onceWrapper (events.js:421:28)
at Socket.emit (events.js:315:20)
at Socket.EventEmitter.emit (domain.js:486:12)
at Socket._onTimeout (net.js:483:8)
at listOnTimeout (internal/timers.js:554:17)
at processTimers (internal/timers.js:497:7) {
errorno: 'ETIMEDOUT',
code: 'ETIMEDOUT',
syscall: 'connect'
}
I have gone through the steps outlined in https://kubernetes.io/docs/tasks/administer-cluster/dns-debugging-resolution/
ubuntu#k8-01:~$ kubectl exec -i -t dnsutils -- nslookup kubernetes.default
;; connection timed out; no servers could be reached
command terminated with exit code 1
ubuntu#k8-01:~$ kubectl get pods --namespace=kube-system -l k8s-app=kube-dns
NAME READY STATUS RESTARTS AGE
coredns-f9fd979d6-lfm5t 1/1 Running 17 37d
coredns-f9fd979d6-sw2qp 1/1 Running 18 37d
ubuntu#k8-01:~$ kubectl logs --namespace=kube-system -l k8s-app=kube-dns
CoreDNS-1.7.0
linux/amd64, go1.14.4, f59c03d
[INFO] Reloading
[INFO] plugin/health: Going into lameduck mode for 5s
[INFO] plugin/reload: Running configuration MD5 = 3d3f6363f05ccd60e0f885f0eca6c5ff
[INFO] Reloading complete
[INFO] 10.244.210.238:34288 - 28733 "A IN oct-2020-redis-master.default.svc.cluster.local. udp 75 false 512" NOERROR qr,aa,rd 148 0.001300712s
[INFO] 10.244.210.238:44532 - 12032 "A IN oct-2020-redis-master.default.svc.cluster.local. udp 75 false 512" NOERROR qr,aa,rd 148 0.001279312s
[INFO] 10.244.210.235:44595 - 65094 "A IN oct-2020-redis-master.default.svc.cluster.local. udp 75 false 512" NOERROR qr,aa,rd 148 0.000163001s
[INFO] 10.244.210.235:55945 - 20758 "A IN oct-2020-redis-master.default.svc.cluster.local. udp 75 false 512" NOERROR qr,aa,rd 148 0.000141202s
ubuntu#k8-01:~$ kubectl get services --all-namespaces
NAMESPACE NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
default oct-2020-api ClusterIP 10.107.89.213 <none> 80/TCP 37d
default oct-2020-nginx-ingress-controller LoadBalancer 10.110.235.175 192.168.2.150 80:30194/TCP,443:31514/TCP 37d
default oct-2020-nginx-ingress-default-backend ClusterIP 10.98.147.246 <none> 80/TCP 37d
default oct-2020-redis-headless ClusterIP None <none> 6379/TCP 37d
default oct-2020-redis-master ClusterIP 10.109.58.236 <none> 6379/TCP 37d
default oct-2020-webclient ClusterIP 10.111.204.251 <none> 80/TCP 37d
default kubernetes ClusterIP 10.96.0.1 <none> 443/TCP 37d
kube-system coredns NodePort 10.101.104.114 <none> 53:31245/UDP 15h
kube-system kube-dns ClusterIP 10.96.0.10 <none> 53/UDP,53/TCP,9153/TCP 37d
When I enter the pod:
/app # grep "nameserver" /etc/resolv.conf
nameserver 10.96.0.10
/app # nslookup
BusyBox v1.31.1 () multi-call binary.
Usage: nslookup [-type=QUERY_TYPE] [-debug] HOST [DNS_SERVER]
Query DNS about HOST
QUERY_TYPE: soa,ns,a,aaaa,cname,mx,txt,ptr,any
/app # ping 10.96.0.10
PING 10.96.0.10 (10.96.0.10): 56 data bytes
^C
--- 10.96.0.10 ping statistics ---
3 packets transmitted, 0 packets received, 100% packet loss
/app # nslookup oct-20-redis-master
;; connection timed out; no servers could be reached
Any ideas on troubleshooting would be greatly appreciated.
To answer my own question, I deleted the DNS pods and then it worked again. The command was the following:
kubectl delete pod coredns-f9fd979d6-sw2qp --namespace=kube-system
This doesn't get to the underlying problem of why this is happening, or why K8 isn't detecting that something is wrong with those pods and recreating them. I am going to keep digging into this and put some more instrumenting on the DNS pods to see what it actually is that is causing this problem.
If anyone has any ideas on instrumenting to hook up or look at specifically, that would be appreciated.
This is how we test dns
Create below deployment
apiVersion: v1
kind: Service
metadata:
name: nginx
labels:
app: nginx
spec:
ports:
- port: 80
name: web
clusterIP: None
selector:
app: nginx
---
apiVersion: apps/v1
kind: StatefulSet
metadata:
name: web
labels:
app: nginx
spec:
serviceName: "nginx"
replicas: 2
selector:
matchLabels:
app: nginx
template:
metadata:
labels:
app: nginx
spec:
containers:
- name: nginx
image: k8s.gcr.io/nginx-slim:0.8
ports:
- containerPort: 80
name: web
volumeMounts:
- name: www
mountPath: /usr/share/nginx/html
volumes:
- name: www
emptyDir:
Run the below tests
master $ kubectl get po
NAME READY STATUS RESTARTS AGE
web-0 1/1 Running 0 1m
web-1 1/1 Running 0 1m
master $ kubectl get svc
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
kubernetes ClusterIP 10.96.0.1 <none> 443/TCP 35m
nginx ClusterIP None <none> 80/TCP 2m
master $ kubectl run -i --tty --image busybox:1.28 dns-test --restart=Never --rm
If you don't see a command prompt, try pressing enter.
/ # nslookup nginx
Server: 10.96.0.10
Address 1: 10.96.0.10 kube-dns.kube-system.svc.cluster.local
Name: nginx
Address 1: 10.40.0.1 web-0.nginx.default.svc.cluster.local
Address 2: 10.40.0.2 web-1.nginx.default.svc.cluster.local
/ #
/ # nslookup web-0.nginx
Server: 10.96.0.10
Address 1: 10.96.0.10 kube-dns.kube-system.svc.cluster.local
Name: web-0.nginx
Address 1: 10.40.0.1 web-0.nginx.default.svc.cluster.local
/ # nslookup web-0.nginx.default.svc.cluster.local
Server: 10.96.0.10
Address 1: 10.96.0.10 kube-dns.kube-system.svc.cluster.local
Name: web-0.nginx.default.svc.cluster.local
Address 1: 10.40.0.1 web-0.nginx.default.svc.cluster.local
Link: Kubernetes v1.12 dashboard is running but timeout occurred while accessing it via api server proxy
Started: 2018-12-01
Title: Kubernetes v1.12 dashboard is running but timeout occurred while accessing it via api server proxy
Body:
I have windows 10 home (1803 update) host machine, Virtual Box 5.22, 2 guest ubuntu 18.04.1 servers.
Each guest has 2 networks: NAT (host IP 10.0.2.15) and shared host-only with gateway IP 192.168.151.1.
I set IPs:
for k8s master(ubuk8sma) - 192.168.151.21
for worker1 (ubuk8swrk1) - 192.168.151.22
I remained docker as is, version is 18.09.0.
I installed k8s version stable-1.12 on master and worker. For master init is:
K8S_POD_CIDR='10.244.0.0/16'
K8S_IP_ADDR='192.168.151.21'
K8S_VER='stable-1.12' # or latest
sudo kubeadm init --pod-network-cidr=${K8S_POD_CIDR} --apiserver-advertise-address=${K8S_IP_ADDR} --kubernetes-version ${K8S_VER} --ignore-preflight-errors=all
Why I set "ignore errors" flag:
[ERROR SystemVerification]: unsupported docker version: 18.09.0
I was reluctant to reinstall k8s fully compatible docker version (may be not very smart move, just I'm usually eager to try the latest stuff).
For CNI I installed flannel network:
kubectl apply -f https://raw.githubusercontent.com/coreos/flannel/master/Documentation/kube-flannel.yml
kubectl apply -f https://raw.githubusercontent.com/coreos/flannel/master/Documentation/k8s-manifests/kube-flannel-rbac.yml
After installing worker1 nodes state looks like:
u1#ubuk8sma:~$ kubectl get nodes
NAME STATUS ROLES AGE VERSION
ubuk8sma Ready master 6d v1.12.2
ubuk8swrk1 Ready <none> 4d1h v1.12.2
No big issues shown up. Next I wanted is to have visualization of this pretty k8s bundle ecosystem, so I headed towards installing k8s dashboard.
I followed "defaults" path, with zero intervention, if possible. I used this yaml:
kubectl create -f https://raw.githubusercontent.com/kubernetes/dashboard/master/src/deploy/recommended/kubernetes-dashboard.yaml
From basic level it looks like installed, deployed to worker Pod, running. From pod list info:
u1#ubuk8sma:~$ kubectl get all --namespace=kube-system
NAME READY STATUS RESTARTS AGE
pod/coredns-576cbf47c7-4tzm9 1/1 Running 5 6d
pod/coredns-576cbf47c7-tqtpw 1/1 Running 5 6d
pod/etcd-ubuk8sma 1/1 Running 7 6d
pod/kube-apiserver-ubuk8sma 1/1 Running 7 6d
pod/kube-controller-manager-ubuk8sma 1/1 Running 11 6d
pod/kube-flannel-ds-amd64-rt442 1/1 Running 3 4d1h
pod/kube-flannel-ds-amd64-zx78x 1/1 Running 5 6d
pod/kube-proxy-6b6mc 1/1 Running 6 6d
pod/kube-proxy-zcchn 1/1 Running 3 4d1h
pod/kube-scheduler-ubuk8sma 1/1 Running 10 6d
pod/kubernetes-dashboard-77fd78f978-crl7b 1/1 Running 1 2d1h
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
service/kube-dns ClusterIP 10.96.0.10 <none> 53/UDP,53/TCP 6d
service/kubernetes-dashboard ClusterIP 10.103.36.134 <none> 443/TCP 2d1h
NAME DESIRED CURRENT READY UP-TO-DATE AVAILABLE NODE SELECTOR AGE
daemonset.apps/kube-flannel-ds-amd64 2 2 2 2 2 beta.kubernetes.io/arch=amd64 6d
...
daemonset.apps/kube-proxy 2 2 2 2 2 <none> 6d
NAME DESIRED CURRENT UP-TO-DATE AVAILABLE AGE
deployment.apps/coredns 2 2 2 2 6d
deployment.apps/kubernetes-dashboard 1 1 1 1 2d1h
NAME DESIRED CURRENT READY AGE
replicaset.apps/coredns-576cbf47c7 2 2 2 6d
replicaset.apps/kubernetes-dashboard-77fd78f978 1 1 1 2d1h
I started proxy for both API server and dashboard service validation:
kubectl proxy
Version check for API server:
u1#ubuk8sma:~$ curl http://localhost:8001/version
{
"major": "1",
"minor": "12",
"gitVersion": "v1.12.2",
"gitCommit": "17c77c7898218073f14c8d573582e8d2313dc740",
"gitTreeState": "clean",
"buildDate": "2018-10-24T06:43:59Z",
"goVersion": "go1.10.4",
"compiler": "gc",
"platform": "linux/amd64"
}
And here is problem I'm writing this question about:
u1#ubuk8sma:~$ curl "http://localhost:8001/api/v1/namespaces/kube-system/services/https:kubernetes-dashboard:/proxy/"
Error: 'dial tcp 10.244.1.8:8443: i/o timeout'
Trying to reach: 'https://10.244.1.8:8443/'
Fragment of Pod info:
status:
conditions:
- lastProbeTime: null
lastTransitionTime: 2018-11-29T07:16:04Z
status: "True"
type: Initialized
- lastProbeTime: null
lastTransitionTime: 2018-11-30T06:53:24Z
status: "True"
type: Ready
- lastProbeTime: null
lastTransitionTime: 2018-11-30T06:53:24Z
status: "True"
type: ContainersReady
- lastProbeTime: null
lastTransitionTime: 2018-11-29T07:16:04Z
status: "True"
type: PodScheduled
containerStatuses:
- containerID: docker://fb0937959c7680046130e670c483877e4c0f1854870cb0b20ed4fe066d72df18
image: k8s.gcr.io/kubernetes-dashboard-amd64:v1.10.0
imageID: docker-pullable://k8s.gcr.io/kubernetes-dashboard-amd64#sha256:1d2e1229a918f4bc38b5a3f9f5f11302b3e71f8397b492afac7f273a0008776a
lastState:
terminated:
containerID: docker://f85e1cc50f59adbd8a13d42694aef7c5e726c07b3d852a26288c4bfc1124c718
exitCode: 2
finishedAt: 2018-11-30T06:53:21Z
reason: Error
startedAt: 2018-11-29T07:16:07Z
name: kubernetes-dashboard
ready: true
restartCount: 1
state:
running:
startedAt: 2018-11-30T06:53:23Z
hostIP: 10.0.2.15
phase: Running
podIP: 10.244.1.8
qosClass: BestEffort
startTime: 2018-11-29T07:16:04Z
Docker check on worker1 node:
CONTAINER ID IMAGE COMMAND CREATED STATUS NAMES
fb0937959c... sha256:0dab2435c100... "/dashboard --insecure-bind-address=0.0.0.0 --bind-address=0.0.0.0 --auto-generate-certificates" 27 hours ago Up 27 hours k8s_kubernetes-dashboard_kube...
Tried to check Pod logs, no luck:
DASHBOARD_POD_NAME='kubernetes-dashboard-77fd78f978-crl7b'
kubectl -n kube-system logs $DASHBOARD_POD_NAME
Error from server (NotFound): the server could not find the requested resource ( pods/log kubernetes-dashboard-77fd78f978-crl7b)
Tried to wget from API server:
API_SRV_POD_NAME='kube-apiserver-ubuk8sma'
kubectl -n 'kube-system' exec -ti $API_SRV_POD_NAME wget https://10.244.1.8:8443/
No response.
Tried to check dashboard service existence, no luck:
u1#ubuk8sma:~$ kubectl get svc $DASHBOARD_SVC_NAME
Error from server (NotFound): services "kubernetes-dashboard" not found
Checked IP route table on API server:
kubectl -n 'kube-system' exec -ti $API_SRV_POD_NAME ip route show
default via 10.0.2.2 dev enp0s3 src 10.0.2.15 metric 100
10.0.2.0/24 dev enp0s3 scope link src 10.0.2.15
10.0.2.2 dev enp0s3 scope link src 10.0.2.15 metric 100
10.244.0.0/24 dev cni0 scope link src 10.244.0.1
10.244.1.0/24 via 10.244.1.0 dev flannel.1 onlink
172.17.0.0/16 dev docker0 scope link src 172.17.0.1
192.168.151.0/24 dev enp0s8 scope link src 192.168.151.21
For reference, enp0s3 is NAT NIC adapter, enp0s8 - host-only one.
I see flannel route 10.244.1.x. Seems to be the issue is hardly about network misconfig (but I can be wrong).
So, dashboard Pod looks like running, but has some errors and I cannot diagnose which ones. Could you help to find root cause and ideally make dashboard service run without errors?
Thanks in advance, folks!
Update1:
I see events on master:
Type Reason Age From Message
---- ------ ---- ---- -------
Warning FailedCreatePodSandBox 11h kubelet, ubuk8swrk1 Failed create pod sandbox: rpc error: code = Unknown desc = failed to set up sandbox container "43191144d447d0e9da52c8b6600bd96a23fab1e96c79af8c8fedc4e4e50882c7" network for pod "kubernetes-dashboard-77fd78f978-crl7b": NetworkPlugin cni failed to set up pod "kubernetes-dashboard-77fd78f978-crl7b_kube-system" network: open /run/flannel/subnet.env: no such file or directory
Normal SandboxChanged 11h (x4 over 11h) kubelet, ubuk8swrk1 Pod sandbox changed, it will be killed and re-created.
Normal Pulled 11h kubelet, ubuk8swrk1 Container image "k8s.gcr.io/kubernetes-dashboard-amd64:v1.10.0" already present on machine
Normal Created 11h kubelet, ubuk8swrk1 Created container
Normal Started 11h kubelet, ubuk8swrk1 Started container
Error about subnet.env absence - a bit strange, as both master and minion have it (well, maybe created on the fly):
u1#ubuk8swrk1:~$ ls -la /run/flannel/subnet.env
-rw-r--r-- 1 root root 96 Dec 3 08:15 /run/flannel/subnet.env
This is dashboard service descriptor:
Name: kubernetes-dashboard
Namespace: kube-system
Labels: k8s-app=kubernetes-dashboard
Annotations: deployment.kubernetes.io/revision: 1
Selector: k8s-app=kubernetes-dashboard
Replicas: 1 desired | 1 updated | 1 total | 1 available | 0 unavailable
StrategyType: RollingUpdate
MinReadySeconds: 0
RollingUpdateStrategy: 25% max unavailable, 25% max surge
Pod Template:
Labels: k8s-app=kubernetes-dashboard
Service Account: kubernetes-dashboard
Containers:
kubernetes-dashboard:
Image: k8s.gcr.io/kubernetes-dashboard-amd64:v1.10.0
Port: 8443/TCP
Host Port: 0/TCP
Args:
--auto-generate-certificates
Liveness: http-get https://:8443/ delay=30s timeout=30s period=10s #success=1 #failure=3
Environment: <none>
Mounts:
/certs from kubernetes-dashboard-certs (rw)
/tmp from tmp-volume (rw)
Volumes:
kubernetes-dashboard-certs:
Type: Secret (a volume populated by a Secret)
SecretName: kubernetes-dashboard-certs
Optional: false
tmp-volume:
Type: EmptyDir (a temporary directory that shares a pod's lifetime)
Medium:
Conditions:
Type Status Reason
---- ------ ------
Progressing True NewReplicaSetAvailable
Available True MinimumReplicasAvailable
OldReplicaSets: <none>
NewReplicaSet: kubernetes-dashboard-77fd78f978 (1/1 replicas created)
Events: <none>
This is reduced description of pods(original yaml is 35K, too much to share):
Name: coredns-576cbf47c7-4tzm9
Namespace: kube-system
Node: ubuk8sma/10.0.2.15
Labels: k8s-app=kube-dns
pod-template-hash=576cbf47c7
Status: Running
IP: 10.244.0.14
Controlled By: ReplicaSet/coredns-576cbf47c7
Containers:
coredns:
Container ID: docker://0efcd043407d93fb9d052045828489f6b99bb59b4f0882ec89e1897071609b77
Image: k8s.gcr.io/coredns:1.2.2
Ports: 53/UDP, 53/TCP, 9153/TCP
Host Ports: 0/UDP, 0/TCP, 0/TCP
State: Running
Last State: Terminated
Reason: Error
Exit Code: 255
Ready: True
Restart Count: 6
Liveness: http-get http://:8080/health delay=60s timeout=5s period=10s #success=1 #failure=5
Conditions:
Type Status
Initialized True
Ready True
ContainersReady True
PodScheduled True
Events: <none>
Name: etcd-ubuk8sma
Namespace: kube-system
Priority: 2000000000
PriorityClassName: system-cluster-critical
Node: ubuk8sma/10.0.2.15
Labels: component=etcd
tier=control-plane
Status: Running
IP: 10.0.2.15
Containers:
etcd:
Container ID: docker://ba2bdcf5fa558beabdd8578628d71480d595d5ee3bb5c4edf42407419010144b
Image: k8s.gcr.io/etcd:3.2.24
Image ID: docker-pullable://k8s.gcr.io/etcd#sha256:905d7ca17fd02bc24c0eba9a062753aba15db3e31422390bc3238eb762339b20
Port: <none>
Host Port: <none>
Command:
etcd
--advertise-client-urls=https://127.0.0.1:2379
--cert-file=/etc/kubernetes/pki/etcd/server.crt
--client-cert-auth=true
--data-dir=/var/lib/etcd
--initial-advertise-peer-urls=https://127.0.0.1:2380
--initial-cluster=ubuk8sma=https://127.0.0.1:2380
--key-file=/etc/kubernetes/pki/etcd/server.key
--listen-client-urls=https://127.0.0.1:2379
--listen-peer-urls=https://127.0.0.1:2380
--name=ubuk8sma
--peer-cert-file=/etc/kubernetes/pki/etcd/peer.crt
--peer-client-cert-auth=true
--peer-key-file=/etc/kubernetes/pki/etcd/peer.key
--peer-trusted-ca-file=/etc/kubernetes/pki/etcd/ca.crt
--snapshot-count=10000
--trusted-ca-file=/etc/kubernetes/pki/etcd/ca.crt
State: Running
Started: Mon, 03 Dec 2018 08:12:56 +0000
Last State: Terminated
Reason: Error
Exit Code: 255
Started: Wed, 28 Nov 2018 09:31:46 +0000
Finished: Mon, 03 Dec 2018 08:12:35 +0000
Ready: True
Restart Count: 8
Liveness: exec [/bin/sh -ec ETCDCTL_API=3 etcdctl --endpoints=https://[127.0.0.1]:2379 --cacert=/etc/kubernetes/pki/etcd/ca.crt --cert=/etc/kubernetes/pki/etcd/healthcheck-client.crt --key=/etc/kubernetes/pki/etcd/healthcheck-client.key get foo] delay=15s timeout=15s period=10s #success=1 #failure=8
Environment: <none>
Mounts:
/etc/kubernetes/pki/etcd from etcd-certs (rw)
/var/lib/etcd from etcd-data (rw)
Conditions:
Type Status
Initialized True
Ready True
ContainersReady True
PodScheduled True
Events: <none>
Name: kube-apiserver-ubuk8sma
Namespace: kube-system
Node: ubuk8sma/10.0.2.15
Status: Running
IP: 10.0.2.15
Containers:
kube-apiserver:
Container ID: docker://099b2a30772b969c3919b57fd377980673f03a820afba6034daa70f011271a52
Image: k8s.gcr.io/kube-apiserver:v1.12.2
Image ID: docker-pullable://k8s.gcr.io/kube-apiserver#sha256:094929baf3a7681945d83a7654b3248e586b20506e28526121f50eb359cee44f
Port: <none>
Host Port: <none>
Command:
kube-apiserver
--authorization-mode=Node,RBAC
--advertise-address=192.168.151.21
--allow-privileged=true
--client-ca-file=/etc/kubernetes/pki/ca.crt
--enable-admission-plugins=NodeRestriction
--enable-bootstrap-token-auth=true
--etcd-cafile=/etc/kubernetes/pki/etcd/ca.crt
--etcd-certfile=/etc/kubernetes/pki/apiserver-etcd-client.crt
--etcd-keyfile=/etc/kubernetes/pki/apiserver-etcd-client.key
--etcd-servers=https://127.0.0.1:2379
--insecure-port=0
--kubelet-client-certificate=/etc/kubernetes/pki/apiserver-kubelet-client.crt
--kubelet-client-key=/etc/kubernetes/pki/apiserver-kubelet-client.key
--kubelet-preferred-address-types=InternalIP,ExternalIP,Hostname
--proxy-client-cert-file=/etc/kubernetes/pki/front-proxy-client.crt
--proxy-client-key-file=/etc/kubernetes/pki/front-proxy-client.key
--requestheader-allowed-names=front-proxy-client
--requestheader-client-ca-file=/etc/kubernetes/pki/front-proxy-ca.crt
--requestheader-extra-headers-prefix=X-Remote-Extra-
--requestheader-group-headers=X-Remote-Group
--requestheader-username-headers=X-Remote-User
--secure-port=6443
--service-account-key-file=/etc/kubernetes/pki/sa.pub
--service-cluster-ip-range=10.96.0.0/12
--tls-cert-file=/etc/kubernetes/pki/apiserver.crt
--tls-private-key-file=/etc/kubernetes/pki/apiserver.key
State: Running
Started: Mon, 03 Dec 2018 08:13:00 +0000
Last State: Terminated
Reason: Error
Exit Code: 255
Ready: True
Restart Count: 8
Liveness: http-get https://192.168.151.21:6443/healthz delay=15s timeout=15s period=10s #success=1 #failure=8
Environment: <none>
Events: <none>
Name: kube-flannel-ds-amd64-rt442
Namespace: kube-system
Node: ubuk8swrk1/10.0.2.15
Status: Running
IP: 10.0.2.15
Controlled By: DaemonSet/kube-flannel-ds-amd64
Init Containers:
install-cni:
Container ID: docker://a6377b0fe1b040235c24e9ca19455c56e77daecf688b212cfea5553b6e59ff68
Image: quay.io/coreos/flannel:v0.10.0-amd64
Port: <none>
Host Port: <none>
State: Terminated
Reason: Completed
Exit Code: 0
Ready: True
Restart Count: 4
Containers:
kube-flannel:
Container ID: docker://f7029bc2144c1ab8654407d742c1079df0059d418b7ba86b886091b5ad8c34a3
Image: quay.io/coreos/flannel:v0.10.0-amd64
Port: <none>
Host Port: <none>
Command:
/opt/bin/flanneld
Args:
--ip-masq
--kube-subnet-mgr
State: Running
Last State: Terminated
Reason: Error
Exit Code: 255
Ready: True
Restart Count: 4
Events: <none>
Name: kube-proxy-6b6mc
Namespace: kube-system
Node: ubuk8sma/10.0.2.15
Status: Running
IP: 10.0.2.15
The biggest suspect is node IP. I see 10.0.2.15 (NAT IP) everywhere. But host-only NIC should be used. I had long story of setting up network properly for my ubuntu VMs.
I edited /etc/netplan/01-netcfg.yaml before k8s setup (thanks https://askubuntu.com/questions/984445/netplan-configuration-on-ubuntu-17-04-virtual-machine?rq=1 for help). Example for master config:
network:
version: 2
renderer: networkd
ethernets:
enp0s3:
dhcp4: yes
dhcp6: yes
routes:
- to: 0.0.0.0/0
via: 10.0.2.2
metric: 0
enp0s8:
dhcp4: no
dhcp6: no
addresses: [192.168.151.21/24]
routes:
- to: 192.168.151.1/24
via: 192.168.151.1
metric: 100
Only after this and a few more changes NAT and host-only networks start work together. NAT remains default net adapter. Likely that's why its IP is everywhere. For api server I set --advertise-address=192.168.151.21 explicitly. That reduced using NAT IP at least for it.
So, maybe root cause is different, but current question, how to reconfigure networks to replace NAT IP to host-only. I already tried this for /etc/systemd/system/kubelet.service.d/10-kubeadm.conf
Environment="KUBELET_EXTRA_ARGS=--node-ip=192.168.151.21"
Restarted kubelet:
sudo systemctl daemon-reload
sudo systemctl restart kubelet
Didn't help. Restarted VMs. Didn't help (I only expected kubelet related changes, but nothing changed). Explored a few configs (5+) for potential changes, no luck.
Update2:
I mentioned NAT address config issue above. I resolved it with editing /etc/default/kubelet config. I found that idea in comments for this article:
https://medium.com/#joatmon08/playing-with-kubeadm-in-vagrant-machines-part-2-bac431095706
Dashboard config part now has proper IP:
hostIP: 192.168.151.22
phase: Running
podIP: 10.244.1.13
Then I went to docker container for API and tried to reach podIP via wget,ping,traceroute. Timeouts everywhere. Routes:
/ # route -n
Kernel IP routing table
Destination Gateway Genmask Flags Metric Ref Use Iface
0.0.0.0 10.0.2.2 0.0.0.0 UG 100 0 0 enp0s3
10.0.2.0 0.0.0.0 255.255.255.0 U 0 0 0 enp0s3
10.0.2.2 0.0.0.0 255.255.255.255 UH 100 0 0 enp0s3
10.244.0.0 0.0.0.0 255.255.255.0 U 0 0 0 cni0
10.244.1.0 10.244.1.0 255.255.255.0 UG 0 0 0 flannel.1
172.17.0.0 0.0.0.0 255.255.0.0 U 0 0 0 docker0
192.168.151.0 0.0.0.0 255.255.255.0 U 0 0 0 enp0s8
Attempt to perform curl call from master VM:
u1#ubuk8sma:~$ curl -v -i -kSs "https://192.168.151.21:6443/api/v1/namespaces/kube-system/services/https:kubernetes-dashboard:/proxy/" -H "$K8S_AUTH_HEADER"
...
* SSL certificate verify result: unable to get local issuer certificate (20), continuing anyway.
* Using HTTP2, server supports multi-use
* Connection state changed (HTTP/2 confirmed)
* Copying HTTP/2 data in stream buffer to connection buffer after upgrade: len=0
* Using Stream ID: 1 (easy handle 0x565072b5a750)
> GET /api/v1/namespaces/kube-system/services/https:kubernetes-dashboard:/proxy/ HTTP/2
> Host: 192.168.151.21:6443
> User-Agent: curl/7.58.0
> Accept: */*
> Authorization: Bearer eyJhbGciOiJSUzI1.....
>
* Connection state changed (MAX_CONCURRENT_STREAMS updated)!
< HTTP/2 503
HTTP/2 503
< content-type: text/plain; charset=utf-8
content-type: text/plain; charset=utf-8
< content-length: 92
content-length: 92
< date: Tue, 04 Dec 2018 08:44:25 GMT
date: Tue, 04 Dec 2018 08:44:25 GMT
<
Error: 'dial tcp 10.244.1.13:8443: i/o timeout'
* Connection #0 to host 192.168.151.21 left intact
Trying to reach: 'https://10.244.1.13:8443/'
Service info for dashboard:
u1#ubuk8sma:~$ kubectl -n kube-system get service kubernetes-dashboard
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
kubernetes-dashboard ClusterIP 10.103.36.134 <none> 443/TCP 5d
A bit more details:
u1#ubuk8sma:~$ kubectl -n kube-system describe services kubernetes-dashboard
Name: kubernetes-dashboard
Namespace: kube-system
Labels: k8s-app=kubernetes-dashboard
Annotations: <none>
Selector: k8s-app=kubernetes-dashboard
Type: ClusterIP
IP: 10.103.36.134
Port: <unset> 443/TCP
TargetPort: 8443/TCP
Endpoints: 10.244.1.13:8443
Session Affinity: None
Events: <none>
Also I tried to go to shell, both via kubectl and docker. For any usual linux command I see this 'OCI runtime exec failed' issue:
u1#ubuk8sma:~$ DASHBOARD_POD_NAME='kubernetes-dashboard-77fd78f978-crl7b'
u1#ubuk8sma:~$ kubectl -v=9 -n kube-system exec "$DASHBOARD_POD_NAME" -- env
I1204 09:57:17.673345 23517 loader.go:359] Config loaded from file /home/u1/.kube/config
I1204 09:57:17.679526 23517 round_trippers.go:386] curl -k -v -XGET -H "Accept: application/json, */*" -H "User-Agent: kubectl/v1.12.2 (linux/amd64) kubernetes/17c77c7" 'https://192.168.151.21:6443/api/v1/namespaces/kube-system/pods/kubernetes-dashboard-77fd78f978-crl7b'
I1204 09:57:17.703924 23517 round_trippers.go:405] GET https://192.168.151.21:6443/api/v1/namespaces/kube-system/pods/kubernetes-dashboard-77fd78f978-crl7b 200 OK in 23 milliseconds
I1204 09:57:17.703957 23517 round_trippers.go:411] Response Headers:
I1204 09:57:17.703971 23517 round_trippers.go:414] Content-Length: 3435
I1204 09:57:17.703984 23517 round_trippers.go:414] Date: Tue, 04 Dec 2018 09:57:17 GMT
I1204 09:57:17.703997 23517 round_trippers.go:414] Content-Type: application/json
I1204 09:57:17.704866 23517 request.go:942] Response Body: {"kind":"Pod","apiVersion":"v1","metadata":{"name":"kubernetes-dashboard-77fd78f978-crl7b","generateName":"kubernetes-dashboard-77fd78f978-","namespace":"kube-system","selfLink":"/api/v1/namespaces/kube-system/pods/kubernetes-dashboard-77fd78f978-crl7b","uid":"a1d005b8-f3a6-11e8-a2d0-08002783a80f"...
I1204 09:57:17.740811 23517 round_trippers.go:386] curl -k -v -XPOST -H "X-Stream-Protocol-Version: v4.channel.k8s.io" -H "X-Stream-Protocol-Version: v3.channel.k8s.io" -H "X-Stream-Protocol-Version: v2.channel.k8s.io" -H "X-Stream-Protocol-Version: channel.k8s.io" -H "User-Agent: kubectl/v1.12.2 (linux/amd64) kubernetes/17c77c7" 'https://192.168.151.21:6443/api/v1/namespaces/kube-system/pods/kubernetes-dashboard-77fd78f978-crl7b/exec?command=env&container=kubernetes-dashboard&container=kubernetes-dashboard&stderr=true&stdout=true'
I1204 09:57:17.805528 23517 round_trippers.go:405] POST https://192.168.151.21:6443/api/v1/namespaces/kube-system/pods/kubernetes-dashboard-77fd78f978-crl7b/exec?command=env&container=kubernetes-dashboard&container=kubernetes-dashboard&stderr=true&stdout=true 101 Switching Protocols in 64 milliseconds
I1204 09:57:17.805565 23517 round_trippers.go:411] Response Headers:
I1204 09:57:17.805581 23517 round_trippers.go:414] Connection: Upgrade
I1204 09:57:17.805594 23517 round_trippers.go:414] Upgrade: SPDY/3.1
I1204 09:57:17.805607 23517 round_trippers.go:414] X-Stream-Protocol-Version: v4.channel.k8s.io
I1204 09:57:17.805620 23517 round_trippers.go:414] Date: Tue, 04 Dec 2018 09:57:17 GMT
OCI runtime exec failed: exec failed: container_linux.go:348: starting container process caused "exec: \"env\": executable file not found in $PATH": unknown
F1204 09:57:18.088488 23517 helpers.go:119] command terminated with exit code 126
So, I cannot reach pod, cannot go to shell there. But at least I see some logs:
u1#ubuk8sma:~$ kubectl -n kube-system logs -p $DASHBOARD_POD_NAME
2018/12/03 08:15:16 Starting overwatch
2018/12/03 08:15:16 Using in-cluster config to connect to apiserver
2018/12/03 08:15:16 Using service account token for csrf signing
2018/12/03 08:15:16 No request provided. Skipping authorization
2018/12/03 08:15:16 Successful initial request to the apiserver, version: v1.12.2
2018/12/03 08:15:16 Generating JWE encryption key
2018/12/03 08:15:16 New synchronizer has been registered: kubernetes-dashboard-key-holder-kube-system. Starting
2018/12/03 08:15:16 Starting secret synchronizer for kubernetes-dashboard-key-holder in namespace kube-system
2018/12/03 08:15:18 Initializing JWE encryption key from synchronized object
2018/12/03 08:15:18 Creating in-cluster Heapster client
2018/12/03 08:15:19 Metric client health check failed: the server could not find the requested resource (get services heapster). Retrying in 30 seconds.
2018/12/03 08:15:19 Auto-generating certificates
2018/12/03 08:15:19 Successfully created certificates
2018/12/03 08:15:19 Serving securely on HTTPS port: 8443
2018/12/03 08:15:49 Metric client health check failed: the server could not find the requested resource (get services heapster). Retrying in 30 seconds.
No ideas, where to go further for now to fix this timeout.
I am trying to build a 3 master, 3 worker Kubernetes Cluster, with 3 separate etcd servers.
[root#K8sMaster01 ~]# kubectl get nodes
NAME STATUS ROLES AGE VERSION
k8smaster01 Ready master 5h v1.11.1
k8smaster02 Ready master 4h v1.11.1
k8smaster03 Ready master 4h v1.11.1
k8snode01 Ready <none> 4h v1.11.1
k8snode02 Ready <none> 4h v1.11.1
k8snode03 Ready <none> 4h v1.11.1
I have spent weeks trying to get those to work, but can not get beyond one problem.
The containers / pods cannot access the API server.
[root#K8sMaster01 ~]# kubeadm version
kubeadm version: &version.Info{Major:"1", Minor:"11", GitVersion:"v1.11.1", GitCommit:"b1b29978270dc22fecc592ac55d903350454310a", GitTreeState:"clean", BuildDate:"2018-07-17T18:50:16Z", GoVersion:"go1.10.3", Compiler:"gc", Platform:"linux/amd64"}
[root#K8sMaster01 ~]# cat /etc/redhat-release
Fedora release 28 (Twenty Eight)
[root#K8sMaster01 ~]# uname -a
Linux K8sMaster01 4.16.3-301.fc28.x86_64 #1 SMP Mon Apr 23 21:59:58 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux
NAME READY STATUS RESTARTS AGE
coredns-78fcdf6894-c2wbh 1/1 Running 1 4h
coredns-78fcdf6894-psbtq 1/1 Running 1 4h
heapster-77f99d6b7c-5pxj6 1/1 Running 0 4h
kube-apiserver-k8smaster01 1/1 Running 1 4h
kube-apiserver-k8smaster02 1/1 Running 1 4h
kube-apiserver-k8smaster03 1/1 Running 1 4h
kube-controller-manager-k8smaster01 1/1 Running 1 4h
kube-controller-manager-k8smaster02 1/1 Running 1 4h
kube-controller-manager-k8smaster03 1/1 Running 1 4h
kube-flannel-ds-amd64-542x6 1/1 Running 0 4h
kube-flannel-ds-amd64-6dw2g 1/1 Running 4 4h
kube-flannel-ds-amd64-h6j9b 1/1 Running 1 4h
kube-flannel-ds-amd64-mgggx 1/1 Running 0 3h
kube-flannel-ds-amd64-p8xfk 1/1 Running 0 4h
kube-flannel-ds-amd64-qp86h 1/1 Running 4 4h
kube-proxy-4bqxh 1/1 Running 0 3h
kube-proxy-56p4n 1/1 Running 0 3h
kube-proxy-7z8p7 1/1 Running 0 3h
kube-proxy-b59ns 1/1 Running 0 3h
kube-proxy-fc6zg 1/1 Running 0 3h
kube-proxy-wrxg7 1/1 Running 0 3h
kube-scheduler-k8smaster01 1/1 Running 1 4h
kube-scheduler-k8smaster02 1/1 Running 1 4h
kube-scheduler-k8smaster03 1/1 Running 1 4h
**kubernetes-dashboard-6948bdb78-4f7qj 1/1 Running 19 1h**
node-problem-detector-v0.1-77fdw 1/1 Running 0 4h
node-problem-detector-v0.1-96pld 1/1 Running 1 4h
node-problem-detector-v0.1-ctnfn 1/1 Running 0 3h
node-problem-detector-v0.1-q2xvw 1/1 Running 0 4h
node-problem-detector-v0.1-vvf4j 1/1 Running 1 4h
traefik-ingress-controller-7w44f 1/1 Running 0 4h
traefik-ingress-controller-8cprj 1/1 Running 1 4h
traefik-ingress-controller-f6c7q 1/1 Running 0 3h
traefik-ingress-controller-tf8zw 1/1 Running 0 4h
kube-ops-view-6744bdc77d-2x5w8 1/1 Running 0 2h
kube-ops-view-redis-74578dcc5d-5fnvf 1/1 Running 0 2h
The kubernetes-dashboard will not start, but actually the same is for the kube-ops-view. Core DNS also has errors. All this to me is something to do with networks. I have tried:
sudo iptables -P FORWARD ACCEPT
sudo iptables --policy FORWARD ACCEPT
sudo iptables -A FORWARD -o flannel.1 -j ACCEPT
Core DNS give this error in the logs:
[root#K8sMaster01 ~]# kubectl logs coredns-78fcdf6894-c2wbh -n kube-system
.:53
2018/08/26 15:15:28 [INFO] CoreDNS-1.1.3
2018/08/26 15:15:28 [INFO] linux/amd64, go1.10.1, b0fd575c
2018/08/26 15:15:28 [INFO] plugin/reload: Running configuration MD5 = 2a066f12ec80aeb2b92740dd74c17138
CoreDNS-1.1.3
linux/amd64, go1.10.1, b0fd575c
E0826 17:12:19.624560 1 reflector.go:322] github.com/coredns/coredns/plugin/kubernetes/controller.go:313: Failed to watch *v1.Service: Get https://10.96.0.1:443/api/v1/services?resourceVersion=556&timeoutSeconds=389&watch=true: dial tcp 10.96.0.1:443: i/o timeout
2018/08/26 17:35:34 [ERROR] 2 kube-ops-view-redis.uk.specsavers.com. A: unreachable backend: read udp 10.96.0.7:46862->10.4.4.28:53: i/o timeout
2018/08/26 17:35:34 [ERROR] 2 kube-ops-view-redis.uk.specsavers.com. AAAA: unreachable backend: read udp 10.96.0.7:46690->10.4.4.28:53: i/o timeout
2018/08/26 17:35:37 [ERROR] 2 kube-ops-view-redis.uk.specsavers.com. AAAA: unreachable backend: read udp 10.96.0.7:60267->10.4.4.28:53: i/o timeout
2018/08/26 17:35:37 [ERROR] 2 kube-ops-view-redis.uk.specsavers.com. A: unreachable backend: read udp 10.96.0.7:41482->10.4.4.28:53: i/o timeout
2018/08/26 17:36:58 [ERROR] 2 kube-ops-view-redis.specsavers.local. AAAA: unreachable backend: read udp 10.96.0.7:58042->10.4.4.28:53: i/o timeout
2018/08/26 17:36:58 [ERROR] 2 kube-ops-view-redis.specsavers.local. A: unreachable backend: read udp 10.96.0.7:53149->10.4.4.28:53: i/o timeout
2018/08/26 17:37:01 [ERROR] 2 kube-ops-view-redis.specsavers.local. A: unreachable backend: read udp 10.96.0.7:36861->10.4.4.28:53: i/o timeout
2018/08/26 17:37:01 [ERROR] 2 kube-ops-view-redis.specsavers.local. AAAA: unreachable backend: read udp 10.96.0.7:43235->10.4.4.28:53: i/o timeout
The Dash board:
[root#K8sMaster01 ~]# kubectl logs kubernetes-dashboard-6948bdb78-4f7qj -n kube-system
2018/08/26 20:10:31 Starting overwatch
2018/08/26 20:10:31 Using in-cluster config to connect to apiserver
2018/08/26 20:10:31 Using service account token for csrf signing
2018/08/26 20:10:31 No request provided. Skipping authorization
2018/08/26 20:11:01 Error while initializing connection to Kubernetes apiserver. This most likely means that the cluster is misconfigured (e.g., it has invalid apiserver certificates or service accounts configuration) or the --apiserver-host param points to a server that does not exist. Reason: Get https://10.96.0.1:443/version: dial tcp 10.96.0.1:443: i/o timeout
Refer to our FAQ and wiki pages for more information: https://github.com/kubernetes/dashboard/wiki/FAQ
kube-ops-view:
ERROR:kube_ops_view.update:Failed to query cluster 10-96-0-1:443 (https://10.96.0.1:443): ConnectTimeout (try 141, wait 63 seconds)
10.96.3.1 - - [2018-08-26 20:12:34] "GET /health HTTP/1.1" 200 117 0.001002
10.96.3.1 - - [2018-08-26 20:12:44] "GET /health HTTP/1.1" 200 117 0.000921
10.96.3.1 - - [2018-08-26 20:12:54] "GET /health HTTP/1.1" 200 117 0.000926
10.96.3.1 - - [2018-08-26 20:13:04] "GET /health HTTP/1.1" 200 117 0.000924
10.96.3.1 - - [2018-08-26 20:13:14] "GET /health HTTP/1.1" 200 117 0.000942
10.96.3.1 - - [2018-08-26 20:13:24] "GET /health HTTP/1.1" 200 117 0.000924
10.96.3.1 - - [2018-08-26 20:13:34] "GET /health HTTP/1.1" 200 117 0.000939
ERROR:kube_ops_view.update:Failed to query cluster 10-96-0-1:443 (https://10.96.0.1:443): ConnectTimeout (try 142, wait 61 seconds)
Flannel has created the networks:
[root#K8sMaster01 ~]# ip addr show 1: lo: <LOOPBACK,UP,LOWER_UP> mtu
65536 qdisc noqueue state UNKNOWN group default qlen 1000
link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
inet 127.0.0.1/8 scope host lo
valid_lft forever preferred_lft forever
inet6 ::1/128 scope host
valid_lft forever preferred_lft forever 2: ens192: BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP group
default qlen 1000
link/ether 00:50:56:9a:80:f7 brd ff:ff:ff:ff:ff:ff
inet 10.34.88.182/24 brd 10.34.88.255 scope global dynamic ens192
valid_lft 7071sec preferred_lft 7071sec
inet 10.10.40.90/24 brd 10.10.40.255 scope global ens192:1
valid_lft forever preferred_lft forever
inet6 fe80::250:56ff:fe9a:80f7/64 scope link
valid_lft forever preferred_lft forever 3: docker0: <NO-ARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc noqueue state DOWN
group default
link/ether 02:42:cf:ec:b3:ee brd ff:ff:ff:ff:ff:ff
inet 172.17.0.1/16 brd 172.17.255.255 scope global docker0
valid_lft forever preferred_lft forever 4: flannel.1: BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1450 qdisc noqueue state UNKNOWN
group default
link/ether 06:df:1e:87:b8:ee brd ff:ff:ff:ff:ff:ff
inet 10.96.0.0/32 scope global flannel.1
valid_lft forever preferred_lft forever
inet6 fe80::4df:1eff:fe87:b8ee/64 scope link
valid_lft forever preferred_lft forever 5: cni0: BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1450 qdisc noqueue state UP
group default qlen 1000
link/ether 0a:58:0a:60:00:01 brd ff:ff:ff:ff:ff:ff
inet 10.96.0.1/24 scope global cni0
valid_lft forever preferred_lft forever
inet6 fe80::8c77:39ff:fe6e:8710/64 scope link
valid_lft forever preferred_lft forever 7: veth9527916b#if3: BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1450 qdisc noqueue master cni0
state UP group default
link/ether 46:62:b6:b8:b9:ac brd ff:ff:ff:ff:ff:ff link-netnsid 1
inet6 fe80::4462:b6ff:feb8:b9ac/64 scope link
valid_lft forever preferred_lft forever 8: veth6e6f08f5#if3: BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1450 qdisc noqueue master cni0
state UP group default
link/ether 3e:a5:4b:8d:11:ce brd ff:ff:ff:ff:ff:ff link-netnsid 2
inet6 fe80::3ca5:4bff:fe8d:11ce/64 scope link
valid_lft forever preferred_lft forever
I can ping the IP:
[root#K8sMaster01 ~]# ping 10.96.0.1
PING 10.96.0.1 (10.96.0.1) 56(84) bytes of data.
64 bytes from 10.96.0.1: icmp_seq=1 ttl=64 time=0.052 ms
64 bytes from 10.96.0.1: icmp_seq=2 ttl=64 time=0.032 ms
64 bytes from 10.96.0.1: icmp_seq=3 ttl=64 time=0.042 ms
and telent the port:
[root#K8sMaster01 ~]# telnet 10.96.0.1 443
Trying 10.96.0.1...
Connected to 10.96.0.1.
Escape character is '^]'.
Some one PLEASE save my back holiday weekend and tell me what is going wrong!
As requested here is my get services:
[root#K8sMaster01 ~]# kubectl get services --all-namespaces
NAMESPACE NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
default blackbox-database ClusterIP 10.110.56.121 <none> 3306/TCP 5h
default kube-ops-view ClusterIP 10.105.35.23 <none> 82/TCP 1d
default kube-ops-view-redis ClusterIP 10.107.254.193 <none> 6379/TCP 1d
default kubernetes ClusterIP 10.96.0.1 <none> 443/TCP 1d
kube-system heapster ClusterIP 10.103.5.79 <none> 80/TCP 1d
kube-system kube-dns ClusterIP 10.96.0.10 <none> 53/UDP,53/TCP 1d
kube-system kubernetes-dashboard ClusterIP 10.96.220.152 <none> 443/TCP 1d
kube-system traefik-ingress-service ClusterIP 10.102.84.167 <none> 80/TCP,8080/TCP 1d
liab-live-bb blackbox-application ClusterIP 10.98.40.25 <none> 8080/TCP 5h
liab-live-bb blackbox-database ClusterIP 10.108.43.196 <none> 3306/TCP 5h
Telnet to port 46690:
[root#K8sMaster01 ~]# telnet 10.96.0.7 46690
Trying 10.96.0.7...
(no response)
Today I tried deploying two of my applications to the cluster, as can be seen in the get services. The "app" is unable to connect to the "db" it cannot resolve the DB service name. I believe that I have an issue with the networking, not sure if it is at the host level, or with in the kubernetes layer. I did notice my resolv.conf files were not pointing to localhost, and found some changes to make to the coredns config. When Ilooked at its configuration it was trying to point to a IP V6 Address, so changed it to this:
apiVersion: v1
data:
Corefile: |
.:53 {
errors
health
kubernetes cluster.local 10.96.0.0/12 {
pods insecure
}
prometheus :9153
proxy 10.4.4.28
cache 30
reload
}
kind: ConfigMap
metadata:
creationTimestamp: 2018-08-27T12:28:57Z
name: coredns
namespace: kube-system
resourceVersion: "174571"
selfLink: /api/v1/namespaces/kube-system/configmaps/coredns
uid: c5016361-a9f4-11e8-b0b4-0050569afad9
currently I am setting up Kubernetes on a 1 Master 2 Node enviorement.
I succesfully initialized the Master and added the nodes to the Cluster
kubectl get nodes
When I joined the Nodes to the cluster, the kube-proxy pod started succesfully, but the kube-flannel pod gets an error and runs into a CrashLoopBackOff.
flannel-pod.log:
I0613 09:03:36.820387 1 main.go:475] Determining IP address of default interface,
I0613 09:03:36.821180 1 main.go:488] Using interface with name ens160 and address 172.17.11.2,
I0613 09:03:36.821233 1 main.go:505] Defaulting external address to interface address (172.17.11.2),
I0613 09:03:37.015163 1 kube.go:131] Waiting 10m0s for node controller to sync,
I0613 09:03:37.015436 1 kube.go:294] Starting kube subnet manager,
I0613 09:03:38.015675 1 kube.go:138] Node controller sync successful,
I0613 09:03:38.015767 1 main.go:235] Created subnet manager: Kubernetes Subnet Manager - caasfaasslave1.XXXXXX.local,
I0613 09:03:38.015828 1 main.go:238] Installing signal handlers,
I0613 09:03:38.016109 1 main.go:353] Found network config - Backend type: vxlan,
I0613 09:03:38.016281 1 vxlan.go:120] VXLAN config: VNI=1 Port=0 GBP=false DirectRouting=false,
E0613 09:03:38.016872 1 main.go:280] Error registering network: failed to acquire lease: node "caasfaasslave1.XXXXXX.local" pod cidr not assigned,
I0613 09:03:38.016966 1 main.go:333] Stopping shutdownHandler...,
On the Node, I can verify that the PodCDIR is available:
kubectl get nodes -o jsonpath='{.items[*].spec.podCIDR}'
172.17.12.0/24
On the Masters kube-controller-manager, the pod cidr is also there
[root#caasfaasmaster manifests]# cat kube-controller-manager.yaml
apiVersion: v1
kind: Pod
metadata:
annotations:
scheduler.alpha.kubernetes.io/critical-pod: ""
creationTimestamp: null
labels:
component: kube-controller-manager
tier: control-plane
name: kube-controller-manager
namespace: kube-system
spec:
containers:
- command:
- kube-controller-manager
- --leader-elect=true
- --controllers=*,bootstrapsigner,tokencleaner
- --cluster-signing-key-file=/etc/kubernetes/pki/ca.key
- --address=127.0.0.1
- --use-service-account-credentials=true
- --kubeconfig=/etc/kubernetes/controller-manager.conf
- --root-ca-file=/etc/kubernetes/pki/ca.crt
- --service-account-private-key-file=/etc/kubernetes/pki/sa.key
- --cluster-signing-cert-file=/etc/kubernetes/pki/ca.crt
- --allocate-node-cidrs=true
- --cluster-cidr=172.17.12.0/24
- --node-cidr-mask-size=24
env:
- name: http_proxy
value: http://ntlmproxy.XXXXXX.local:3154
- name: https_proxy
value: http://ntlmproxy.XXXXXX.local:3154
- name: no_proxy
value: .XXXXX.local,172.17.11.0/24,172.17.12.0/24
image: k8s.gcr.io/kube-controller-manager-amd64:v1.10.4
livenessProbe:
failureThreshold: 8
httpGet:
host: 127.0.0.1
path: /healthz
port: 10252
scheme: HTTP
initialDelaySeconds: 15
timeoutSeconds: 15
name: kube-controller-manager
resources:
requests:
cpu: 200m
volumeMounts:
- mountPath: /etc/kubernetes/pki
name: k8s-certs
readOnly: true
- mountPath: /etc/ssl/certs
name: ca-certs
readOnly: true
- mountPath: /etc/kubernetes/controller-manager.conf
name: kubeconfig
readOnly: true
- mountPath: /etc/pki
name: ca-certs-etc-pki
readOnly: true
hostNetwork: true
volumes:
- hostPath:
path: /etc/pki
type: DirectoryOrCreate
name: ca-certs-etc-pki
- hostPath:
path: /etc/kubernetes/pki
type: DirectoryOrCreate
name: k8s-certs
- hostPath:
path: /etc/ssl/certs
type: DirectoryOrCreate
name: ca-certs
- hostPath:
path: /etc/kubernetes/controller-manager.conf
type: FileOrCreate
name: kubeconfig
status: {}
XXXXX for anonymization
I initialized the master with the following kubeadm comman (which also went through without any errors)
kubeadm init --pod-network-cidr=172.17.12.0/24 --service-
cidr=172.17.11.129/25 --service-dns-domain=dcs.XXXXX.local
Does anyone know what could cause my issues and how to fix them?
NAMESPACE NAME READY STATUS RESTARTS AGE IP NODE
kube-system etcd-caasfaasmaster.XXXXXX.local 1/1 Running 0 16h 172.17.11.1 caasfaasmaster.XXXXXX.local
kube-system kube-apiserver-caasfaasmaster.XXXXXX.local 1/1 Running 1 16h 172.17.11.1 caasfaasmaster.XXXXXX.local
kube-system kube-controller-manager-caasfaasmaster.XXXXXX.local 1/1 Running 0 16h 172.17.11.1 caasfaasmaster.XXXXXX.local
kube-system kube-dns-75c5968bf9-qfh96 3/3 Running 0 16h 172.17.12.2 caasfaasmaster.XXXXXX.local
kube-system kube-flannel-ds-4b6kf 0/1 CrashLoopBackOff 205 16h 172.17.11.2 caasfaasslave1.XXXXXX.local
kube-system kube-flannel-ds-j2fz6 0/1 CrashLoopBackOff 191 16h 172.17.11.3 caasfassslave2.XXXXXX.local
kube-system kube-flannel-ds-qjd89 1/1 Running 0 16h 172.17.11.1 caasfaasmaster.XXXXXX.local
kube-system kube-proxy-h4z54 1/1 Running 0 16h 172.17.11.3 caasfassslave2.XXXXXX.local
kube-system kube-proxy-sjwl2 1/1 Running 0 16h 172.17.11.2 caasfaasslave1.XXXXXX.local
kube-system kube-proxy-zc5xh 1/1 Running 0 16h 172.17.11.1 caasfaasmaster.XXXXXX.local
kube-system kube-scheduler-caasfaasmaster.XXXXXX.local 1/1 Running 0 16h 172.17.11.1 caasfaasmaster.XXXXXX.local
Failed to acquire lease simply means, the pod didn't get the podCIDR. Happened with me as well although the manifest on master-node says podCIDR true but still it wasn't working and funnel going in crashbackloop.
This is what i did to fix it.
From the master-node, first find out your funnel CIDR
sudo cat /etc/kubernetes/manifests/kube-controller-manager.yaml | grep -i cluster-cidr
Output:
- --cluster-cidr=172.168.10.0/24
Then run the following from the master node:
kubectl patch node slave-node-1 -p '{"spec":{"podCIDR":"172.168.10.0/24"}}'
where,
slave-node-1 is your node where acquire lease is failing
podCIDR is the cidr that you found in previous command
Hope this helps.
According to Flannel documentation:
At the bare minimum, you must tell flannel an IP range (subnet) that
it should use for the overlay. Here is an example of the minimum
flannel configuration:
{ "Network": "10.1.0.0/16" }
Therefore, you need to specify a network for pods with a minimum size of /16, and it should not be a part of your existing network because Flannel uses encapsulation to connect pods on different nodes to one overlay network.
Here is the part of documentation which describes it:
With Docker, each container is assigned an IP address that can be used
to communicate with other containers on the same host. For
communicating over a network, containers are tied to the IP addresses
of the host machines and must rely on port-mapping to reach the
desired container. This makes it difficult for applications running
inside containers to advertise their external IP and port as that
information is not available to them.
flannel solves the problem by giving each container an IP that can be
used for container-to-container communication. It uses packet
encapsulation to create a virtual overlay network that spans the whole
cluster. More specifically, flannel gives each host an IP subnet
(/24 by default) from which the Docker daemon is able to allocate
IPs to the individual containers.
In other words, you should recreate your cluster with settings like these:
kubeadm init --pod-network-cidr=10.17.0.0/16 --service-cidr=10.18.0.0/24 --service-dns-domain=dcs.XXXXX.local
Running the Guestbook Kubernetes app with the Kube-Sky add-on. On the guestbook app page I get this error in the JavaScript console:
Fatal error: Uncaught exception 'Predis\Connection\ConnectionException' with message 'php_network_getaddresses: getaddrinfo failed: Name or service not known [tcp://redis-slave:6379]' in /vendor/predis/predis/lib/Predis/Connection/AbstractConnection.php:141
The following are the all the changes (vs. the provided templates) relating to DNS I can find in my setup:
diff -r sample-configs/unmodified/cloud-init/node.yaml sample-configs/defaults/cloud-init/node.yaml
88a91,92
> --cluster_dns=10.100.88.88 \
> --cluster_domain=cluster.local \
diff -r sample-configs/unmodified/skydns-controller.yaml sample-configs/defaults/skydns-controller.yaml
11c11
< replicas: {{ pillar['dns_replicas'] }}
---
> replicas: 3
50c50,51
< - -domain={{ pillar['dns_domain'] }}
---
> - -domain=cluster.local
> - -kube_master_url=http://$(KUBERNETES_MASTER_IPV4):8080
62c63
< - -domain={{ pillar['dns_domain'] }}.
---
> - -domain=cluster.local.
91c92
< - -cmd=nslookup kubernetes.default.svc.{{ pillar['dns_domain'] }} 127.0.0.1 >/dev/null
---
> - -cmd=nslookup kubernetes.default.svc.cluster.local 127.0.0.1 >/dev/null
diff -r sample-configs/unmodified/skydns-service.yaml sample-configs/defaults/skydns-service.yaml
13c13,14
< clusterIP: {{ pillar['dns_server'] }}
---
> clusterIP: 10.100.88.88
> type: NodePort
17a19
> nodePort:
20a23
> nodePort:
\ No newline at end of file
./sample-configs/unmodified/cloud-init/master.yaml: --service-cluster-ip-range=10.100.0.0/16 \
NAMESPACE NAME READY STATUS RESTARTS AGE NODE
default frontend-jw0ud 1/1 Running 0 48m $publicip.23
default frontend-mwu18 1/1 Running 0 48m $publicip.23
default frontend-o33ei 1/1 Running 0 48m $publicip.26
default redis-master-ubpga 1/1 Running 0 46m $publicip.23
default redis-slave-7aqp9 1/1 Running 0 46m $publicip.97
default redis-slave-w6rn3 1/1 Running 0 46m $publicip.26
default redis-slave-wny9v 1/1 Running 0 46m $publicip.26
kube-system kube-dns-v9-jek26 4/4 Running 0 50m $publicip.23
kube-system kube-dns-v9-ua150 4/4 Running 0 50m $publicip.26
kube-system kube-dns-v9-ycloq 4/4 Running 0 50m $publicip.97
NAMESPACE NAME CLUSTER_IP EXTERNAL_IP PORT(S) SELECTOR AGE
default frontend 10.100.221.197 nodes 80/TCP name=frontend 46m
default kubernetes 10.100.0.1 <none> 443/TCP <none> 1h
default redis-master 10.100.151.114 <none> 6379/TCP name=redis-master 46m
default redis-slave 10.100.223.227 nodes 6379/TCP name=redis-slave 46m
kube-system kube-dns 10.100.88.88 nodes 53/UDP,53/TCP k8s-app=kube-dns 46m
NAMESPACE CONTROLLER CONTAINER(S) IMAGE(S) SELECTOR REPLICAS AGE
default frontend php-redis kubernetes/example-guestbook-php-redis:v2 name=frontend 3 48m
default redis-master master redis name=redis-master 1 46m
default redis-slave worker kubernetes/redis-slave:v2 name=redis-slave 3 46m
kube-system kube-dns-v9 etcd gcr.io/google_containers/etcd:2.0.9 k8s-app=kube-dns 3 50m
kube2sky gcr.io/google_containers/kube2sky:1.11
skydns gcr.io/google_containers/skydns:2015-10-13-8c72f8c
healthz gcr.io/google_containers/exechealthz:1.0
$ rkubectl exec busybox -- nslookup kubernetes
Server: 10.100.88.88
Address 1: 10.100.88.88
nslookup: can't resolve 'kubernetes'
error: error executing remote command: Error executing command in container: Error executing in Docker Container: 1
$ rkubectl exec busybox -- ping -w 1 10.100.88.88
PING 10.100.88.88 (10.100.88.88): 56 data bytes
--- 10.100.88.88 ping statistics ---
1 packets transmitted, 0 packets received, 100% packet loss
error: error executing remote command: Error executing command in container: Error executing in Docker Container: 1
$ rkubectl exec busybox -- route -n
Kernel IP routing table
Destination Gateway Genmask Flags Metric Ref Use Iface
0.0.0.0 172.17.42.1 0.0.0.0 UG 0 0 0 eth0
172.17.0.0 0.0.0.0 255.255.0.0 U 0 0 0 eth0
$ rkubectl exec busybox -- ifconfig
eth0 Link encap:Ethernet HWaddr 02:42:AC:11:00:1A
inet addr:172.17.0.26 Bcast:0.0.0.0 Mask:255.255.0.0
$ rkubectl exec busybox -- cat /etc/resolv.conf
nameserver 10.100.88.88
nameserver 8.8.8.8
nameserver 8.8.4.4
search default.svc.cluster.local svc.cluster.local cluster.local
options ndots:5