Kubernetes API timeout when deploying ingress-controller - kubernetes

I am using a Self-Managed Bare-Metal kubernetes cluster with the following three nodes:
NAME STATUS ROLES AGE VERSION
ruby0 Ready master 17h v1.18.5
ruby1 Ready <none> 17h v1.18.5
ruby2 Ready <none> 17h v1.18.5
They are hosted on three identical debian systems with the following kernel (uname -a):
Linux ruby0 4.19.0-9-amd64 #1 SMP Debian 4.19.118-2+deb10u1 (2020-06-07) x86_64 GNU/Linux
To setup the nodes, I followed the basic installation guide of Imixs-Cloud. As mentioned in the tutorial, I've deployed the pod network flannel. I finished up the installation with the deployment of a traefik ingress-controller, which is explained here. I left out the 011-persistencevolume.yaml for simplicity.
After that, i got the following traefik service and pod (aswell as deployment and replicaset):
NAMESPACE NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
kube-system service/traefik LoadBalancer 10.110.149.28 192.168.10.127 80:31298/TCP,443:32383/TCP,8100:31372/TCP 68m
NAMESPACE NAME READY STATUS RESTARTS AGE
kube-system pod/traefik-b9b444b89-w9b6h 1/1 Running 0 68m
When trying to access the traefik service on one of the given ports, either by a browser or curl, I get a timeout error with no response. The given traefik pod shows me the following errors:
Trace[1402685037]: [30.000537725s] [30.000537725s] END
E0707 08:08:49.147428 1 reflector.go:153] pkg/mod/k8s.io/client-go#v0.17.3/tools/cache/reflector.go:105: Failed to list *v1.Service: Get "https://10.96.0.1:443/api/v1/services?limit=500&resourceVersion=0": dial tcp 10.96.0.1:443: i/o timeout
I0707 08:08:49.148535 1 trace.go:116] Trace[84373527]: "Reflector ListAndWatch" name:pkg/mod/k8s.io/client-go#v0.17.3/tools/cache/reflector.go:105 (started: 2020-07-07 08:08:19.148006738 +0000 UTC m=+4340.222316899) (total time: 30.000486829s):
Trace[84373527]: [30.000486829s] [30.000486829s] END
E0707 08:08:49.148560 1 reflector.go:153] pkg/mod/k8s.io/client-go#v0.17.3/tools/cache/reflector.go:105: Failed to list *v1.Secret: Get "https://10.96.0.1:443/api/v1/secrets?limit=500&resourceVersion=0": dial tcp 10.96.0.1:443: i/o timeout
I0707 08:08:49.149667 1 trace.go:116] Trace[1117097236]: "Reflector ListAndWatch" name:pkg/mod/k8s.io/client-go#v0.17.3/tools/cache/reflector.go:105 (started: 2020-07-07 08:08:19.149071171 +0000 UTC m=+4340.223381353) (total time: 30.000553113s):
Trace[1117097236]: [30.000553113s] [30.000553113s] END
E0707 08:08:49.149693 1 reflector.go:153] pkg/mod/k8s.io/client-go#v0.17.3/tools/cache/reflector.go:105: Failed to list *v1.Endpoints: Get "https://10.96.0.1:443/api/v1/endpoints?limit=500&resourceVersion=0": dial tcp 10.96.0.1:443: i/o timeout
I0707 08:08:49.150743 1 trace.go:116] Trace[1553411795]: "Reflector ListAndWatch" name:pkg/mod/k8s.io/client-go#v0.17.3/tools/cache/reflector.go:105 (started: 2020-07-07 08:08:19.150285239 +0000 UTC m=+4340.224595418) (total time: 30.000416845s):
Trace[1553411795]: [30.000416845s] [30.000416845s] END
E0707 08:08:49.150770 1 reflector.go:153] pkg/mod/k8s.io/client-go#v0.17.3/tools/cache/reflector.go:105: Failed to list *v1alpha1.TraefikService: Get "https://10.96.0.1:443/apis/traefik.containo.us/v1alpha1/traefikservices?limit=500&resourceVersion=0": dial tcp 10.96.0.1:443: i/o timeout
10.244.1.1 - - [07/Jul/2020:08:08:53 +0000] "GET /ping HTTP/1.1" 200 2 "-" "-" 873 "ping#internal" "-" 0ms
10.244.1.1 - - [07/Jul/2020:08:08:56 +0000] "GET /ping HTTP/1.1" 200 2 "-" "-" 874 "ping#internal" "-" 0ms
10.244.1.1 - - [07/Jul/2020:08:09:03 +0000] "GET /ping HTTP/1.1" 200 2 "-" "-" 875 "ping#internal" "-" 0ms
10.244.1.1 - - [07/Jul/2020:08:09:06 +0000] "GET /ping HTTP/1.1" 200 2 "-" "-" 876 "ping#internal" "-" 0ms
For some reason, traefik cannot access the kubernetes API:
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
service/kubernetes ClusterIP 10.96.0.1 <none> 443/TCP 18h
I've seen that behaviour on my cluster before, for example when I've installed MetalLB and it tried to access the configmap over the kubernetes API.

Related

Kubernetes cluster on bare metal by kubeadm

I'm trying to create a single control-plane cluster with kubeadm on 3 bare metal nodes (1 master and 2 workers) running on Debian 10 with Docker as a container runtime. Each node has an external IP and internal IP.
I want to configure a cluster on the internal network and be accessible from the Internet.
Used this command for that (please correct me if something wrong):
kubeadm init --control-plane-endpoint=10.10.0.1 --apiserver-cert-extra-sans={public_DNS_name},10.10.0.1 --pod-network-cidr=192.168.0.0/16
I got:
kubectl get no -o wide
NAME STATUS ROLES AGE VERSION INTERNAL-IP EXTERNAL-IP OS-IMAGE KERNEL-VERSION CONTAINER-RUNTIME
dev-k8s-master-0.public.dns Ready master 16h v1.18.2 10.10.0.1 <none> Debian GNU/Linux 10 (buster) 4.19.0-8-amd64 docker://19.3.8
Init phase complete successfully and the cluster is accessible from the Internet. All pods are up and running except coredns that should be running after networking will be applied.
kubectl apply -f https://docs.projectcalico.org/v3.11/manifests/calico.yaml
After networking applied, coredns pods still not ready:
kubectl get po -A
NAMESPACE NAME READY STATUS RESTARTS AGE
kube-system calico-kube-controllers-75d56dfc47-g8g9g 0/1 CrashLoopBackOff 192 16h
kube-system calico-node-22gtx 1/1 Running 0 16h
kube-system coredns-66bff467f8-87vd8 0/1 Running 0 16h
kube-system coredns-66bff467f8-mv8d9 0/1 Running 0 16h
kube-system etcd-dev-k8s-master-0 1/1 Running 0 16h
kube-system kube-apiserver-dev-k8s-master-0 1/1 Running 0 16h
kube-system kube-controller-manager-dev-k8s-master-0 1/1 Running 0 16h
kube-system kube-proxy-lp6b8 1/1 Running 0 16h
kube-system kube-scheduler-dev-k8s-master-0 1/1 Running 0 16h
Some logs from failed pods:
kubectl -n kube-system logs calico-kube-controllers-75d56dfc47-g8g9g
2020-04-22 08:24:55.853 [INFO][1] main.go 88: Loaded configuration from environment config=&config.Config{LogLevel:"info", ReconcilerPeriod:"5m", CompactionPeriod:"10m", EnabledControllers:"node", WorkloadEndpointWorkers:1, ProfileWorkers:1, PolicyWorkers:1, NodeWorkers:1, Kubeconfig:"", HealthEnabled:true, SyncNodeLabels:true, DatastoreType:"kubernetes"}
2020-04-22 08:24:55.855 [INFO][1] k8s.go 228: Using Calico IPAM
W0422 08:24:55.855525 1 client_config.go:541] Neither --kubeconfig nor --master was specified. Using the inClusterConfig. This might not work.
2020-04-22 08:24:55.856 [INFO][1] main.go 109: Ensuring Calico datastore is initialized
2020-04-22 08:25:05.857 [ERROR][1] client.go 255: Error getting cluster information config ClusterInformation="default" error=Get https://10.96.0.1:443/apis/crd.projectcalico.org/v1/clusterinformations/default: context deadline exceeded
2020-04-22 08:25:05.857 [FATAL][1] main.go 114: Failed to initialize Calico datastore error=Get https://10.96.0.1:443/apis/crd.projectcalico.org/v1/clusterinformations/default: context deadline exceeded
coredns:
[INFO] plugin/ready: Still waiting on: "kubernetes"
I0422 08:29:12.275344 1 trace.go:116] Trace[1050055850]: "Reflector ListAndWatch" name:pkg/mod/k8s.io/client-go#v0.17.2/tools/cache/reflector.go:105 (started: 2020-04-22 08:28:42.274382393 +0000 UTC m=+59491.429700922) (total time: 30.000897581s):
Trace[1050055850]: [30.000897581s] [30.000897581s] END
E0422 08:29:12.275388 1 reflector.go:153] pkg/mod/k8s.io/client-go#v0.17.2/tools/cache/reflector.go:105: Failed to list *v1.Service: Get https://10.96.0.1:443/api/v1/services?limit=500&resourceVersion=0: dial tcp 10.96.0.1:443: i/o timeout
I0422 08:29:12.276163 1 trace.go:116] Trace[188478428]: "Reflector ListAndWatch" name:pkg/mod/k8s.io/client-go#v0.17.2/tools/cache/reflector.go:105 (started: 2020-04-22 08:28:42.275499997 +0000 UTC m=+59491.430818380) (total time: 30.000606394s):
Trace[188478428]: [30.000606394s] [30.000606394s] END
E0422 08:29:12.276198 1 reflector.go:153] pkg/mod/k8s.io/client-go#v0.17.2/tools/cache/reflector.go:105: Failed to list *v1.Namespace: Get https://10.96.0.1:443/api/v1/namespaces?limit=500&resourceVersion=0: dial tcp 10.96.0.1:443: i/o timeout
I0422 08:29:12.277424 1 trace.go:116] Trace[16697023]: "Reflector ListAndWatch" name:pkg/mod/k8s.io/client-go#v0.17.2/tools/cache/reflector.go:105 (started: 2020-04-22 08:28:42.276675998 +0000 UTC m=+59491.431994406) (total time: 30.000689778s):
Trace[16697023]: [30.000689778s] [30.000689778s] END
E0422 08:29:12.277452 1 reflector.go:153] pkg/mod/k8s.io/client-go#v0.17.2/tools/cache/reflector.go:105: Failed to list *v1.Endpoints: Get https://10.96.0.1:443/api/v1/endpoints?limit=500&resourceVersion=0: dial tcp 10.96.0.1:443: i/o timeout
[INFO] plugin/ready: Still waiting on: "kubernetes"
Any thoughts what's wrong?
This answer is to call attention to #florin suggestion:
I've seen a similar behavior when I had multiple public interfaces on the node and calico selected the wrong one.
What I did is to set IP_AUTODETECT_METHOD in the calico config.
From Calico Configuration on IP_AUTO_DETECT_METHOD:
The method to use to autodetect the IPv4 address for this host. This is only used when the IPv4 address is being autodetected. See IP Autodetection methods for details of the valid methods.
Learn more Here: https://docs.projectcalico.org/reference/node/configuration#ip-autodetection-methods
I am also facing same problem, but following is work for me, try this in you master node.
$ sudo iptables -P INPUT ACCEPT
$ sudo iptables -P FORWARD ACCEPT
$ sudo iptables -P FORWARD ACCEPT
$ sudo iptables -F

Kubernetes v1.12 dashboard is running but timeout occurred while accessing it via api server proxy

Link: Kubernetes v1.12 dashboard is running but timeout occurred while accessing it via api server proxy
Started: 2018-12-01
Title: Kubernetes v1.12 dashboard is running but timeout occurred while accessing it via api server proxy
Body:
I have windows 10 home (1803 update) host machine, Virtual Box 5.22, 2 guest ubuntu 18.04.1 servers.
Each guest has 2 networks: NAT (host IP 10.0.2.15) and shared host-only with gateway IP 192.168.151.1.
I set IPs:
for k8s master(ubuk8sma) - 192.168.151.21
for worker1 (ubuk8swrk1) - 192.168.151.22
I remained docker as is, version is 18.09.0.
I installed k8s version stable-1.12 on master and worker. For master init is:
K8S_POD_CIDR='10.244.0.0/16'
K8S_IP_ADDR='192.168.151.21'
K8S_VER='stable-1.12' # or latest
sudo kubeadm init --pod-network-cidr=${K8S_POD_CIDR} --apiserver-advertise-address=${K8S_IP_ADDR} --kubernetes-version ${K8S_VER} --ignore-preflight-errors=all
Why I set "ignore errors" flag:
[ERROR SystemVerification]: unsupported docker version: 18.09.0
I was reluctant to reinstall k8s fully compatible docker version (may be not very smart move, just I'm usually eager to try the latest stuff).
For CNI I installed flannel network:
kubectl apply -f https://raw.githubusercontent.com/coreos/flannel/master/Documentation/kube-flannel.yml
kubectl apply -f https://raw.githubusercontent.com/coreos/flannel/master/Documentation/k8s-manifests/kube-flannel-rbac.yml
After installing worker1 nodes state looks like:
u1#ubuk8sma:~$ kubectl get nodes
NAME STATUS ROLES AGE VERSION
ubuk8sma Ready master 6d v1.12.2
ubuk8swrk1 Ready <none> 4d1h v1.12.2
No big issues shown up. Next I wanted is to have visualization of this pretty k8s bundle ecosystem, so I headed towards installing k8s dashboard.
I followed "defaults" path, with zero intervention, if possible. I used this yaml:
kubectl create -f https://raw.githubusercontent.com/kubernetes/dashboard/master/src/deploy/recommended/kubernetes-dashboard.yaml
From basic level it looks like installed, deployed to worker Pod, running. From pod list info:
u1#ubuk8sma:~$ kubectl get all --namespace=kube-system
NAME READY STATUS RESTARTS AGE
pod/coredns-576cbf47c7-4tzm9 1/1 Running 5 6d
pod/coredns-576cbf47c7-tqtpw 1/1 Running 5 6d
pod/etcd-ubuk8sma 1/1 Running 7 6d
pod/kube-apiserver-ubuk8sma 1/1 Running 7 6d
pod/kube-controller-manager-ubuk8sma 1/1 Running 11 6d
pod/kube-flannel-ds-amd64-rt442 1/1 Running 3 4d1h
pod/kube-flannel-ds-amd64-zx78x 1/1 Running 5 6d
pod/kube-proxy-6b6mc 1/1 Running 6 6d
pod/kube-proxy-zcchn 1/1 Running 3 4d1h
pod/kube-scheduler-ubuk8sma 1/1 Running 10 6d
pod/kubernetes-dashboard-77fd78f978-crl7b 1/1 Running 1 2d1h
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
service/kube-dns ClusterIP 10.96.0.10 <none> 53/UDP,53/TCP 6d
service/kubernetes-dashboard ClusterIP 10.103.36.134 <none> 443/TCP 2d1h
NAME DESIRED CURRENT READY UP-TO-DATE AVAILABLE NODE SELECTOR AGE
daemonset.apps/kube-flannel-ds-amd64 2 2 2 2 2 beta.kubernetes.io/arch=amd64 6d
...
daemonset.apps/kube-proxy 2 2 2 2 2 <none> 6d
NAME DESIRED CURRENT UP-TO-DATE AVAILABLE AGE
deployment.apps/coredns 2 2 2 2 6d
deployment.apps/kubernetes-dashboard 1 1 1 1 2d1h
NAME DESIRED CURRENT READY AGE
replicaset.apps/coredns-576cbf47c7 2 2 2 6d
replicaset.apps/kubernetes-dashboard-77fd78f978 1 1 1 2d1h
I started proxy for both API server and dashboard service validation:
kubectl proxy
Version check for API server:
u1#ubuk8sma:~$ curl http://localhost:8001/version
{
"major": "1",
"minor": "12",
"gitVersion": "v1.12.2",
"gitCommit": "17c77c7898218073f14c8d573582e8d2313dc740",
"gitTreeState": "clean",
"buildDate": "2018-10-24T06:43:59Z",
"goVersion": "go1.10.4",
"compiler": "gc",
"platform": "linux/amd64"
}
And here is problem I'm writing this question about:
u1#ubuk8sma:~$ curl "http://localhost:8001/api/v1/namespaces/kube-system/services/https:kubernetes-dashboard:/proxy/"
Error: 'dial tcp 10.244.1.8:8443: i/o timeout'
Trying to reach: 'https://10.244.1.8:8443/'
Fragment of Pod info:
status:
conditions:
- lastProbeTime: null
lastTransitionTime: 2018-11-29T07:16:04Z
status: "True"
type: Initialized
- lastProbeTime: null
lastTransitionTime: 2018-11-30T06:53:24Z
status: "True"
type: Ready
- lastProbeTime: null
lastTransitionTime: 2018-11-30T06:53:24Z
status: "True"
type: ContainersReady
- lastProbeTime: null
lastTransitionTime: 2018-11-29T07:16:04Z
status: "True"
type: PodScheduled
containerStatuses:
- containerID: docker://fb0937959c7680046130e670c483877e4c0f1854870cb0b20ed4fe066d72df18
image: k8s.gcr.io/kubernetes-dashboard-amd64:v1.10.0
imageID: docker-pullable://k8s.gcr.io/kubernetes-dashboard-amd64#sha256:1d2e1229a918f4bc38b5a3f9f5f11302b3e71f8397b492afac7f273a0008776a
lastState:
terminated:
containerID: docker://f85e1cc50f59adbd8a13d42694aef7c5e726c07b3d852a26288c4bfc1124c718
exitCode: 2
finishedAt: 2018-11-30T06:53:21Z
reason: Error
startedAt: 2018-11-29T07:16:07Z
name: kubernetes-dashboard
ready: true
restartCount: 1
state:
running:
startedAt: 2018-11-30T06:53:23Z
hostIP: 10.0.2.15
phase: Running
podIP: 10.244.1.8
qosClass: BestEffort
startTime: 2018-11-29T07:16:04Z
Docker check on worker1 node:
CONTAINER ID IMAGE COMMAND CREATED STATUS NAMES
fb0937959c... sha256:0dab2435c100... "/dashboard --insecure-bind-address=0.0.0.0 --bind-address=0.0.0.0 --auto-generate-certificates" 27 hours ago Up 27 hours k8s_kubernetes-dashboard_kube...
Tried to check Pod logs, no luck:
DASHBOARD_POD_NAME='kubernetes-dashboard-77fd78f978-crl7b'
kubectl -n kube-system logs $DASHBOARD_POD_NAME
Error from server (NotFound): the server could not find the requested resource ( pods/log kubernetes-dashboard-77fd78f978-crl7b)
Tried to wget from API server:
API_SRV_POD_NAME='kube-apiserver-ubuk8sma'
kubectl -n 'kube-system' exec -ti $API_SRV_POD_NAME wget https://10.244.1.8:8443/
No response.
Tried to check dashboard service existence, no luck:
u1#ubuk8sma:~$ kubectl get svc $DASHBOARD_SVC_NAME
Error from server (NotFound): services "kubernetes-dashboard" not found
Checked IP route table on API server:
kubectl -n 'kube-system' exec -ti $API_SRV_POD_NAME ip route show
default via 10.0.2.2 dev enp0s3 src 10.0.2.15 metric 100
10.0.2.0/24 dev enp0s3 scope link src 10.0.2.15
10.0.2.2 dev enp0s3 scope link src 10.0.2.15 metric 100
10.244.0.0/24 dev cni0 scope link src 10.244.0.1
10.244.1.0/24 via 10.244.1.0 dev flannel.1 onlink
172.17.0.0/16 dev docker0 scope link src 172.17.0.1
192.168.151.0/24 dev enp0s8 scope link src 192.168.151.21
For reference, enp0s3 is NAT NIC adapter, enp0s8 - host-only one.
I see flannel route 10.244.1.x. Seems to be the issue is hardly about network misconfig (but I can be wrong).
So, dashboard Pod looks like running, but has some errors and I cannot diagnose which ones. Could you help to find root cause and ideally make dashboard service run without errors?
Thanks in advance, folks!
Update1:
I see events on master:
Type Reason Age From Message
---- ------ ---- ---- -------
Warning FailedCreatePodSandBox 11h kubelet, ubuk8swrk1 Failed create pod sandbox: rpc error: code = Unknown desc = failed to set up sandbox container "43191144d447d0e9da52c8b6600bd96a23fab1e96c79af8c8fedc4e4e50882c7" network for pod "kubernetes-dashboard-77fd78f978-crl7b": NetworkPlugin cni failed to set up pod "kubernetes-dashboard-77fd78f978-crl7b_kube-system" network: open /run/flannel/subnet.env: no such file or directory
Normal SandboxChanged 11h (x4 over 11h) kubelet, ubuk8swrk1 Pod sandbox changed, it will be killed and re-created.
Normal Pulled 11h kubelet, ubuk8swrk1 Container image "k8s.gcr.io/kubernetes-dashboard-amd64:v1.10.0" already present on machine
Normal Created 11h kubelet, ubuk8swrk1 Created container
Normal Started 11h kubelet, ubuk8swrk1 Started container
Error about subnet.env absence - a bit strange, as both master and minion have it (well, maybe created on the fly):
u1#ubuk8swrk1:~$ ls -la /run/flannel/subnet.env
-rw-r--r-- 1 root root 96 Dec 3 08:15 /run/flannel/subnet.env
This is dashboard service descriptor:
Name: kubernetes-dashboard
Namespace: kube-system
Labels: k8s-app=kubernetes-dashboard
Annotations: deployment.kubernetes.io/revision: 1
Selector: k8s-app=kubernetes-dashboard
Replicas: 1 desired | 1 updated | 1 total | 1 available | 0 unavailable
StrategyType: RollingUpdate
MinReadySeconds: 0
RollingUpdateStrategy: 25% max unavailable, 25% max surge
Pod Template:
Labels: k8s-app=kubernetes-dashboard
Service Account: kubernetes-dashboard
Containers:
kubernetes-dashboard:
Image: k8s.gcr.io/kubernetes-dashboard-amd64:v1.10.0
Port: 8443/TCP
Host Port: 0/TCP
Args:
--auto-generate-certificates
Liveness: http-get https://:8443/ delay=30s timeout=30s period=10s #success=1 #failure=3
Environment: <none>
Mounts:
/certs from kubernetes-dashboard-certs (rw)
/tmp from tmp-volume (rw)
Volumes:
kubernetes-dashboard-certs:
Type: Secret (a volume populated by a Secret)
SecretName: kubernetes-dashboard-certs
Optional: false
tmp-volume:
Type: EmptyDir (a temporary directory that shares a pod's lifetime)
Medium:
Conditions:
Type Status Reason
---- ------ ------
Progressing True NewReplicaSetAvailable
Available True MinimumReplicasAvailable
OldReplicaSets: <none>
NewReplicaSet: kubernetes-dashboard-77fd78f978 (1/1 replicas created)
Events: <none>
This is reduced description of pods(original yaml is 35K, too much to share):
Name: coredns-576cbf47c7-4tzm9
Namespace: kube-system
Node: ubuk8sma/10.0.2.15
Labels: k8s-app=kube-dns
pod-template-hash=576cbf47c7
Status: Running
IP: 10.244.0.14
Controlled By: ReplicaSet/coredns-576cbf47c7
Containers:
coredns:
Container ID: docker://0efcd043407d93fb9d052045828489f6b99bb59b4f0882ec89e1897071609b77
Image: k8s.gcr.io/coredns:1.2.2
Ports: 53/UDP, 53/TCP, 9153/TCP
Host Ports: 0/UDP, 0/TCP, 0/TCP
State: Running
Last State: Terminated
Reason: Error
Exit Code: 255
Ready: True
Restart Count: 6
Liveness: http-get http://:8080/health delay=60s timeout=5s period=10s #success=1 #failure=5
Conditions:
Type Status
Initialized True
Ready True
ContainersReady True
PodScheduled True
Events: <none>
Name: etcd-ubuk8sma
Namespace: kube-system
Priority: 2000000000
PriorityClassName: system-cluster-critical
Node: ubuk8sma/10.0.2.15
Labels: component=etcd
tier=control-plane
Status: Running
IP: 10.0.2.15
Containers:
etcd:
Container ID: docker://ba2bdcf5fa558beabdd8578628d71480d595d5ee3bb5c4edf42407419010144b
Image: k8s.gcr.io/etcd:3.2.24
Image ID: docker-pullable://k8s.gcr.io/etcd#sha256:905d7ca17fd02bc24c0eba9a062753aba15db3e31422390bc3238eb762339b20
Port: <none>
Host Port: <none>
Command:
etcd
--advertise-client-urls=https://127.0.0.1:2379
--cert-file=/etc/kubernetes/pki/etcd/server.crt
--client-cert-auth=true
--data-dir=/var/lib/etcd
--initial-advertise-peer-urls=https://127.0.0.1:2380
--initial-cluster=ubuk8sma=https://127.0.0.1:2380
--key-file=/etc/kubernetes/pki/etcd/server.key
--listen-client-urls=https://127.0.0.1:2379
--listen-peer-urls=https://127.0.0.1:2380
--name=ubuk8sma
--peer-cert-file=/etc/kubernetes/pki/etcd/peer.crt
--peer-client-cert-auth=true
--peer-key-file=/etc/kubernetes/pki/etcd/peer.key
--peer-trusted-ca-file=/etc/kubernetes/pki/etcd/ca.crt
--snapshot-count=10000
--trusted-ca-file=/etc/kubernetes/pki/etcd/ca.crt
State: Running
Started: Mon, 03 Dec 2018 08:12:56 +0000
Last State: Terminated
Reason: Error
Exit Code: 255
Started: Wed, 28 Nov 2018 09:31:46 +0000
Finished: Mon, 03 Dec 2018 08:12:35 +0000
Ready: True
Restart Count: 8
Liveness: exec [/bin/sh -ec ETCDCTL_API=3 etcdctl --endpoints=https://[127.0.0.1]:2379 --cacert=/etc/kubernetes/pki/etcd/ca.crt --cert=/etc/kubernetes/pki/etcd/healthcheck-client.crt --key=/etc/kubernetes/pki/etcd/healthcheck-client.key get foo] delay=15s timeout=15s period=10s #success=1 #failure=8
Environment: <none>
Mounts:
/etc/kubernetes/pki/etcd from etcd-certs (rw)
/var/lib/etcd from etcd-data (rw)
Conditions:
Type Status
Initialized True
Ready True
ContainersReady True
PodScheduled True
Events: <none>
Name: kube-apiserver-ubuk8sma
Namespace: kube-system
Node: ubuk8sma/10.0.2.15
Status: Running
IP: 10.0.2.15
Containers:
kube-apiserver:
Container ID: docker://099b2a30772b969c3919b57fd377980673f03a820afba6034daa70f011271a52
Image: k8s.gcr.io/kube-apiserver:v1.12.2
Image ID: docker-pullable://k8s.gcr.io/kube-apiserver#sha256:094929baf3a7681945d83a7654b3248e586b20506e28526121f50eb359cee44f
Port: <none>
Host Port: <none>
Command:
kube-apiserver
--authorization-mode=Node,RBAC
--advertise-address=192.168.151.21
--allow-privileged=true
--client-ca-file=/etc/kubernetes/pki/ca.crt
--enable-admission-plugins=NodeRestriction
--enable-bootstrap-token-auth=true
--etcd-cafile=/etc/kubernetes/pki/etcd/ca.crt
--etcd-certfile=/etc/kubernetes/pki/apiserver-etcd-client.crt
--etcd-keyfile=/etc/kubernetes/pki/apiserver-etcd-client.key
--etcd-servers=https://127.0.0.1:2379
--insecure-port=0
--kubelet-client-certificate=/etc/kubernetes/pki/apiserver-kubelet-client.crt
--kubelet-client-key=/etc/kubernetes/pki/apiserver-kubelet-client.key
--kubelet-preferred-address-types=InternalIP,ExternalIP,Hostname
--proxy-client-cert-file=/etc/kubernetes/pki/front-proxy-client.crt
--proxy-client-key-file=/etc/kubernetes/pki/front-proxy-client.key
--requestheader-allowed-names=front-proxy-client
--requestheader-client-ca-file=/etc/kubernetes/pki/front-proxy-ca.crt
--requestheader-extra-headers-prefix=X-Remote-Extra-
--requestheader-group-headers=X-Remote-Group
--requestheader-username-headers=X-Remote-User
--secure-port=6443
--service-account-key-file=/etc/kubernetes/pki/sa.pub
--service-cluster-ip-range=10.96.0.0/12
--tls-cert-file=/etc/kubernetes/pki/apiserver.crt
--tls-private-key-file=/etc/kubernetes/pki/apiserver.key
State: Running
Started: Mon, 03 Dec 2018 08:13:00 +0000
Last State: Terminated
Reason: Error
Exit Code: 255
Ready: True
Restart Count: 8
Liveness: http-get https://192.168.151.21:6443/healthz delay=15s timeout=15s period=10s #success=1 #failure=8
Environment: <none>
Events: <none>
Name: kube-flannel-ds-amd64-rt442
Namespace: kube-system
Node: ubuk8swrk1/10.0.2.15
Status: Running
IP: 10.0.2.15
Controlled By: DaemonSet/kube-flannel-ds-amd64
Init Containers:
install-cni:
Container ID: docker://a6377b0fe1b040235c24e9ca19455c56e77daecf688b212cfea5553b6e59ff68
Image: quay.io/coreos/flannel:v0.10.0-amd64
Port: <none>
Host Port: <none>
State: Terminated
Reason: Completed
Exit Code: 0
Ready: True
Restart Count: 4
Containers:
kube-flannel:
Container ID: docker://f7029bc2144c1ab8654407d742c1079df0059d418b7ba86b886091b5ad8c34a3
Image: quay.io/coreos/flannel:v0.10.0-amd64
Port: <none>
Host Port: <none>
Command:
/opt/bin/flanneld
Args:
--ip-masq
--kube-subnet-mgr
State: Running
Last State: Terminated
Reason: Error
Exit Code: 255
Ready: True
Restart Count: 4
Events: <none>
Name: kube-proxy-6b6mc
Namespace: kube-system
Node: ubuk8sma/10.0.2.15
Status: Running
IP: 10.0.2.15
The biggest suspect is node IP. I see 10.0.2.15 (NAT IP) everywhere. But host-only NIC should be used. I had long story of setting up network properly for my ubuntu VMs.
I edited /etc/netplan/01-netcfg.yaml before k8s setup (thanks https://askubuntu.com/questions/984445/netplan-configuration-on-ubuntu-17-04-virtual-machine?rq=1 for help). Example for master config:
network:
version: 2
renderer: networkd
ethernets:
enp0s3:
dhcp4: yes
dhcp6: yes
routes:
- to: 0.0.0.0/0
via: 10.0.2.2
metric: 0
enp0s8:
dhcp4: no
dhcp6: no
addresses: [192.168.151.21/24]
routes:
- to: 192.168.151.1/24
via: 192.168.151.1
metric: 100
Only after this and a few more changes NAT and host-only networks start work together. NAT remains default net adapter. Likely that's why its IP is everywhere. For api server I set --advertise-address=192.168.151.21 explicitly. That reduced using NAT IP at least for it.
So, maybe root cause is different, but current question, how to reconfigure networks to replace NAT IP to host-only. I already tried this for /etc/systemd/system/kubelet.service.d/10-kubeadm.conf
Environment="KUBELET_EXTRA_ARGS=--node-ip=192.168.151.21"
Restarted kubelet:
sudo systemctl daemon-reload
sudo systemctl restart kubelet
Didn't help. Restarted VMs. Didn't help (I only expected kubelet related changes, but nothing changed). Explored a few configs (5+) for potential changes, no luck.
Update2:
I mentioned NAT address config issue above. I resolved it with editing /etc/default/kubelet config. I found that idea in comments for this article:
https://medium.com/#joatmon08/playing-with-kubeadm-in-vagrant-machines-part-2-bac431095706
Dashboard config part now has proper IP:
hostIP: 192.168.151.22
phase: Running
podIP: 10.244.1.13
Then I went to docker container for API and tried to reach podIP via wget,ping,traceroute. Timeouts everywhere. Routes:
/ # route -n
Kernel IP routing table
Destination Gateway Genmask Flags Metric Ref Use Iface
0.0.0.0 10.0.2.2 0.0.0.0 UG 100 0 0 enp0s3
10.0.2.0 0.0.0.0 255.255.255.0 U 0 0 0 enp0s3
10.0.2.2 0.0.0.0 255.255.255.255 UH 100 0 0 enp0s3
10.244.0.0 0.0.0.0 255.255.255.0 U 0 0 0 cni0
10.244.1.0 10.244.1.0 255.255.255.0 UG 0 0 0 flannel.1
172.17.0.0 0.0.0.0 255.255.0.0 U 0 0 0 docker0
192.168.151.0 0.0.0.0 255.255.255.0 U 0 0 0 enp0s8
Attempt to perform curl call from master VM:
u1#ubuk8sma:~$ curl -v -i -kSs "https://192.168.151.21:6443/api/v1/namespaces/kube-system/services/https:kubernetes-dashboard:/proxy/" -H "$K8S_AUTH_HEADER"
...
* SSL certificate verify result: unable to get local issuer certificate (20), continuing anyway.
* Using HTTP2, server supports multi-use
* Connection state changed (HTTP/2 confirmed)
* Copying HTTP/2 data in stream buffer to connection buffer after upgrade: len=0
* Using Stream ID: 1 (easy handle 0x565072b5a750)
> GET /api/v1/namespaces/kube-system/services/https:kubernetes-dashboard:/proxy/ HTTP/2
> Host: 192.168.151.21:6443
> User-Agent: curl/7.58.0
> Accept: */*
> Authorization: Bearer eyJhbGciOiJSUzI1.....
>
* Connection state changed (MAX_CONCURRENT_STREAMS updated)!
< HTTP/2 503
HTTP/2 503
< content-type: text/plain; charset=utf-8
content-type: text/plain; charset=utf-8
< content-length: 92
content-length: 92
< date: Tue, 04 Dec 2018 08:44:25 GMT
date: Tue, 04 Dec 2018 08:44:25 GMT
<
Error: 'dial tcp 10.244.1.13:8443: i/o timeout'
* Connection #0 to host 192.168.151.21 left intact
Trying to reach: 'https://10.244.1.13:8443/'
Service info for dashboard:
u1#ubuk8sma:~$ kubectl -n kube-system get service kubernetes-dashboard
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
kubernetes-dashboard ClusterIP 10.103.36.134 <none> 443/TCP 5d
A bit more details:
u1#ubuk8sma:~$ kubectl -n kube-system describe services kubernetes-dashboard
Name: kubernetes-dashboard
Namespace: kube-system
Labels: k8s-app=kubernetes-dashboard
Annotations: <none>
Selector: k8s-app=kubernetes-dashboard
Type: ClusterIP
IP: 10.103.36.134
Port: <unset> 443/TCP
TargetPort: 8443/TCP
Endpoints: 10.244.1.13:8443
Session Affinity: None
Events: <none>
Also I tried to go to shell, both via kubectl and docker. For any usual linux command I see this 'OCI runtime exec failed' issue:
u1#ubuk8sma:~$ DASHBOARD_POD_NAME='kubernetes-dashboard-77fd78f978-crl7b'
u1#ubuk8sma:~$ kubectl -v=9 -n kube-system exec "$DASHBOARD_POD_NAME" -- env
I1204 09:57:17.673345 23517 loader.go:359] Config loaded from file /home/u1/.kube/config
I1204 09:57:17.679526 23517 round_trippers.go:386] curl -k -v -XGET -H "Accept: application/json, */*" -H "User-Agent: kubectl/v1.12.2 (linux/amd64) kubernetes/17c77c7" 'https://192.168.151.21:6443/api/v1/namespaces/kube-system/pods/kubernetes-dashboard-77fd78f978-crl7b'
I1204 09:57:17.703924 23517 round_trippers.go:405] GET https://192.168.151.21:6443/api/v1/namespaces/kube-system/pods/kubernetes-dashboard-77fd78f978-crl7b 200 OK in 23 milliseconds
I1204 09:57:17.703957 23517 round_trippers.go:411] Response Headers:
I1204 09:57:17.703971 23517 round_trippers.go:414] Content-Length: 3435
I1204 09:57:17.703984 23517 round_trippers.go:414] Date: Tue, 04 Dec 2018 09:57:17 GMT
I1204 09:57:17.703997 23517 round_trippers.go:414] Content-Type: application/json
I1204 09:57:17.704866 23517 request.go:942] Response Body: {"kind":"Pod","apiVersion":"v1","metadata":{"name":"kubernetes-dashboard-77fd78f978-crl7b","generateName":"kubernetes-dashboard-77fd78f978-","namespace":"kube-system","selfLink":"/api/v1/namespaces/kube-system/pods/kubernetes-dashboard-77fd78f978-crl7b","uid":"a1d005b8-f3a6-11e8-a2d0-08002783a80f"...
I1204 09:57:17.740811 23517 round_trippers.go:386] curl -k -v -XPOST -H "X-Stream-Protocol-Version: v4.channel.k8s.io" -H "X-Stream-Protocol-Version: v3.channel.k8s.io" -H "X-Stream-Protocol-Version: v2.channel.k8s.io" -H "X-Stream-Protocol-Version: channel.k8s.io" -H "User-Agent: kubectl/v1.12.2 (linux/amd64) kubernetes/17c77c7" 'https://192.168.151.21:6443/api/v1/namespaces/kube-system/pods/kubernetes-dashboard-77fd78f978-crl7b/exec?command=env&container=kubernetes-dashboard&container=kubernetes-dashboard&stderr=true&stdout=true'
I1204 09:57:17.805528 23517 round_trippers.go:405] POST https://192.168.151.21:6443/api/v1/namespaces/kube-system/pods/kubernetes-dashboard-77fd78f978-crl7b/exec?command=env&container=kubernetes-dashboard&container=kubernetes-dashboard&stderr=true&stdout=true 101 Switching Protocols in 64 milliseconds
I1204 09:57:17.805565 23517 round_trippers.go:411] Response Headers:
I1204 09:57:17.805581 23517 round_trippers.go:414] Connection: Upgrade
I1204 09:57:17.805594 23517 round_trippers.go:414] Upgrade: SPDY/3.1
I1204 09:57:17.805607 23517 round_trippers.go:414] X-Stream-Protocol-Version: v4.channel.k8s.io
I1204 09:57:17.805620 23517 round_trippers.go:414] Date: Tue, 04 Dec 2018 09:57:17 GMT
OCI runtime exec failed: exec failed: container_linux.go:348: starting container process caused "exec: \"env\": executable file not found in $PATH": unknown
F1204 09:57:18.088488 23517 helpers.go:119] command terminated with exit code 126
So, I cannot reach pod, cannot go to shell there. But at least I see some logs:
u1#ubuk8sma:~$ kubectl -n kube-system logs -p $DASHBOARD_POD_NAME
2018/12/03 08:15:16 Starting overwatch
2018/12/03 08:15:16 Using in-cluster config to connect to apiserver
2018/12/03 08:15:16 Using service account token for csrf signing
2018/12/03 08:15:16 No request provided. Skipping authorization
2018/12/03 08:15:16 Successful initial request to the apiserver, version: v1.12.2
2018/12/03 08:15:16 Generating JWE encryption key
2018/12/03 08:15:16 New synchronizer has been registered: kubernetes-dashboard-key-holder-kube-system. Starting
2018/12/03 08:15:16 Starting secret synchronizer for kubernetes-dashboard-key-holder in namespace kube-system
2018/12/03 08:15:18 Initializing JWE encryption key from synchronized object
2018/12/03 08:15:18 Creating in-cluster Heapster client
2018/12/03 08:15:19 Metric client health check failed: the server could not find the requested resource (get services heapster). Retrying in 30 seconds.
2018/12/03 08:15:19 Auto-generating certificates
2018/12/03 08:15:19 Successfully created certificates
2018/12/03 08:15:19 Serving securely on HTTPS port: 8443
2018/12/03 08:15:49 Metric client health check failed: the server could not find the requested resource (get services heapster). Retrying in 30 seconds.
No ideas, where to go further for now to fix this timeout.

Kubernetes using KVM instances on OpenStack via KubeAdm

I have successfully deployed a "working" Kubernetes cluster using the Horizon interface to create the Linux instances:
Having configured the hosts according to: https://kubernetes.io/docs/setup/independent/high-availability/
I can now say I have a Kubernetes cluster:
$ kubectl get nodes
NAME STATUS ROLES AGE VERSION
kube-apiserver-1 Ready master 1d v1.12.2
kube-apiserver-2 Ready master 1d v1.12.2
kube-apiserver-3 Ready master 1d v1.12.2
kube-node-1 Ready <none> 21h v1.12.2
kube-node-2 Ready <none> 21h v1.12.2
kube-node-3 Ready <none> 21h v1.12.2
kube-node-4 Ready <none> 21h v1.12.2
However, getting beyond this point has proven to be quite a struggle. I can not create usable services and coredns which is an essential component seems unusable:
$ kubectl -n kube-system get pods
NAME READY STATUS RESTARTS AGE
coredns-576cbf47c7-4gdnc 0/1 CrashLoopBackOff 288 23h
coredns-576cbf47c7-x4h4v 0/1 CrashLoopBackOff 288 23h
kube-apiserver-kube-apiserver-1 1/1 Running 0 1d
kube-apiserver-kube-apiserver-2 1/1 Running 0 1d
kube-apiserver-kube-apiserver-3 1/1 Running 0 1d
kube-controller-manager-kube-apiserver-1 1/1 Running 3 1d
kube-controller-manager-kube-apiserver-2 1/1 Running 1 1d
kube-controller-manager-kube-apiserver-3 1/1 Running 0 1d
kube-flannel-ds-amd64-2zdtd 1/1 Running 0 20h
kube-flannel-ds-amd64-7l5mr 1/1 Running 0 20h
kube-flannel-ds-amd64-bmvs9 1/1 Running 0 1d
kube-flannel-ds-amd64-cmhkg 1/1 Running 0 1d
...
Errors in the pod indicate that it cannot reach the kubernetes service:
$ kubectl -n kube-system logs coredns-576cbf47c7-4gdnc
E1121 18:04:48.928055 1 reflector.go:205] github.com/coredns/coredns/plugin/kubernetes/controller.go:355: Failed to list *v1.Namespace: Get https://10.96.0.1:443/api/v1/namespaces?limit=500&resourceVersion=0: dial tcp 10.96.0.1:443: i/o timeout
E1121 18:04:48.928688 1 reflector.go:205] github.com/coredns/coredns/plugin/kubernetes/controller.go:348: Failed to list *v1.Service: Get https://10.96.0.1:443/api/v1/services?limit=500&resourceVersion=0: dial tcp 10.96.0.1:443: i/o timeout
E1121 18:04:48.928917 1 reflector.go:205] github.com/coredns/coredns/plugin/kubernetes/controller.go:350: Failed to list *v1.Endpoints: Get https://10.96.0.1:443/api/v1/endpoints?limit=500&resourceVersion=0: dial tcp 10.96.0.1:443: i/o timeout
E1121 18:05:19.929869 1 reflector.go:205] github.com/coredns/coredns/plugin/kubernetes/controller.go:355: Failed to list *v1.Namespace: Get https://10.96.0.1:443/api/v1/namespaces?limit=500&resourceVersion=0: dial tcp 10.96.0.1:443: i/o timeout
E1121 18:05:19.930819 1 reflector.go:205] github.com/coredns/coredns/plugin/kubernetes/controller.go:348: Failed to list *v1.Service: Get https://10.96.0.1:443/api/v1/services?limit=500&resourceVersion=0: dial tcp 10.96.0.1:443: i/o timeout
E1121 18:05:19.931517 1 reflector.go:205] github.com/coredns/coredns/plugin/kubernetes/controller.go:350: Failed to list *v1.Endpoints: Get https://10.96.0.1:443/api/v1/endpoints?limit=500&resourceVersion=0: dial tcp 10.96.0.1:443: i/o timeout
E1121 18:05:50.932159 1 reflector.go:205] github.com/coredns/coredns/plugin/kubernetes/controller.go:355: Failed to list *v1.Namespace: Get https://10.96.0.1:443/api/v1/namespaces?limit=500&resourceVersion=0: dial tcp 10.96.0.1:443: i/o timeout
E1121 18:05:50.932722 1 reflector.go:205] github.com/coredns/coredns/plugin/kubernetes/controller.go:348: Failed to list *v1.Service: Get https://10.96.0.1:443/api/v1/services?limit=500&resourceVersion=0: dial tcp 10.96.0.1:443: i/o timeout
E1121 18:05:50.933179 1 reflector.go:205] github.com/coredns/coredns/plugin/kubernetes/controller.go:350: Failed to list *v1.Endpoints: Get https://10.96.0.1:443/api/v1/endpoints?limit=500&resourceVersion=0: dial tcp 10.96.0.1:443: i/o timeout
2018/11/21 18:06:07 [INFO] SIGTERM: Shutting down servers then terminating
E1121 18:06:21.933058 1 reflector.go:205] github.com/coredns/coredns/plugin/kubernetes/controller.go:355: Failed to list *v1.Namespace: Get https://10.96.0.1:443/api/v1/namespaces?limit=500&resourceVersion=0: dial tcp 10.96.0.1:443: i/o timeout
E1121 18:06:21.934010 1 reflector.go:205] github.com/coredns/coredns/plugin/kubernetes/controller.go:348: Failed to list *v1.Service: Get https://10.96.0.1:443/api/v1/services?limit=500&resourceVersion=0: dial tcp 10.96.0.1:443: i/o timeout
E1121 18:06:21.935107 1 reflector.go:205] github.com/coredns/coredns/plugin/kubernetes/controller.go:350: Failed to list *v1.Endpoints: Get https://10.96.0.1:443/api/v1/endpoints?limit=500&resourceVersion=0: dial tcp 10.96.0.1:443: i/o timeout
$ kubectl -n kube-system describe pod/coredns-576cbf47c7-dk7sh
...
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal Scheduled 25m default-scheduler Successfully assigned kube-system/coredns-576cbf47c7-dk7sh to kube-node-3
Normal Pulling 25m kubelet, kube-node-3 pulling image "k8s.gcr.io/coredns:1.2.2"
Normal Pulled 25m kubelet, kube-node-3 Successfully pulled image "k8s.gcr.io/coredns:1.2.2"
Normal Created 20m (x3 over 25m) kubelet, kube-node-3 Created container
Normal Killing 20m (x2 over 22m) kubelet, kube-node-3 Killing container with id docker://coredns:Container failed liveness probe.. Container will be killed and recreated.
Normal Pulled 20m (x2 over 22m) kubelet, kube-node-3 Container image "k8s.gcr.io/coredns:1.2.2" already present on machine
Normal Started 20m (x3 over 25m) kubelet, kube-node-3 Started container
Warning Unhealthy 4m (x36 over 24m) kubelet, kube-node-3 Liveness probe failed: HTTP probe failed with statuscode: 503
Warning BackOff 17s (x22 over 8m) kubelet, kube-node-3 Back-off restarting failed container
The kubernetes service is there and seems to be properly autoconfigured:
$ kubectl get svc
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
kubernetes ClusterIP 10.96.0.1 <none> 443/TCP 23h
$ kubectl describe svc/kubernetes
Name: kubernetes
Namespace: default
Labels: component=apiserver
provider=kubernetes
Annotations: <none>
Selector: <none>
Type: ClusterIP
IP: 10.96.0.1
Port: https 443/TCP
TargetPort: 6443/TCP
Endpoints: 192.168.5.19:6443,192.168.5.24:6443,192.168.5.29:6443
Session Affinity: None
Events: <none>
$ kubectl get endpoints
NAME ENDPOINTS AGE
kubernetes 192.168.5.19:6443,192.168.5.24:6443,192.168.5.29:6443 23h
I have a nagging suspicion that I am missing something in the network layer and that this issue has something to do with Neutron. There are plenty of HOWTOs on how to install Kubernetes using other tools and how to install it in OpenStack but I have yet to find one guide that explains how to install it by creating KVMs using the Horizon interface and dealing with security groups and network issues. By the way, ALL IPv4/TCP ports are open between the Masters and Nodes.
Is there anyone out there with a guide that explains this scenario?
The issue here was a polluted etcd cluster. As soon as I rebuilt the EXTERNAL etcd cluster and started from scratch using these instructions: https://kubernetes.io/docs/setup/independent/high-availability/#external-etcd all items were working as expected. There does not seem to be a tool available to reset the etcd entries for a flannel pod network.

kube-dns Failed to list *v1.Endpoints getsockopt: connection refused

I have a kubernetes cluster (v1.10) using flannel (not sure if relevant, might be) as CNI provider. Trying to apply kube-dns but it goes to CrashLoopBackOff and the logs for the kubedns pod show, repeatedly:
I0423 17:46:47.045712 1 dns.go:173] Waiting for services and endpoints to be initialized from apiserver...
I0423 17:46:47.545729 1 dns.go:173] Waiting for services and endpoints to be initialized from apiserver...
I0423 17:46:48.045723 1 dns.go:173] Waiting for services and endpoints to be initialized from apiserver...
I0423 17:46:48.545749 1 dns.go:173] Waiting for services and endpoints to be initialized from apiserver...
E0423 17:46:49.019286 1 reflector.go:201] k8s.io/dns/pkg/dns/dns.go:147: Failed to list *v1.Endpoints: Get https://10.96.0.1:443/api/v1/endpoints?resourceVersion=0: dial tcp 10.96.0.1:443: getsockopt: connection refused
E0423 17:46:49.019325 1 reflector.go:201] k8s.io/dns/pkg/dns/dns.go:150: Failed to list *v1.Service: Get https://10.96.0.1:443/api/v1/services?resourceVersion=0: dial tcp 10.96.0.1:443: getsockopt: connection refused
I0423 17:46:49.045731 1 dns.go:173] Waiting for services and endpoints to be initialized from apiserver...
F0423 17:46:49.545707 1 dns.go:167] Timeout waiting for initialization
Nothing in my kube-dns manifest refers to port 443 and kube-apiserver is configured for 6443. What is it trying to get a connection to that is being refused?
I also don't know whether it has anything to do with the kube-dns pod having an ip of 10.88.0.3:
kubectl -n kube-system -o wide get pods
NAME READY STATUS RESTARTS AGE IP NODE
kube-dns-564f9d98-lt9js 2/3 CrashLoopBackOff 13 18m 10.88.0.3 worker1
kube-flannel-ds-5bqm6 1/1 Running 0 35m 10.240.0.12 controller2
kube-flannel-ds-djmld 1/1 Running 0 35m 10.240.0.11 controller1
kube-flannel-ds-nbfhp 1/1 Running 0 35m 10.240.0.23 worker3
kube-flannel-ds-prxdr 1/1 Running 0 35m 10.240.0.22 worker2
kube-flannel-ds-x9cdq 1/1 Running 0 35m 10.240.0.21 worker1
kube-flannel-ds-zjbgb 1/1 Running 0 35m 10.240.0.13 controller3
Again, where is this coming from? It's not something I have configured and it does not sit within either of my service network or pod network CIDR ranges:
kubernetes_dns_domain: kubernetes.local
kubernetes_dns_ip: "{{ kubernetes_cluster_subnet }}.10"
kubernetes_cluster_subnet: 10.96.0
kubernetes_pod_network_cidr: 10.244.0.0/16
kubernetes_service_ip: "{{ kubernetes_cluster_subnet }}.1"
kubernetes_service_ip_range: "{{ kubernetes_cluster_subnet }}.0/24"
kubernetes_service_node_port_range: 30000-32767
kubernetes_secure_port: 6443
I'm thoroughly confused and would be grateful of any explanations as to what is going on.
kube_dns_version: 1.14.10
flannel_version: v0.10.0

kubeadm upgrade to 1.91 kube-dns failure

I attempted to upgrade to 1.7 to 1.9 using kubeadm, kube-dns was crashloopig. I removed the deployment and applied the a new deployment using the latest yaml for kube-dns (replacing the clusterip with 10.96.0.10, domain with cluster.local).
The kubedns container fails after not being able to get a valid response from the api server. The 10.96.0.1 ip does respond to a wget on the 443 port from all servers in the cluster (403 forbidden response).
E0104 21:51:42.732805 1 reflector.go:201] k8s.io/dns/pkg/dns/dns.go:147: Failed to list *v1.Endpoints: Get https://10.96.0.1:443/api/v1/endpoints?resourceVersion=0: dial tcp 10.96.0.1:443: i/o timeout
E0104 21:51:42.732971 1 reflector.go:201] k8s.io/dns/pkg/dns/dns.go:150: Failed to list *v1.Service: Get https://10.96.0.1:443/api/v1/services?resourceVersion=0: dial tcp 10.96.0.1:443: i/o timeout
Is this a connection issue, configuration issue, or a security model change that is causing the errors in the log?
Thanks.
$ kubectl get nodes
NAME STATUS ROLES AGE VERSION
ubuntu80 Ready master 165d v1.9.1
ubuntu81 Ready <none> 165d v1.9.1
ubuntu82 Ready <none> 165d v1.9.1
ubuntu83 Ready <none> 163d v1.9.1
$ kubectl get all --namespace=kube-system
NAME DESIRED CURRENT READY UP-TO-DATE AVAILABLE NODE SELECTOR AGE
ds/kube-flannel-ds 4 4 4 0 4 beta.kubernetes.io/arch=amd64 165d
ds/kube-proxy 4 4 4 4 4 <none> 165d
ds/traefik-ingress-controller 3 3 3 3 3 <none> 165d
NAME DESIRED CURRENT UP-TO-DATE AVAILABLE AGE
deploy/kube-dns 1 1 1 0 1h
deploy/tiller-deploy 1 1 1 1 163d
NAME DESIRED CURRENT READY AGE
rs/kube-dns-6c857864fb 1 1 0 1h
rs/tiller-deploy-3341511835 1 1 1 105d
NAME DESIRED CURRENT READY UP-TO-DATE AVAILABLE NODE SELECTOR AGE
ds/kube-flannel-ds 4 4 4 0 4 beta.kubernetes.io/arch=amd64 165d
ds/kube-proxy 4 4 4 4 4 <none> 165d
ds/traefik-ingress-controller 3 3 3 3 3 <none> 165d
NAME DESIRED CURRENT UP-TO-DATE AVAILABLE AGE
deploy/kube-dns 1 1 1 0 1h
deploy/tiller-deploy 1 1 1 1 163d
NAME DESIRED CURRENT READY AGE
rs/kube-dns-6c857864fb 1 1 0 1h
rs/tiller-deploy-3341511835 1 1 1 105d
NAME READY STATUS RESTARTS AGE
po/etcd-ubuntu80 1/1 Running 1 16d
po/kube-apiserver-ubuntu80 1/1 Running 1 2h
po/kube-controller-manager-ubuntu80 1/1 Running 1 2h
po/kube-dns-6c857864fb-grhxp 1/3 CrashLoopBackOff 52 1h
po/kube-flannel-ds-07npj 2/2 Running 32 165d
po/kube-flannel-ds-169lh 2/2 Running 26 165d
po/kube-flannel-ds-50c56 2/2 Running 27 163d
po/kube-flannel-ds-wkd7j 2/2 Running 29 165d
po/kube-proxy-495n7 1/1 Running 1 2h
po/kube-proxy-9g7d2 1/1 Running 1 2h
po/kube-proxy-d856z 1/1 Running 0 2h
po/kube-proxy-kzmcc 1/1 Running 0 2h
po/kube-scheduler-ubuntu80 1/1 Running 1 2h
po/tiller-deploy-3341511835-m3x26 1/1 Running 2 58d
po/traefik-ingress-controller-51r7d 1/1 Running 4 105d
po/traefik-ingress-controller-sf6lc 1/1 Running 4 105d
po/traefik-ingress-controller-xz1rt 1/1 Running 5 105d
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
svc/kube-dns ClusterIP 10.96.0.10 <none> 53/UDP,53/TCP 1h
svc/kubernetes-dashboard ClusterIP 10.101.112.198 <none> 443/TCP 165d
svc/tiller-deploy ClusterIP 10.98.117.242 <none> 44134/TCP 163d
svc/traefik-web-ui ClusterIP 10.110.215.194 <none> 80/TCP 165d
$ kubectl logs --namespace=kube-system $(kubectl get pods --namespace=kube-system -l k8s-app=kube-dns -o name) -c kubedns
I0104 21:51:12.730927 1 dns.go:48] version: 1.14.6-3-gc36cb11
I0104 21:51:12.731643 1 server.go:69] Using configuration read from directory: /kube-dns-config with period 10s
I0104 21:51:12.731673 1 server.go:112] FLAG: --alsologtostderr="false"
I0104 21:51:12.731679 1 server.go:112] FLAG: --config-dir="/kube-dns-config"
I0104 21:51:12.731683 1 server.go:112] FLAG: --config-map=""
I0104 21:51:12.731686 1 server.go:112] FLAG: --config-map-namespace="kube-system"
I0104 21:51:12.731688 1 server.go:112] FLAG: --config-period="10s"
I0104 21:51:12.731693 1 server.go:112] FLAG: --dns-bind-address="0.0.0.0"
I0104 21:51:12.731695 1 server.go:112] FLAG: --dns-port="10053"
I0104 21:51:12.731713 1 server.go:112] FLAG: --domain="cluster.local."
I0104 21:51:12.731717 1 server.go:112] FLAG: --federations=""
I0104 21:51:12.731723 1 server.go:112] FLAG: --healthz-port="8081"
I0104 21:51:12.731726 1 server.go:112] FLAG: --initial-sync-timeout="1m0s"
I0104 21:51:12.731729 1 server.go:112] FLAG: --kube-master-url=""
I0104 21:51:12.731733 1 server.go:112] FLAG: --kubecfg-file=""
I0104 21:51:12.731735 1 server.go:112] FLAG: --log-backtrace-at=":0"
I0104 21:51:12.731740 1 server.go:112] FLAG: --log-dir=""
I0104 21:51:12.731743 1 server.go:112] FLAG: --log-flush-frequency="5s"
I0104 21:51:12.731746 1 server.go:112] FLAG: --logtostderr="true"
I0104 21:51:12.731748 1 server.go:112] FLAG: --nameservers=""
I0104 21:51:12.731751 1 server.go:112] FLAG: --stderrthreshold="2"
I0104 21:51:12.731753 1 server.go:112] FLAG: --v="2"
I0104 21:51:12.731756 1 server.go:112] FLAG: --version="false"
I0104 21:51:12.731761 1 server.go:112] FLAG: --vmodule=""
I0104 21:51:12.731798 1 server.go:194] Starting SkyDNS server (0.0.0.0:10053)
I0104 21:51:12.731979 1 server.go:213] Skydns metrics enabled (/metrics:10055)
I0104 21:51:12.731987 1 dns.go:146] Starting endpointsController
I0104 21:51:12.731991 1 dns.go:149] Starting serviceController
I0104 21:51:12.732457 1 logs.go:41] skydns: ready for queries on cluster.local. for tcp://0.0.0.0:10053 [rcache 0]
I0104 21:51:12.732467 1 logs.go:41] skydns: ready for queries on cluster.local. for udp://0.0.0.0:10053 [rcache 0]
I0104 21:51:13.232355 1 dns.go:173] Waiting for services and endpoints to be initialized from apiserver...
I0104 21:51:13.732395 1 dns.go:173] Waiting for services and endpoints to be initialized from apiserver...
I0104 21:51:14.232389 1 dns.go:173] Waiting for services and endpoints to be initialized from apiserver...
I0104 21:51:14.732389 1 dns.go:173] Waiting for services and endpoints to be initialized from apiserver...
I0104 21:51:15.232369 1 dns.go:173] Waiting for services and endpoints to be initialized from apiserver...
I0104 21:51:42.732629 1 dns.go:173] Waiting for services and endpoints to be initialized from apiserver...
E0104 21:51:42.732805 1 reflector.go:201] k8s.io/dns/pkg/dns/dns.go:147: Failed to list *v1.Endpoints: Get https://10.96.0.1:443/api/v1/endpoints?resourceVersion=0: dial tcp 10.96.0.1:443: i/o timeout
E0104 21:51:42.732971 1 reflector.go:201] k8s.io/dns/pkg/dns/dns.go:150: Failed to list *v1.Service: Get https://10.96.0.1:443/api/v1/services?resourceVersion=0: dial tcp 10.96.0.1:443: i/o timeout
I0104 21:51:43.232257 1 dns.go:173] Waiting for services and endpoints to be initialized from apiserver...
I0104 21:51:51.232379 1 dns.go:173] Waiting for services and endpoints to be initialized from apiserver...
I0104 21:51:51.732371 1 dns.go:173] Waiting for services and endpoints to be initialized from apiserver...
I0104 21:51:52.232390 1 dns.go:173] Waiting for services and endpoints to be initialized from apiserver...
I0104 21:52:11.732376 1 dns.go:173] Waiting for services and endpoints to be initialized from apiserver...
I0104 21:52:12.232382 1 dns.go:173] Waiting for services and endpoints to be initialized from apiserver...
F0104 21:52:12.732377 1 dns.go:167] Timeout waiting for initialization
$ kubectl describe po/kube-dns-6c857864fb-grhxp --namespace=kube-system
Name: kube-dns-6c857864fb-grhxp
Namespace: kube-system
Node: ubuntu82/10.80.82.1
Start Time: Fri, 05 Jan 2018 01:55:48 +0530
Labels: k8s-app=kube-dns
pod-template-hash=2741342096
Annotations: scheduler.alpha.kubernetes.io/critical-pod=
Status: Running
IP: 10.244.2.12
Controlled By: ReplicaSet/kube-dns-6c857864fb
Containers:
kubedns:
Container ID: docker://3daa4233f54fa251abdcdfe73d2e71179356f5da45983d19fe66a3f18bab8d13
Image: gcr.io/google_containers/k8s-dns-kube-dns-amd64:1.14.7
Image ID: docker-pullable://gcr.io/google_containers/k8s-dns-kube-dns-amd64#sha256:f5bddc71efe905f4e4b96f3ca346414be6d733610c1525b98fff808f93966680
Ports: 10053/UDP, 10053/TCP, 10055/TCP
Args:
--domain=cluster.local.
--dns-port=10053
--config-dir=/kube-dns-config
--v=2
State: Waiting
Reason: CrashLoopBackOff
Last State: Terminated
Reason: Error
Exit Code: 255
Started: Fri, 05 Jan 2018 03:21:12 +0530
Finished: Fri, 05 Jan 2018 03:22:12 +0530
Ready: False
Restart Count: 26
Limits:
memory: 170Mi
Requests:
cpu: 100m
memory: 70Mi
Liveness: http-get http://:10054/healthcheck/kubedns delay=60s timeout=5s period=10s #success=1 #failure=5
Readiness: http-get http://:8081/readiness delay=3s timeout=5s period=10s #success=1 #failure=3
Environment:
PROMETHEUS_PORT: 10055
Mounts:
/kube-dns-config from kube-dns-config (rw)
/var/run/secrets/kubernetes.io/serviceaccount from kube-dns-token-cpzzw (ro)
dnsmasq:
Container ID: docker://a40a34e6fdf7176ea148fdb1f21d157c5d264e44bd14183ed9d19164a742fb65
Image: gcr.io/google_containers/k8s-dns-dnsmasq-nanny-amd64:1.14.7
Image ID: docker-pullable://gcr.io/google_containers/k8s-dns-dnsmasq-nanny-amd64#sha256:6cfb9f9c2756979013dbd3074e852c2d8ac99652570c5d17d152e0c0eb3321d6
Ports: 53/UDP, 53/TCP
Args:
-v=2
-logtostderr
-configDir=/etc/k8s/dns/dnsmasq-nanny
-restartDnsmasq=true
--
-k
--cache-size=1000
--no-negcache
--log-facility=-
--server=/cluster.local/127.0.0.1#10053
--server=/in-addr.arpa/127.0.0.1#10053
--server=/ip6.arpa/127.0.0.1#10053
State: Running
Started: Fri, 05 Jan 2018 03:24:44 +0530
Last State: Terminated
Reason: Error
Exit Code: 137
Started: Fri, 05 Jan 2018 03:17:33 +0530
Finished: Fri, 05 Jan 2018 03:19:33 +0530
Ready: True
Restart Count: 27
Requests:
cpu: 150m
memory: 20Mi
Liveness: http-get http://:10054/healthcheck/dnsmasq delay=60s timeout=5s period=10s #success=1 #failure=5
Environment: <none>
Mounts:
/etc/k8s/dns/dnsmasq-nanny from kube-dns-config (rw)
/var/run/secrets/kubernetes.io/serviceaccount from kube-dns-token-cpzzw (ro)
sidecar:
Container ID: docker://c05b33a08344f15b0d1a1e8fee39cc05b6d9de6a24db6d2cd05e92c2706fc03c
Image: gcr.io/google_containers/k8s-dns-sidecar-amd64:1.14.7
Image ID: docker-pullable://gcr.io/google_containers/k8s-dns-sidecar-amd64#sha256:f80f5f9328107dc516d67f7b70054354b9367d31d4946a3bffd3383d83d7efe8
Port: 10054/TCP
Args:
--v=2
--logtostderr
--probe=kubedns,127.0.0.1:10053,kubernetes.default.svc.cluster.local,5,SRV
--probe=dnsmasq,127.0.0.1:53,kubernetes.default.svc.cluster.local,5,SRV
State: Running
Started: Fri, 05 Jan 2018 02:09:25 +0530
Last State: Terminated
Reason: Error
Exit Code: 2
Started: Fri, 05 Jan 2018 01:55:50 +0530
Finished: Fri, 05 Jan 2018 02:08:20 +0530
Ready: True
Restart Count: 1
Requests:
cpu: 10m
memory: 20Mi
Liveness: http-get http://:10054/metrics delay=60s timeout=5s period=10s #success=1 #failure=5
Environment: <none>
Mounts:
/var/run/secrets/kubernetes.io/serviceaccount from kube-dns-token-cpzzw (ro)
Conditions:
Type Status
Initialized True
Ready False
PodScheduled True
Volumes:
kube-dns-config:
Type: ConfigMap (a volume populated by a ConfigMap)
Name: kube-dns
Optional: true
kube-dns-token-cpzzw:
Type: Secret (a volume populated by a Secret)
SecretName: kube-dns-token-cpzzw
Optional: false
QoS Class: Burstable
Node-Selectors: <none>
Tolerations: CriticalAddonsOnly
node.kubernetes.io/not-ready:NoExecute for 300s
node.kubernetes.io/unreachable:NoExecute for 300s
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Warning Unhealthy 46m (x57 over 1h) kubelet, ubuntu82 Readiness probe failed: Get http://10.244.2.12:8081/readiness: dial tcp 10.244.2.12:8081: getsockopt: connection refused
Warning Unhealthy 36m (x42 over 1h) kubelet, ubuntu82 Liveness probe failed: HTTP probe failed with statuscode: 503
Warning BackOff 31m (x162 over 1h) kubelet, ubuntu82 Back-off restarting failed container
Normal Killing 26m (x13 over 1h) kubelet, ubuntu82 Killing container with id docker://dnsmasq:Container failed liveness probe.. Container will be killed and recreated.
Normal SuccessfulMountVolume 21m kubelet, ubuntu82 MountVolume.SetUp succeeded for volume "kube-dns-token-cpzzw"
Normal SuccessfulMountVolume 21m kubelet, ubuntu82 MountVolume.SetUp succeeded for volume "kube-dns-config"
Normal Pulled 21m kubelet, ubuntu82 Container image "gcr.io/google_containers/k8s-dns-dnsmasq-nanny-amd64:1.14.7" already present on machine
Normal Started 21m kubelet, ubuntu82 Started container
Normal Created 21m kubelet, ubuntu82 Created container
Normal Started 19m (x2 over 21m) kubelet, ubuntu82 Started container
Normal Created 19m (x2 over 21m) kubelet, ubuntu82 Created container
Normal Pulled 19m (x2 over 21m) kubelet, ubuntu82 Container image "gcr.io/google_containers/k8s-dns-kube-dns-amd64:1.14.7" already present on machine
Warning Unhealthy 19m (x4 over 20m) kubelet, ubuntu82 Liveness probe failed: HTTP probe failed with statuscode: 503
Warning Unhealthy 16m (x22 over 21m) kubelet, ubuntu82 Readiness probe failed: Get http://10.244.2.12:8081/readiness: dial tcp 10.244.2.12:8081: getsockopt: connection refused
Normal Killing 6m (x6 over 19m) kubelet, ubuntu82 Killing container with id docker://dnsmasq:Container failed liveness probe.. Container will be killed and recreated.
Warning BackOff 1m (x65 over 20m) kubelet, ubuntu82 Back-off restarting failed container
Kubedns 1.14.7 does not work well with kubernetes 1.9.1. In my case, kubedns was trying to connect to apiserver using 443 and not, as configured, 6443.
When I changed the image version to 1.14.8 (newest - kubedns github), kubedns recognized the apiserver port properly. No problems any more:
kubectl edit deploy kube-dns --namespace=kube-system
#change to the image version to 1.14.8 and works
Yes, I saw issues with kube-dns 1.14.7 too. Use the latest kube-dns version 1.14.8 in https://github.com/kubernetes/dns/releases by doing:
kubectl edit deploy kube-dns --namespace=kube-system
# change the image version in the "Image:" field to 1.14.8
If the issue is still seen, also do:
kubectl create configmap --namespace=kube-system kube-dns
kubectl delete pod <name of kube-dns pod> --namespace=kube-system
# kube-dns should restart and work now