I just upgraded my 1.10.0 kubernetes cluster to 1.10.12.
i also updates a node or two to the same version.
however, i now see that:
kube-proxy-r5ts5 0/1 CrashLoopBackOff 5 3m gpu03
showing the logs gives:
# kubectl -n kube-system logs -f kube-proxy-r5ts5
error: unrecognized key:
help? i do not know how to troubleshoot this further.
coincidentally, i added a new node at the same time, and i see that weave also has problems starting:
# kubectl -n kube-system logs -f weave-net-mb299 weave
FATA: 2018/12/20 01:43:35.703088 [kube-peers] Could not get peers: Get dial tcp i/o timeout
Failed to get peers
# kubectl -n kube-system logs -f weave-net-mb299 weave-npc
ERROR: logging before flag.Parse: E1220 01:44:02.447197 28249 reflector.go:205] github.com/weaveworks/weave/prog/weave-npc/main.go:230: Failed to list *v1.NetworkPolicy: Get dial tcp i/o timeout
i guess this is because kube-proxy isn't up.
# kubectl -n kube-system describe pods kube-proxy-r5ts5
Name: kube-proxy-r5ts5
Namespace: kube-system
Node: gpu02/
Start Time: Thu, 20 Dec 2018 02:01:10 +0000
Labels: controller-revision-hash=3231443654
Annotations: <none>
Status: Running
Controlled By: DaemonSet/kube-proxy
Container ID: docker://1bcfca6db8f68d7130de86947343a24f9fc23b506ea295509933473f3d830845
Image: gcr.io/google_containers/kube-proxy-amd64:v1.10.12
Image ID: docker-pullable://gcr.io/google_containers/kube-proxy-amd64#sha256:a9ed73c3526033cd3cf732b4a84de9d211f425ef08cce4f0535617cadf0f4200
Port: <none>
Host Port: <none>
State: Waiting
Reason: CrashLoopBackOff
Last State: Terminated
Reason: Error
Exit Code: 1
Started: Thu, 20 Dec 2018 02:04:00 +0000
Finished: Thu, 20 Dec 2018 02:04:00 +0000
Ready: False
Restart Count: 5
Environment: <none>
/lib/modules from lib-modules (ro)
/run/xtables.lock from xtables-lock (rw)
/var/lib/kube-proxy from kube-proxy (rw)
/var/run/secrets/kubernetes.io/serviceaccount from kube-proxy-token-m4hvr (ro)
Type Status
Initialized True
Ready False
PodScheduled True
Type: ConfigMap (a volume populated by a ConfigMap)
Name: kube-proxy
Optional: false
Type: HostPath (bare host directory volume)
Path: /run/xtables.lock
HostPathType: FileOrCreate
Type: HostPath (bare host directory volume)
Path: /lib/modules
Type: Secret (a volume populated by a Secret)
SecretName: kube-proxy-token-m4hvr
Optional: false
QoS Class: BestEffort
Node-Selectors: <none>
Tolerations: node-role.kubernetes.io/master:NoSchedule
Type Reason Age From Message
---- ------ ---- ---- -------
Normal SuccessfulMountVolume 3m kubelet, gpu02 MountVolume.SetUp succeeded for volume "xtables-lock"
Normal SuccessfulMountVolume 3m kubelet, gpu02 MountVolume.SetUp succeeded for volume "lib-modules"
Normal SuccessfulMountVolume 3m kubelet, gpu02 MountVolume.SetUp succeeded for volume "kube-proxy"
Normal SuccessfulMountVolume 3m kubelet, gpu02 MountVolume.SetUp succeeded for volume "kube-proxy-token-m4hvr"
Normal Started 2m (x4 over 3m) kubelet, gpu02 Started container
Warning BackOff 2m (x7 over 3m) kubelet, gpu02 Back-off restarting failed container
Normal Pulled 2m (x5 over 3m) kubelet, gpu02 Container image "gcr.io/google_containers/kube-proxy-amd64:v1.10.12" already present on machine
Normal Created 2m (x5 over 3m) kubelet, gpu02 Created container
probably not related, but i did have problems with cri-tools and kubeadm join saying that it couldn't find dockershim.sock. so i did a rpm -e --nodeps cri-tools and that appeared to fix the join. i'm pretty sure the docker subsystem is working as i can see other kubernetes pods on the machine (eg k8s_POD_weave-net-mb299_kube-system, k8s_weave-npc_weave-net-mb299_kube-system)
a snapshot of the logs from one of the minions:
Dec 20 08:41:19 gpu01 kubelet[10526]: E1220 08:41:19.459850 10526 cni.go:227] Error while adding to cni network: unable to allocate IP address: Post dial tcp getsockopt: connection refused
Dec 20 08:41:19 gpu01 kubelet[10526]: E1220 08:41:19.637709 10526 remote_runtime.go:92] RunPodSandbox from runtime service failed: rpc error: code = Unknown desc = NetworkPlugin cni failed to set up pod "hub-85c95bbd57-bx4sr_jupyter-prod" network: unable to allocate IP address: Post dial tcp getsockopt: connection refused
Dec 20 08:41:19 gpu01 kubelet[10526]: E1220 08:41:19.637826 10526 kuberuntime_sandbox.go:54] CreatePodSandbox for pod "hub-85c95bbd57-bx4sr_jupyter-prod(bd2287cb-0475-11e9-90de-fa163e21c438)" failed: rpc error: code = Unknown desc = NetworkPlugin cni failed to set up pod "hub-85c95bbd57-bx4sr_jupyter-prod" network: unable to allocate IP address: Post dial tcp getsockopt: connection refused
Dec 20 08:41:19 gpu01 kubelet[10526]: E1220 08:41:19.637852 10526 kuberuntime_manager.go:646] createPodSandbox for pod "hub-85c95bbd57-bx4sr_jupyter-prod(bd2287cb-0475-11e9-90de-fa163e21c438)" failed: rpc error: code = Unknown desc = NetworkPlugin cni failed to set up pod "hub-85c95bbd57-bx4sr_jupyter-prod" network: unable to allocate IP address: Post dial tcp getsockopt: connection refused
Dec 20 08:41:19 gpu01 kubelet[10526]: E1220 08:41:19.637947 10526 pod_workers.go:186] Error syncing pod bd2287cb-0475-11e9-90de-fa163e21c438 ("hub-85c95bbd57-bx4sr_jupyter-prod(bd2287cb-0475-11e9-90de-fa163e21c438)"), skipping: failed to "CreatePodSandbox" for "hub-85c95bbd57-bx4sr_jupyter-prod(bd2287cb-0475-11e9-90de-fa163e21c438)" with CreatePodSandboxError: "CreatePodSandbox for pod \"hub-85c95bbd57-bx4sr_jupyter-prod(bd2287cb-0475-11e9-90de-fa163e21c438)\" failed: rpc error: code = Unknown desc = NetworkPlugin cni failed to set up pod \"hub-85c95bbd57-bx4sr_jupyter-prod\" network: unable to allocate IP address: Post dial tcp getsockopt: connection refused"
Dec 20 08:41:19 gpu01 kubelet[10526]: W1220 08:41:19.661793 10526 container.go:507] Failed to update stats for container "/libcontainer_14802_systemd_test_default.slice": read /sys/fs/cgroup/cpu,cpuacct/libcontainer_14802_systemd_test_default.slice/cpuacct.usage: no such device, continuing to push stats
Dec 20 08:41:19 gpu01 kubelet[10526]: E1220 08:41:19.745423 10526 remote_runtime.go:92] RunPodSandbox from runtime service failed: rpc error: code = Unknown desc = NetworkPlugin cni failed to set up pod "nvidia-device-plugin-daemonset-ljmv9_kube-system" network: unable to allocate IP address: Post dial tcp getsockopt: connection refused
Dec 20 08:41:19 gpu01 kubelet[10526]: E1220 08:41:19.745492 10526 kuberuntime_sandbox.go:54] CreatePodSandbox for pod "nvidia-device-plugin-daemonset-ljmv9_kube-system(ad93d43c-f986-11e8-a0db-fa163e21c438)" failed: rpc error: code = Unknown desc = NetworkPlugin cni failed to set up pod "nvidia-device-plugin-daemonset-ljmv9_kube-system" network: unable to allocate IP address: Post dial tcp getsockopt: connection refused
Dec 20 08:41:19 gpu01 kubelet[10526]: E1220 08:41:19.745526 10526 kuberuntime_manager.go:646] createPodSandbox for pod "nvidia-device-plugin-daemonset-ljmv9_kube-system(ad93d43c-f986-11e8-a0db-fa163e21c438)" failed: rpc error: code = Unknown desc = NetworkPlugin cni failed to set up pod "nvidia-device-plugin-daemonset-ljmv9_kube-system" network: unable to allocate IP address: Post dial tcp getsockopt: connection refused
Dec 20 08:41:19 gpu01 kubelet[10526]: E1220 08:41:19.745640 10526 pod_workers.go:186] Error syncing pod ad93d43c-f986-11e8-a0db-fa163e21c438 ("nvidia-device-plugin-daemonset-ljmv9_kube-system(ad93d43c-f986-11e8-a0db-fa163e21c438)"), skipping: failed to "CreatePodSandbox" for "nvidia-device-plugin-daemonset-ljmv9_kube-system(ad93d43c-f986-11e8-a0db-fa163e21c438)" with CreatePodSandboxError: "CreatePodSandbox for pod \"nvidia-device-plugin-daemonset-ljmv9_kube-system(ad93d43c-f986-11e8-a0db-fa163e21c438)\" failed: rpc error: code = Unknown desc = NetworkPlugin cni failed to set up pod \"nvidia-device-plugin-daemonset-ljmv9_kube-system\" network: unable to allocate IP address: Post dial tcp getsockopt: connection refused"
Dec 20 08:41:19 gpu01 kubelet[10526]: W1220 08:41:19.858313 10526 pod_container_deletor.go:77] Container "e7ba3feb145f2004ac730c96eb6e1f7c91ad30d70515984de37d325b98abb616" not found in pod's containers
Dec 20 08:41:19 gpu01 kubelet[10526]: W1220 08:41:19.934213 10526 pod_container_deletor.go:77] Container "de1f07ee792f8d2e666efffdf756774ebab0558e279e6f0e8375d520ca7cb63e" not found in pod's containers
Dec 20 08:41:20 gpu01 kubelet[10526]: E1220 08:41:20.696842 10526 cni.go:259] Error adding network: unable to allocate IP address: Post dial tcp getsockopt: connection refused
Dec 20 08:41:20 gpu01 kubelet[10526]: E1220 08:41:20.696892 10526 cni.go:227] Error while adding to cni network: unable to allocate IP address: Post dial tcp getsockopt: connection refused
Dec 20 08:41:20 gpu01 kubelet[10526]: W1220 08:41:20.697306 10526 container.go:393] Failed to create summary reader for "/libcontainer_14936_systemd_test_default.slice": none of the resources are being tracked.
Dec 20 08:41:20 gpu01 kubelet[10526]: W1220 08:41:20.697520 10526 container.go:393] Failed to create summary reader for "/libcontainer_14941_systemd_test_default.slice": none of the resources are being tracked.
Dec 20 08:41:20 gpu01 kubelet[10526]: E1220 08:41:20.708833 10526 cni.go:259] Error adding network: unable to allocate IP address: Post dial tcp getsockopt: connection refused
Dec 20 08:41:20 gpu01 kubelet[10526]: E1220 08:41:20.708860 10526 cni.go:227] Error while adding to cni network: unable to allocate IP address: Post dial tcp getsockopt: connection refused
Dec 20 08:41:20 gpu01 kubelet[10526]: E1220 08:41:20.860952 10526 remote_runtime.go:92] RunPodSandbox from runtime service failed: rpc error: code = Unknown desc = NetworkPlugin cni failed to set up pod "nvidia-device-plugin-daemonset-ljmv9_kube-system" network: unable to allocate IP address: Post dial tcp getsockopt: connection refused
Dec 20 08:41:20 gpu01 kubelet[10526]: E1220 08:41:20.861039 10526 kuberuntime_sandbox.go:54] CreatePodSandbox for pod "nvidia-device-plugin-daemonset-ljmv9_kube-system(ad93d43c-f986-11e8-a0db-fa163e21c438)" failed: rpc error: code = Unknown desc = NetworkPlugin cni failed to set up pod "nvidia-device-plugin-daemonset-ljmv9_kube-system" network: unable to allocate IP address: Post dial tcp getsockopt: connection refused
Dec 20 08:41:20 gpu01 kubelet[10526]: E1220 08:41:20.861067 10526 kuberuntime_manager.go:646] createPodSandbox for pod "nvidia-device-plugin-daemonset-ljmv9_kube-system(ad93d43c-f986-11e8-a0db-fa163e21c438)" failed: rpc error: code = Unknown desc = NetworkPlugin cni failed to set up pod "nvidia-device-plugin-daemonset-ljmv9_kube-system" network: unable to allocate IP address: Post dial tcp getsockopt: connection refused
Dec 20 08:41:20 gpu01 kubelet[10526]: E1220 08:41:20.861167 10526 pod_workers.go:186] Error syncing pod ad93d43c-f986-11e8-a0db-fa163e21c438 ("nvidia-device-plugin-daemonset-ljmv9_kube-system(ad93d43c-f986-11e8-a0db-fa163e21c438)"), skipping: failed to "CreatePodSandbox" for "nvidia-device-plugin-daemonset-ljmv9_kube-system(ad93d43c-f986-11e8-a0db-fa163e21c438)" with CreatePodSandboxError: "CreatePodSandbox for pod \"nvidia-device-plugin-daemonset-ljmv9_kube-system(ad93d43c-f986-11e8-a0db-fa163e21c438)\" failed: rpc error: code = Unknown desc = NetworkPlugin cni failed to set up pod \"nvidia-device-plugin-daemonset-ljmv9_kube-system\" network: unable to allocate IP address: Post dial tcp getsockopt: connection refused"
Dec 20 08:41:20 gpu01 kubelet[10526]: E1220 08:41:20.954796 10526 remote_runtime.go:92] RunPodSandbox from runtime service failed: rpc error: code = Unknown desc = NetworkPlugin cni failed to set up pod "hub-85c95bbd57-bx4sr_jupyter-prod" network: unable to allocate IP address: Post dial tcp getsockopt: connection refused
Dec 20 08:41:20 gpu01 kubelet[10526]: E1220 08:41:20.954851 10526 kuberuntime_sandbox.go:54] CreatePodSandbox for pod "hub-85c95bbd57-bx4sr_jupyter-prod(bd2287cb-0475-11e9-90de-fa163e21c438)" failed: rpc error: code = Unknown desc = NetworkPlugin cni failed to set up pod "hub-85c95bbd57-bx4sr_jupyter-prod" network: unable to allocate IP address: Post dial tcp getsockopt: connection refused


coredns connection refused error while setting up kubernetes cluster

I've got a kubernetes cluster set up with kubeadm. I haven't deployed any pods yet, but the coredns pods are stuck in a ContainerCreating status.
[root#master-node ~]# kubectl get -A pods
kube-system coredns-64897985d-f5kjh 0/1 ContainerCreating 0 151m
kube-system coredns-64897985d-xz9nt 0/1 ContainerCreating 0 151m
When I check it out with kubectl describe I see this:
[root#master-node ~]# kubectl describe -n kube-system pod coredns-64897985d-f5kjh
Type Reason Age From Message
---- ------ ---- ---- -------
Warning FailedCreatePodSandBox 22m (x570 over 145m) kubelet (combined from similar events): Failed to create pod sandbox: rpc error: code = Unknown desc = failed to setup network for sandbox "4974dadd11fecf1ebfbcccd75701641b752426808889895672f34e6934776207": unable to allocate IP address: Post "": dial tcp connect: connection refused
Warning FailedCreatePodSandBox 18m kubelet Failed to create pod sandbox: rpc error: code = Unknown desc = failed to setup network for sandbox "bce2558b24468c0d0e83fe1eedf2fa70108420a466d000b74ceaf351e595007d": unable to allocate IP address: Post "": dial tcp connect: connection refused
Warning FailedCreatePodSandBox 18m kubelet Failed to create pod sandbox: rpc error: code = Unknown desc = failed to setup network for sandbox "e53e79bc3642c9a0c2b240dc174931af9f5dddf7d5b7df50382fcb3fea351df9": unable to allocate IP address: Post "": dial tcp connect: connection refused
Warning FailedCreatePodSandBox 18m kubelet Failed to create pod sandbox: rpc error: code = Unknown desc = failed to setup network for sandbox "b6da6e72057c3b48ac6ced3ba6b81917111e94c20216b65126a2733462139ed1": unable to allocate IP address: Post "": dial tcp connect: connection refused
Warning FailedCreatePodSandBox 18m kubelet Failed to create pod sandbox: rpc error: code = Unknown desc = failed to setup network for sandbox "09416534b75ef7beea279f9389eb1a732b6a288c3b170a489e04cce01c294fa2": unable to allocate IP address: Post "": dial tcp connect: connection refused
Warning FailedCreatePodSandBox 17m kubelet Failed to create pod sandbox: rpc error: code = Unknown desc = failed to setup network for sandbox "411fe06179ab24a3999b1c034bc99452d99249bbb6cb966b496f7a8b467e1806": unable to allocate IP address: Post "": dial tcp connect: connection refused
Warning FailedCreatePodSandBox 17m kubelet Failed to create pod sandbox: rpc error: code = Unknown desc = failed to setup network for sandbox "e0fc2a5d4852cd31eca4b473f614cadcb9235a2a325c01b469110bfd6bbf9a3b": unable to allocate IP address: Post "": dial tcp connect: connection refused
Warning FailedCreatePodSandBox 17m kubelet Failed to create pod sandbox: rpc error: code = Unknown desc = failed to setup network for sandbox "4528997239e55f7ef546c0af9cc7c12cf5fe4942a370ed2a772ba7fc405773d2": unable to allocate IP address: Post "": dial tcp connect: connection refused
Warning FailedCreatePodSandBox 17m kubelet Failed to create pod sandbox: rpc error: code = Unknown desc = failed to setup network for sandbox "b534273b4fe3b893cdeac05555e47429bc7578c1e0c0095481fe155637f0c4ae": unable to allocate IP address: Post "": dial tcp connect: connection refused
Warning FailedCreatePodSandBox 17m kubelet Failed to create pod sandbox: rpc error: code = Unknown desc = failed to setup network for sandbox "afc479a4bfa16cef4367ecfee74333dfa9bbf12c59995446792f22c8e39ca16d": unable to allocate IP address: Post "": dial tcp connect: connection refused
Warning FailedCreatePodSandBox 3m50s (x61 over 16m) kubelet (combined from similar events): Failed to create pod sandbox: rpc error: code = Unknown desc = failed to setup network for sandbox "a9254528ba611403a9b2293a2201c8758ff4adf75fd4a1d2b9690d15446cc92a": unable to allocate IP address: Post "": dial tcp connect: connection refused
Any idea what could be causing this?
Turns out this is a firewall issue. I was using Weavenet as my CNI, which requires port 6784 to be open to work. You can see this in the error, where it's trying to access and getting the connection refused (pretty obvious in hindsight). I fixed it by opening port 6784 on my firewall. For firewalld, I did
firewall-cmd --permanent --add-port=6784/tcp
firewall-cmd --reload
This might be a security problem. The weavenet docs said something about how this port should only be accessible to certain processes or something, not sure. For my application security isn't a big concern so I didn't bother looking into it.

Failed to create pod sandbox [flannel]

I am running into this error on random pods. Thank you #matthew-l-daniel for the comment - as I didn't know where to start.
Here is the contents of /opt/cni/bin on the node
:/opt/cni/bin$ ls
bridge host-local loopback
Here are the kubelet logs for a container that failed.
Jan 30 15:42:00 ip-172-20-39-216 kubelet[32233]: E0130 15:42:00.924370 32233 kuberuntime_sandbox.go:54] CreatePodSandbox for pod "postgres-core-0_service-master-459cf23(d8acae2f-24a2-11e9-b79c-0a0d1213cce2)" failed: rpc error: code = Unknown desc = failed to start sandbox container for pod "postgres-core-0": Error response from daemon: grpc: the connection is unavailable
Jan 30 15:42:00 ip-172-20-39-216 kubelet[32233]: E0130 15:42:00.924380 32233 kuberuntime_manager.go:647] createPodSandbox for pod "postgres-core-0_service-master-459cf23(d8acae2f-24a2-11e9-b79c-0a0d1213cce2)" failed: rpc error: code = Unknown desc = failed to start sandbox container for pod "postgres-core-0": Error response from daemon: grpc: the connection is unavailable
Jan 30 15:42:00 ip-172-20-39-216 kubelet[32233]: E0130 15:42:00.924427 32233 pod_workers.go:186] Error syncing pod d8acae2f-24a2-11e9-b79c-0a0d1213cce2 ("postgres-core-0_service-master-459cf23(d8acae2f-24a2-11e9-b79c-0a0d1213cce2)"), skipping: failed to "CreatePodSandbox" for "postgres-core-0_service-master-459cf23(d8acae2f-24a2-11e9-b79c-0a0d1213cce2)" with CreatePodSandboxError: "CreatePodSandbox for pod \"postgres-core-0_service-master-459cf23(d8acae2f-24a2-11e9-b79c-0a0d1213cce2)\" failed: rpc error: code = Unknown desc = failed to start sandbox container for pod \"postgres-core-0\": Error response from daemon: grpc: the connection is unavailable"
As for flannel container logs, there are many flannel pods running - and all are healthy.
Kubernetes v 1.10.11
Docker version 17.03.2-ce, build f5ec1e2
Flannel logs
E0130 15:34:16.536354 1 vxlan_network.go:187] DelFDB failed: no such file or directory
E0130 15:34:16.536411 1 vxlan_network.go:191] failed to delete vxlanRoute ( -> no such process
E0130 17:33:44.848163 1 vxlan_network.go:187] DelFDB failed: no such file or directory
E0130 17:33:44.848219 1 vxlan_network.go:191] failed to delete vxlanRoute ( -> no such process

Why does kubernetes v13's API randomly go down?

cat /etc/redhat-release
CentOS Linux release 7.6.1810 (Core)
Clean install for kubernetes using kubeadm init following the steps directly in the docs. Tried with flannel, weavenet and Calico.
After about 5-10 minutes, after a watch kubectl get nodes, I'll get these messages at random, leaving me with an inaccessible cluster that I can't apply any .yml files to.
Unable to connect to the server: net/http: TLS handshake timeout
The connection to the server was refused - did you specify the right host or port?
Unable to connect to the server: http2: server sent GOAWAY and closed the connection; LastStreamID=1, ErrCode=NO_ERROR, debug=""
kubelet is fine aside from it showing it can't get random services from (the master node)
[root#play ~]# systemctl status kubelet
● kubelet.service - kubelet: The Kubernetes Node Agent
Loaded: loaded (/etc/systemd/system/kubelet.service; enabled; vendor preset: disabled)
Drop-In: /etc/systemd/system/kubelet.service.d
Active: active (running) since Mon 2018-12-10 13:57:17 EST; 21min ago
Docs: https://kubernetes.io/docs/
Main PID: 3411939 (kubelet)
CGroup: /system.slice/kubelet.service
└─3411939 /usr/bin/kubelet --bootstrap-kubeconfig=/etc/kubernetes/bootstrap-kubelet.conf --kubeconfig=/etc/kubernetes/kubelet.conf --config=/var/lib/kubelet/config.yaml --cgroup-driver=systemd --network-...
Dec 10 14:18:53 play kubelet[3411939]: E1210 14:18:53.811213 3411939 reflector.go:134] object-"kube-system"/"kube-proxy": Failed to list *v1.ConfigMap: Get
Dec 10 14:18:54 play kubelet[3411939]: E1210 14:18:54.011239 3411939 reflector.go:134] k8s.io/kubernetes/pkg/kubelet/kubelet.go:444: Failed to list *v1.Service: Get refused
Dec 10 14:18:54 play kubelet[3411939]: E1210 14:18:54.211160 3411939 reflector.go:134] object-"kube-system"/"kube-proxy-token-n5qjm": Failed to list *v1.Secret: Get
Dec 10 14:18:54 play kubelet[3411939]: E1210 14:18:54.411190 3411939 reflector.go:134] object-"kube-system"/"coredns-token-7qjzv": Failed to list *v1.Secret: Get
Dec 10 14:18:54 play kubelet[3411939]: E1210 14:18:54.611103 3411939 reflector.go:134] object-"kube-system"/"coredns": Failed to list *v1.ConfigMap: Get
Dec 10 14:18:54 play kubelet[3411939]: E1210 14:18:54.811105 3411939 reflector.go:134] k8s.io/kubernetes/pkg/kubelet/kubelet.go:453: Failed to list *v1.Node: Get refused
Dec 10 14:18:55 play kubelet[3411939]: E1210 14:18:55.011204 3411939 reflector.go:134] k8s.io/kubernetes/pkg/kubelet/config/apiserver.go:47: Failed to list *v1.Pod: Get refused
Dec 10 14:18:55 play kubelet[3411939]: E1210 14:18:55.211132 3411939 reflector.go:134] object-"kube-system"/"weave-net-token-5zb86": Failed to list *v1.Secret: Get
Dec 10 14:18:55 play kubelet[3411939]: E1210 14:18:55.411281 3411939 reflector.go:134] object-"kube-system"/"kube-proxy": Failed to list *v1.ConfigMap: Get
Dec 10 14:18:55 play kubelet[3411939]: E1210 14:18:55.611125 3411939 reflector.go:134] k8s.io/kubernetes/pkg/kubelet/kubelet.go:444: Failed to list *v1.Service: Get refused
Hint: Some lines were ellipsized, use -l to show in full.
A docker container that runs coredns shows issues with getting resources from the what looks like anything in k8s default Service Subnet CIDR range (showing a VPS on a separate hosting provider using a local IP here)
2018-12-10T10:34:52.589Z [INFO] CoreDNS-1.2.6
2018-12-10T10:34:52.589Z [INFO] linux/amd64, go1.11.2, 756749c
linux/amd64, go1.11.2, 756749c
[INFO] plugin/reload: Running configuration MD5 = f65c4821c8a9b7b5eb30fa4fbc167769
E1210 10:55:53.286644 1 reflector.go:205] github.com/coredns/coredns/plugin/kubernetes/controller.go:313: Failed to list *v1.Endpoints: Get dial tcp connect: connection refused
E1210 10:55:53.290019 1 reflector.go:205] github.com/coredns/coredns/plugin/kubernetes/controller.go:318: Failed to list *v1.Namespace: Get dial tcp connect: connection refused
Kubeapi is just showing random failures and it looks like its mixing in IPv6.
I1210 19:23:09.067462 1 trace.go:76] Trace[1029933921]: "Get /api/v1/nodes/play" (started: 2018-12-10 19:23:00.256692931 +0000 UTC m=+188.530973072) (total time: 8.810746081s):
Trace[1029933921]: [8.810746081s] [8.810715241s] END
E1210 19:23:09.068687 1 available_controller.go:316] v2beta1.autoscaling failed with: Put https://[::1]:6443/apis/apiregistration.k8s.io/v1/apiservices/v2beta1.autoscaling/status: dial tcp [::1]:6443: connect: connection refused
E1210 19:23:09.069678 1 available_controller.go:316] v1. failed with: Put https://[::1]:6443/apis/apiregistration.k8s.io/v1/apiservices/v1./status: dial tcp [::1]:6443: connect: connection refused
E1210 19:23:09.073019 1 available_controller.go:316] v1beta1.apiextensions.k8s.io failed with: Put https://[::1]:6443/apis/apiregistration.k8s.io/v1/apiservices/v1beta1.apiextensions.k8s.io/status: dial tcp [::1]:6443: connect: connection refused
E1210 19:23:09.074112 1 available_controller.go:316] v1beta1.batch failed with: Put https://[::1]:6443/apis/apiregistration.k8s.io/v1/apiservices/v1beta1.batch/status: dial tcp [::1]:6443: connect: connection refused
E1210 19:23:09.075151 1 available_controller.go:316] v2beta2.autoscaling failed with: Put https://[::1]:6443/apis/apiregistration.k8s.io/v1/apiservices/v2beta2.autoscaling/status: dial tcp [::1]:6443: connect: connection refused
E1210 19:23:09.077408 1 available_controller.go:316] v1.authorization.k8s.io failed with: Put https://[::1]:6443/apis/apiregistration.k8s.io/v1/apiservices/v1.authorization.k8s.io/status: dial tcp [::1]:6443: connect: connection refused
E1210 19:23:09.078457 1 available_controller.go:316] v1.networking.k8s.io failed with: Put https://[::1]:6443/apis/apiregistration.k8s.io/v1/apiservices/v1.networking.k8s.io/status: dial tcp [::1]:6443: connect: connection refused
E1210 19:23:09.079449 1 available_controller.go:316] v1beta1.coordination.k8s.io failed with: Put https://[::1]:6443/apis/apiregistration.k8s.io/v1/apiservices/v1beta1.coordination.k8s.io/status: dial tcp [::1]:6443: connect: connection refused
E1210 19:23:09.080558 1 available_controller.go:316] v1.authentication.k8s.io failed with: Put https://[::1]:6443/apis/apiregistration.k8s.io/v1/apiservices/v1.authentication.k8s.io/status: dial tcp [::1]:6443: connect: connection refused
E1210 19:23:09.081628 1 available_controller.go:316] v1beta1.scheduling.k8s.io failed with: Put https://[::1]:6443/apis/apiregistration.k8s.io/v1/apiservices/v1beta1.scheduling.k8s.io/status: dial tcp [::1]:6443: connect: connection refused
E1210 19:23:09.082803 1 available_controller.go:316] v1.autoscaling failed with: Put https://[::1]:6443/apis/apiregistration.k8s.io/v1/apiservices/v1.autoscaling/status: dial tcp [::1]:6443: connect: connection refused
E1210 19:23:09.083845 1 available_controller.go:316] v1beta1.events.k8s.io failed with: Put https://[::1]:6443/apis/apiregistration.k8s.io/v1/apiservices/v1beta1.events.k8s.io/status: dial tcp [::1]:6443: connect: connection refused
E1210 19:23:09.084882 1 available_controller.go:316] v1beta1.storage.k8s.io failed with: Put https://[::1]:6443/apis/apiregistration.k8s.io/v1/apiservices/v1beta1.storage.k8s.io/status: dial tcp [::1]:6443: connect: connection refused
E1210 19:23:09.085985 1 available_controller.go:316] v1.apps failed with: Put https://[::1]:6443/apis/apiregistration.k8s.io/v1/apiservices/v1.apps/status: dial tcp [::1]:6443: connect: connection refused
E1210 19:23:09.087019 1 available_controller.go:316] v1beta1.apps failed with: Put https://[::1]:6443/apis/apiregistration.k8s.io/v1/apiservices/v1beta1.apps/status: dial tcp [::1]:6443: connect: connection refused
E1210 19:23:09.088113 1 available_controller.go:316] v1beta1.certificates.k8s.io failed with: Put https://[::1]:6443/apis/apiregistration.k8s.io/v1/apiservices/v1beta1.certificates.k8s.io/status: dial tcp [::1]:6443: connect: connection refused
E1210 19:23:09.089164 1 available_controller.go:316] v1.storage.k8s.io failed with: Put https://[::1]:6443/apis/apiregistration.k8s.io/v1/apiservices/v1.storage.k8s.io/status: dial tcp [::1]:6443: connect: connection refused
E1210 19:23:09.090268 1 available_controller.go:316] v1beta1.authentication.k8s.io failed with: Put https://[::1]:6443/apis/apiregistration.k8s.io/v1/apiservices/v1beta1.authentication.k8s.io/status: dial tcp [::1]:6443: connect: connection refused
W1210 19:23:28.996746 1 controller.go:181] StopReconciling() timed out
And I'm out of troubleshooting steps.

network service in kubernetes worker nodes

I have installed 3 servers kubernetes setup by following https://kubernetes.io/docs/setup/independent/create-cluster-kubeadm/
I created calico network service in the master node. my question should I create calico service in worker nodes also?
I am getting below error in worker node when i create pod
Jul 11 13:25:05 ip-172-31-20-212 kubelet: I0711 13:25:05.144142 23325 kuberuntime_manager.go:767] Back-off 5m0s restarting failed container=calico-node pod=calico-node-ngwhq_kube-system(e17770a3-8507-11e8-962c-0ac29e406ef0)
Jul 11 13:25:05 ip-172-31-20-212 kubelet: E0711 13:25:05.144169 23325 pod_workers.go:186] Error syncing pod e17770a3-8507-11e8-962c-0ac29e406ef0 ("calico-node-ngwhq_kube-system(e17770a3-8507-11e8-962c-0ac29e406ef0)"), skipping: failed to "StartContainer" for "calico-node" with CrashLoopBackOff: "Back-off 5m0s restarting failed container=calico-node pod=calico-node-ngwhq_kube-system(e17770a3-8507-11e8-962c-0ac29e406ef0)"
Jul 11 13:25:07 ip-172-31-20-212 kubelet: E0711 13:25:07.221953 23325 cni.go:280] Error deleting network: context deadline exceeded
Jul 11 13:25:07 ip-172-31-20-212 kubelet: E0711 13:25:07.222595 23325 remote_runtime.go:115] StopPodSandbox "22fe8b5db360011aa79afadfe91a46bfef0322092478d378ef657d3babfc1326" from runtime service failed: rpc error: code = Unknown desc = NetworkPlugin cni failed to teardown pod "test2-597bdc85dc-k2xsm_default" network: context deadline exceeded
Jul 11 13:25:07 ip-172-31-20-212 kubelet: E0711 13:25:07.222630 23325 kuberuntime_manager.go:799] Failed to stop sandbox {"docker" "22fe8b5db360011aa79afadfe91a46bfef0322092478d378ef657d3babfc1326"}
Jul 11 13:25:07 ip-172-31-20-212 kubelet: E0711 13:25:07.222664 23325 kuberuntime_manager.go:594] killPodWithSyncResult failed: failed to "KillPodSandbox" for "67e18616-850d-11e8-962c-0ac29e406ef0" with KillPodSandboxError: "rpc error: code = Unknown desc = NetworkPlugin cni failed to teardown pod \"test2-597bdc85dc-k2xsm_default\" network: context deadline exceeded"
Jul 11 13:25:07 ip-172-31-20-212 kubelet: E0711 13:25:07.222685 23325 pod_workers.go:186] Error syncing pod 67e18616-850d-11e8-962c-0ac29e406ef0 ("test2-597bdc85dc-k2xsm_default(67e18616-850d-11e8-962c-0ac29e406ef0)"), skipping: failed to "KillPodSandbox" for "67e18616-850d-11e8-962c-0ac29e406ef0" with KillPodSandboxError: "rpc error: code = Unknown desc = NetworkPlugin cni failed to teardown pod \"test2-597bdc85dc-k2xsm_default\" network: context deadline exceeded"
Jul 11 13:25:12 ip-172-31-20-212 kubelet: E0711 13:25:12.007944 23325 cni.go:280] Error deleting network: context deadline exceeded
Jul 11 13:25:12 ip-172-31-20-212 kubelet: E0711 13:25:12.008783 23325 remote_runtime.go:115] StopPodSandbox "4b14d68c7bc892594dedd1f62d92414574a3fb00873a805b62707c7a63bfdfe7" from runtime service failed: rpc error: code = Unknown desc = NetworkPlugin cni failed to teardown pod "test2-597bdc85dc-qmc85_default" network: context deadline exceeded
Jul 11 13:25:12 ip-172-31-20-212 kubelet: E0711 13:25:12.008819 23325 kuberuntime_gc.go:153] Failed to stop sandbox "4b14d68c7bc892594dedd1f62d92414574a3fb00873a805b62707c7a63bfdfe7" before removing: rpc error: code = Unknown desc = NetworkPlugin cni failed to teardown pod "test2-597bdc85dc-qmc85_default" network: context deadline exceeded
Jul 11 13:25:19 ip-172-31-20-212 kubelet: W0711 13:25:19.145386 23325 cni.go:243] CNI failed to retrieve network namespace path: cannot find network namespace for the terminated container "22fe8b5db360011aa79afadfe91a46bfef0322092478d378ef657d3babfc1326"
I tried to install calico network in worker nodes as well with below mentioned commands but no luck getting error ..
kubectl apply -f https://docs.projectcalico.org/v3.1/getting-started/kubernetes/installation/hosted/kubeadm/1.7/calico.yaml
unable to recognize "https://docs.projectcalico.org/v3.1/getting-started/kubernetes/installation/hosted/kubeadm/1.7/calico.yaml": Get http://localhost:8080/api?timeout=32s: dial tcp connect: connection refused
unable to recognize "https://docs.projectcalico.org/v3.1/getting-started/kubernetes/installation/hosted/kubeadm/1.7/calico.yaml": Get http://localhost:8080/api?timeout=32s: dial tcp connect: connection refused
unable to recognize "https://docs.projectcalico.org/v3.1/getting-started/kubernetes/installation/hosted/kubeadm/1.7/calico.yaml": Get http://localhost:8080/api?timeout=32s: dial tcp connect: connection refused
unable to recognize "https://docs.projectcalico.org/v3.1/getting-started/kubernetes/installation/hosted/kubeadm/1.7/calico.yaml": Get http://localhost:8080/api?timeout=32s: dial tcp connect: connection refused
unable to recognize "https://docs.projectcalico.org/v3.1/getting-started/kubernetes/installation/hosted/kubeadm/1.7/calico.yaml": Get http://localhost:8080/api?timeout=32s: dial tcp connect: connection refused
unable to recognize "https://docs.projectcalico.org/v3.1/getting-started/kubernetes/installation/hosted/kubeadm/1.7/calico.yaml": Get http://localhost:8080/api?timeout=32s: dial tcp connect: connection refused
unable to recognize "https://docs.projectcalico.org/v3.1/getting-started/kubernetes/installation/hosted/kubeadm/1.7/calico.yaml": Get http://localhost:8080/api?timeout=32s: dial tcp connect: connection refused
unable to recognize "https://docs.projectcalico.org/v3.1/getting-started/kubernetes/installation/hosted/kubeadm/1.7/calico.yaml": Get http://localhost:8080/api?timeout=32s: dial tcp connect: connection refused
unable to recognize "https://docs.projectcalico.org/v3.1/getting-started/kubernetes/installation/hosted/kubeadm/1.7/calico.yaml": Get http://localhost:8080/api?timeout=32s: dial tcp connect: connection refused
unable to recognize "https://docs.projectcalico.org/v3.1/getting-started/kubernetes/installation/hosted/kubeadm/1.7/calico.yaml": Get http://localhost:8080/api?timeout=32s: dial tcp connect: connection refused
unable to recognize "https://docs.projectcalico.org/v3.1/getting-started/kubernetes/installation/hosted/kubeadm/1.7/calico.yaml": Get http://localhost:8080/api?timeout=32s: dial tcp connect: connection refused
Every single node needs calico service running, that's general knowledge.

Kubernetes Authentication issue

I started looking at different ways of using authentication on kubernetes. Of course, I started with the simplest option, static password file. Basically, I created a file named users.csv with the following content:
When I start minikube using this file, it hangs at the cluster components (starting cluster components). The command I use is:
minikube --extra-config=apiserver.Authentication.PasswordFile.BasicAuthFile=~/temp/users.csv start
After a while (~ 10 minutes), the minikube start command fails with the following error message:
E0523 10:23:57.391692 30932 util.go:151] Error uploading error message: : Post https://clouderrorreporting.googleapis.com/v1beta1/projects/k8s-minikube/events:report?key=AIzaSyACUwzG0dEPcl-eOgpDKnyKoUFgHdfoFuA: x509: certificate signed by unknown authority
I can see that there are several errors on the log (minikube logs):
ay 23 09:47:32 minikube kubelet[3301]: E0523 09:47:32.473157 3301 reflector.go:205] k8s.io/kubernetes/pkg/kubelet/config/apiserver.go:47: Failed to list *v1.Pod: Get dial tcp getsockopt: connection refused
May 23 09:47:33 minikube kubelet[3301]: E0523 09:47:33.414460 3301 reflector.go:205] k8s.io/kubernetes/pkg/kubelet/kubelet.go:460: Failed to list *v1.Node: Get dial tcp getsockopt: connection refused
May 23 09:47:33 minikube kubelet[3301]: E0523 09:47:33.470604 3301 reflector.go:205] k8s.io/kubernetes/pkg/kubelet/kubelet.go:451: Failed to list *v1.Service: Get dial tcp getsockopt: connection refused
May 23 09:47:33 minikube kubelet[3301]: E0523 09:47:33.474548 3301 reflector.go:205] k8s.io/kubernetes/pkg/kubelet/config/apiserver.go:47: Failed to list *v1.Pod: Get dial tcp getsockopt: connection refused
May 23 09:47:34 minikube kubelet[3301]: I0523 09:47:34.086654 3301 kubelet_node_status.go:271] Setting node annotation to enable volume controller attach/detach
May 23 09:47:34 minikube kubelet[3301]: I0523 09:47:34.090697 3301 kubelet_node_status.go:82] Attempting to register node minikube
May 23 09:47:34 minikube kubelet[3301]: E0523 09:47:34.091108 3301 kubelet_node_status.go:106] Unable to register node "minikube" with API server: Post dial tcp getsockopt: connection refused
May 23 09:47:34 minikube kubelet[3301]: E0523 09:47:34.370484 3301 event.go:209] Unable to write event: 'Patch dial tcp getsockopt: connection refused' (may retry after sleeping)
May 23 09:47:34 minikube kubelet[3301]: E0523 09:47:34.419833 3301 reflector.go:205] k8s.io/kubernetes/pkg/kubelet/kubelet.go:460: Failed to list *v1.Node: Get dial tcp getsockopt: connection refused
May 23 09:47:34 minikube kubelet[3301]: E0523 09:47:34.472826 3301 reflector.go:205] k8s.io/kubernetes/pkg/kubelet/kubelet.go:451: Failed to list *v1.Service: Get dial tcp getsockopt: connection refused
May 23 09:47:34 minikube kubelet[3301]: E0523 09:47:34.479619 3301 reflector.go:205] k8s.io/kubernetes/pkg/kubelet/config/apiserver.go:47: Failed to list *v1.Pod: Get dial tcp getsockopt: connection refused
I also logged in the minikube VM (minikube ssh) and I noticed that the apiserver docker container is down. Looking at the logs of this container I see the following error:
error: unknown flag: --Authentication.PasswordFile.BasicAuthFile
Therefore, I changed my command to something like:
minikube start --extra-config=apiserver.basic-auth-file=~/temp/users.csv
It failed again but now the container shows a different error. The error is no longer related to invalid flag. Instead, it complains that the file not found (no such file or directory). I also tried to specify a file on the minikube vm (/var/lib/localkube) but I had the same issue.
The minikube version is:
minikube version: v0.26.0
When I start minikube without considering the authentication, it works fine. Are there any other steps that I need to do?
You will need to mount the file into the docker container that runs apiserver. Pls see a hack that worked: https://github.com/kubernetes/minikube/issues/1898#issuecomment-402714802