network service in kubernetes worker nodes - kubernetes

I have installed 3 servers kubernetes setup by following https://kubernetes.io/docs/setup/independent/create-cluster-kubeadm/
I created calico network service in the master node. my question should I create calico service in worker nodes also?
I am getting below error in worker node when i create pod
ngwhq_kube-system(e17770a3-8507-11e8-962c-0ac29e406ef0)"
Jul 11 13:25:05 ip-172-31-20-212 kubelet: I0711 13:25:05.144142 23325 kuberuntime_manager.go:767] Back-off 5m0s restarting failed container=calico-node pod=calico-node-ngwhq_kube-system(e17770a3-8507-11e8-962c-0ac29e406ef0)
Jul 11 13:25:05 ip-172-31-20-212 kubelet: E0711 13:25:05.144169 23325 pod_workers.go:186] Error syncing pod e17770a3-8507-11e8-962c-0ac29e406ef0 ("calico-node-ngwhq_kube-system(e17770a3-8507-11e8-962c-0ac29e406ef0)"), skipping: failed to "StartContainer" for "calico-node" with CrashLoopBackOff: "Back-off 5m0s restarting failed container=calico-node pod=calico-node-ngwhq_kube-system(e17770a3-8507-11e8-962c-0ac29e406ef0)"
Jul 11 13:25:07 ip-172-31-20-212 kubelet: E0711 13:25:07.221953 23325 cni.go:280] Error deleting network: context deadline exceeded
Jul 11 13:25:07 ip-172-31-20-212 kubelet: E0711 13:25:07.222595 23325 remote_runtime.go:115] StopPodSandbox "22fe8b5db360011aa79afadfe91a46bfef0322092478d378ef657d3babfc1326" from runtime service failed: rpc error: code = Unknown desc = NetworkPlugin cni failed to teardown pod "test2-597bdc85dc-k2xsm_default" network: context deadline exceeded
Jul 11 13:25:07 ip-172-31-20-212 kubelet: E0711 13:25:07.222630 23325 kuberuntime_manager.go:799] Failed to stop sandbox {"docker" "22fe8b5db360011aa79afadfe91a46bfef0322092478d378ef657d3babfc1326"}
Jul 11 13:25:07 ip-172-31-20-212 kubelet: E0711 13:25:07.222664 23325 kuberuntime_manager.go:594] killPodWithSyncResult failed: failed to "KillPodSandbox" for "67e18616-850d-11e8-962c-0ac29e406ef0" with KillPodSandboxError: "rpc error: code = Unknown desc = NetworkPlugin cni failed to teardown pod \"test2-597bdc85dc-k2xsm_default\" network: context deadline exceeded"
Jul 11 13:25:07 ip-172-31-20-212 kubelet: E0711 13:25:07.222685 23325 pod_workers.go:186] Error syncing pod 67e18616-850d-11e8-962c-0ac29e406ef0 ("test2-597bdc85dc-k2xsm_default(67e18616-850d-11e8-962c-0ac29e406ef0)"), skipping: failed to "KillPodSandbox" for "67e18616-850d-11e8-962c-0ac29e406ef0" with KillPodSandboxError: "rpc error: code = Unknown desc = NetworkPlugin cni failed to teardown pod \"test2-597bdc85dc-k2xsm_default\" network: context deadline exceeded"
Jul 11 13:25:12 ip-172-31-20-212 kubelet: E0711 13:25:12.007944 23325 cni.go:280] Error deleting network: context deadline exceeded
Jul 11 13:25:12 ip-172-31-20-212 kubelet: E0711 13:25:12.008783 23325 remote_runtime.go:115] StopPodSandbox "4b14d68c7bc892594dedd1f62d92414574a3fb00873a805b62707c7a63bfdfe7" from runtime service failed: rpc error: code = Unknown desc = NetworkPlugin cni failed to teardown pod "test2-597bdc85dc-qmc85_default" network: context deadline exceeded
Jul 11 13:25:12 ip-172-31-20-212 kubelet: E0711 13:25:12.008819 23325 kuberuntime_gc.go:153] Failed to stop sandbox "4b14d68c7bc892594dedd1f62d92414574a3fb00873a805b62707c7a63bfdfe7" before removing: rpc error: code = Unknown desc = NetworkPlugin cni failed to teardown pod "test2-597bdc85dc-qmc85_default" network: context deadline exceeded
Jul 11 13:25:19 ip-172-31-20-212 kubelet: W0711 13:25:19.145386 23325 cni.go:243] CNI failed to retrieve network namespace path: cannot find network namespace for the terminated container "22fe8b5db360011aa79afadfe91a46bfef0322092478d378ef657d3babfc1326"
I tried to install calico network in worker nodes as well with below mentioned commands but no luck getting error ..
kubectl apply -f https://docs.projectcalico.org/v3.1/getting-started/kubernetes/installation/hosted/kubeadm/1.7/calico.yaml
unable to recognize "https://docs.projectcalico.org/v3.1/getting-started/kubernetes/installation/hosted/kubeadm/1.7/calico.yaml": Get http://localhost:8080/api?timeout=32s: dial tcp 127.0.0.1:8080: connect: connection refused
unable to recognize "https://docs.projectcalico.org/v3.1/getting-started/kubernetes/installation/hosted/kubeadm/1.7/calico.yaml": Get http://localhost:8080/api?timeout=32s: dial tcp 127.0.0.1:8080: connect: connection refused
unable to recognize "https://docs.projectcalico.org/v3.1/getting-started/kubernetes/installation/hosted/kubeadm/1.7/calico.yaml": Get http://localhost:8080/api?timeout=32s: dial tcp 127.0.0.1:8080: connect: connection refused
unable to recognize "https://docs.projectcalico.org/v3.1/getting-started/kubernetes/installation/hosted/kubeadm/1.7/calico.yaml": Get http://localhost:8080/api?timeout=32s: dial tcp 127.0.0.1:8080: connect: connection refused
unable to recognize "https://docs.projectcalico.org/v3.1/getting-started/kubernetes/installation/hosted/kubeadm/1.7/calico.yaml": Get http://localhost:8080/api?timeout=32s: dial tcp 127.0.0.1:8080: connect: connection refused
unable to recognize "https://docs.projectcalico.org/v3.1/getting-started/kubernetes/installation/hosted/kubeadm/1.7/calico.yaml": Get http://localhost:8080/api?timeout=32s: dial tcp 127.0.0.1:8080: connect: connection refused
unable to recognize "https://docs.projectcalico.org/v3.1/getting-started/kubernetes/installation/hosted/kubeadm/1.7/calico.yaml": Get http://localhost:8080/api?timeout=32s: dial tcp 127.0.0.1:8080: connect: connection refused
unable to recognize "https://docs.projectcalico.org/v3.1/getting-started/kubernetes/installation/hosted/kubeadm/1.7/calico.yaml": Get http://localhost:8080/api?timeout=32s: dial tcp 127.0.0.1:8080: connect: connection refused
unable to recognize "https://docs.projectcalico.org/v3.1/getting-started/kubernetes/installation/hosted/kubeadm/1.7/calico.yaml": Get http://localhost:8080/api?timeout=32s: dial tcp 127.0.0.1:8080: connect: connection refused
unable to recognize "https://docs.projectcalico.org/v3.1/getting-started/kubernetes/installation/hosted/kubeadm/1.7/calico.yaml": Get http://localhost:8080/api?timeout=32s: dial tcp 127.0.0.1:8080: connect: connection refused
unable to recognize "https://docs.projectcalico.org/v3.1/getting-started/kubernetes/installation/hosted/kubeadm/1.7/calico.yaml": Get http://localhost:8080/api?timeout=32s: dial tcp 127.0.0.1:8080: connect: connection refused

Every single node needs calico service running, that's general knowledge.

Related

Error-Getsockopt: connection refused - Kubernetes apiserver

Am facing an unexpected issue today, earlier today I noticed that "kubectl get pods" was returning an "Unable to connect to the server: EOF". Upon further investigation I found out that Kubernetes apiserver is unable to connect to 127.0.0.1:443. I have been unable to resolve this problem, any assistance would be highly appreciated. Below are the logs I found.
Nov 20 18:13:30 ip-172-31-152-166.us-west-2.compute.internal kubelet[6398]: E1120 18:13:30.362106 6398 reflector.go:205] k8s.io/kubernetes/pkg/kubelet/kubelet.go:481: Failed to list *v1.Node: Get https://127.0.0.1/api/v1/nodes?fieldSelector=metadata.name%3Dip-172-31-152-166.us-west-2.compute.internal&limit=500&resourceVersion=0: dial tcp 127.0.0.1:443: getsockopt: connection refused
Nov 20 18:13:30 ip-172-31-152-166.us-west-2.compute.internal kubelet[6398]: E1120 18:13:30.362928 6398 reflector.go:205] k8s.io/kubernetes/pkg/kubelet/config/apiserver.go:47: Failed to list *v1.Pod: Get https://127.0.0.1/api/v1/pods?fieldSelector=spec.nodeName%3Dip-172-31-152-166.us-west-2.compute.internal&limit=500&resourceVersion=0: dial tcp 127.0.0.1:443: getsockopt: connection refused
Nov 20 18:13:30 ip-172-31-152-166.us-west-2.compute.internal kubelet[6398]: E1120 18:13:30.363719 6398 reflector.go:205] k8s.io/kubernetes/pkg/kubelet/kubelet.go:472: Failed to list *v1.Service: Get https://127.0.0.1/api/v1/services?limit=500&resourceVersion=0: dial tcp 127.0.0.1:443: getsockopt: connection refused

kube-proxy failing to start "error: unrecognized key:"

I just upgraded my 1.10.0 kubernetes cluster to 1.10.12.
i also updates a node or two to the same version.
however, i now see that:
kube-proxy-r5ts5 0/1 CrashLoopBackOff 5 3m 134.79.129.110 gpu03
showing the logs gives:
# kubectl -n kube-system logs -f kube-proxy-r5ts5
error: unrecognized key:
help? i do not know how to troubleshoot this further.
coincidentally, i added a new node at the same time, and i see that weave also has problems starting:
# kubectl -n kube-system logs -f weave-net-mb299 weave
FATA: 2018/12/20 01:43:35.703088 [kube-peers] Could not get peers: Get https://10.96.0.1:443/api/v1/nodes: dial tcp 10.96.0.1:443: i/o timeout
Failed to get peers
# kubectl -n kube-system logs -f weave-net-mb299 weave-npc
ERROR: logging before flag.Parse: E1220 01:44:02.447197 28249 reflector.go:205] github.com/weaveworks/weave/prog/weave-npc/main.go:230: Failed to list *v1.NetworkPolicy: Get https://10.96.0.1:443/apis/networking.k8s.io/v1/networkpolicies?resourceVersion=0: dial tcp 10.96.0.1:443: i/o timeout
i guess this is because kube-proxy isn't up.
# kubectl -n kube-system describe pods kube-proxy-r5ts5
Name: kube-proxy-r5ts5
Namespace: kube-system
Node: gpu02/134.79.129.96
Start Time: Thu, 20 Dec 2018 02:01:10 +0000
Labels: controller-revision-hash=3231443654
k8s-app=kube-proxy
pod-template-generation=4
Annotations: <none>
Status: Running
IP: 134.79.129.96
Controlled By: DaemonSet/kube-proxy
Containers:
kube-proxy:
Container ID: docker://1bcfca6db8f68d7130de86947343a24f9fc23b506ea295509933473f3d830845
Image: gcr.io/google_containers/kube-proxy-amd64:v1.10.12
Image ID: docker-pullable://gcr.io/google_containers/kube-proxy-amd64#sha256:a9ed73c3526033cd3cf732b4a84de9d211f425ef08cce4f0535617cadf0f4200
Port: <none>
Host Port: <none>
Command:
/usr/local/bin/kube-proxy
--config=/var/lib/kube-proxy/config.conf
State: Waiting
Reason: CrashLoopBackOff
Last State: Terminated
Reason: Error
Exit Code: 1
Started: Thu, 20 Dec 2018 02:04:00 +0000
Finished: Thu, 20 Dec 2018 02:04:00 +0000
Ready: False
Restart Count: 5
Environment: <none>
Mounts:
/lib/modules from lib-modules (ro)
/run/xtables.lock from xtables-lock (rw)
/var/lib/kube-proxy from kube-proxy (rw)
/var/run/secrets/kubernetes.io/serviceaccount from kube-proxy-token-m4hvr (ro)
Conditions:
Type Status
Initialized True
Ready False
PodScheduled True
Volumes:
kube-proxy:
Type: ConfigMap (a volume populated by a ConfigMap)
Name: kube-proxy
Optional: false
xtables-lock:
Type: HostPath (bare host directory volume)
Path: /run/xtables.lock
HostPathType: FileOrCreate
lib-modules:
Type: HostPath (bare host directory volume)
Path: /lib/modules
HostPathType:
kube-proxy-token-m4hvr:
Type: Secret (a volume populated by a Secret)
SecretName: kube-proxy-token-m4hvr
Optional: false
QoS Class: BestEffort
Node-Selectors: <none>
Tolerations: node-role.kubernetes.io/master:NoSchedule
node.cloudprovider.kubernetes.io/uninitialized=true:NoSchedule
node.kubernetes.io/disk-pressure:NoSchedule
node.kubernetes.io/memory-pressure:NoSchedule
node.kubernetes.io/not-ready:NoExecute
node.kubernetes.io/unreachable:NoExecute
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal SuccessfulMountVolume 3m kubelet, gpu02 MountVolume.SetUp succeeded for volume "xtables-lock"
Normal SuccessfulMountVolume 3m kubelet, gpu02 MountVolume.SetUp succeeded for volume "lib-modules"
Normal SuccessfulMountVolume 3m kubelet, gpu02 MountVolume.SetUp succeeded for volume "kube-proxy"
Normal SuccessfulMountVolume 3m kubelet, gpu02 MountVolume.SetUp succeeded for volume "kube-proxy-token-m4hvr"
Normal Started 2m (x4 over 3m) kubelet, gpu02 Started container
Warning BackOff 2m (x7 over 3m) kubelet, gpu02 Back-off restarting failed container
Normal Pulled 2m (x5 over 3m) kubelet, gpu02 Container image "gcr.io/google_containers/kube-proxy-amd64:v1.10.12" already present on machine
Normal Created 2m (x5 over 3m) kubelet, gpu02 Created container
probably not related, but i did have problems with cri-tools and kubeadm join saying that it couldn't find dockershim.sock. so i did a rpm -e --nodeps cri-tools and that appeared to fix the join. i'm pretty sure the docker subsystem is working as i can see other kubernetes pods on the machine (eg k8s_POD_weave-net-mb299_kube-system, k8s_weave-npc_weave-net-mb299_kube-system)
a snapshot of the logs from one of the minions:
Dec 20 08:41:19 gpu01 kubelet[10526]: E1220 08:41:19.459850 10526 cni.go:227] Error while adding to cni network: unable to allocate IP address: Post http://127.0.0.1:6784/ip/e7ba3feb145f2004ac730c96eb6e1f7c91ad30d70515984de37d325b98abb616: dial tcp 127.0.0.1:6784: getsockopt: connection refused
Dec 20 08:41:19 gpu01 kubelet[10526]: E1220 08:41:19.637709 10526 remote_runtime.go:92] RunPodSandbox from runtime service failed: rpc error: code = Unknown desc = NetworkPlugin cni failed to set up pod "hub-85c95bbd57-bx4sr_jupyter-prod" network: unable to allocate IP address: Post http://127.0.0.1:6784/ip/de1f07ee792f8d2e666efffdf756774ebab0558e279e6f0e8375d520ca7cb63e: dial tcp 127.0.0.1:6784: getsockopt: connection refused
Dec 20 08:41:19 gpu01 kubelet[10526]: E1220 08:41:19.637826 10526 kuberuntime_sandbox.go:54] CreatePodSandbox for pod "hub-85c95bbd57-bx4sr_jupyter-prod(bd2287cb-0475-11e9-90de-fa163e21c438)" failed: rpc error: code = Unknown desc = NetworkPlugin cni failed to set up pod "hub-85c95bbd57-bx4sr_jupyter-prod" network: unable to allocate IP address: Post http://127.0.0.1:6784/ip/de1f07ee792f8d2e666efffdf756774ebab0558e279e6f0e8375d520ca7cb63e: dial tcp 127.0.0.1:6784: getsockopt: connection refused
Dec 20 08:41:19 gpu01 kubelet[10526]: E1220 08:41:19.637852 10526 kuberuntime_manager.go:646] createPodSandbox for pod "hub-85c95bbd57-bx4sr_jupyter-prod(bd2287cb-0475-11e9-90de-fa163e21c438)" failed: rpc error: code = Unknown desc = NetworkPlugin cni failed to set up pod "hub-85c95bbd57-bx4sr_jupyter-prod" network: unable to allocate IP address: Post http://127.0.0.1:6784/ip/de1f07ee792f8d2e666efffdf756774ebab0558e279e6f0e8375d520ca7cb63e: dial tcp 127.0.0.1:6784: getsockopt: connection refused
Dec 20 08:41:19 gpu01 kubelet[10526]: E1220 08:41:19.637947 10526 pod_workers.go:186] Error syncing pod bd2287cb-0475-11e9-90de-fa163e21c438 ("hub-85c95bbd57-bx4sr_jupyter-prod(bd2287cb-0475-11e9-90de-fa163e21c438)"), skipping: failed to "CreatePodSandbox" for "hub-85c95bbd57-bx4sr_jupyter-prod(bd2287cb-0475-11e9-90de-fa163e21c438)" with CreatePodSandboxError: "CreatePodSandbox for pod \"hub-85c95bbd57-bx4sr_jupyter-prod(bd2287cb-0475-11e9-90de-fa163e21c438)\" failed: rpc error: code = Unknown desc = NetworkPlugin cni failed to set up pod \"hub-85c95bbd57-bx4sr_jupyter-prod\" network: unable to allocate IP address: Post http://127.0.0.1:6784/ip/de1f07ee792f8d2e666efffdf756774ebab0558e279e6f0e8375d520ca7cb63e: dial tcp 127.0.0.1:6784: getsockopt: connection refused"
Dec 20 08:41:19 gpu01 kubelet[10526]: W1220 08:41:19.661793 10526 container.go:507] Failed to update stats for container "/libcontainer_14802_systemd_test_default.slice": read /sys/fs/cgroup/cpu,cpuacct/libcontainer_14802_systemd_test_default.slice/cpuacct.usage: no such device, continuing to push stats
Dec 20 08:41:19 gpu01 kubelet[10526]: E1220 08:41:19.745423 10526 remote_runtime.go:92] RunPodSandbox from runtime service failed: rpc error: code = Unknown desc = NetworkPlugin cni failed to set up pod "nvidia-device-plugin-daemonset-ljmv9_kube-system" network: unable to allocate IP address: Post http://127.0.0.1:6784/ip/e7ba3feb145f2004ac730c96eb6e1f7c91ad30d70515984de37d325b98abb616: dial tcp 127.0.0.1:6784: getsockopt: connection refused
Dec 20 08:41:19 gpu01 kubelet[10526]: E1220 08:41:19.745492 10526 kuberuntime_sandbox.go:54] CreatePodSandbox for pod "nvidia-device-plugin-daemonset-ljmv9_kube-system(ad93d43c-f986-11e8-a0db-fa163e21c438)" failed: rpc error: code = Unknown desc = NetworkPlugin cni failed to set up pod "nvidia-device-plugin-daemonset-ljmv9_kube-system" network: unable to allocate IP address: Post http://127.0.0.1:6784/ip/e7ba3feb145f2004ac730c96eb6e1f7c91ad30d70515984de37d325b98abb616: dial tcp 127.0.0.1:6784: getsockopt: connection refused
Dec 20 08:41:19 gpu01 kubelet[10526]: E1220 08:41:19.745526 10526 kuberuntime_manager.go:646] createPodSandbox for pod "nvidia-device-plugin-daemonset-ljmv9_kube-system(ad93d43c-f986-11e8-a0db-fa163e21c438)" failed: rpc error: code = Unknown desc = NetworkPlugin cni failed to set up pod "nvidia-device-plugin-daemonset-ljmv9_kube-system" network: unable to allocate IP address: Post http://127.0.0.1:6784/ip/e7ba3feb145f2004ac730c96eb6e1f7c91ad30d70515984de37d325b98abb616: dial tcp 127.0.0.1:6784: getsockopt: connection refused
Dec 20 08:41:19 gpu01 kubelet[10526]: E1220 08:41:19.745640 10526 pod_workers.go:186] Error syncing pod ad93d43c-f986-11e8-a0db-fa163e21c438 ("nvidia-device-plugin-daemonset-ljmv9_kube-system(ad93d43c-f986-11e8-a0db-fa163e21c438)"), skipping: failed to "CreatePodSandbox" for "nvidia-device-plugin-daemonset-ljmv9_kube-system(ad93d43c-f986-11e8-a0db-fa163e21c438)" with CreatePodSandboxError: "CreatePodSandbox for pod \"nvidia-device-plugin-daemonset-ljmv9_kube-system(ad93d43c-f986-11e8-a0db-fa163e21c438)\" failed: rpc error: code = Unknown desc = NetworkPlugin cni failed to set up pod \"nvidia-device-plugin-daemonset-ljmv9_kube-system\" network: unable to allocate IP address: Post http://127.0.0.1:6784/ip/e7ba3feb145f2004ac730c96eb6e1f7c91ad30d70515984de37d325b98abb616: dial tcp 127.0.0.1:6784: getsockopt: connection refused"
Dec 20 08:41:19 gpu01 kubelet[10526]: W1220 08:41:19.858313 10526 pod_container_deletor.go:77] Container "e7ba3feb145f2004ac730c96eb6e1f7c91ad30d70515984de37d325b98abb616" not found in pod's containers
Dec 20 08:41:19 gpu01 kubelet[10526]: W1220 08:41:19.934213 10526 pod_container_deletor.go:77] Container "de1f07ee792f8d2e666efffdf756774ebab0558e279e6f0e8375d520ca7cb63e" not found in pod's containers
Dec 20 08:41:20 gpu01 kubelet[10526]: E1220 08:41:20.696842 10526 cni.go:259] Error adding network: unable to allocate IP address: Post http://127.0.0.1:6784/ip/264521a208ca5f0a3081b5b40e6f0176624c44ee40d0b02e31e4f148194faa78: dial tcp 127.0.0.1:6784: getsockopt: connection refused
Dec 20 08:41:20 gpu01 kubelet[10526]: E1220 08:41:20.696892 10526 cni.go:227] Error while adding to cni network: unable to allocate IP address: Post http://127.0.0.1:6784/ip/264521a208ca5f0a3081b5b40e6f0176624c44ee40d0b02e31e4f148194faa78: dial tcp 127.0.0.1:6784: getsockopt: connection refused
Dec 20 08:41:20 gpu01 kubelet[10526]: W1220 08:41:20.697306 10526 container.go:393] Failed to create summary reader for "/libcontainer_14936_systemd_test_default.slice": none of the resources are being tracked.
Dec 20 08:41:20 gpu01 kubelet[10526]: W1220 08:41:20.697520 10526 container.go:393] Failed to create summary reader for "/libcontainer_14941_systemd_test_default.slice": none of the resources are being tracked.
Dec 20 08:41:20 gpu01 kubelet[10526]: E1220 08:41:20.708833 10526 cni.go:259] Error adding network: unable to allocate IP address: Post http://127.0.0.1:6784/ip/296ffa649c2fdb61d7b0e10aa9e0051fbcb2931a0f12dc471820a0b58ad4fc4a: dial tcp 127.0.0.1:6784: getsockopt: connection refused
Dec 20 08:41:20 gpu01 kubelet[10526]: E1220 08:41:20.708860 10526 cni.go:227] Error while adding to cni network: unable to allocate IP address: Post http://127.0.0.1:6784/ip/296ffa649c2fdb61d7b0e10aa9e0051fbcb2931a0f12dc471820a0b58ad4fc4a: dial tcp 127.0.0.1:6784: getsockopt: connection refused
Dec 20 08:41:20 gpu01 kubelet[10526]: E1220 08:41:20.860952 10526 remote_runtime.go:92] RunPodSandbox from runtime service failed: rpc error: code = Unknown desc = NetworkPlugin cni failed to set up pod "nvidia-device-plugin-daemonset-ljmv9_kube-system" network: unable to allocate IP address: Post http://127.0.0.1:6784/ip/264521a208ca5f0a3081b5b40e6f0176624c44ee40d0b02e31e4f148194faa78: dial tcp 127.0.0.1:6784: getsockopt: connection refused
Dec 20 08:41:20 gpu01 kubelet[10526]: E1220 08:41:20.861039 10526 kuberuntime_sandbox.go:54] CreatePodSandbox for pod "nvidia-device-plugin-daemonset-ljmv9_kube-system(ad93d43c-f986-11e8-a0db-fa163e21c438)" failed: rpc error: code = Unknown desc = NetworkPlugin cni failed to set up pod "nvidia-device-plugin-daemonset-ljmv9_kube-system" network: unable to allocate IP address: Post http://127.0.0.1:6784/ip/264521a208ca5f0a3081b5b40e6f0176624c44ee40d0b02e31e4f148194faa78: dial tcp 127.0.0.1:6784: getsockopt: connection refused
Dec 20 08:41:20 gpu01 kubelet[10526]: E1220 08:41:20.861067 10526 kuberuntime_manager.go:646] createPodSandbox for pod "nvidia-device-plugin-daemonset-ljmv9_kube-system(ad93d43c-f986-11e8-a0db-fa163e21c438)" failed: rpc error: code = Unknown desc = NetworkPlugin cni failed to set up pod "nvidia-device-plugin-daemonset-ljmv9_kube-system" network: unable to allocate IP address: Post http://127.0.0.1:6784/ip/264521a208ca5f0a3081b5b40e6f0176624c44ee40d0b02e31e4f148194faa78: dial tcp 127.0.0.1:6784: getsockopt: connection refused
Dec 20 08:41:20 gpu01 kubelet[10526]: E1220 08:41:20.861167 10526 pod_workers.go:186] Error syncing pod ad93d43c-f986-11e8-a0db-fa163e21c438 ("nvidia-device-plugin-daemonset-ljmv9_kube-system(ad93d43c-f986-11e8-a0db-fa163e21c438)"), skipping: failed to "CreatePodSandbox" for "nvidia-device-plugin-daemonset-ljmv9_kube-system(ad93d43c-f986-11e8-a0db-fa163e21c438)" with CreatePodSandboxError: "CreatePodSandbox for pod \"nvidia-device-plugin-daemonset-ljmv9_kube-system(ad93d43c-f986-11e8-a0db-fa163e21c438)\" failed: rpc error: code = Unknown desc = NetworkPlugin cni failed to set up pod \"nvidia-device-plugin-daemonset-ljmv9_kube-system\" network: unable to allocate IP address: Post http://127.0.0.1:6784/ip/264521a208ca5f0a3081b5b40e6f0176624c44ee40d0b02e31e4f148194faa78: dial tcp 127.0.0.1:6784: getsockopt: connection refused"
Dec 20 08:41:20 gpu01 kubelet[10526]: E1220 08:41:20.954796 10526 remote_runtime.go:92] RunPodSandbox from runtime service failed: rpc error: code = Unknown desc = NetworkPlugin cni failed to set up pod "hub-85c95bbd57-bx4sr_jupyter-prod" network: unable to allocate IP address: Post http://127.0.0.1:6784/ip/296ffa649c2fdb61d7b0e10aa9e0051fbcb2931a0f12dc471820a0b58ad4fc4a: dial tcp 127.0.0.1:6784: getsockopt: connection refused
Dec 20 08:41:20 gpu01 kubelet[10526]: E1220 08:41:20.954851 10526 kuberuntime_sandbox.go:54] CreatePodSandbox for pod "hub-85c95bbd57-bx4sr_jupyter-prod(bd2287cb-0475-11e9-90de-fa163e21c438)" failed: rpc error: code = Unknown desc = NetworkPlugin cni failed to set up pod "hub-85c95bbd57-bx4sr_jupyter-prod" network: unable to allocate IP address: Post http://127.0.0.1:6784/ip/296ffa649c2fdb61d7b0e10aa9e0051fbcb2931a0f12dc471820a0b58ad4fc4a: dial tcp 127.0.0.1:6784: getsockopt: connection refused

Why does kubernetes v13's API randomly go down?

cat /etc/redhat-release
CentOS Linux release 7.6.1810 (Core)
Clean install for kubernetes using kubeadm init following the steps directly in the docs. Tried with flannel, weavenet and Calico.
After about 5-10 minutes, after a watch kubectl get nodes, I'll get these messages at random, leaving me with an inaccessible cluster that I can't apply any .yml files to.
Unable to connect to the server: net/http: TLS handshake timeout
The connection to the server 66.70.180.162:6443 was refused - did you specify the right host or port?
Unable to connect to the server: http2: server sent GOAWAY and closed the connection; LastStreamID=1, ErrCode=NO_ERROR, debug=""
kubelet is fine aside from it showing it can't get random services from 66.70.180.162 (the master node)
[root#play ~]# systemctl status kubelet
● kubelet.service - kubelet: The Kubernetes Node Agent
Loaded: loaded (/etc/systemd/system/kubelet.service; enabled; vendor preset: disabled)
Drop-In: /etc/systemd/system/kubelet.service.d
└─10-kubeadm.conf
Active: active (running) since Mon 2018-12-10 13:57:17 EST; 21min ago
Docs: https://kubernetes.io/docs/
Main PID: 3411939 (kubelet)
CGroup: /system.slice/kubelet.service
└─3411939 /usr/bin/kubelet --bootstrap-kubeconfig=/etc/kubernetes/bootstrap-kubelet.conf --kubeconfig=/etc/kubernetes/kubelet.conf --config=/var/lib/kubelet/config.yaml --cgroup-driver=systemd --network-...
Dec 10 14:18:53 play kubelet[3411939]: E1210 14:18:53.811213 3411939 reflector.go:134] object-"kube-system"/"kube-proxy": Failed to list *v1.ConfigMap: Get https://66.70.180.162:6443/api/v1/namespaces/kube-system/c...
Dec 10 14:18:54 play kubelet[3411939]: E1210 14:18:54.011239 3411939 reflector.go:134] k8s.io/kubernetes/pkg/kubelet/kubelet.go:444: Failed to list *v1.Service: Get https://66.70.180.162:6443/api/v1/...nection refused
Dec 10 14:18:54 play kubelet[3411939]: E1210 14:18:54.211160 3411939 reflector.go:134] object-"kube-system"/"kube-proxy-token-n5qjm": Failed to list *v1.Secret: Get https://66.70.180.162:6443/api/v1/namespaces/kube...
Dec 10 14:18:54 play kubelet[3411939]: E1210 14:18:54.411190 3411939 reflector.go:134] object-"kube-system"/"coredns-token-7qjzv": Failed to list *v1.Secret: Get https://66.70.180.162:6443/api/v1/namespaces/kube-sy...
Dec 10 14:18:54 play kubelet[3411939]: E1210 14:18:54.611103 3411939 reflector.go:134] object-"kube-system"/"coredns": Failed to list *v1.ConfigMap: Get https://66.70.180.162:6443/api/v1/namespaces/kube-system/conf...
Dec 10 14:18:54 play kubelet[3411939]: E1210 14:18:54.811105 3411939 reflector.go:134] k8s.io/kubernetes/pkg/kubelet/kubelet.go:453: Failed to list *v1.Node: Get https://66.70.180.162:6443/api/v1/nod...nection refused
Dec 10 14:18:55 play kubelet[3411939]: E1210 14:18:55.011204 3411939 reflector.go:134] k8s.io/kubernetes/pkg/kubelet/config/apiserver.go:47: Failed to list *v1.Pod: Get https://66.70.180.162:6443/api...nection refused
Dec 10 14:18:55 play kubelet[3411939]: E1210 14:18:55.211132 3411939 reflector.go:134] object-"kube-system"/"weave-net-token-5zb86": Failed to list *v1.Secret: Get https://66.70.180.162:6443/api/v1/namespaces/kube-...
Dec 10 14:18:55 play kubelet[3411939]: E1210 14:18:55.411281 3411939 reflector.go:134] object-"kube-system"/"kube-proxy": Failed to list *v1.ConfigMap: Get https://66.70.180.162:6443/api/v1/namespaces/kube-system/c...
Dec 10 14:18:55 play kubelet[3411939]: E1210 14:18:55.611125 3411939 reflector.go:134] k8s.io/kubernetes/pkg/kubelet/kubelet.go:444: Failed to list *v1.Service: Get https://66.70.180.162:6443/api/v1/...nection refused
Hint: Some lines were ellipsized, use -l to show in full.
A docker container that runs coredns shows issues with getting resources from the what looks like anything in k8s default Service Subnet CIDR range (showing a VPS on a separate hosting provider using a local IP here)
.:53
2018-12-10T10:34:52.589Z [INFO] CoreDNS-1.2.6
2018-12-10T10:34:52.589Z [INFO] linux/amd64, go1.11.2, 756749c
CoreDNS-1.2.6
linux/amd64, go1.11.2, 756749c
[INFO] plugin/reload: Running configuration MD5 = f65c4821c8a9b7b5eb30fa4fbc167769
...
E1210 10:55:53.286644 1 reflector.go:205] github.com/coredns/coredns/plugin/kubernetes/controller.go:313: Failed to list *v1.Endpoints: Get https://10.96.0.1:443/api/v1/endpoints?limit=500&resourceVersion=0: dial tcp 10.96.0.1:443: connect: connection refused
E1210 10:55:53.290019 1 reflector.go:205] github.com/coredns/coredns/plugin/kubernetes/controller.go:318: Failed to list *v1.Namespace: Get https://10.96.0.1:443/api/v1/namespaces?limit=500&resourceVersion=0: dial tcp 10.96.0.1:443: connect: connection refused
Kubeapi is just showing random failures and it looks like its mixing in IPv6.
I1210 19:23:09.067462 1 trace.go:76] Trace[1029933921]: "Get /api/v1/nodes/play" (started: 2018-12-10 19:23:00.256692931 +0000 UTC m=+188.530973072) (total time: 8.810746081s):
Trace[1029933921]: [8.810746081s] [8.810715241s] END
E1210 19:23:09.068687 1 available_controller.go:316] v2beta1.autoscaling failed with: Put https://[::1]:6443/apis/apiregistration.k8s.io/v1/apiservices/v2beta1.autoscaling/status: dial tcp [::1]:6443: connect: connection refused
E1210 19:23:09.069678 1 available_controller.go:316] v1. failed with: Put https://[::1]:6443/apis/apiregistration.k8s.io/v1/apiservices/v1./status: dial tcp [::1]:6443: connect: connection refused
E1210 19:23:09.073019 1 available_controller.go:316] v1beta1.apiextensions.k8s.io failed with: Put https://[::1]:6443/apis/apiregistration.k8s.io/v1/apiservices/v1beta1.apiextensions.k8s.io/status: dial tcp [::1]:6443: connect: connection refused
E1210 19:23:09.074112 1 available_controller.go:316] v1beta1.batch failed with: Put https://[::1]:6443/apis/apiregistration.k8s.io/v1/apiservices/v1beta1.batch/status: dial tcp [::1]:6443: connect: connection refused
E1210 19:23:09.075151 1 available_controller.go:316] v2beta2.autoscaling failed with: Put https://[::1]:6443/apis/apiregistration.k8s.io/v1/apiservices/v2beta2.autoscaling/status: dial tcp [::1]:6443: connect: connection refused
E1210 19:23:09.077408 1 available_controller.go:316] v1.authorization.k8s.io failed with: Put https://[::1]:6443/apis/apiregistration.k8s.io/v1/apiservices/v1.authorization.k8s.io/status: dial tcp [::1]:6443: connect: connection refused
E1210 19:23:09.078457 1 available_controller.go:316] v1.networking.k8s.io failed with: Put https://[::1]:6443/apis/apiregistration.k8s.io/v1/apiservices/v1.networking.k8s.io/status: dial tcp [::1]:6443: connect: connection refused
E1210 19:23:09.079449 1 available_controller.go:316] v1beta1.coordination.k8s.io failed with: Put https://[::1]:6443/apis/apiregistration.k8s.io/v1/apiservices/v1beta1.coordination.k8s.io/status: dial tcp [::1]:6443: connect: connection refused
E1210 19:23:09.080558 1 available_controller.go:316] v1.authentication.k8s.io failed with: Put https://[::1]:6443/apis/apiregistration.k8s.io/v1/apiservices/v1.authentication.k8s.io/status: dial tcp [::1]:6443: connect: connection refused
E1210 19:23:09.081628 1 available_controller.go:316] v1beta1.scheduling.k8s.io failed with: Put https://[::1]:6443/apis/apiregistration.k8s.io/v1/apiservices/v1beta1.scheduling.k8s.io/status: dial tcp [::1]:6443: connect: connection refused
E1210 19:23:09.082803 1 available_controller.go:316] v1.autoscaling failed with: Put https://[::1]:6443/apis/apiregistration.k8s.io/v1/apiservices/v1.autoscaling/status: dial tcp [::1]:6443: connect: connection refused
E1210 19:23:09.083845 1 available_controller.go:316] v1beta1.events.k8s.io failed with: Put https://[::1]:6443/apis/apiregistration.k8s.io/v1/apiservices/v1beta1.events.k8s.io/status: dial tcp [::1]:6443: connect: connection refused
E1210 19:23:09.084882 1 available_controller.go:316] v1beta1.storage.k8s.io failed with: Put https://[::1]:6443/apis/apiregistration.k8s.io/v1/apiservices/v1beta1.storage.k8s.io/status: dial tcp [::1]:6443: connect: connection refused
E1210 19:23:09.085985 1 available_controller.go:316] v1.apps failed with: Put https://[::1]:6443/apis/apiregistration.k8s.io/v1/apiservices/v1.apps/status: dial tcp [::1]:6443: connect: connection refused
E1210 19:23:09.087019 1 available_controller.go:316] v1beta1.apps failed with: Put https://[::1]:6443/apis/apiregistration.k8s.io/v1/apiservices/v1beta1.apps/status: dial tcp [::1]:6443: connect: connection refused
E1210 19:23:09.088113 1 available_controller.go:316] v1beta1.certificates.k8s.io failed with: Put https://[::1]:6443/apis/apiregistration.k8s.io/v1/apiservices/v1beta1.certificates.k8s.io/status: dial tcp [::1]:6443: connect: connection refused
E1210 19:23:09.089164 1 available_controller.go:316] v1.storage.k8s.io failed with: Put https://[::1]:6443/apis/apiregistration.k8s.io/v1/apiservices/v1.storage.k8s.io/status: dial tcp [::1]:6443: connect: connection refused
E1210 19:23:09.090268 1 available_controller.go:316] v1beta1.authentication.k8s.io failed with: Put https://[::1]:6443/apis/apiregistration.k8s.io/v1/apiservices/v1beta1.authentication.k8s.io/status: dial tcp [::1]:6443: connect: connection refused
W1210 19:23:28.996746 1 controller.go:181] StopReconciling() timed out
And I'm out of troubleshooting steps.

Kubernetes Authentication issue

I started looking at different ways of using authentication on kubernetes. Of course, I started with the simplest option, static password file. Basically, I created a file named users.csv with the following content:
mauro,maurosil,maurosil123,group_mauro
When I start minikube using this file, it hangs at the cluster components (starting cluster components). The command I use is:
minikube --extra-config=apiserver.Authentication.PasswordFile.BasicAuthFile=~/temp/users.csv start
After a while (~ 10 minutes), the minikube start command fails with the following error message:
E0523 10:23:57.391692 30932 util.go:151] Error uploading error message: : Post https://clouderrorreporting.googleapis.com/v1beta1/projects/k8s-minikube/events:report?key=AIzaSyACUwzG0dEPcl-eOgpDKnyKoUFgHdfoFuA: x509: certificate signed by unknown authority
I can see that there are several errors on the log (minikube logs):
ay 23 09:47:32 minikube kubelet[3301]: E0523 09:47:32.473157 3301 reflector.go:205] k8s.io/kubernetes/pkg/kubelet/config/apiserver.go:47: Failed to list *v1.Pod: Get https://192.168.99.100:8443/api/v1/pods?fieldSelector=spec.nodeName%3Dminikube&limit=500&resourceVersion=0: dial tcp 192.168.99.100:8443: getsockopt: connection refused
May 23 09:47:33 minikube kubelet[3301]: E0523 09:47:33.414460 3301 reflector.go:205] k8s.io/kubernetes/pkg/kubelet/kubelet.go:460: Failed to list *v1.Node: Get https://192.168.99.100:8443/api/v1/nodes?fieldSelector=metadata.name%3Dminikube&limit=500&resourceVersion=0: dial tcp 192.168.99.100:8443: getsockopt: connection refused
May 23 09:47:33 minikube kubelet[3301]: E0523 09:47:33.470604 3301 reflector.go:205] k8s.io/kubernetes/pkg/kubelet/kubelet.go:451: Failed to list *v1.Service: Get https://192.168.99.100:8443/api/v1/services?limit=500&resourceVersion=0: dial tcp 192.168.99.100:8443: getsockopt: connection refused
May 23 09:47:33 minikube kubelet[3301]: E0523 09:47:33.474548 3301 reflector.go:205] k8s.io/kubernetes/pkg/kubelet/config/apiserver.go:47: Failed to list *v1.Pod: Get https://192.168.99.100:8443/api/v1/pods?fieldSelector=spec.nodeName%3Dminikube&limit=500&resourceVersion=0: dial tcp 192.168.99.100:8443: getsockopt: connection refused
May 23 09:47:34 minikube kubelet[3301]: I0523 09:47:34.086654 3301 kubelet_node_status.go:271] Setting node annotation to enable volume controller attach/detach
May 23 09:47:34 minikube kubelet[3301]: I0523 09:47:34.090697 3301 kubelet_node_status.go:82] Attempting to register node minikube
May 23 09:47:34 minikube kubelet[3301]: E0523 09:47:34.091108 3301 kubelet_node_status.go:106] Unable to register node "minikube" with API server: Post https://192.168.99.100:8443/api/v1/nodes: dial tcp 192.168.99.100:8443: getsockopt: connection refused
May 23 09:47:34 minikube kubelet[3301]: E0523 09:47:34.370484 3301 event.go:209] Unable to write event: 'Patch https://192.168.99.100:8443/api/v1/namespaces/default/events/minikube.15313c5b8cf5913c: dial tcp 192.168.99.100:8443: getsockopt: connection refused' (may retry after sleeping)
May 23 09:47:34 minikube kubelet[3301]: E0523 09:47:34.419833 3301 reflector.go:205] k8s.io/kubernetes/pkg/kubelet/kubelet.go:460: Failed to list *v1.Node: Get https://192.168.99.100:8443/api/v1/nodes?fieldSelector=metadata.name%3Dminikube&limit=500&resourceVersion=0: dial tcp 192.168.99.100:8443: getsockopt: connection refused
May 23 09:47:34 minikube kubelet[3301]: E0523 09:47:34.472826 3301 reflector.go:205] k8s.io/kubernetes/pkg/kubelet/kubelet.go:451: Failed to list *v1.Service: Get https://192.168.99.100:8443/api/v1/services?limit=500&resourceVersion=0: dial tcp 192.168.99.100:8443: getsockopt: connection refused
May 23 09:47:34 minikube kubelet[3301]: E0523 09:47:34.479619 3301 reflector.go:205] k8s.io/kubernetes/pkg/kubelet/config/apiserver.go:47: Failed to list *v1.Pod: Get https://192.168.99.100:8443/api/v1/pods?fieldSelector=spec.nodeName%3Dminikube&limit=500&resourceVersion=0: dial tcp 192.168.99.100:8443: getsockopt: connection refused
I also logged in the minikube VM (minikube ssh) and I noticed that the apiserver docker container is down. Looking at the logs of this container I see the following error:
error: unknown flag: --Authentication.PasswordFile.BasicAuthFile
Therefore, I changed my command to something like:
minikube start --extra-config=apiserver.basic-auth-file=~/temp/users.csv
It failed again but now the container shows a different error. The error is no longer related to invalid flag. Instead, it complains that the file not found (no such file or directory). I also tried to specify a file on the minikube vm (/var/lib/localkube) but I had the same issue.
The minikube version is:
minikube version: v0.26.0
When I start minikube without considering the authentication, it works fine. Are there any other steps that I need to do?
Mauro
You will need to mount the file into the docker container that runs apiserver. Pls see a hack that worked: https://github.com/kubernetes/minikube/issues/1898#issuecomment-402714802

What is causing syslog to log lots of "Dial failed: dial tcp x.x.x.x:port: connection refused" messages?

All of the nodes in our AWS kubernetes cluster (Server Version: version.Info{Major:"1", Minor:"0", GitVersion:"v1.0.6", GitCommit:"388061f00f0d9e4d641f9ed4971c775e1654579d", GitTreeState:"clean"}) are getting the following messages sent to /var/log/syslog which are filling the disk very quickly (32GB in about 24 hours).
Dec 4 03:13:36 ubuntu kube-proxy[15171]: I1204 03:13:36.961584 15171 proxysocket.go:130] Accepted TCP connection from 172.30.0.164:58063 to 172.30.0.39:33570
Dec 4 03:13:36 ubuntu kube-proxy[15171]: E1204 03:13:36.961775 15171 proxysocket.go:99] Dial failed: dial tcp 10.244.0.7:5000: connection refused
Dec 4 03:13:36 ubuntu kube-proxy[15171]: E1204 03:13:36.961888 15171 proxysocket.go:99] Dial failed: dial tcp 10.244.2.9:5000: connection refused
Dec 4 03:13:36 ubuntu kube-proxy[15171]: E1204 03:13:36.962104 15171 proxysocket.go:99] Dial failed: dial tcp 10.244.0.7:5000: connection refused
Dec 4 03:13:36 ubuntu kube-proxy[15171]: E1204 03:13:36.962275 15171 proxysocket.go:99] Dial failed: dial tcp 10.244.2.9:5000: connection refused
Dec 4 03:13:36 ubuntu kube-proxy[15171]: E1204 03:13:36.962299 15171 proxysocket.go:133] Failed to connect to balancer: failed to connect to an endpoint.
Dec 4 03:13:36 ubuntu kube-proxy[15171]: I1204 03:13:36.962380 15171 proxysocket.go:130] Accepted TCP connection from 172.30.0.87:29540 to 172.30.0.39:33570
Dec 4 03:13:36 ubuntu kube-proxy[15171]: E1204 03:13:36.962630 15171 proxysocket.go:99] Dial failed: dial tcp 10.244.0.7:5000: connection refused
Dec 4 03:13:36 ubuntu kube-proxy[15171]: E1204 03:13:36.962746 15171 proxysocket.go:99] Dial failed: dial tcp 10.244.2.9:5000: connection refused
Dec 4 03:13:36 ubuntu kube-proxy[15171]: E1204 03:13:36.962958 15171 proxysocket.go:99] Dial failed: dial tcp 10.244.0.7:5000: connection refused
Dec 4 03:13:36 ubuntu kube-proxy[15171]: E1204 03:13:36.963084 15171 proxysocket.go:99] Dial failed: dial tcp 10.244.2.9:5000: connection refused
Dec 4 03:13:36 ubuntu kube-proxy[15171]: E1204 03:13:36.963105 15171 proxysocket.go:133] Failed to connect to balancer: failed to connect to an endpoint.
We created the cluster using export KUBERNETES_PROVIDER=aws; curl -sS https://get.k8s.io | bash if that is relevant.
Can anyone point me into the right direction as to the cause?
port 5000 is used by the local docker registry usually.
It is an add-on though.
Is your cluster pulling images from that local registry? If so, is it working? how is it setup?
this link may help figure your config issues:
https://github.com/kubernetes/kubernetes/tree/master/cluster/addons/registry