Failed to create pod sandbox [flannel] - kubernetes

I am running into this error on random pods. Thank you #matthew-l-daniel for the comment - as I didn't know where to start.
Here is the contents of /opt/cni/bin on the node
:/opt/cni/bin$ ls
bridge host-local loopback
Here are the kubelet logs for a container that failed.
Jan 30 15:42:00 ip-172-20-39-216 kubelet[32233]: E0130 15:42:00.924370 32233 kuberuntime_sandbox.go:54] CreatePodSandbox for pod "postgres-core-0_service-master-459cf23(d8acae2f-24a2-11e9-b79c-0a0d1213cce2)" failed: rpc error: code = Unknown desc = failed to start sandbox container for pod "postgres-core-0": Error response from daemon: grpc: the connection is unavailable
Jan 30 15:42:00 ip-172-20-39-216 kubelet[32233]: E0130 15:42:00.924380 32233 kuberuntime_manager.go:647] createPodSandbox for pod "postgres-core-0_service-master-459cf23(d8acae2f-24a2-11e9-b79c-0a0d1213cce2)" failed: rpc error: code = Unknown desc = failed to start sandbox container for pod "postgres-core-0": Error response from daemon: grpc: the connection is unavailable
Jan 30 15:42:00 ip-172-20-39-216 kubelet[32233]: E0130 15:42:00.924427 32233 pod_workers.go:186] Error syncing pod d8acae2f-24a2-11e9-b79c-0a0d1213cce2 ("postgres-core-0_service-master-459cf23(d8acae2f-24a2-11e9-b79c-0a0d1213cce2)"), skipping: failed to "CreatePodSandbox" for "postgres-core-0_service-master-459cf23(d8acae2f-24a2-11e9-b79c-0a0d1213cce2)" with CreatePodSandboxError: "CreatePodSandbox for pod \"postgres-core-0_service-master-459cf23(d8acae2f-24a2-11e9-b79c-0a0d1213cce2)\" failed: rpc error: code = Unknown desc = failed to start sandbox container for pod \"postgres-core-0\": Error response from daemon: grpc: the connection is unavailable"
As for flannel container logs, there are many flannel pods running - and all are healthy.
Kubernetes v 1.10.11
Docker version 17.03.2-ce, build f5ec1e2
Flannel logs
E0130 15:34:16.536354 1 vxlan_network.go:187] DelFDB failed: no such file or directory
E0130 15:34:16.536411 1 vxlan_network.go:191] failed to delete vxlanRoute (100.107.178.0/24 -> 100.107.178.0): no such process
E0130 17:33:44.848163 1 vxlan_network.go:187] DelFDB failed: no such file or directory
E0130 17:33:44.848219 1 vxlan_network.go:191] failed to delete vxlanRoute (100.107.201.0/24 -> 100.107.201.0): no such process

Related

Kubernetes 1.17 containerd 1.2.0 with Calico CNI node not joining to master

I am setting up the kubernetes cluster on CentOS 8 with containerd and Calico as CNI. with kubeadm command setup the master node, its in Ready status.
When I join the node to master, node not becoming ready status. I see below message the log file.
Jan 14 20:17:29 node02 containerd[1417]: time="2020-01-14T20:17:29.416373526-05:00" level=error msg="RunPodSandbox for &PodSandboxMetadata{Name:calico-node-fbst8,Uid:9c7f6334-d106-48e1-af12-1bcdebc7c2c2,Namespace:kube-system,Attempt:0,} failed, error" error="failed to start sandbox container: failed to create containerd task: OCI runtime create failed: container_linux.go:348: starting container process caused "process_linux.go:279: applying cgroup configuration for process caused \"Invalid unit name 'pod9c7f6334-d106-48e1-af12-1bcdebc7c2c2'\"": unknown"
Jan 14 20:17:29 node02 kubelet[30113]: E0114 20:17:29.416668 30113 remote_runtime.go:105] RunPodSandbox from runtime service failed: rpc error: code = Unknown desc = failed to start sandbox container: failed to create containerd task: OCI runtime create failed: container_linux.go:348: starting container process caused "process_linux.go:279: applying cgroup configuration for process caused \"Invalid unit name 'pod9c7f6334-d106-48e1-af12-1bcdebc7c2c2'\"": unknown
Jan 14 20:17:29 node02 kubelet[30113]: E0114 20:17:29.416742 30113 kuberuntime_sandbox.go:68] CreatePodSandbox for pod "calico-node-fbst8_kube-system(9c7f6334-d106-48e1-af12-1bcdebc7c2c2)" failed: rpc error: code = Unknown desc = failed to start sandbox container: failed to create containerd task: OCI runtime create failed: container_linux.go:348: starting container process caused "process_linux.go:279: applying cgroup configuration for process caused \"Invalid unit name 'pod9c7f6334-d106-48e1-af12-1bcdebc7c2c2'\"": unknown
Jan 14 20:17:29 node02 kubelet[30113]: E0114 20:17:29.416761 30113 kuberuntime_manager.go:729] createPodSandbox for pod "calico-node-fbst8_kube-system(9c7f6334-d106-48e1-af12-1bcdebc7c2c2)" failed: rpc error: code = Unknown desc = failed to start sandbox container: failed to create containerd task: OCI runtime create failed: container_linux.go:348: starting container process caused "process_linux.go:279: applying cgroup configuration for process caused \"Invalid unit name 'pod9c7f6334-d106-48e1-af12-1bcdebc7c2c2'\"": unknown
Jan 14 20:17:29 node02 kubelet[30113]: E0114 20:17:29.416819 30113 pod_workers.go:191] Error syncing pod 9c7f6334-d106-48e1-af12-1bcdebc7c2c2 ("calico-node-fbst8_kube-system(9c7f6334-d106-48e1-af12-1bcdebc7c2c2)"), skipping: failed to "CreatePodSandbox" for "calico-node-fbst8_kube-system(9c7f6334-d106-48e1-af12-1bcdebc7c2c2)" with CreatePodSandboxError: "CreatePodSandbox for pod \"calico-node-fbst8_kube-system(9c7f6334-d106-48e1-af12-1bcdebc7c2c2)\" failed: rpc error: code = Unknown desc = failed to start sandbox container: failed to create containerd task: OCI runtime create failed: container_linux.go:348: starting container process caused \"process_linux.go:279: applying cgroup configuration for process caused \\\"Invalid unit name 'pod9c7f6334-d106-48e1-af12-1bcdebc7c2c2'\\\"\": unknown"
Jan 14 20:17:30 node02 containerd[1417]: time="2020-01-14T20:17:30.541254039-05:00" level=error msg="Failed to load cni configuration" error="cni config load failed: no network config found in /etc/cni/net.d: cni plugin not initialized: failed to load cni config"
Jan 14 20:17:30 node02 kubelet[30113]: E0114 20:17:30.541394 30113 kubelet.go:2183] Container runtime network not ready: NetworkReady=false reason:NetworkPluginNotReady message:Network plugin returns error: cni plugin not initialized
Jan 14 20:17:35 node02 containerd[1417]: time="2020-01-14T20:17:35.541792325-05:00" level=error msg="Failed to load cni configuration" error="cni config load failed: no network config found in /etc/cni/net.d: cni plugin not initialized: failed to load cni config"
Jan 14 20:17:35 node02 kubelet[30113]: E0114 20:17:35.541929 30113 kubelet.go:2183] Container runtime network not ready: NetworkReady=false reason:NetworkPluginNotReady message:Network plugin returns error: cni plugin not initialized
Any tips to resolve this error?
Did you setting --pod-network-cidr=192.168.0.0/16 to kubeadm init?
Apparently, You need setting it.
https://kubernetes.io/docs/setup/production-environment/tools/kubeadm/create-cluster-kubeadm/#pod-network
Because you are not using docker you need to setup the cgroup driver explicitly.
To use the systemd cgroup driver, set plugins.cri.systemd_cgroup = true in /etc/containerd/config.toml and systemctl restart containerd
You have to modify the file kubeadm-flags.env in /var/lib/kubelet and set the cgroups driver.
KUBELET_EXTRA_ARGS=--cgroup-driver=systemd
Make sure to point to above file in /etc/systemd/system/kubelet.service.d/10-kubeadm.conf
EnvironmentFile=-/var/lib/kubelet/kubeadm-flags.env

containerCreating after a drain on K8S

After trying to make a drain on one node of my cluster I have this kind of error:
Failed create pod sandbox: rpc error: code = Unknown desc = failed to start sandbox container for pod "gitlab-postgresql-stolon-keeper-2": Error response from daemon: all SubConns are in TransientFailure, latest connection error: connection error: desc = "transport: Error while dialing dial unix /run/containerd/containerd.sock: connect: connection refused": unavailable
I'm using and K8S 15.4
docker v18.09.9

Kubectl is not able to get cluster info from minikube

I am new to Kubernetes and trying out the minikube tutorial. I have a Mac and I have installed minikube, kubectl cli and hyperkit driver. Docker daemon has been running. I have started minikube by passing the proxy variables.
minikube start --vm-driver=hyperkit \
--docker-env HTTP_PROXY=http://my-http-proxy-host:my-http-proxy-port \
--docker-env HTTPS_PROXY=https://my-https-proxy-host:my-https-proxy-port
I set kubectl to use minikube context.
However when I run the:
kubectl cluster-info
I am getting the below error:
Unable to connect to the server: net/http: TLS handshake timeout
Below is the output of minikube logs | grep error:
Oct 21 16:53:22 minikube localkube[3010]: E1021 16:53:22.531735 3010 proxier.go:1701] Failed to delete stale service IP 10.96.0.10 connections, error: error deleting connection tracking state for UDP service IP: 10.96.0.10, error: error looking for path of conntrack: exec: "conntrack": executable file not found in $PATH
Oct 21 16:53:26 minikube localkube[3010]: E1021 16:53:26.781082 3010 proxier.go:964] Failed to delete kube-system/kube-dns:dns endpoint connections, error: error deleting conntrack entries for UDP peer {10.96.0.10, 172.17.0.3}, error: error looking for path of conntrack: exec: "conntrack": executable file not found in $PATH
Oct 21 16:53:37 minikube localkube[3010]: E1021 16:53:37.528164 3010 proxier.go:1701] Failed to delete stale service IP 10.96.0.10 connections, error: error deleting connection tracking state for UDP service IP: 10.96.0.10, error: error looking for path of conntrack: exec: "conntrack": executable file not found in $PATH
Oct 23 01:45:00 minikube localkube[3010]: E1023 01:45:00.513057 3010 remote_runtime.go:115] StopPodSandbox "9aad7169cf8c357f512575118efdcb88fb796bcc7642c44b5f3b79f2310720ff" from runtime service failed: rpc error: code = Unknown desc = Cannot connect to the Docker daemon at unix:///var/run/docker.sock. Is the docker daemon running?
Oct 23 01:45:00 minikube localkube[3010]: E1023 01:45:00.513654 3010 remote_runtime.go:115] StopPodSandbox "172f08e151ec2a7b12132bb65917b7d993cbba9978f4ff5d6f272fba132317d0" from runtime service failed: rpc error: code = Unknown desc = [failed to get checkpoint for sandbox "172f08e151ec2a7b12132bb65917b7d993cbba9978f4ff5d6f272fba132317d0": key is not found, failed to get sandbox status: Cannot connect to the Docker daemon at unix:///var/run/docker.sock. Is the docker daemon running?]
Oct 23 01:45:00 minikube localkube[3010]: E1023 01:45:00.513758 3010 kuberuntime_manager.go:595] killPodWithSyncResult failed: failed to "KillPodSandbox" for "22d0437c-d333-11e8-b4fc-0800277656c6" with KillPodSandboxError: "rpc error: code = Unknown desc = [failed to get checkpoint for sandbox \"172f08e151ec2a7b12132bb65917b7d993cbba9978f4ff5d6f272fba132317d0\": key is not found, failed to get sandbox status: Cannot connect to the Docker daemon at unix:///var/run/docker.sock. Is the docker daemon running?]"
Oct 23 01:45:00 minikube localkube[3010]: E1023 01:45:00.513805 3010 pod_workers.go:186] Error syncing pod 22d0437c-d333-11e8-b4fc-0800277656c6 ("storage-provisioner_kube-system(22d0437c-d333-11e8-b4fc-0800277656c6)"), skipping: failed to "KillPodSandbox" for "22d0437c-d333-11e8-b4fc-0800277656c6" with KillPodSandboxError: "rpc error: code = Unknown desc = [failed to get checkpoint for sandbox \"172f08e151ec2a7b12132bb65917b7d993cbba9978f4ff5d6f272fba132317d0\": key is not found, failed to get sandbox status: Cannot connect to the Docker daemon at unix:///var/run/docker.sock. Is the docker daemon running?]"
Oct 23 01:45:00 minikube localkube[3010]: E1023 01:45:00.549309 3010 remote_runtime.go:229] StopContainer "7058c1679128f30650a384e0ce3cfe31eff7fb2fbcd144b20fd26e1c94a1d61b" from runtime service failed: rpc error: code = Unknown desc = Cannot connect to the Docker daemon at unix:///var/run/docker.sock. Is the docker daemon running?
Oct 23 01:45:00 minikube localkube[3010]: E1023 01:45:00.550191 3010 kuberuntime_container.go:604] Container "docker://7058c1679128f30650a384e0ce3cfe31eff7fb2fbcd144b20fd26e1c94a1d61b" termination failed with gracePeriod 30: rpc error: code = Unknown desc = Cannot connect to the Docker daemon at unix:///var/run/docker.sock. Is the docker daemon running?
Oct 23 01:45:00 minikube localkube[3010]: E1023 01:45:00.549603 3010 remote_runtime.go:229] StopContainer "0837dc4d55ea00fe3367fce3b94e5a1baf5a2ed7f1affe4f315312bf94c68a21" from runtime service failed: rpc error: code = Unknown desc = Cannot connect to the Docker daemon at unix:///var/run/docker.sock. Is the docker daemon running?
Oct 23 01:45:00 minikube localkube[3010]: E1023 01:45:00.554796 3010 kuberuntime_container.go:604] Container "docker://0837dc4d55ea00fe3367fce3b94e5a1baf5a2ed7f1affe4f315312bf94c68a21" termination failed with gracePeriod 30: rpc error: code = Unknown desc = Cannot connect to the Docker daemon at unix:///var/run/docker.sock. Is the docker daemon running?
Oct 23 01:45:00 minikube localkube[3010]: E1023 01:45:00.556955 3010 remote_runtime.go:115] StopPodSandbox "69b05b274f31ee80f9e1e732e51d2a1859c81a9deaa913e54dcfafe891340acc" from runtime service failed: rpc error: code = Unknown desc = Cannot connect to the Docker daemon at unix:///var/run/docker.sock. Is the docker daemon running?
Oct 23 01:45:00 minikube localkube[3010]: E1023 01:45:00.561291 3010 remote_runtime.go:115] StopPodSandbox "8536992fb81fc810a29b0fa41733a960b5ad422e0b5c8a2a206c1f8d165bc6ec" from runtime service failed: rpc error: code = Unknown desc = [failed to get checkpoint for sandbox "8536992fb81fc810a29b0fa41733a960b5ad422e0b5c8a2a206c1f8d165bc6ec": key is not found, failed to get sandbox status: Cannot connect to the Docker daemon at unix:///var/run/docker.sock. Is the docker daemon running?]
Oct 23 01:45:00 minikube localkube[3010]: E1023 01:45:00.561411 3010 kuberuntime_manager.go:595] killPodWithSyncResult failed: [failed to "KillContainer" for "kubedns" with KillContainerError: "rpc error: code = Unknown desc = Cannot connect to the Docker daemon at unix:///var/run/docker.sock. Is the docker daemon running?"
Oct 23 01:45:00 minikube localkube[3010]: , failed to "KillContainer" for "dnsmasq" with KillContainerError: "rpc error: code = Unknown desc = Cannot connect to the Docker daemon at unix:///var/run/docker.sock. Is the docker daemon running?"
Oct 23 01:45:00 minikube localkube[3010]: , failed to "KillPodSandbox" for "23825d5c-d333-11e8-b4fc-0800277656c6" with KillPodSandboxError: "rpc error: code = Unknown desc = [failed to get checkpoint for sandbox \"8536992fb81fc810a29b0fa41733a960b5ad422e0b5c8a2a206c1f8d165bc6ec\": key is not found, failed to get sandbox status: Cannot connect to the Docker daemon at unix:///var/run/docker.sock. Is the docker daemon running?]"
Oct 23 01:45:00 minikube localkube[3010]: E1023 01:45:00.561442 3010 pod_workers.go:186] Error syncing pod 23825d5c-d333-11e8-b4fc-0800277656c6 ("kube-dns-54cccfbdf8-wk7v5_kube-system(23825d5c-d333-11e8-b4fc-0800277656c6)"), skipping: [failed to "KillContainer" for "kubedns" with KillContainerError: "rpc error: code = Unknown desc = Cannot connect to the Docker daemon at unix:///var/run/docker.sock. Is the docker daemon running?"
Oct 23 01:45:00 minikube localkube[3010]: , failed to "KillContainer" for "dnsmasq" with KillContainerError: "rpc error: code = Unknown desc = Cannot connect to the Docker daemon at unix:///var/run/docker.sock. Is the docker daemon running?"
Oct 23 01:45:00 minikube localkube[3010]: , failed to "KillPodSandbox" for "23825d5c-d333-11e8-b4fc-0800277656c6" with KillPodSandboxError: "rpc error: code = Unknown desc = [failed to get checkpoint for sandbox \"8536992fb81fc810a29b0fa41733a960b5ad422e0b5c8a2a206c1f8d165bc6ec\": key is not found, failed to get sandbox status: Cannot connect to the Docker daemon at unix:///var/run/docker.sock. Is the docker daemon running?]"
Oct 23 01:45:01 minikube localkube[3010]: E1023 01:45:01.607038 3010 remote_runtime.go:229] StopContainer "7058c1679128f30650a384e0ce3cfe31eff7fb2fbcd144b20fd26e1c94a1d61b" from runtime service failed: rpc error: code = Unknown desc = Cannot connect to the Docker daemon at unix:///var/run/docker.sock. Is the docker daemon running?
Oct 23 01:45:01 minikube localkube[3010]: E1023 01:45:01.607075 3010 kuberuntime_container.go:604] Container "docker://7058c1679128f30650a384e0ce3cfe31eff7fb2fbcd144b20fd26e1c94a1d61b" termination failed with gracePeriod 30: rpc error: code = Unknown desc = Cannot connect to the Docker daemon at unix:///var/run/docker.sock. Is the docker daemon running?
Oct 23 01:45:01 minikube localkube[3010]: E1023 01:45:01.607162 3010 remote_runtime.go:229] StopContainer "0837dc4d55ea00fe3367fce3b94e5a1baf5a2ed7f1affe4f315312bf94c68a21" from runtime service failed: rpc error: code = Unknown desc = Cannot connect to the Docker daemon at unix:///var/run/docker.sock. Is the docker daemon running?
Oct 23 01:45:01 minikube localkube[3010]: E1023 01:45:01.607177 3010 kuberuntime_container.go:604] Container "docker://0837dc4d55ea00fe3367fce3b94e5a1baf5a2ed7f1affe4f315312bf94c68a21" termination failed with gracePeriod 30: rpc error: code = Unknown desc = Cannot connect to the Docker daemon at unix:///var/run/docker.sock. Is the docker daemon running?
Oct 23 01:45:01 minikube localkube[3010]: E1023 01:45:01.607527 3010 remote_runtime.go:115] StopPodSandbox "69b05b274f31ee80f9e1e732e51d2a1859c81a9deaa913e54dcfafe891340acc" from runtime service failed: rpc error: code = Unknown desc = Cannot connect to the Docker daemon at unix:///var/run/docker.sock. Is the docker daemon running?
Oct 23 01:45:01 minikube localkube[3010]: E1023 01:45:01.607907 3010 remote_runtime.go:115] StopPodSandbox "8536992fb81fc810a29b0fa41733a960b5ad422e0b5c8a2a206c1f8d165bc6ec" from runtime service failed: rpc error: code = Unknown desc = [failed to get checkpoint for sandbox "8536992fb81fc810a29b0fa41733a960b5ad422e0b5c8a2a206c1f8d165bc6ec": key is not found, failed to get sandbox status: Cannot connect to the Docker daemon at unix:///var/run/docker.sock. Is the docker daemon running?]
Oct 23 01:45:01 minikube localkube[3010]: E1023 01:45:01.608521 3010 kuberuntime_manager.go:595] killPodWithSyncResult failed: [failed to "KillContainer" for "kubedns" with KillContainerError: "rpc error: code = Unknown desc = Cannot connect to the Docker daemon at unix:///var/run/docker.sock. Is the docker daemon running?"
Oct 23 01:45:01 minikube localkube[3010]: , failed to "KillContainer" for "dnsmasq" with KillContainerError: "rpc error: code = Unknown desc = Cannot connect to the Docker daemon at unix:///var/run/docker.sock. Is the docker daemon running?"
Oct 23 01:45:01 minikube localkube[3010]: , failed to "KillPodSandbox" for "23825d5c-d333-11e8-b4fc-0800277656c6" with KillPodSandboxError: "rpc error: code = Unknown desc = [failed to get checkpoint for sandbox \"8536992fb81fc810a29b0fa41733a960b5ad422e0b5c8a2a206c1f8d165bc6ec\": key is not found, failed to get sandbox status: Cannot connect to the Docker daemon at unix:///var/run/docker.sock. Is the docker daemon running?]"
Oct 23 01:45:01 minikube localkube[3010]: E1023 01:45:01.608574 3010 pod_workers.go:186] Error syncing pod 23825d5c-d333-11e8-b4fc-0800277656c6 ("kube-dns-54cccfbdf8-wk7v5_kube-system(23825d5c-d333-11e8-b4fc-0800277656c6)"), skipping: [failed to "KillContainer" for "kubedns" with KillContainerError: "rpc error: code = Unknown desc = Cannot connect to the Docker daemon at unix:///var/run/docker.sock. Is the docker daemon running?"
Oct 23 01:45:01 minikube localkube[3010]: , failed to "KillContainer" for "dnsmasq" with KillContainerError: "rpc error: code = Unknown desc = Cannot connect to the Docker daemon at unix:///var/run/docker.sock. Is the docker daemon running?"
Oct 23 01:45:01 minikube localkube[3010]: , failed to "KillPodSandbox" for "23825d5c-d333-11e8-b4fc-0800277656c6" with KillPodSandboxError: "rpc error: code = Unknown desc = [failed to get checkpoint for sandbox \"8536992fb81fc810a29b0fa41733a960b5ad422e0b5c8a2a206c1f8d165bc6ec\": key is not found, failed to get sandbox status: Cannot connect to the Docker daemon at unix:///var/run/docker.sock. Is the docker daemon running?]"
Oct 23 01:45:01 minikube localkube[3010]: E1023 01:45:01.683566 3010 remote_runtime.go:169] ListPodSandbox with filter &PodSandboxFilter{Id:,State:&PodSandboxStateValue{State:SANDBOX_READY,},LabelSelector:map[string]string{},} from runtime service failed: rpc error: code = Unknown desc = Cannot connect to the Docker daemon at unix:///var/run/docker.sock. Is the docker daemon running?
Oct 23 01:45:01 minikube localkube[3010]: E1023 01:45:01.683676 3010 kuberuntime_sandbox.go:192] ListPodSandbox failed: rpc error: code = Unknown desc = Cannot connect to the Docker daemon at unix:///var/run/docker.sock. Is the docker daemon running?
Oct 23 01:45:01 minikube localkube[3010]: E1023 01:45:01.683710 3010 kubelet.go:1929] Failed cleaning pods: rpc error: code = Unknown desc = Cannot connect to the Docker daemon at unix:///var/run/docker.sock. Is the docker daemon running?
Oct 23 01:45:05 minikube localkube[3010]: E1023 01:45:05.299705 3010 proxier.go:964] Failed to delete kube-system/kube-dns:dns endpoint connections, error: error deleting conntrack entries for UDP peer {10.96.0.10, 172.17.0.3}, error: error looking for path of conntrack: exec: "conntrack": executable file not found in $PATH
Oct 23 01:45:09 minikube localkube[3010]: E1023 01:45:09.791294 3010 remote_runtime.go:92] RunPodSandbox from runtime service failed: rpc error: code = Unknown desc = failed to start sandbox container for pod "kube-addon-manager-minikube": Error response from daemon: transport is closing
Oct 23 01:45:09 minikube localkube[3010]: E1023 01:45:09.791344 3010 kuberuntime_sandbox.go:54] CreatePodSandbox for pod "kube-addon-manager-minikube_kube-system(c4c3188325a93a2d7fb1714e1abf1259)" failed: rpc error: code = Unknown desc = failed to start sandbox container for pod "kube-addon-manager-minikube": Error response from daemon: transport is closing
Oct 23 01:45:09 minikube localkube[3010]: E1023 01:45:09.791356 3010 kuberuntime_manager.go:647] createPodSandbox for pod "kube-addon-manager-minikube_kube-system(c4c3188325a93a2d7fb1714e1abf1259)" failed: rpc error: code = Unknown desc = failed to start sandbox container for pod "kube-addon-manager-minikube": Error response from daemon: transport is closing
Oct 23 01:45:09 minikube localkube[3010]: E1023 01:45:09.791399 3010 pod_workers.go:186] Error syncing pod c4c3188325a93a2d7fb1714e1abf1259 ("kube-addon-manager-minikube_kube-system(c4c3188325a93a2d7fb1714e1abf1259)"), skipping: failed to "CreatePodSandbox" for "kube-addon-manager-minikube_kube-system(c4c3188325a93a2d7fb1714e1abf1259)" with CreatePodSandboxError: "CreatePodSandbox for pod \"kube-addon-manager-minikube_kube-system(c4c3188325a93a2d7fb1714e1abf1259)\" failed: rpc error: code = Unknown desc = failed to start sandbox container for pod \"kube-addon-manager-minikube\": Error response from daemon: transport is closing"
Oct 23 01:45:10 minikube localkube[3010]: E1023 01:45:10.055640 3010 remote_runtime.go:92] RunPodSandbox from runtime service failed: rpc error: code = Unknown desc = failed to start sandbox container for pod "kube-dns-54cccfbdf8-wk7v5": Error response from daemon: failed to update store for object type *libnetwork.endpoint: open : no such file or directory
Oct 23 01:45:10 minikube localkube[3010]: E1023 01:45:10.055893 3010 kuberuntime_sandbox.go:54] CreatePodSandbox for pod "kube-dns-54cccfbdf8-wk7v5_kube-system(23825d5c-d333-11e8-b4fc-0800277656c6)" failed: rpc error: code = Unknown desc = failed to start sandbox container for pod "kube-dns-54cccfbdf8-wk7v5": Error response from daemon: failed to update store for object type *libnetwork.endpoint: open : no such file or directory
Oct 23 01:45:10 minikube localkube[3010]: E1023 01:45:10.056004 3010 kuberuntime_manager.go:647] createPodSandbox for pod "kube-dns-54cccfbdf8-wk7v5_kube-system(23825d5c-d333-11e8-b4fc-0800277656c6)" failed: rpc error: code = Unknown desc = failed to start sandbox container for pod "kube-dns-54cccfbdf8-wk7v5": Error response from daemon: failed to update store for object type *libnetwork.endpoint: open : no such file or directory
Oct 23 01:45:10 minikube localkube[3010]: E1023 01:45:10.056194 3010 pod_workers.go:186] Error syncing pod 23825d5c-d333-11e8-b4fc-0800277656c6 ("kube-dns-54cccfbdf8-wk7v5_kube-system(23825d5c-d333-11e8-b4fc-0800277656c6)"), skipping: failed to "CreatePodSandbox" for "kube-dns-54cccfbdf8-wk7v5_kube-system(23825d5c-d333-11e8-b4fc-0800277656c6)" with CreatePodSandboxError: "CreatePodSandbox for pod \"kube-dns-54cccfbdf8-wk7v5_kube-system(23825d5c-d333-11e8-b4fc-0800277656c6)\" failed: rpc error: code = Unknown desc = failed to start sandbox container for pod \"kube-dns-54cccfbdf8-wk7v5\": Error response from daemon: failed to update store for object type *libnetwork.endpoint: open : no such file or directory"
Oct 23 01:45:10 minikube localkube[3010]: E1023 01:45:10.770086 3010 remote_runtime.go:92] RunPodSandbox from runtime service failed: rpc error: code = Unknown desc = failed to start sandbox container for pod "kube-dns-54cccfbdf8-wk7v5": error during connect: Post http://%2Fvar%2Frun%2Fdocker.sock/v1.31/containers/d3249fd7ea20fd2f6331f1cc78f3685b15fc63359992de039188960eafc893cd/start: EOF
Oct 23 01:45:10 minikube localkube[3010]: E1023 01:45:10.770123 3010 kuberuntime_sandbox.go:54] CreatePodSandbox for pod "kube-dns-54cccfbdf8-wk7v5_kube-system(23825d5c-d333-11e8-b4fc-0800277656c6)" failed: rpc error: code = Unknown desc = failed to start sandbox container for pod "kube-dns-54cccfbdf8-wk7v5": error during connect: Post http://%2Fvar%2Frun%2Fdocker.sock/v1.31/containers/d3249fd7ea20fd2f6331f1cc78f3685b15fc63359992de039188960eafc893cd/start: EOF
Oct 23 01:45:10 minikube localkube[3010]: E1023 01:45:10.770133 3010 kuberuntime_manager.go:647] createPodSandbox for pod "kube-dns-54cccfbdf8-wk7v5_kube-system(23825d5c-d333-11e8-b4fc-0800277656c6)" failed: rpc error: code = Unknown desc = failed to start sandbox container for pod "kube-dns-54cccfbdf8-wk7v5": error during connect: Post http://%2Fvar%2Frun%2Fdocker.sock/v1.31/containers/d3249fd7ea20fd2f6331f1cc78f3685b15fc63359992de039188960eafc893cd/start: EOF
Oct 23 01:45:10 minikube localkube[3010]: E1023 01:45:10.770174 3010 pod_workers.go:186] Error syncing pod 23825d5c-d333-11e8-b4fc-0800277656c6 ("kube-dns-54cccfbdf8-wk7v5_kube-system(23825d5c-d333-11e8-b4fc-0800277656c6)"), skipping: failed to "CreatePodSandbox" for "kube-dns-54cccfbdf8-wk7v5_kube-system(23825d5c-d333-11e8-b4fc-0800277656c6)" with CreatePodSandboxError: "CreatePodSandbox for pod \"kube-dns-54cccfbdf8-wk7v5_kube-system(23825d5c-d333-11e8-b4fc-0800277656c6)\" failed: rpc error: code = Unknown desc = failed to start sandbox container for pod \"kube-dns-54cccfbdf8-wk7v5\": error during connect: Post http://%2Fvar%2Frun%2Fdocker.sock/v1.31/containers/d3249fd7ea20fd2f6331f1cc78f3685b15fc63359992de039188960eafc893cd/start: EOF"
Oct 23 01:45:10 minikube localkube[3010]: E1023 01:45:10.770502 3010 remote_runtime.go:92] RunPodSandbox from runtime service failed: rpc error: code = Unknown desc = failed to create a sandbox for pod "kube-addon-manager-minikube": error during connect: Post http://%2Fvar%2Frun%2Fdocker.sock/v1.31/containers/create?name=k8s_POD_kube-addon-manager-minikube_kube-system_c4c3188325a93a2d7fb1714e1abf1259_9: EOF
Oct 23 01:45:10 minikube localkube[3010]: E1023 01:45:10.770566 3010 kuberuntime_sandbox.go:54] CreatePodSandbox for pod "kube-addon-manager-minikube_kube-system(c4c3188325a93a2d7fb1714e1abf1259)" failed: rpc error: code = Unknown desc = failed to create a sandbox for pod "kube-addon-manager-minikube": error during connect: Post http://%2Fvar%2Frun%2Fdocker.sock/v1.31/containers/create?name=k8s_POD_kube-addon-manager-minikube_kube-system_c4c3188325a93a2d7fb1714e1abf1259_9: EOF
Oct 23 01:45:10 minikube localkube[3010]: E1023 01:45:10.770575 3010 kuberuntime_manager.go:647] createPodSandbox for pod "kube-addon-manager-minikube_kube-system(c4c3188325a93a2d7fb1714e1abf1259)" failed: rpc error: code = Unknown desc = failed to create a sandbox for pod "kube-addon-manager-minikube": error during connect: Post http://%2Fvar%2Frun%2Fdocker.sock/v1.31/containers/create?name=k8s_POD_kube-addon-manager-minikube_kube-system_c4c3188325a93a2d7fb1714e1abf1259_9: EOF
Oct 23 01:45:10 minikube localkube[3010]: E1023 01:45:10.770602 3010 pod_workers.go:186] Error syncing pod c4c3188325a93a2d7fb1714e1abf1259 ("kube-addon-manager-minikube_kube-system(c4c3188325a93a2d7fb1714e1abf1259)"), skipping: failed to "CreatePodSandbox" for "kube-addon-manager-minikube_kube-system(c4c3188325a93a2d7fb1714e1abf1259)" with CreatePodSandboxError: "CreatePodSandbox for pod \"kube-addon-manager-minikube_kube-system(c4c3188325a93a2d7fb1714e1abf1259)\" failed: rpc error: code = Unknown desc = failed to create a sandbox for pod \"kube-addon-manager-minikube\": error during connect: Post http://%2Fvar%2Frun%2Fdocker.sock/v1.31/containers/create?name=k8s_POD_kube-addon-manager-minikube_kube-system_c4c3188325a93a2d7fb1714e1abf1259_9: EOF"
Oct 23 01:45:23 minikube localkube[3010]: E1023 01:45:23.491216 3010 proxier.go:1701] Failed to delete stale service IP 10.96.0.10 connections, error: error deleting connection tracking state for UDP service IP: 10.96.0.10, error: error looking for path of conntrack: exec: "conntrack": executable file not found in $PATH
Oct 23 01:45:35 minikube localkube[21585]: http: TLS handshake error from 172.17.0.2:37382: remote error: tls: bad certificate
Oct 23 01:45:35 minikube localkube[21585]: http: TLS handshake error from 172.17.0.2:37386: remote error: tls: bad certificate
Oct 23 01:45:35 minikube localkube[21585]: http: TLS handshake error from 127.0.0.1:34088: remote error: tls: bad certificate
Oct 23 01:45:35 minikube localkube[21585]: http: TLS handshake error from 172.17.0.2:37384: remote error: tls: bad certificate
Oct 23 01:45:36 minikube localkube[21585]: http: TLS handshake error from 127.0.0.1:34112: remote error: tls: bad certificate
Oct 23 01:45:36 minikube localkube[21585]: http: TLS handshake error from 172.17.0.2:37414: remote error: tls: bad certificate
Oct 23 01:45:36 minikube localkube[21585]: http: TLS handshake error from 172.17.0.2:37416: remote error: tls: bad certificate
Oct 23 01:45:36 minikube localkube[21585]: http: TLS handshake error from 172.17.0.2:37418: remote error: tls: bad certificate
Oct 23 01:45:37 minikube localkube[21585]: http: TLS handshake error from 127.0.0.1:34120: remote error: tls: bad certificate
Oct 23 01:45:37 minikube localkube[21585]: http: TLS handshake error from 172.17.0.2:37422: remote error: tls: bad certificate
It appears that the minikube had issues. I am getting the same error when I start the minikube without proxy as well.
hyperkit version : hyperkit: v0.20180403-17-g3e954c
minikube version: v0.25.1
kubectl version
Client Version: version.Info{Major:"1", Minor:"12", GitVersion:"v1.12.0", GitCommit:"0ed33881dc4355495f623c6f22e7dd0b7632b7c0", GitTreeState:"clean", BuildDate:"2018-09-28T15:20:58Z", GoVersion:"go1.11", Compiler:"gc", Platform:"darwin/amd64"}
Ran kubectl -v9 cluster-info and below is the output.
I1023 00:11:12.639212 18611 loader.go:359] Config loaded from file /Users/fna516/.kube/config
I1023 00:11:12.640094 18611 loader.go:359] Config loaded from file /Users/fna516/.kube/config
I1023 00:11:12.641924 18611 round_trippers.go:386] curl -k -v -XGET -H "Accept: application/json, */*" -H "User-Agent: kubectl/v1.12.0 (darwin/amd64) kubernetes/0ed3388" 'https://192.168.99.100:8443/api?timeout=32s'
I1023 00:11:22.826784 18611 round_trippers.go:405] GET https://192.168.99.100:8443/api?timeout=32s in 10185 milliseconds
I1023 00:11:22.826808 18611 round_trippers.go:411] Response Headers:
I1023 00:11:22.826900 18611 cached_discovery.go:111] skipped caching discovery info due to Get https://192.168.99.100:8443/api?timeout=32s: net/http: TLS handshake timeout
I1023 00:11:22.827565 18611 loader.go:359] Config loaded from file /Users/fna516/.kube/config
I1023 00:11:22.828233 18611 round_trippers.go:386] curl -k -v -XGET -H "Accept: application/json, */*" -H "User-Agent: kubectl/v1.12.0 (darwin/amd64) kubernetes/0ed3388" 'https://192.168.99.100:8443/api?timeout=32s'
I1023 00:11:32.943036 18611 round_trippers.go:405] GET https://192.168.99.100:8443/api?timeout=32s in 10115 milliseconds
I1023 00:11:32.943069 18611 round_trippers.go:411] Response Headers:
I1023 00:11:32.943112 18611 cached_discovery.go:111] skipped caching discovery info due to Get https://192.168.99.100:8443/api?timeout=32s: net/http: TLS handshake timeout
I1023 00:11:32.943161 18611 shortcut.go:89] Error loading discovery information: Get https://192.168.99.100:8443/api?timeout=32s: net/http: TLS handshake timeout
I1023 00:11:32.943340 18611 round_trippers.go:386] curl -k -v -XGET -H "Accept: application/json, */*" -H "User-Agent: kubectl/v1.12.0 (darwin/amd64) kubernetes/0ed3388" 'https://192.168.99.100:8443/api?timeout=32s'
I1023 00:11:43.397428 18611 round_trippers.go:405] GET https://192.168.99.100:8443/api?timeout=32s in 10454 milliseconds
I1023 00:11:43.397454 18611 round_trippers.go:411] Response Headers:
I1023 00:11:43.397505 18611 cached_discovery.go:111] skipped caching discovery info due to Get https://192.168.99.100:8443/api?timeout=32s: net/http: TLS handshake timeout
I1023 00:11:43.397619 18611 round_trippers.go:386] curl -k -v -XGET -H "Accept: application/json, */*" -H "User-Agent: kubectl/v1.12.0 (darwin/amd64) kubernetes/0ed3388" 'https://192.168.99.100:8443/api?timeout=32s'
I1023 00:11:53.510603 18611 round_trippers.go:405] GET https://192.168.99.100:8443/api?timeout=32s in 10113 milliseconds
I1023 00:11:53.510631 18611 round_trippers.go:411] Response Headers:
I1023 00:11:53.510677 18611 cached_discovery.go:111] skipped caching discovery info due to Get https://192.168.99.100:8443/api?timeout=32s: net/http: TLS handshake timeout
I1023 00:11:53.510771 18611 round_trippers.go:386] curl -k -v -XGET -H "Accept: application/json, */*" -H "User-Agent: kubectl/v1.12.0 (darwin/amd64) kubernetes/0ed3388" 'https://192.168.99.100:8443/api?timeout=32s'

kubelet error: Failed to start ContainerManager failed to initialise top level QOS containers: root container /kubepods doesn't exist

I am trying to use kubelet to start kubernetes api server as a staic pod, but failed with following errors:
I0523 11:13:41.192680 9248 remote_runtime.go:41] Connecting to runtime service /var/run/dockershim.sock
I0523 11:13:41.196764 9248 kuberuntime_manager.go:171] Container runtime docker initialized, version: 1.12.3, apiVersion: 1.24.0
E0523 11:13:41.199242 9248 kubelet.go:1165] Image garbage collection failed: unable to find data for container /
E0523 11:13:41.199405 9248 event.go:208] Unable to write event: 'Post https://127.0.0.1:8443/api/v1/namespaces/default/events: dial tcp 127.0.0.1:8443: getsockopt: connection refused' (may retry after sleeping)
I0523 11:13:41.199529 9248 server.go:869] Started kubelet v1.6.4
I0523 11:13:41.199711 9248 server.go:127] Starting to listen on 0.0.0.0:10250
I0523 11:13:41.200017 9248 kubelet_node_status.go:230] Setting node annotation to enable volume controller attach/detach
I0523 11:13:41.203018 9248 server.go:294] Adding debug handlers to kubelet server.
E0523 11:13:41.207486 9248 kubelet.go:1661] Failed to check if disk space is available for the runtime: failed to get fs info for "runtime": unable to find data for container /
E0523 11:13:41.207554 9248 kubelet.go:1669] Failed to check if disk space is available on the root partition: failed to get fs info for "root": unable to find data for container /
E0523 11:13:41.214231 9248 kubelet.go:1246] Failed to start ContainerManager failed to initialise top level QOS containers: root container /kubepods doesn't exist
The full log is here: https://travis-ci.org/reachlin/k8s0/jobs/235187507
This is the api server deployment yml: https://github.com/reachlin/k8s0/blob/master/roles/k8s/templates/apiserver.yml.j2
Later, I found the error actually matters is:
Failed to start ContainerManager failed to initialise top level QOS containers: root container /kubepods doesn't exist
after some research, i found the solution here: https://github.com/kubernetes/kubernetes/issues/43704
by adding these two parameters to kubelet:
--cgroups-per-qos=false
--enforce-node-allocatable=""

Kubelet process has high CPU usage over long time

I have kubernetes cluster with weave CNI plugin consisting of 3 nodes:
1 master node (virtual machine)
2 worker baremetall nodes (4 cores xeon with hyperthreading - 8 logical nodes)
The trouble is that top shows that kubelet has 60-100% CPU usage on first worker.
In journalctl -u kubelet I see a lot of messages (hundreds every minute)
May 19 09:57:38 kube-worker1 bash[3843]: E0519 09:57:38.075243 3843 docker_sandbox.go:205] Failed to stop sandbox "011cf10cf46dbc6bf2e11d1cb562af478eee21eba0c40521bf7af51ee5399640": Error response from daemon: {"message":"No such container: 011cf10cf46dbc6bf2e11d1cb562af478eee21eba0c40521bf7af51ee5399640"}
May 19 09:57:38 kube-worker1 bash[3843]: E0519 09:57:38.075360 3843 remote_runtime.go:109] StopPodSandbox "011cf10cf46dbc6bf2e11d1cb562af478eee21eba0c40521bf7af51ee5399640" from runtime service failed: rpc error: code = 2 desc = NetworkPlugin cni failed to teardown pod "cron-task-2533948c46c1-p6kwb_namespace" network: CNI failed to retrieve network namespace path: Error: No such container: 011cf10cf46dbc6bf2e11d1cb562af478eee21eba0c40521bf7af51ee5399640
May 19 09:57:38 kube-worker1 bash[3843]: E0519 09:57:38.075380 3843 kuberuntime_gc.go:138] Failed to stop sandbox "011cf10cf46dbc6bf2e11d1cb562af478eee21eba0c40521bf7af51ee5399640" before removing: rpc error: code = 2 desc = NetworkPlugin cni failed to teardown pod "cron-task-2533948c46c1-p6kwb_namespace" network: CNI failed to retrieve network namespace path: Error: No such container: 011cf10cf46dbc6bf2e11d1cb562af478eee21eba0c40521bf7af51ee5399640
May 19 09:57:38 kube-worker1 bash[3843]: E0519 09:57:38.076549 3843 docker_sandbox.go:205] Failed to stop sandbox "0125de37634ef7f3aa852c999cfb5849750167b1e3d63293a085ceca416e4ebf": Error response from daemon: {"message":"No such container: 0125de37634ef7f3aa852c999cfb5849750167b1e3d63293a085ceca416e4ebf"}
May 19 09:57:38 kube-worker1 bash[3843]: E0519 09:57:38.076654 3843 remote_runtime.go:109] StopPodSandbox "0125de37634ef7f3aa852c999cfb5849750167b1e3d63293a085ceca416e4ebf" from runtime service failed: rpc error: code = 2 desc = NetworkPlugin cni failed to teardown pod "cron-task-2533948c46c1-6g8jq_namespace" network: CNI failed to retrieve network namespace path: Error: No such container: 0125de37634ef7f3aa852c999cfb5849750167b1e3d63293a085ceca416e4ebf
May 19 09:57:38 kube-worker1 bash[3843]: E0519 09:57:38.076676 3843 kuberuntime_gc.go:138] Failed to stop sandbox "0125de37634ef7f3aa852c999cfb5849750167b1e3d63293a085ceca416e4ebf" before removing: rpc error: code = 2 desc = NetworkPlugin cni failed to teardown pod "cron-task-2533948c46c1-6g8jq_namespace" network: CNI failed to retrieve network namespace path: Error: No such container: 0125de37634ef7f3aa852c999cfb5849750167b1e3d63293a085ceca416e4ebf
May 19 09:57:38 kube-worker1 bash[3843]: E0519 09:57:38.079585 3843 docker_sandbox.go:205] Failed to stop sandbox "014135ede46ee45c176528da02782a38ded36bd10566f864c147ccb66a617772": Error response from daemon: {"message":"No such container: 014135ede46ee45c176528da02782a38ded36bd10566f864c147ccb66a617772"}
May 19 09:57:38 kube-worker1 bash[3843]: E0519 09:57:38.079805 3843 remote_runtime.go:109] StopPodSandbox "014135ede46ee45c176528da02782a38ded36bd10566f864c147ccb66a617772" from runtime service failed: rpc error: code = 2 desc = NetworkPlugin cni failed to teardown pod "cron-task-2533948c46c1-r30cw_namespace" network: CNI failed to retrieve network namespace path: Error: No such container: 014135ede46ee45c176528da02782a38ded36bd10566f864c147ccb66a617772
It's happen after wrong cronetes tasks which failed during creation. I removed all pods with --force but kubelet still try to remove them. Also I restarted kubelet on that worker with no result. How can I talk to kubelet to forget them?
Version info
Kubernetes v1.6.1
Docker version 1.12.0, build 8eab29e
Linux kube-worker1 4.4.0-72-generic #93-Ubuntu SMP
Container manifest (without metadata)
job:
apiVersion: batch/v1
kind: Job
spec:
template:
spec:
containers:
- name: cron-task
image: docker.company.ru/image:v2.3.2
command: ["rake", "db:refresh_views"]
env:
- name: RAILS_ENV
value: namespace
- name: CONFIG_PATH
value: /config
volumeMounts:
- name: config
mountPath: /config
volumes:
- name: config
configMap:
name: task-conf
restartPolicy: Never
Also I didn't found any mention of this pod's part of name (2533948c46c1) in cluster's etcd.
Finally I found the solution.
Kubelet stores information about all pods, running on it in
/var/lib/dockershim/sandbox
So when I ls in that folder I found files for all missing pods. Then I deleted these files and log messages disappeared and CPU usage returns to normal value (even without kubelet restart)
This seems to be related to the Pods with hostNetwork=true cannot be removed (and generate errors) when using CNI issue in Kubernetes 1.6.x. Those messages are not critical anyhow but of course it's annoying when you try to find actual issues.
Try using the most recent version of Kubernetes to mitigate the issues.
I ran into the same problem as you and did go profiling for this and find the cause is kubelet pleg mechanism and remove the '/var/lib/dockershim/sandbox' did the magic.