kubelet restarting randomly - kubernetes

We have a 2 nodes cluster running on GKE and for a long time we have been suffering random downtimes due what appears as kubelet service random restarts.
These are the logs for the last downtime in the node that was running the kubernetes system pods at that time:
sudo journalctl -r -u kubelet
Aug 22 04:17:36 gke-app-node1 systemd[1]: Started Kubernetes kubelet.
Aug 22 04:17:36 gke-app-node1 systemd[1]: Stopped Kubernetes kubelet.
Aug 22 04:17:36 gke-app-node1 systemd[1]: kubelet.service: Service hold-off time over, scheduling restart.
Aug 22 04:10:46 gke-app-node1 systemd[1]: Started Kubernetes kubelet.
Aug 22 04:10:46 gke-app-node1 systemd[1]: Stopped Kubernetes kubelet.
Aug 22 04:10:46 gke-app-node1 systemd[1]: kubelet.service: Service hold-off time over, scheduling restart.
Aug 22 04:09:48 gke-app-node1 systemd[1]: Started Kubernetes kubelet.
Aug 22 04:09:46 gke-app-node1 systemd[1]: Stopped Kubernetes kubelet.
Aug 22 04:09:44 gke-app-node1 systemd[1]: kubelet.service: Service hold-off time over, scheduling restart.
Aug 22 04:02:05 gke-app-node1 systemd[1]: Started Kubernetes kubelet.
Aug 22 04:02:03 gke-app-node1 systemd[1]: Stopped Kubernetes kubelet.
Aug 22 04:02:03 gke-app-node1 systemd[1]: kubelet.service: Service hold-off time over, scheduling restart.
Aug 22 04:01:09 gke-app-node1 systemd[1]: Started Kubernetes kubelet.
Aug 22 04:01:09 gke-app-node1 systemd[1]: Stopped Kubernetes kubelet.
Aug 22 04:01:08 gke-app-node1 systemd[1]: kubelet.service: Service hold-off time over, scheduling restart.
Aug 22 04:00:58 gke-app-node1 systemd[1]: kubelet.service: Failed with result 'exit-code'.
Aug 22 04:00:58 gke-app-node1 systemd[1]: kubelet.service: Unit entered failed state.
Aug 22 04:00:58 gke-app-node1 systemd[1]: kubelet.service: Main process exited, code=exited, status=1/FAILURE
Aug 22 04:00:58 gke-app-node1 kubelet[1330]: I0822 04:00:58.104306 1330 server.go:794] GET /healthz: (5.286840082s) 200 [[curl/7.51.0] 127.0.0.1:35924]
Aug 22 04:00:58 gke-app-node1 kubelet[1330]: I0822 04:00:57.981923 1330 docker_server.go:73] Stop docker server
Aug 22 04:00:58 gke-app-node1 kubelet[1330]: I0822 04:00:53.991354 1330 server.go:794] GET /healthz: (5.834296581s) 200 [[Go-http-client/1.1] 127.0.0.1:35926]
Aug 22 04:00:57 gke-app-node1 kubelet[1330]: I0822 04:00:42.636036 1330 fsHandler.go:131] du and find on following dirs took 16.466105259s: [/var/lib/docker/overlay/e496194dfcb8a053050a0eb73965f57b109fe3036c1ffc5b0f12b4a341f13794 /var/lib/docker/containers/b6a212aedf588a1f1d173fd9f4871f678d014e260e8aa6147ad8212619675802]
Aug 22 04:00:39 gke-app-node1 kubelet[1330]: I0822 04:00:38.061492 1330 fsHandler.go:131] du and find on following dirs took 12.246559762s: [/var/lib/docker/overlay/303dc4c5814a0a12a6ac450e5b27327f55a7baa8000c011bd38521f3ff997e0f /var/lib/docker/containers/18a95beaf86b382bb8abc6ee40033020de1da4b54a5ca52e1c61bf7f14d6ef44]
Aug 22 04:00:39 gke-app-node1 kubelet[1330]: I0822 04:00:38.476930 1330 fsHandler.go:131] du and find on following dirs took 11.766408434s: [/var/lib/docker/overlay/86802dda255243388ab86fa8fc403187193f8c4ccdee54d8ca18c021ca35bc36 /var/lib/docker/containers/7fd4d507ec6445035fcb4a60efd4ae68e54052c1cace3268be72954062fed830]
Aug 22 04:00:35 gke-app-node1 kubelet[1330]: I0822 04:00:35.865293 1330 prober.go:106] Readiness probe for "web-deployment-812924635-ntcqw_default(bcf76fb6-8661-11e7-88da-42010a840211):rails-app" failed (failure): Get http://10.48.1.7:80/health_check: net/http: request canceled (Client.Timeout exceeded while awaiting headers)
Aug 22 04:00:35 gke-app-node1 kubelet[1330]: W0822 04:00:35.380953 1330 prober.go:98] No ref for container "docker://e2dcca90c091d2789af9b22e1405cb273f63c399aecde2686ef4b1e8ab9fdc5f" (web-deployment-812924635-ntcqw_default(bcf76fb6-8661-11e7-88da-42010a840211):rails-app)
Aug 22 04:00:35 gke-app-node1 kubelet[1330]: I0822 04:00:32.514573 1330 fsHandler.go:131] du and find on following dirs took 7.127181023s: [/var/lib/docker/overlay/647f419bce585a3d0f5792376b269704cb358828bc5c4fb5e815bfa23950d511 /var/lib/docker/containers/59f7ada601f38a243daa7154f2ed27790d14d163c4675e26186d9a6d9db0272e]
Aug 22 04:00:35 gke-app-node1 kubelet[1330]: I0822 04:00:32.255644 1330 fsHandler.go:131] du and find on following dirs took 6.72357089s: [/var/lib/docker/overlay/992a65b68531c5ac53e4cd06f7a8f8abe4b908d943b5b9cc38da126b469050b2 /var/lib/docker/containers/2be7aede380d6f3452a5abacc53f9e0a69f8c5ee3dbdf5351a30effdf2d47833]
Aug 22 04:00:35 gke-app-node1 kubelet[1330]: I0822 04:00:32.067601 1330 fsHandler.go:131] du and find on following dirs took 6.511405985s: [/var/lib/docker/overlay/7bc4e00d232b4a22eb64a87ad079970aabb24bde17d3adaa6989840ebc91b96c /var/lib/docker/containers/949c778861a4f86440c5dd21d4daf40e97fb49b9eb1498111d7941ca3e63541a]
Aug 22 04:00:35 gke-app-node1 kubelet[1330]: I0822 04:00:31.907928 1330 fsHandler.go:131] du and find on following dirs took 6.263993478s: [/var/lib/docker/overlay/303abc540335a9ce7077fd21182845fbff2f06ed9eb1ac8af9effdfd048153b5 /var/lib/docker/containers/6544add2796f365d67d72fe283e083042aa2af82862eb6335295d228efa28d61]
Aug 22 04:00:35 gke-app-node1 kubelet[1330]: I0822 04:00:31.907845 1330 fsHandler.go:131] du and find on following dirs took 7.630063774s: [/var/lib/docker/overlay/a36c376a7ddd04c168822770866d8c34499ddec7e4039ada579b3d65adc57347 /var/lib/docker/containers/6a606a6c901f8373dff22f94ba77a24956a7b4eed3d0e550be168eeaeed86236]
Aug 22 04:00:35 gke-app-node1 kubelet[1330]: I0822 04:00:31.902731 1330 fsHandler.go:131] du and find on following dirs took 6.259025553s: [/var/lib/docker/overlay/0a7170e1a42bfa8b0112d8c7bb805da8e4778aa5ce90978d90ed5335929633ff /var/lib/docker/containers/1f68eaa59cab0a0bcdc087e25d18573044b599967a56867d189acd82bc19172b]
Aug 22 04:00:35 gke-app-node1 kubelet[1330]: I0822 04:00:31.871796 1330 fsHandler.go:131] du and find on following dirs took 6.410999589s: [/var/lib/docker/overlay/25ffbf8bd71e814af8991cc52499286d2d345b3f348fec9358ca366f341ed588 /var/lib/docker/containers/efe1969587c9b0412fe7f7c8c24bbe1326d46f576bddf12f88ae7cd406b6475d]
Aug 22 04:00:35 gke-app-node1 kubelet[1330]: I0822 04:00:31.871699 1330 fsHandler.go:131] du and find on following dirs took 6.259940483s: [/var/lib/docker/overlay/56909c00ec20b59c1fcb4988cd51fe50ebb467681f37bab3f9061d76993565bc /var/lib/docker/containers/a8d1df672c23313313b511389f6eeb44e78c3f9e4c346d214fc190695f270e5f]
Aug 22 04:00:35 gke-app-node1 kubelet[1330]: I0822 04:00:31.614518 1330 fsHandler.go:131] du and find on following dirs took 5.982313751s: [/var/lib/docker/overlay/cb057acc4f3a3e91470847f78ffd550b25a24605cec42ee080aaf193933968cf /var/lib/docker/containers/e755c4d88e4e5d4d074806e829b1e83fd52c8e2b1c01c27131222a40b0c6c10a]
Aug 22 04:00:35 gke-app-node1 kubelet[1330]: I0822 04:00:31.837000 1330 fsHandler.go:131] du and find on following dirs took 7.500602734s: [/var/lib/docker/overlay/e9539d9569ccdcc79db1cd4add7036d70ad71391dc30ca16903bdd9bda4d0972 /var/lib/docker/containers/b0a7c955af1ed85f56aeaed1d787794d5ffd04c2a81820465a1e3453242c8a19]
Aug 22 04:00:34 gke-app-node1 kubelet[1330]: I0822 04:00:31.836947 1330 fsHandler.go:131] du and find on following dirs took 6.257091389s: [/var/lib/docker/overlay/200f0f063157381d25001350c34914e020ea16b3f82f7bedf7e4b01d34e513a7 /var/lib/docker/containers/eca7504b7e24332381e459a2f09acc150a5681c148cebc5867ac66021cbe0435]
Aug 22 04:00:33 gke-app-node1 kubelet[1330]: I0822 04:00:31.836787 1330 fsHandler.go:131] du and find on following dirs took 7.286756684s: [/var/lib/docker/overlay/37334712f505b11c7f0b27fb0580eadc0e79fc789dcfafbea1730efd500fb69c /var/lib/docker/containers/4858388c53032331868497859110a7267fef95110a7ab3664aa857a21ee02a3e]
Aug 22 04:00:22 gke-app-node1 kubelet[1330]: I0822 04:00:21.999930 1330 qos_container_manager_linux.go:286] [ContainerManager]: Updated QoS cgroup configuration
Aug 22 04:00:20 gke-app-node1 kubelet[1330]: I0822 04:00:19.598974 1330 server.go:794] GET /healthz: (136.991429ms) 200 [[curl/7.51.0] 127.0.0.1:35888]
Aug 22 04:00:10 gke-app-node1 kubelet[1330]: I0822 04:00:08.024328 1330 server.go:794] GET /healthz: (36.191534ms) 200 [[curl/7.51.0] 127.0.0.1:35868]
Aug 22 04:00:05 gke-app-node1 kubelet[1330]: I0822 04:00:05.861339 1330 server.go:794] GET /stats/summary/: (808.201834ms) 200 [[Go-http-client/1.1] 10.48.0.7:43022]
Aug 22 04:00:03 gke-app-node1 kubelet[1330]: W0822 04:00:02.723586 1330 raw.go:87] Error while processing event ("/sys/fs/cgroup/memory/system.slice/kube-logrotate.service": 0x40000100 == IN_CREATE|IN_ISDIR): inotify_add_watch /sys/fs/cgroup/memory/system.slice/kube-logrotate.service: no such file or directory
Aug 22 04:00:03 gke-app-node1 kubelet[1330]: W0822 04:00:02.723529 1330 raw.go:87] Error while processing event ("/sys/fs/cgroup/blkio/system.slice/kube-logrotate.service": 0x40000100 == IN_CREATE|IN_ISDIR): inotify_add_watch /sys/fs/cgroup/blkio/system.slice/kube-logrotate.service: no such file or directory
Aug 22 04:00:03 gke-app-node1 kubelet[1330]: W0822 04:00:02.622765 1330 raw.go:87] Error while processing event ("/sys/fs/cgroup/cpu,cpuacct/system.slice/kube-logrotate.service": 0x40000100 == IN_CREATE|IN_ISDIR): inotify_add_watch /sys/fs/cgroup/cpu,cpuacct/system.slice/kube-logrotate.service: no such file or directory
sudo dmesg -T
[Tue Aug 22 04:17:31 2017] cbr0: port 4(vethd18b0189) entered disabled state
[Tue Aug 22 04:17:31 2017] device vethd18b0189 left promiscuous mode
[Tue Aug 22 04:17:31 2017] cbr0: port 4(vethd18b0189) entered disabled state
[Tue Aug 22 04:17:40 2017] cbr0: port 6(veth2985149d) entered disabled state
[Tue Aug 22 04:17:40 2017] device veth2985149d left promiscuous mode
[Tue Aug 22 04:17:40 2017] cbr0: port 6(veth2985149d) entered disabled state
[Tue Aug 22 04:17:42 2017] cbr0: port 5(veth2a1d2827) entered disabled state
[Tue Aug 22 04:17:42 2017] device veth2a1d2827 left promiscuous mode
[Tue Aug 22 04:17:42 2017] cbr0: port 5(veth2a1d2827) entered disabled state
[Tue Aug 22 04:17:42 2017] cbr0: port 2(vetha070fbca) entered disabled state
[Tue Aug 22 04:17:42 2017] device vetha070fbca left promiscuous mode
[Tue Aug 22 04:17:42 2017] cbr0: port 2(vetha070fbca) entered disabled state
[Tue Aug 22 04:17:42 2017] cbr0: port 3(veth7e3e663a) entered disabled state
[Tue Aug 22 04:17:42 2017] device veth7e3e663a left promiscuous mode
[Tue Aug 22 04:17:42 2017] cbr0: port 3(veth7e3e663a) entered disabled state
[Tue Aug 22 04:17:57 2017] IPv6: ADDRCONF(NETDEV_UP): eth0: link is not ready
[Tue Aug 22 04:17:57 2017] IPv6: ADDRCONF(NETDEV_CHANGE): eth0: link becomes ready
[Tue Aug 22 04:17:57 2017] device veth215e85fc entered promiscuous mode
[Tue Aug 22 04:17:57 2017] cbr0: port 1(veth215e85fc) entered forwarding state
[Tue Aug 22 04:17:57 2017] cbr0: port 1(veth215e85fc) entered forwarding state
[Tue Aug 22 04:18:12 2017] cbr0: port 1(veth215e85fc) entered forwarding state
And finally here we can see how the Kubernetes pods were killed around that time:
docker ps -a
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
d7b59cd7116c gcr.io/google_containers/pause-amd64:3.0 "/pause" 4 hours ago Up 4 hours k8s_POD_worker-deployment-3042507874-ghjpd_default_1de2291b-86ef-11e7-88da-42010a840211_0
e6f2563ac3c8 5e65193af899 "/monitor --component" 4 hours ago Up 4 hours k8s_prometheus-to-sd-exporter_fluentd-gcp-v2.0-5vlrc_kube-system_6cebabe5-84f9-11e7-88da-42010a840211_1
45d7886d0308 487e99ee05d9 "/bin/sh -c '/run.sh " 4 hours ago Up 4 hours k8s_fluentd-gcp_fluentd-gcp-v2.0-5vlrc_kube-system_6cebabe5-84f9-11e7-88da-42010a840211_1
b5a5c5d085ac gcr.io/google_containers/pause-amd64:3.0 "/pause" 4 hours ago Up 4 hours k8s_POD_fluentd-gcp-v2.0-5vlrc_kube-system_6cebabe5-84f9-11e7-88da-42010a840211_1
32dcd4d5847c 54d2a8698e3c "/bin/sh -c 'echo -99" 4 hours ago Up 4 hours k8s_kube-proxy_kube-proxy-gke-app-node1_kube-system_ed40100d42c9e285fa1f59ca7a1d8f8d_1
2d055d96b610 gcr.io/google_containers/pause-amd64:3.0 "/pause" 4 hours ago Up 4 hours k8s_POD_kube-proxy-gke-app-node1_kube-system_ed40100d42c9e285fa1f59ca7a1d8f8d_1
2be7aede380d 5e65193af899 "/monitor --component" 2 days ago Exited (0) 4 hours ago k8s_prometheus-to-sd-exporter_fluentd-gcp-v2.0-5vlrc_kube-system_6cebabe5-84f9-11e7-88da-42010a840211_0
7fd4d507ec64 54d2a8698e3c "/bin/sh -c 'echo -99" 2 days ago Exited (0) 4 hours ago k8s_kube-proxy_kube-proxy-gke-app-node1_kube-system_ed40100d42c9e285fa1f59ca7a1d8f8d_0
cc615ec1e87c efe10ee6727f "/bin/touch /run/xtab" 2 days ago Exited (0) 2 days ago k8s_touch-lock_kube-proxy-gke-app-node1_kube-system_ed40100d42c9e285fa1f59ca7a1d8f8d_0
b0a7c955af1e 487e99ee05d9 "/bin/sh -c '/run.sh " 2 days ago Exited (0) 4 hours ago k8s_fluentd-gcp_fluentd-gcp-v2.0-5vlrc_kube-system_6cebabe5-84f9-11e7-88da-42010a840211_0
4858388c5303 gcr.io/google_containers/pause-amd64:3.0 "/pause" 2 days ago Exited (0) 4 hours ago k8s_POD_fluentd-gcp-v2.0-5vlrc_kube-system_6cebabe5-84f9-11e7-88da-42010a840211_0
6a606a6c901f gcr.io/google_containers/pause-amd64:3.0 "/pause" 2 days ago Exited (0) 4 hours ago k8s_POD_kube-proxy-gke-app-node1_kube-system_ed40100d42c9e285fa1f59ca7a1d8f8d_0
Our cluster is running kubernetes 1.7.3 both for master and the node pools, and it's on GKE (europe-west1-d zone).
Any help would be appreciated as we don't really know how to debug this problem further.

Related

Installing K3S offline faild with error: starting kubernetes: preparing server: building kine: dial tcp\\: unknown network tcp\\"

I now install a k3s on:
A centos7 with arm64;
A mysql8.0;
I have disabled:
firewall
selinux
swap
I have modified /etc/hosts;
I have installed docker-ce;
I have downloaded:
https://get.k3s.io to install.sh
k3s-airgap-images-arm64.tar to the right place "/var/lib/rancher/k3s/agent/images/"
chmod +x k3s-arm64 and move to "/usr/local/bin/k3s".
I'm quite sure the mysql connection is ok.Then I use: INSTALL_K3S_SKIP_DOWNLOAD=true INSTALL_K3S_EXEC='server --docker --datastore-endpoint="mysql://root:root#tcp(172.16.149.139:3306)/k3s"' ./install.sh
But I always got error shows in journalctl:
Nov 19 11:05:52 k3s01 systemd[1]: Starting Lightweight Kubernetes...
Nov 19 11:05:52 k3s01 k3s[16058]: time="2020-11-19T11:05:52.883415201+08:00" level=info msg="Starting k3s v1.19.3+k3s3 (0e4fbfef)"
Nov 19 11:05:52 k3s01 k3s[16058]: time="2020-11-19T11:05:52.884004317+08:00" level=fatal msg="starting kubernetes: preparing server: creating storage endpoint: building kine: dial tcp\\: unknown network tcp\\"
Nov 19 11:05:52 k3s01 systemd[1]: k3s.service: main process exited, code=exited, status=1/FAILURE
Nov 19 11:05:52 k3s01 systemd[1]: Failed to start Lightweight Kubernetes.
Nov 19 11:05:52 k3s01 systemd[1]: Unit k3s.service entered failed state.
Nov 19 11:05:52 k3s01 systemd[1]: k3s.service failed.
Nov 19 11:05:57 k3s01 systemd[1]: k3s.service holdoff time over, scheduling restart.
Nov 19 11:05:57 k3s01 systemd[1]: Stopped Lightweight Kubernetes.
Nov 19 11:05:57 k3s01 systemd[1]: Starting Lightweight Kubernetes...
Nov 19 11:05:58 k3s01 k3s[16086]: time="2020-11-19T11:05:58.341115144+08:00" level=info msg="Starting k3s v1.19.3+k3s3 (0e4fbfef)"
Nov 19 11:05:58 k3s01 k3s[16086]: time="2020-11-19T11:05:58.345448686+08:00" level=fatal msg="starting kubernetes: preparing server: creating storage endpoint: building kine: dial tcp\\: unknown network tcp\\"
Nov 19 11:05:58 k3s01 systemd[1]: k3s.service: main process exited, code=exited, status=1/FAILURE
Nov 19 11:05:58 k3s01 systemd[1]: Failed to start Lightweight Kubernetes.
Nov 19 11:05:58 k3s01 systemd[1]: Unit k3s.service entered failed state.
Nov 19 11:05:58 k3s01 systemd[1]: k3s.service failed.
Nov 19 11:06:03 k3s01 systemd[1]: k3s.service holdoff time over, scheduling restart.
Nov 19 11:06:03 k3s01 systemd[1]: Stopped Lightweight Kubernetes.
Nov 19 11:06:03 k3s01 systemd[1]: Starting Lightweight Kubernetes...
Nov 19 11:06:03 k3s01 k3s[16114]: time="2020-11-19T11:06:03.855567834+08:00" level=info msg="Starting k3s v1.19.3+k3s3 (0e4fbfef)"
Nov 19 11:06:03 k3s01 k3s[16114]: time="2020-11-19T11:06:03.856344291+08:00" level=fatal msg="starting kubernetes: preparing server: creating storage endpoint: building kine: dial tcp\\: unknown network tcp\\"
Nov 19 11:06:03 k3s01 systemd[1]: k3s.service: main process exited, code=exited, status=1/FAILURE
Nov 19 11:06:03 k3s01 systemd[1]: Failed to start Lightweight Kubernetes.
Nov 19 11:06:03 k3s01 systemd[1]: Unit k3s.service entered failed state.
Nov 19 11:06:03 k3s01 systemd[1]: k3s.service failed.
Nov 19 11:06:08 k3s01 systemd[1]: k3s.service holdoff time over, scheduling restart.
Nov 19 11:06:08 k3s01 systemd[1]: Stopped Lightweight Kubernetes.
Nov 19 11:06:08 k3s01 systemd[1]: Starting Lightweight Kubernetes...
Nov 19 11:06:09 k3s01 k3s[16142]: time="2020-11-19T11:06:09.430387037+08:00" level=info msg="Starting k3s v1.19.3+k3s3 (0e4fbfef)"
Nov 19 11:06:09 k3s01 k3s[16142]: time="2020-11-19T11:06:09.431185565+08:00" level=fatal msg="starting kubernetes: preparing server: creating storage endpoint: building kine: dial tcp\\: unknown network tcp\\"
Nov 19 11:06:09 k3s01 systemd[1]: k3s.service: main process exited, code=exited, status=1/FAILURE
Nov 19 11:06:09 k3s01 systemd[1]: Failed to start Lightweight Kubernetes.
Nov 19 11:06:09 k3s01 systemd[1]: Unit k3s.service entered failed state.
Nov 19 11:06:09 k3s01 systemd[1]: k3s.service failed.
Nov 19 11:06:14 k3s01 systemd[1]: k3s.service holdoff time over, scheduling restart.
Nov 19 11:06:14 k3s01 systemd[1]: Stopped Lightweight Kubernetes.
Nov 19 11:06:14 k3s01 systemd[1]: Starting Lightweight Kubernetes...
Nov 19 11:06:14 k3s01 k3s[16193]: time="2020-11-19T11:06:14.888534204+08:00" level=info msg="Starting k3s v1.19.3+k3s3 (0e4fbfef)"
Nov 19 11:06:14 k3s01 k3s[16193]: time="2020-11-19T11:06:14.889537923+08:00" level=fatal msg="starting kubernetes: preparing server: creating storage endpoint: building kine: dial tcp\\: unknown network tcp\\"
Nov 19 11:06:14 k3s01 systemd[1]: k3s.service: main process exited, code=exited, status=1/FAILURE
Nov 19 11:06:14 k3s01 systemd[1]: Failed to start Lightweight Kubernetes.
Nov 19 11:06:14 k3s01 systemd[1]: Unit k3s.service entered failed state.
Nov 19 11:06:14 k3s01 systemd[1]: k3s.service failed.
Nov 19 11:06:19 k3s01 systemd[1]: k3s.service holdoff time over, scheduling restart.
Nov 19 11:06:19 k3s01 systemd[1]: Stopped Lightweight Kubernetes.
Nov 19 11:06:19 k3s01 systemd[1]: Starting Lightweight Kubernetes...
Nov 19 11:06:20 k3s01 k3s[16221]: time="2020-11-19T11:06:20.442535396+08:00" level=info msg="Starting k3s v1.19.3+k3s3 (0e4fbfef)"
Nov 19 11:06:20 k3s01 k3s[16221]: time="2020-11-19T11:06:20.443421344+08:00" level=fatal msg="starting kubernetes: preparing server: creating storage endpoint: building kine: dial tcp\\: unknown network tcp\\"
Nov 19 11:06:20 k3s01 systemd[1]: k3s.service: main process exited, code=exited, status=1/FAILURE
Nov 19 11:06:20 k3s01 systemd[1]: Failed to start Lightweight Kubernetes.
Nov 19 11:06:20 k3s01 systemd[1]: Unit k3s.service entered failed state.
Nov 19 11:06:20 k3s01 systemd[1]: k3s.service failed.
Nov 19 11:06:24 k3s01 systemd[1]: Stopped Lightweight Kubernetes.
Nov 19 11:06:24 k3s01 systemd[1]: Starting Lightweight Kubernetes...
Nov 19 11:06:25 k3s01 k3s[16336]: time="2020-11-19T11:06:25.168513665+08:00" level=info msg="Starting k3s v1.19.3+k3s3 (0e4fbfef)"
Nov 19 11:06:25 k3s01 k3s[16336]: time="2020-11-19T11:06:25.168946929+08:00" level=fatal msg="starting kubernetes: preparing server: creating storage endpoint: building kine: dial tcp\\: unknown network tcp\\"
Nov 19 11:06:25 k3s01 systemd[1]: k3s.service: main process exited, code=exited, status=1/FAILURE
Nov 19 11:06:25 k3s01 systemd[1]: Failed to start Lightweight Kubernetes.
Nov 19 11:06:25 k3s01 systemd[1]: Unit k3s.service entered failed state.
Nov 19 11:06:25 k3s01 systemd[1]: k3s.service failed.
Nov 19 11:06:30 k3s01 systemd[1]: k3s.service holdoff time over, scheduling restart.
Nov 19 11:06:30 k3s01 systemd[1]: Stopped Lightweight Kubernetes.
Nov 19 11:06:30 k3s01 systemd[1]: Starting Lightweight Kubernetes...
Nov 19 11:06:30 k3s01 k3s[16363]: time="2020-11-19T11:06:30.645875517+08:00" level=info msg="Starting k3s v1.19.3+k3s3 (0e4fbfef)"
Nov 19 11:06:30 k3s01 k3s[16363]: time="2020-11-19T11:06:30.649172179+08:00" level=fatal msg="starting kubernetes: preparing server: creating storage endpoint: building kine: dial tcp\\: unknown network tcp\\"
Nov 19 11:06:30 k3s01 systemd[1]: k3s.service: main process exited, code=exited, status=1/FAILURE
Nov 19 11:06:30 k3s01 systemd[1]: Failed to start Lightweight Kubernetes.
Nov 19 11:06:30 k3s01 systemd[1]: Unit k3s.service entered failed state.
Nov 19 11:06:30 k3s01 systemd[1]: k3s.service failed.
I really don't know what's going on, need help!!!!!!!!!
finally , I found that I must use K3S_DATASTORE_ENDPOINT='mysql://xxxxxxx' not INSTALL_K3S_EXEC='xxx --datastore-endpoint="mysql://xxxxxx"' to avoid this!But I don't know what's on earth of it

kubelet.service: Unit entered failed state in not ready state node error from kubernetes cluster

I am trying to deploy an springboot microservices in kubernetes cluster having 1 master and 2 worker node. When I am trying to get the node state using the command sudo kubectl get nodes, I am getting one of my worker node is not ready. It showing not ready in status.
When I am applying to troubleshoot the following command,
sudo journalctl -u kubelet
I am getting response like kubelet.service: Unit entered failed state and kubelet service stopped. The following is the response what I am getting when applying the command sudo journalctl -u kubelet.
-- Logs begin at Fri 2020-01-03 04:56:18 EST, end at Fri 2020-01-03 05:32:47 EST. --
Jan 03 04:56:25 MILDEVKUB050 systemd[1]: Started kubelet: The Kubernetes Node Agent.
Jan 03 04:56:31 MILDEVKUB050 kubelet[970]: Flag --cgroup-driver has been deprecated, This parameter should be set via the config file specified by the Kubelet's --confi
Jan 03 04:56:31 MILDEVKUB050 kubelet[970]: Flag --cgroup-driver has been deprecated, This parameter should be set via the config file specified by the Kubelet's --confi
Jan 03 04:56:32 MILDEVKUB050 kubelet[970]: I0103 04:56:32.053962 970 server.go:416] Version: v1.17.0
Jan 03 04:56:32 MILDEVKUB050 kubelet[970]: I0103 04:56:32.084061 970 plugins.go:100] No cloud provider specified.
Jan 03 04:56:32 MILDEVKUB050 kubelet[970]: I0103 04:56:32.235928 970 server.go:821] Client rotation is on, will bootstrap in background
Jan 03 04:56:32 MILDEVKUB050 kubelet[970]: I0103 04:56:32.280173 970 certificate_store.go:129] Loading cert/key pair from "/var/lib/kubelet/pki/kubelet-client-curre
Jan 03 04:56:38 MILDEVKUB050 kubelet[970]: I0103 04:56:38.107966 970 server.go:641] --cgroups-per-qos enabled, but --cgroup-root was not specified. defaulting to /
Jan 03 04:56:38 MILDEVKUB050 kubelet[970]: F0103 04:56:38.109401 970 server.go:273] failed to run Kubelet: running with swap on is not supported, please disable swa
Jan 03 04:56:38 MILDEVKUB050 systemd[1]: kubelet.service: Main process exited, code=exited, status=255/n/a
Jan 03 04:56:38 MILDEVKUB050 systemd[1]: kubelet.service: Unit entered failed state.
Jan 03 04:56:38 MILDEVKUB050 systemd[1]: kubelet.service: Failed with result 'exit-code'.
Jan 03 04:56:48 MILDEVKUB050 systemd[1]: kubelet.service: Service hold-off time over, scheduling restart.
Jan 03 04:56:48 MILDEVKUB050 systemd[1]: Stopped kubelet: The Kubernetes Node Agent.
Jan 03 04:56:48 MILDEVKUB050 systemd[1]: Started kubelet: The Kubernetes Node Agent.
Jan 03 04:56:48 MILDEVKUB050 kubelet[1433]: Flag --cgroup-driver has been deprecated, This parameter should be set via the config file specified by the Kubelet's --conf
Jan 03 04:56:48 MILDEVKUB050 kubelet[1433]: Flag --cgroup-driver has been deprecated, This parameter should be set via the config file specified by the Kubelet's --conf
Jan 03 04:56:48 MILDEVKUB050 kubelet[1433]: I0103 04:56:48.901632 1433 server.go:416] Version: v1.17.0
Jan 03 04:56:48 MILDEVKUB050 kubelet[1433]: I0103 04:56:48.907654 1433 plugins.go:100] No cloud provider specified.
Jan 03 04:56:48 MILDEVKUB050 kubelet[1433]: I0103 04:56:48.907806 1433 server.go:821] Client rotation is on, will bootstrap in background
Jan 03 04:56:48 MILDEVKUB050 kubelet[1433]: I0103 04:56:48.947107 1433 certificate_store.go:129] Loading cert/key pair from "/var/lib/kubelet/pki/kubelet-client-curr
Jan 03 04:56:49 MILDEVKUB050 kubelet[1433]: I0103 04:56:49.263777 1433 server.go:641] --cgroups-per-qos enabled, but --cgroup-root was not specified. defaulting to
Jan 03 04:56:49 MILDEVKUB050 kubelet[1433]: F0103 04:56:49.264219 1433 server.go:273] failed to run Kubelet: running with swap on is not supported, please disable sw
Jan 03 04:56:49 MILDEVKUB050 systemd[1]: kubelet.service: Main process exited, code=exited, status=255/n/a
Jan 03 04:56:49 MILDEVKUB050 systemd[1]: kubelet.service: Unit entered failed state.
Jan 03 04:56:49 MILDEVKUB050 systemd[1]: kubelet.service: Failed with result 'exit-code'.
Jan 03 04:56:59 MILDEVKUB050 systemd[1]: kubelet.service: Service hold-off time over, scheduling restart.
Jan 03 04:56:59 MILDEVKUB050 systemd[1]: Stopped kubelet: The Kubernetes Node Agent.
Jan 03 04:56:59 MILDEVKUB050 systemd[1]: Started kubelet: The Kubernetes Node Agent.
Jan 03 04:56:59 MILDEVKUB050 kubelet[1500]: Flag --cgroup-driver has been deprecated, This parameter should be set via the config file specified by the Kubelet's --conf
Jan 03 04:56:59 MILDEVKUB050 kubelet[1500]: Flag --cgroup-driver has been deprecated, This parameter should be set via the config file specified by the Kubelet's --conf
Jan 03 04:56:59 MILDEVKUB050 kubelet[1500]: I0103 04:56:59.712729 1500 server.go:416] Version: v1.17.0
Jan 03 04:56:59 MILDEVKUB050 kubelet[1500]: I0103 04:56:59.714927 1500 plugins.go:100] No cloud provider specified.
Jan 03 04:56:59 MILDEVKUB050 kubelet[1500]: I0103 04:56:59.715248 1500 server.go:821] Client rotation is on, will bootstrap in background
Jan 03 04:56:59 MILDEVKUB050 kubelet[1500]: I0103 04:56:59.763508 1500 certificate_store.go:129] Loading cert/key pair from "/var/lib/kubelet/pki/kubelet-client-curr
Jan 03 04:56:59 MILDEVKUB050 kubelet[1500]: I0103 04:56:59.956706 1500 server.go:641] --cgroups-per-qos enabled, but --cgroup-root was not specified. defaulting to
Jan 03 04:56:59 MILDEVKUB050 kubelet[1500]: F0103 04:56:59.957078 1500 server.go:273] failed to run Kubelet: running with swap on is not supported, please disable sw
Jan 03 04:56:59 MILDEVKUB050 systemd[1]: kubelet.service: Main process exited, code=exited, status=255/n/a
Jan 03 04:56:59 MILDEVKUB050 systemd[1]: kubelet.service: Unit entered failed state.
Jan 03 04:56:59 MILDEVKUB050 systemd[1]: kubelet.service: Failed with result 'exit-code'.
Jan 03 04:57:10 MILDEVKUB050 systemd[1]: kubelet.service: Service hold-off time over, scheduling restart.
Jan 03 04:57:10 MILDEVKUB050 systemd[1]: Stopped kubelet: The Kubernetes Node Agent.
Jan 03 04:57:10 MILDEVKUB050 systemd[1]: Started kubelet: The Kubernetes Node Agent.
log file: service: Unit entered failed state
I tried by restarting the kubelet. But still there is no change in node state. Not ready state only.
Updates
When I am trying the command systemctl list-units --type=swap --state=active , then I am getting the following response,
docker#MILDEVKUB040:~$ systemctl list-units --type=swap --state=active
UNIT LOAD ACTIVE SUB DESCRIPTION
dev-mapper-MILDEVDCR01\x2d\x2dvg\x2dswap_1.swap loaded active active /dev/mapper/MILDEVDCR01--vg-swap_1
Important
When I am getting these kind of issue with node not ready, each time I need to disable the swap and need to reload the daemon and kubelet. After that node becomes ready state. And again I need to repeat the same.
How can I find a permanent solution for this?
failed to run Kubelet: running with swap on is not supported, please disable swap
You need to disable swap on the system for kubelet to work. You can disable swap with sudo swapoff -a
For systemd based systems, there is another way of enabling swap partitions using swap units which gets enabled whenever systemd reloads even if you have turned off swap using swapoff -a
https://www.freedesktop.org/software/systemd/man/systemd.swap.html
Check if you have any swap units using systemctl list-units --type=swap --state=active
You can permanently disable any active swap unit with systemctl mask <unit name>.
Note: Do not use systemctl disable <unit name> to disable the swap unit as swap unit will be activated again when systemd reloads. Use systemctl mask <unit name> only.
To make sure swap doesn't get re-enabled when your system reboots due to power cycle or any other reason, remove or comment out the swap entries in /etc/fstab
Summarizing:
Run sudo swapoff -a
Check if you have swap units with command systemctl list-units --type=swap --state=active. If there are any active swap units, mask them using systemctl mask <unit name>
Remove swap entries in /etc/fstab
The root cause is the swap space. To disable completely follow steps:
run swapoff -a: this will immediately disable swap but will activate on restart
remove any swap entry from /etc/fstab
reboot the system.
If the swap is gone, good. If, for some reason, it is still here, you
had to remove the swap partition. Repeat steps 1 and 2 and, after
that, use fdisk or parted to remove the (now unused) swap partition.
Use great care here: removing the wrong partition will have disastrous
effects!
reboot
This should resolve your issue.
Removing /etc/fstab will give the vm error, I think we should find another way to solve this issue. I tried to remove the fstab, all command (install, ping and other command) error.

CephFS mount. Can't read superblock

Any pointers for this issue? Tried tons of things already to no avail.
This command fails with the error Can't read superblock
sudo mount -t ceph worker2:6789:/ /mnt/mycephfs -o name=admin,secret=AQAYjCpcAAAAABAAxs1mrh6nnx+0+1VUqW2p9A==
Some more info that may be helpful
uname -a Linux cephfs-test-admin-1 4.14.84-coreos #1 SMP Sat Dec 15 22:39:45 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux
Ceph status and ceph osd status all show no issues at all
dmesg | tail
[228343.304863] libceph: resolve 'worker2' (ret=0): 10.1.96.4:0
[228343.322279] libceph: mon0 10.1.96.4:6789 session established
[228343.323622] libceph: client107238 fsid 762e6263-a95c-40da-9813-9df4fef12f53
ceph -s
cluster:
id: 762e6263-a95c-40da-9813-9df4fef12f53
health: HEALTH_WARN
too few PGs per OSD (16 < min 30)
services:
mon: 3 daemons, quorum worker2,worker0,worker1
mgr: worker1(active)
mds: cephfs-1/1/1 up {0=mds-ceph-mds-85b4fbb478-c6jzv=up:active}
osd: 3 osds: 3 up, 3 in
data:
pools: 2 pools, 16 pgs
objects: 21 objects, 2246 bytes
usage: 342 MB used, 76417 MB / 76759 MB avail
pgs: 16 active+clean
ceph osd status
+----+---------+-------+-------+--------+---------+--------+---------+-----------+
| id | host | used | avail | wr ops | wr data | rd ops | rd data | state |
+----+---------+-------+-------+--------+---------+--------+---------+-----------+
| 0 | worker2 | 114M | 24.8G | 0 | 0 | 0 | 0 | exists,up |
| 1 | worker0 | 114M | 24.8G | 0 | 0 | 0 | 0 | exists,up |
| 2 | worker1 | 114M | 24.8G | 0 | 0 | 0 | 0 | exists,up |
+----+---------+-------+-------+--------+---------+--------+---------+-----------+
ceph -v
ceph version 12.2.3 (2dab17a455c09584f2a85e6b10888337d1ec8949) luminous (stable)
Some of the syslog output:
Jan 04 21:24:04 worker2 kernel: libceph: resolve 'worker2' (ret=0): 10.1.96.4:0
Jan 04 21:24:04 worker2 kernel: libceph: mon0 10.1.96.4:6789 session established
Jan 04 21:24:04 worker2 kernel: libceph: client159594 fsid 762e6263-a95c-40da-9813-9df4fef12f53
Jan 04 21:24:10 worker2 systemd[1]: Started OpenSSH per-connection server daemon (58.242.83.28:36729).
Jan 04 21:24:11 worker2 sshd[12315]: pam_unix(sshd:auth): authentication failure; logname= uid=0 euid=0 tty=ssh ruser= rhost=58.242.83.28 us>
Jan 04 21:24:14 worker2 sshd[12315]: Failed password for root from 58.242.83.28 port 36729 ssh2
Jan 04 21:24:15 worker2 sshd[12315]: Failed password for root from 58.242.83.28 port 36729 ssh2
Jan 04 21:24:18 worker2 sshd[12315]: Failed password for root from 58.242.83.28 port 36729 ssh2
Jan 04 21:24:18 worker2 sshd[12315]: Received disconnect from 58.242.83.28 port 36729:11: [preauth]
Jan 04 21:24:18 worker2 sshd[12315]: Disconnected from authenticating user root 58.242.83.28 port 36729 [preauth]
Jan 04 21:24:18 worker2 sshd[12315]: PAM 2 more authentication failures; logname= uid=0 euid=0 tty=ssh ruser= rhost=58.242.83.28 user=root
Jan 04 21:24:56 worker2 systemd[1]: Started OpenSSH per-connection server daemon (24.114.79.151:58123).
Jan 04 21:24:56 worker2 sshd[12501]: Accepted publickey for core from 24.114.79.151 port 58123 ssh2: RSA SHA256:t4t9yXeR2yC7s9c37mdS/F7koUs2x>
Jan 04 21:24:56 worker2 sshd[12501]: pam_unix(sshd:session): session opened for user core by (uid=0)
Jan 04 21:24:56 worker2 systemd[1]: Failed to set up mount unit: Invalid argument
Jan 04 21:24:56 worker2 systemd[1]: Failed to set up mount unit: Invalid argument
Jan 04 21:24:56 worker2 systemd[1]: Failed to set up mount unit: Invalid argument
Jan 04 21:24:56 worker2 systemd[1]: Failed to set up mount unit: Invalid argument
Jan 04 21:24:56 worker2 systemd[1]: Failed to set up mount unit: Invalid argument
Jan 04 21:24:56 worker2 systemd[1]: Failed to set up mount unit: Invalid argument
Jan 04 21:24:56 worker2 systemd[1]: Failed to set up mount unit: Invalid argument
Jan 04 21:24:56 worker2 systemd[1]: Failed to set up mount unit: Invalid argument
Jan 04 21:24:56 worker2 systemd[1]: Failed to set up mount unit: Invalid argument
Jan 04 21:24:56 worker2 systemd[1]: Failed to set up mount unit: Invalid argument
Jan 04 21:24:56 worker2 systemd[1]: Failed to set up mount unit: Invalid argument
So after digging the problem was due to XFS partitioning issues ...
Do not know how I missed it at first.
In short:
Trying to create a partion using xfs was failing.
i.e. running mkfs.xfs /dev/vdb1 would simply hang. The OS would still create and mark partitions properly but they'd be corrupt - the fact you'd only find out when trying to mount by getting that Can't read superblock error.
So ceph does this:
1. Run deploy
2. Create XFS partitions mkfs.xfs ...
3. OS would create those faulty partitions
4. Since you can still read the status of OSDs just fine all status report and logs will report no problems (mkfs.xfs did not report errors it just hang)
5. When you try to mount cephFS or use block storage the whole thing bombs due to corrupt partions.
The root cause: still unknown. But I suspect something was not done right on the SSD disk level when provisioning/attaching them from my cloud provider. It now works fine

Kubernetes master on GCE not display node on AWS EC2 [closed]

Closed. This question does not meet Stack Overflow guidelines. It is not currently accepting answers.
This question does not appear to be about a specific programming problem, a software algorithm, or software tools primarily used by programmers. If you believe the question would be on-topic on another Stack Exchange site, you can leave a comment to explain where the question may be able to be answered.
Closed 4 years ago.
Improve this question
I created master node on GCE using this commands:
gcloud compute instances create master --machine-type g1-small --zone europe-west1-d
gcloud compute addresses create myexternalip --region europe-west1
gcloud compute target-pools create kubernetes --region europe-west1
gcloud compute target-pools add-instances kubernetes --instances master --instances-zone europe-west1-d
gcloud compute forwarding-rules create kubernetes-forward --address myexternalip --region europe-west1 --ports 1-65535 --target-pool kubernetes
gcloud compute forwarding-rules describe kubernetes-forward
sudo kubeadm init --pod-network-cidr=10.244.0.0/16
kubectl apply -f https://raw.githubusercontent.com/coreos/flannel/master/Documentation/kube-flannel.yml
and opened all firewalls.
After it i created aws ec2 instance, opened firewalls and using:
kubeadm join --token 55d287.b540e254a280f853 ip:6443 --discovery-token-unsafe-skip-ca-verification
to connecting instance to cluster.
But in the master node it is not displayed
Docker version: 17.12
Kubernetes version: 1.9.3
UPD:
Output from node on aws ec2
systemctl status kubelet.service:
kubelet.service - kubelet: The Kubernetes Node Agent
Loaded: loaded (/lib/systemd/system/kubelet.service; enabled; vendor preset: enabled)
Drop-In: /etc/systemd/system/kubelet.service.d
└─10-kubeadm.conf
Active: active (running) since Sat 2018-02-24 20:23:53 UTC; 23s ago
Docs: http://kubernetes.io/docs/
Main PID: 30678 (kubelet)
Tasks: 5
Memory: 13.4M
CPU: 125ms
CGroup: /system.slice/kubelet.service
└─30678 /usr/bin/kubelet --bootstrap-kubeconfig=/etc/kubernetes/bootstrap-kubelet.conf --kubeconfig=/etc/kubernetes/kubelet.conf --pod-manifest-path=/etc/kubernetes/manifests -
Feb 24 20:23:53 ip-172-31-0-250 systemd[1]: kubelet.service: Service hold-off time over, scheduling restart.
Feb 24 20:23:53 ip-172-31-0-250 systemd[1]: Stopped kubelet: The Kubernetes Node Agent.
Feb 24 20:23:53 ip-172-31-0-250 systemd[1]: Started kubelet: The Kubernetes Node Agent.
Feb 24 20:23:53 ip-172-31-0-250 kubelet[30678]: I0224 20:23:53.420375 30678 feature_gate.go:226] feature gates: &{{} map[]}
Feb 24 20:23:53 ip-172-31-0-250 kubelet[30678]: I0224 20:23:53.420764 30678 controller.go:114] kubelet config controller: starting controller
Feb 24 20:23:53 ip-172-31-0-250 kubelet[30678]: I0224 20:23:53.420944 30678 controller.go:118] kubelet config controller: validating combination of defaults and flags
Feb 24 20:23:53 ip-172-31-0-250 kubelet[30678]: W0224 20:23:53.425410 30678 cni.go:171] Unable to update cni config: No networks found in /etc/cni/net.d
Feb 24 20:23:53 ip-172-31-0-250 kubelet[30678]: I0224 20:23:53.444969 30678 server.go:182] Version: v1.9.3
Feb 24 20:23:53 ip-172-31-0-250 kubelet[30678]: I0224 20:23:53.445274 30678 feature_gate.go:226] feature gates: &{{} map[]}
Feb 24 20:23:53 ip-172-31-0-250 kubelet[30678]: I0224 20:23:53.445565 30678 plugins.go:101] No cloud provider specified.
journalctl -u kubelet:
Feb 24 20:15:12 ip-172-31-0-250 systemd[1]: Started kubelet: The Kubernetes Node Agent.
Feb 24 20:15:12 ip-172-31-0-250 systemd[1]: Stopping kubelet: The Kubernetes Node Agent...
Feb 24 20:15:12 ip-172-31-0-250 systemd[1]: Stopped kubelet: The Kubernetes Node Agent.
Feb 24 20:15:12 ip-172-31-0-250 systemd[1]: Started kubelet: The Kubernetes Node Agent.
Feb 24 20:15:12 ip-172-31-0-250 kubelet[30243]: I0224 20:15:12.819249 30243 feature_gate.go:226] feature gates: &{{} map[]}
Feb 24 20:15:12 ip-172-31-0-250 kubelet[30243]: I0224 20:15:12.821054 30243 controller.go:114] kubelet config controller: starting controller
Feb 24 20:15:12 ip-172-31-0-250 kubelet[30243]: I0224 20:15:12.821243 30243 controller.go:118] kubelet config controller: validating combination of defaults and flags
Feb 24 20:15:12 ip-172-31-0-250 kubelet[30243]: error: unable to load client CA file /etc/kubernetes/pki/ca.crt: open /etc/kubernetes/pki/ca.crt: no such file or directory
Feb 24 20:15:12 ip-172-31-0-250 systemd[1]: kubelet.service: Main process exited, code=exited, status=1/FAILURE
Feb 24 20:15:12 ip-172-31-0-250 systemd[1]: kubelet.service: Unit entered failed state.
Feb 24 20:15:12 ip-172-31-0-250 systemd[1]: kubelet.service: Failed with result 'exit-code'.
Feb 24 20:15:23 ip-172-31-0-250 systemd[1]: kubelet.service: Service hold-off time over, scheduling restart.
Feb 24 20:15:23 ip-172-31-0-250 systemd[1]: Stopped kubelet: The Kubernetes Node Agent.
Feb 24 20:15:23 ip-172-31-0-250 systemd[1]: Started kubelet: The Kubernetes Node Agent.
Feb 24 20:15:23 ip-172-31-0-250 kubelet[30304]: I0224 20:15:23.186834 30304 feature_gate.go:226] feature gates: &{{} map[]}
Feb 24 20:15:23 ip-172-31-0-250 kubelet[30304]: I0224 20:15:23.187255 30304 controller.go:114] kubelet config controller: starting controller
Feb 24 20:15:23 ip-172-31-0-250 kubelet[30304]: I0224 20:15:23.187451 30304 controller.go:118] kubelet config controller: validating combination of defaults and flags
Feb 24 20:15:23 ip-172-31-0-250 kubelet[30304]: error: unable to load client CA file /etc/kubernetes/pki/ca.crt: open /etc/kubernetes/pki/ca.crt: no such file or directory
Feb 24 20:15:23 ip-172-31-0-250 systemd[1]: kubelet.service: Main process exited, code=exited, status=1/FAILURE
Feb 24 20:15:23 ip-172-31-0-250 systemd[1]: kubelet.service: Unit entered failed state.
Feb 24 20:15:23 ip-172-31-0-250 systemd[1]: kubelet.service: Failed with result 'exit-code'.
Feb 24 20:15:33 ip-172-31-0-250 systemd[1]: kubelet.service: Service hold-off time over, scheduling restart.
Feb 24 20:15:33 ip-172-31-0-250 systemd[1]: Stopped kubelet: The Kubernetes Node Agent.
Feb 24 20:15:33 ip-172-31-0-250 systemd[1]: Started kubelet: The Kubernetes Node Agent.
Feb 24 20:15:33 ip-172-31-0-250 kubelet[30311]: I0224 20:15:33.422948 30311 feature_gate.go:226] feature gates: &{{} map[]}
Feb 24 20:15:33 ip-172-31-0-250 kubelet[30311]: I0224 20:15:33.423349 30311 controller.go:114] kubelet config controller: starting controller
Feb 24 20:15:33 ip-172-31-0-250 kubelet[30311]: I0224 20:15:33.423525 30311 controller.go:118] kubelet config controller: validating combination of defaults and flags
Feb 24 20:15:33 ip-172-31-0-250 kubelet[30311]: error: unable to load client CA file /etc/kubernetes/pki/ca.crt: open /etc/kubernetes/pki/ca.crt: no such file or directory
Feb 24 20:15:33 ip-172-31-0-250 systemd[1]: kubelet.service: Main process exited, code=exited, status=1/FAILURE
Feb 24 20:15:33 ip-172-31-0-250 systemd[1]: kubelet.service: Unit entered failed state.
Feb 24 20:15:33 ip-172-31-0-250 systemd[1]: kubelet.service: Failed with result 'exit-code'.
Feb 24 20:15:43 ip-172-31-0-250 systemd[1]: kubelet.service: Service hold-off time over, scheduling restart.
Feb 24 20:15:43 ip-172-31-0-250 systemd[1]: Stopped kubelet: The Kubernetes Node Agent.
Feb 24 20:15:43 ip-172-31-0-250 systemd[1]: Started kubelet: The Kubernetes Node Agent.
Feb 24 20:15:43 ip-172-31-0-250 kubelet[30319]: I0224 20:15:43.671742 30319 feature_gate.go:226] feature gates: &{{} map[]}
Feb 24 20:15:43 ip-172-31-0-250 kubelet[30319]: I0224 20:15:43.672195 30319 controller.go:114] kubelet config controller: starting controller
UPD:
Error is on aws ec2 instance side, but i can't find what is wrong.
PROBLEM SOLVED
Should initialize kubeadm with this flag --apiserver-advertise-address
After you create load balancer you need to type this command for show you external load balancer ip address
gcloud compute forwarding-rules describe kubernetes-forward
And the initialize cluster with this flag
--apiserver-advertise-address=external_load_balancer_ip
So your kubeadm command looks like this
sudo kubeadm init --pod-network-cidr=10.244.0.0/16 --apiserver-advertise-address=external_load_balancer_ip

Failed to start Kubernetes API Server duo to unknown reason

the service is not starting and the listener is not activated on port 8080.
here is my kubernetes configuration:
KUBE_LOGTOSTDERR="--logtostderr=true"
KUBE_LOG_LEVEL="--v=0"
KUBE_ALLOW_PRIV="--allow-privileged=false"
KUBE_MASTER="--master=http://centos-master:8080"
KUBE_ETCD_SERVERS="--etcd-servers=http://centos-master:2379"
systemctl status kube-apiserver -l
● kube-apiserver.service - Kubernetes API Server
Loaded: loaded (/usr/lib/systemd/system/kube-apiserver.service; enabled; vendor preset: disabled)
Active: failed (Result: start-limit) since Mon 2017-08-14 12:07:04 +0430; 29s ago
Docs: https://github.com/GoogleCloudPlatform/kubernetes
Process: 2087 ExecStart=/usr/bin/kube-apiserver $KUBE_LOGTOSTDERR $KUBE_LOG_LEVEL $KUBE_ETCD_SERVERS $KUBE_API_ADDRESS $KUBE_API_PORT $KUBELET_PORT $KUBE_ALLOW_PRIV $KUBE_SERVICE_ADDRESSES $KUBE_ADMISSION_CONTROL $KUBE_API_ARGS (code=exited, status=2)
Main PID: 2087 (code=exited, status=2)
Aug 14 12:07:04 centos-master systemd[1]: kube-apiserver.service: main process exited, code=exited, status=2/INVALIDARGUMENT
Aug 14 12:07:04 centos-master systemd[1]: Failed to start Kubernetes API Server.
Aug 14 12:07:04 centos-master systemd[1]: Unit kube-apiserver.service entered failed state.
Aug 14 12:07:04 centos-master systemd[1]: kube-apiserver.service failed.
Aug 14 12:07:04 centos-master systemd[1]: kube-apiserver.service holdoff time over, scheduling restart.
Aug 14 12:07:04 centos-master systemd[1]: start request repeated too quickly for kube-apiserver.service
Aug 14 12:07:04 centos-master systemd[1]: Failed to start Kubernetes API Server.
Aug 14 12:07:04 centos-master systemd[1]: Unit kube-apiserver.service entered failed state.
Aug 14 12:07:04 centos-master systemd[1]: kube-apiserver.service failed.
tail -n 1000 /var/log/messages
resourceVersion=0: dial tcp 10.0.2.4:8080: getsockopt: connection refused
Aug 14 12:12:30 centos-master kube-scheduler: E0814 12:12:30.240160 606 reflector.go:199] k8s.io/kubernetes/plugin/pkg/scheduler/factory/factory.go:466: Failed to list *api.PersistentVolume: Get http://centos-master:8080/api/v1/persistentvolumes?resourceVersion=0: dial tcp 10.0.2.4:8080: getsockopt: connection refused
Aug 14 12:12:30 centos-master kube-scheduler: E0814 12:12:30.242039 606 reflector.go:199] k8s.io/kubernetes/plugin/pkg/scheduler/factory/factory.go:470: Failed to list *api.Service: Get http://centos-master:8080/api/v1/services?resourceVersion=0: dial tcp 10.0.2.4:8080: getsockopt: connection refused
Aug 14 12:12:30 centos-master kube-scheduler: E0814 12:12:30.242924 606 reflector.go:199] k8s.io/kubernetes/plugin/pkg/scheduler/factory/factory.go:457: Failed to list *api.Pod: Get http://centos-master:8080/api/v1/pods?fieldSelector=spec.nodeName%3D%2Cstatus.phase%21%3DFailed%2Cstatus.phase%21%3DSucceeded&resourceVersion=0: dial tcp 10.0.2.4:8080: getsockopt: connection refused
Aug 14 12:12:30 centos-master kube-scheduler: E0814 12:12:30.269386 606 reflector.go:199] k8s.io/kubernetes/plugin/pkg/scheduler/factory/factory.go:473: Failed to list *api.ReplicationController: Get http://centos-master:8080/api/v1/replicationcontrollers?resourceVersion=0: dial tcp 10.0.2.4:8080: getsockopt: connection refused
Aug 14 12:12:30 centos-master kube-scheduler: E0814 12:12:30.285782 606 reflector.go:199] k8s.io/kubernetes/plugin/pkg/scheduler/factory/factory.go:481: Failed to list *extensions.ReplicaSet: Get http://centos-master:8080/apis/extensions/v1beta1/replicasets?resourceVersion=0: dial tcp 10.0.2.4:8080: getsockopt: connection refused
Aug 14 12:12:30 centos-master kube-scheduler: E0814 12:12:30.286529 606 reflector.go:199] pkg/controller/informers/factory.go:89: Failed to list *api.PersistentVolumeClaim: Get http://centos-master:8080/api/v1/persistentvolumeclaims?resourceVersion=0: dial tcp 10.0.2.4:8080: getsockopt: connection refused
systemd[1]: kube-apiserver.service: main process exited, code=exited, status=2/INVALIDARGUMENT
The arguments you're using do not seem valid.
Check the list of valid arguments here.
You can also follow the Kubernetes The Hard Way guide for a trusted way to run the API server.