Issue while joing the master nodes using k8s on GCP - kubernetes

In GCP We are setting up kubernetes 1.14. HA as Stacked etcd topology.
We have created a image where kubernetes binaries are installed.
We have terrafrom script where an instance group is created with 3 master and 5 worker nodes instances using the above image.
Also, in the terrafrom script, we have created a TCP Load Balancing with 6443 port enabled.
I am able to bootstrap one master by running kubeadm init --config=. However, joining the 2nd master fails with below error.
kubeadm join XX.XX.XX.XX:6443 --token 9a08jv.c0izixklcxtmnze7 --discovery-token-ca-cert-hash sha256:73390a94962247546282a0954cb46f2a282b00534c06aff93773f3fc50aee562 --experimental-control-plane -v 8
I0423 09:50:33.623004 21078 checks.go:382] validating the presence of executable touch
I0423 09:50:33.623063 21078 checks.go:524] running all checks
I0423 09:50:33.656532 21078 checks.go:412] checking whether the given node name is reachable using net.LookupHost
I0423 09:50:33.656705 21078 checks.go:622] validating kubelet version
I0423 09:50:33.716178 21078 checks.go:131] validating if the service is enabled and active
I0423 09:50:33.723119 21078 checks.go:209] validating availability of port 10250
I0423 09:50:33.723377 21078 checks.go:439] validating if the connectivity type is via proxy or direct
I0423 09:50:33.723445 21078 join.go:441] [preflight] Fetching init configuration
I0423 09:50:33.723486 21078 join.go:474] [preflight] Retrieving KubeConfig objects
[preflight] Reading configuration from the cluster…
[preflight] FYI: You can look at this config file with ‘kubectl -n kube-system get cm kubeadm-config -oyaml’
I0423 09:50:33.725538 21078 round_trippers.go:416] GET https://XX.XX.XX.XX:6443/api/v1/namespaces/kube-system/configmaps/kubeadm-config
I0423 09:50:33.725564 21078 round_trippers.go:423] Request Headers:
I0423 09:50:33.725570 21078 round_trippers.go:426] Accept: application/json, /
I0423 09:50:33.725594 21078 round_trippers.go:426] User-Agent: kubeadm/v1.14.0 (linux/amd64) kubernetes/641856d
I0423 09:50:33.725886 21078 round_trippers.go:441] Response Status: in 0 milliseconds
I0423 09:50:33.725903 21078 round_trippers.go:444] Response Headers:
error execution phase preflight: unable to fetch the kubeadm-config ConfigMap: failed to get config map: Get https://XX.XX.XX.XX:6443/api/v1/namespaces/kube-system/configmaps/kubeadm-config: dial tcp XX.XX.XX.XX:6443: connect: connection refused
Note, we had faced the same issue in AWS wih NLB Loadbalacer, we were able to overcome the issue by using Classic Loadbalacer
Thanks in advance for your help.


k8s: health check fails via kubelet, but works via curl from worker where kubelet runs

Currently, I am facing an issue with my application: it does not become healthy due to the kubelet not being able to perform a successful health check.
From pod describe:
Warning Unhealthy 84s kubelet Startup probe failed: Get "": dial tcp connect: connection refused
Warning Unhealthy 68s (x3 over 78s) kubelet Liveness probe failed: HTTP probe failed with statuscode: 500
Now, I find this strange as I can run the health check fine from the worker node where the kubelet is running? So I am wondering what is the difference between running the health check from the worker node via curl, or the kubelet doing it.
From worker node where the kubelet is running:
sh-4.4# curl -v
* Trying
* Connected to ( port 7777 (#0)
> GET /healthz/readiness HTTP/1.1
> Host:
> User-Agent: curl/7.61.1
> Accept: */*
< HTTP/1.1 200 OK
< Content-Length: 0
< Connection: close
* Closing connection 0
Can I somehow trace when the kubelet is sending the health probe check? Or maybe get into the kubelet and send it myself from there?
There is an extra thing to be told: my pod has got an istio-proxy container inside. It looks like the traffic from the kubelet gets blocked by this istio-proxy.
Setting the following annotation in my deployement:
"rewriteAppHTTPProbe": true
does not help for the kubelet. It did help to get a 200 OK when running the curl command from the worker node.
Maybe also to note: we are using the istio-cni plugin to ingest the istio sidecar. Not sure whether that makes a difference when using the former approach when injecting using istio-init ...
Any suggestions are welcome :).
Issue looks to be that the istio-cni plugin changes the iptables and a re-direct of the health probe check happens towards the application.
However, the re-direct happens to the localhost: and the application is not listening there for the health probes ... only on the : ...
After changing the iptables to a more proper re-direct, the health probe could get responded fine with a 200 OK and the pod became healthy.

Kubernetes cluster recreated from snapshots issue

OVERVIEW:: I am studying for the Kubernetes Administrator certification. To complete the training course, I created a dual node Kubernetes cluster on Google Cloud, 1 master and 1 slave. As I don't want to leave the instances alive all the time, I took snapshots of them to deploy new instances with the Kubernetes cluster already setup. I am aware that I would need to update the ens4 ip used by kubectl, as this will have changed, which I did.
ISSUE:: When I run "kubectl get pods --all-namespaces" I get the error "The connection to the server localhost:8080 was refused - did you specify the right host or port?"
QUESTION:: Would anyone have had similar issues and know if its possible to recreate a Kubernetes cluster from snapshots?
Adding -v=10 to command, the url matches info in .kube/config file
kubectl get pods --all-namespaces -v=10
I0214 17:11:35.317678 6246 loader.go:375] Config loaded from file: /home/student/.kube/config
I0214 17:11:35.321941 6246 round_trippers.go:423] curl -k -v -XGET -H "User-Agent: kubectl/v1.16.1 (linux/amd64) kubernetes/d647ddb" -H "Accept: application/json, /" 'https://k8smaster:6443/api?timeout=32s'
I0214 17:11:35.333308 6246 round_trippers.go:443] GET https://k8smaster:6443/api?timeout=32s in 11 milliseconds
I0214 17:11:35.333335 6246 round_trippers.go:449] Response Headers:
I0214 17:11:35.333422 6246 cached_discovery.go:121] skipped caching discovery info due to Get https://k8smaster:6443/api?timeout=32s: dial tcp connect: connection refused
I0214 17:11:35.333858 6246 round_trippers.go:423] curl -k -v -XGET -H "Accept: application/json, /" -H "User-Agent: kubectl/v1.16.1 (linux/amd64) kubernetes/d647ddb" 'https://k8smaster:6443/api?timeout=32s'
I0214 17:11:35.334234 6246 round_trippers.go:443] GET https://k8smaster:6443/api?timeout=32s in 0 milliseconds
I0214 17:11:35.334254 6246 round_trippers.go:449] Response Headers:
I0214 17:11:35.334281 6246 cached_discovery.go:121] skipped caching discovery info due to Get https://k8smaster:6443/api?timeout=32s: dial tcp connect: connection refused
I0214 17:11:35.334303 6246 shortcut.go:89] Error loading discovery information: Get https://k8smaster:6443/api?timeout=32s: dial tcp connect: connection refused
I replicated you issue and wrote this step by step debugging process for you so you can see what was my thinking.
I created 2 node cluster (master + worker) with kubeadm and made a snapshot.
Then I deleted all nodes and recreated them from snapshots.
After recreating master node from snapshot I started seeing the same error you are seeing:
#kmaster ~]$ kubectl get po -v=10
I0217 11:04:38.397823 3372 loader.go:375] Config loaded from file: /home/user/.kube/config
I0217 11:04:38.398909 3372 round_trippers.go:423] curl -k -v -XGET -H "Accept: application/json, */*" -H "User-Agent: kubectl/v1.17.3 (linux/amd64) kubernetes/06ad960" ''
The connection was hanging so I interrupted it (ctrl+c).
First I noticed was that IP address of where kubectl was connecting was different than node ip, so I modified .kube/config file providing proper IP.
After doing this, here is what running kubectl showed:
$ kubectl get po -v=10
I0217 11:26:57.020744 15929 loader.go:375] Config loaded from file: /home/user/.kube/config
I0217 11:26:57.025155 15929 helpers.go:221] Connection error: Get dial tcp connect: connection refused
F0217 11:26:57.025201 15929 helpers.go:114] The connection to the server was refused - did you specify the right host or port?
As you see, connection to apiserver was beeing refused so I checked if apiserver was running:
$ sudo docker ps -a | grep apiserver
5e957ff48d11 90d27391b780 "kube-apiserver --ad…" 24 seconds ago Exited (2) 3 seconds ago k8s_kube-apiserver_kube-apiserver-kmaster_kube-system_997514ff25ec38012de6a5be7c43b0ae_14
d78e179f1565 "/pause" 26 minutes ago Up 26 minutes k8s_POD_kube-apiserver-kmaster_kube-system_997514ff25ec38012de6a5be7c43b0ae_1
api-server was exiting for some reason.
I checked its logs (I am only including relevant logs for readability):
$ sudo docker logs 5e957ff48d11
W0217 11:30:46.710541 1 clientconn.go:1120] grpc: addrConn.createTransport failed to connect to { 0 <nil>}. Err :connection error: desc = "transport: Error while dialing dial tcp connect: connection refused". Reconnecting...
panic: context deadline exceeded
Notice apiserver was trying to connect to etcd (notice port: 2379) and receiving connection refused.
My first guess was etcd wasn't running, so I checked etcd container:
$ sudo docker ps -a | grep etcd
4a249cb0743b 303ce5db0e90 "etcd --advertise-cl…" 2 minutes ago Exited (1) 2 minutes ago k8s_etcd_etcd-kmaster_kube-system_9018aafee02ebb028a7befd10063ec1e_19
b89b7e7227de "/pause" 30 minutes ago Up 30 minutes k8s_POD_etcd-kmaster_kube-system_9018aafee02ebb028a7befd10063ec1e_1
I was right: Exited (1) 2 minutes ago. I checked its logs:
$ sudo docker logs 4a249cb0743b
2020-02-17 11:34:31.493215 C | etcdmain: listen tcp bind: cannot assign requested address
etcd was trying to bind with old IP address.
I modified /etc/kubernetes/manifests/etcd.yaml and changed old IP address to new IP everywhere in file.
Quick sudo docker ps | grep etcd showed its running.
After a while apierver also started running.
Then I tried running kubectl:
$ kubectl get po
Unable to connect to the server: x509: certificate is valid for,, not
Invalid apiserver certificate. SSL certificate was genereated for old IP so that would mean I need to generate new certificate with new IP.
$ sudo kubeadm init phase certs apiserver
[certs] Using existing apiserver certificate and key on disk
That's not what I expected. I wanted to generate new certificates, not use old ones.
I deleted old certificates:
$ sudo rm /etc/kubernetes/pki/apiserver.crt \
And tried to generate certificates one more time:
$ sudo kubeadm init phase certs apiserver
[certs] Generating "apiserver" certificate and key
[certs] apiserver serving cert is signed for DNS names [kmaster kubernetes kubernetes.default kubernetes.default.svc kubernetes.default.svc.cluster.local] and IPs []
Looks good. Now let's try using kubectl:
$ kubectl get no
instance-21 Ready master 102m v1.17.3
instance-22 Ready <none> 95m v1.17.3
As you can see now its working.

Kubernetes pods cannot communicate outbound

I have installed Kubernetes v1.13.10 on a group of VMs running CentOS 7. When I deploy pods, they can connect to one another but cannot connect to anything outside of the cluster. The CoreDNS pods have these errors in the log:
[ERROR] plugin/errors: 2 A: unreachable backend: read udp> i/o timeout
[ERROR] plugin/errors: 2 AAAA: unreachable backend: read udp> i/o timeout
[ERROR] plugin/errors: 2 AAAA: unreachable backend: read udp> i/o timeout
[ERROR] plugin/errors: 2 A: unreachable backend: read udp> i/o timeout
The IPs and are the internal DNS servers and are reachable from the nodes. I did a Wireshark capture from the DNS servers, and I see the traffic is coming in from the CoreDNS pod IP address There would be no route for the DNS servers to get back to that IP as it isn't routable outside of the Kubernetes cluster.
My understanding is that an iptables rule should be implemented to nat the pod IPs to the address of the node when a pod is trying to communicate outbound (correct?). Below is the POSTROUTING chain in iptables:
[root#kube-aci-1 ~]# iptables -t nat -L POSTROUTING -v --line-number
Chain POSTROUTING (policy ACCEPT 23 packets, 2324 bytes)
num pkts bytes target prot opt in out source destination
1 1990 166K KUBE-POSTROUTING all -- any any anywhere anywhere /* kubernetes postrouting rules */
2 0 0 MASQUERADE all -- any ens192.152 anywhere
Line 1 was added by kube-proxy and line 2 was a line I manually added to try to nat anything coming from the pod subnet to the node interface ens192.152, but that didn't work.
Here's the kube-proxy logs:
[root#kube-aci-1 ~]# kubectl logs kube-proxy-llq22 -n kube-system
W1117 16:31:59.225870 1 proxier.go:498] Failed to load kernel module ip_vs with modprobe. You can ignore this message when kube-proxy is running inside container without mounting /lib/modules
W1117 16:31:59.232006 1 proxier.go:498] Failed to load kernel module ip_vs_rr with modprobe. You can ignore this message when kube-proxy is running inside container without mounting /lib/modules
W1117 16:31:59.233727 1 proxier.go:498] Failed to load kernel module ip_vs_wrr with modprobe. You can ignore this message when kube-proxy is running inside container without mounting /lib/modules
W1117 16:31:59.235700 1 proxier.go:498] Failed to load kernel module ip_vs_sh with modprobe. You can ignore this message when kube-proxy is running inside container without mounting /lib/modules
W1117 16:31:59.255278 1 server_others.go:296] Flag proxy-mode="" unknown, assuming iptables proxy
I1117 16:31:59.289360 1 server_others.go:148] Using iptables Proxier.
I1117 16:31:59.296021 1 server_others.go:178] Tearing down inactive rules.
I1117 16:31:59.324352 1 server.go:484] Version: v1.13.10
I1117 16:31:59.335846 1 conntrack.go:52] Setting nf_conntrack_max to 131072
I1117 16:31:59.336443 1 config.go:102] Starting endpoints config controller
I1117 16:31:59.336466 1 controller_utils.go:1027] Waiting for caches to sync for endpoints config controller
I1117 16:31:59.336493 1 config.go:202] Starting service config controller
I1117 16:31:59.336499 1 controller_utils.go:1027] Waiting for caches to sync for service config controller
I1117 16:31:59.436617 1 controller_utils.go:1034] Caches are synced for service config controller
I1117 16:31:59.436739 1 controller_utils.go:1034] Caches are synced for endpoints config controller
I have tried flushing the iptables nat table as well as restarted kube-proxy on all nodes, but the problem still persisted. Any clues in the output above, or thoughts on further troubleshooting?
Output of kubectl get nodes:
[root#kube-aci-1 ~]# kubectl get nodes -o wide
kube-aci-1 Ready master 85d v1.13.10 <none> CentOS Linux 7 (Core) 3.10.0-957.el7.x86_64 docker://1.13.1
kube-aci-2 Ready <none> 85d v1.13.10 <none> CentOS Linux 7 (Core) 3.10.0-957.el7.x86_64 docker://1.13.1
Turns out it is necessary to use a subnet that is routable on the network with the CNI in use if outbound communication from pods is necessary. I made the subnet routable on the external network and the pods can now communicate outbound.

kube-dns getsockopt no route to host

I'm struggling to understand how to correctly configure kube-dns with flannel on kubernetes 1.10 and containerd as the CRI.
kube-dns fails to run, with the following error:
kubectl -n kube-system logs kube-dns-595fdb6c46-9tvn9 -c kubedns
I0424 14:56:34.944476 1 dns.go:219] Waiting for [endpoints services] to be initialized from apiserver...
I0424 14:56:35.444469 1 dns.go:219] Waiting for [endpoints services] to be initialized from apiserver...
E0424 14:56:35.815863 1 reflector.go:201] Failed to list *v1.Service: Get dial tcp getsockopt: no route to host
E0424 14:56:35.815863 1 reflector.go:201] Failed to list *v1.Endpoints: Get dial tcp getsockopt: no route to host
I0424 14:56:35.944444 1 dns.go:219] Waiting for [endpoints services] to be initialized from apiserver...
I0424 14:56:36.444462 1 dns.go:219] Waiting for [endpoints services] to be initialized from apiserver...
I0424 14:56:36.944507 1 dns.go:219] Waiting for [endpoints services] to be initialized from apiserver...
F0424 14:56:37.444434 1 dns.go:209] Timeout waiting for initialization
kubectl -n kube-system describe pod kube-dns-595fdb6c46-9tvn9
Type Reason Age From Message
---- ------ ---- ---- -------
Warning Unhealthy 47m (x181 over 3h) kubelet, worker1 Readiness probe failed: Get net/http: request canceled while waiting for connection (Client.Timeout exceeded while awaiting headers)
Warning BackOff 27m (x519 over 3h) kubelet, worker1 Back-off restarting failed container
Normal Killing 17m (x44 over 3h) kubelet, worker1 Killing container with id containerd://dnsmasq:Container failed liveness probe.. Container will be killed and recreated.
Warning Unhealthy 12m (x178 over 3h) kubelet, worker1 Liveness probe failed: Get net/http: request canceled while waiting for connection (Client.Timeout exceeded while awaiting headers)
Warning BackOff 2m (x855 over 3h) kubelet, worker1 Back-off restarting failed container
There is indeed no route to the endpoint:
ip route
default via dev ens160 dev ens160 proto kernel scope link src via dev flannel.1 onlink dev cni0 proto kernel scope link src via dev flannel.1 onlink via dev flannel.1 onlink via dev flannel.1 onlink via dev flannel.1 onlink
What is responsible for configuring the cluster service address range and associated routes? Is it the container runtime, the overlay network (flannel in this case), or something else? Where should it be configured?
The 10-containerd-net.conflist configures the bridge between the host and my pod network. Can the service network be configured here too?
cat /etc/cni/net.d/10-containerd-net.conflist
"cniVersion": "0.3.1",
"name": "containerd-net",
"plugins": [
"type": "bridge",
"bridge": "cni0",
"isGateway": true,
"ipMasq": true,
"promiscMode": true,
"ipam": {
"type": "host-local",
"subnet": "",
"routes": [
{ "dst": "" }
"type": "portmap",
"capabilities": {"portMappings": true}
Just came across this from 2016:
As of a few weeks ago (I forget the release but it was a 1.2.x where x
!= 0) (#24429) we fixed the routing such that any traffic that arrives
at a node destined for a service IP will be handled as if it came to a
node port. This means you should be able to set yo static routes for
your service cluster IP range to one or more nodes and the nodes will
act as bridges. This is the same trick most people do with flannel to
bridge the overlay.
It's imperfect but it works. In the future will will need to get more
precise with the routing if you want optimal behavior (i.e. not losing
the client IP), or we will see more non-kube-proxy implementations of
Is that still relevant? Do I need to setup a static route for the service CIDR? Or is the issue actually with kube-proxy rather than flannel or containerd?
My flannel configuration:
cat /etc/cni/net.d/10-flannel.conflist
"name": "cbr0",
"plugins": [
"type": "flannel",
"delegate": {
"hairpinMode": true,
"isDefaultGateway": true
"type": "portmap",
"capabilities": {
"portMappings": true
And kube-proxy:
Description=Kubernetes Kube Proxy
ExecStart=/usr/local/bin/kube-proxy \
--cluster-cidr= \
--feature-gates=SupportIPVSProxyMode=true \
--ipvs-min-sync-period=5s \
--ipvs-sync-period=5s \
--ipvs-scheduler=rr \
--kubeconfig=/etc/kubernetes/kube-proxy.conf \
--logtostderr=true \
--master= \
--proxy-mode=ipvs \
Having looked at the kube-proxy debugging steps, it appears that kube-proxy cannot contact the master. I suspect this is a large part of the problem. I have 3 controller/master nodes behind a HAProxy loadbalancer, which is bound to and forwards round robin to each of the masters on[1|2|3]:6443. This can be seen in the output/configs above.
In kube-proxy.service, I have specified --master= Why are connections being attempted to port 443? Can I change this - there doesn't seem to be a port flag? Does it need to be port 443 for some reason?
There are two components to this answer, one about running kube-proxy and one about where those :443 URLs are coming from.
First, about kube-proxy: please don't run kube-proxy as a systemd service like that. It is designed to be launched by kubelet in the cluster so that the SDN addresses behave rationally, since they are effectively "fake" addresses. By running kube-proxy outside the control of kubelet, all kinds of weird things are going to happen unless you expend a huge amount of energy to replicate the way that kubelet configures its subordinate docker containers.
Now, about that :443 URL:
E0424 14:56:35.815863 1 reflector.go:201] Failed to list *v1.Service: Get dial tcp getsockopt: no route to host
Why are connections being attempted to port 443? Can I change this - there doesn't seem to be a port flag? Does it need to be port 443 for some reason?
That is from the Service CIDR of your cluster, which is (and should be) separate from the Pod CIDR which should be separate from the Node's subnets, etc. The .1 of the cluster's Service CIDR is either reserved (or traditionally allocated) to the kubernetes.default.svc.cluster.local Service, with its one Service.port as 443.
I'm not super sure why the --master flag doesn't supersede the value in /etc/kubernetes/kube-proxy.conf but since that file is very clearly only supposed to be used by kube-proxy, why not just update the value in the file to remove all doubt?

Kubernetes unable to retrieve logs

I have kubeadm cluster deployed in CentOS VM. while trying to deploy ingress controller following github i noticed that i'm unable to see logs:
kubectl logs -n ingress-nginx nginx-ingress-controller-697f7c6ddb-x9xkh --previous
Error from server: Get dial tcp getsockopt: connection timed out
In (node1) netstat returns:
tcp6 0 0 :::10250 :::* LISTEN 1068/kubelet
In fact i'm unable to see any logs despite the status of the pod.
I disabled both the firewalld and SELinux.
I used proxy to enable kubernertes to download images, now i removed the proxy.
When navigating to the url in the error above i get Forbidden (user=system:anonymous, verb=get, resource=nodes, subresource=proxy)
I'm also able to fetch my nodes:
kubectl get node
k8s-master Ready master 32d v1.9.3
k8s-node1 Ready <none> 30d v1.9.3
k8s-node2 NotReady <none> 32d v1.9.3
getsockopt: connection timed out
Is 99.99999% a firewall issue. If it was "connection refused" then showing the output of netstat would be meaningful, but (as you can see) kubelet is listening on that port just fine -- it's the networking configuration between the machine that is running kubectl and "" that is incorrectly configured to allow traffic.
The apiserver expects that everyone who would want to view logs (or use kubectl exec) can reach that port on every Node in the cluster; so be sure you don't just fix the firewall rule(s) for that one Node -- fix it for all of them.
This message is from the apiserver running on your master. The command kubectl logs, running on your local machine, fetches logs via the apiserver. So the error message reveals a firewall misconfiguration between the master and the node(s) (port 10250)