kube-dns getsockopt no route to host - kubernetes

I'm struggling to understand how to correctly configure kube-dns with flannel on kubernetes 1.10 and containerd as the CRI.
kube-dns fails to run, with the following error:
kubectl -n kube-system logs kube-dns-595fdb6c46-9tvn9 -c kubedns
I0424 14:56:34.944476 1 dns.go:219] Waiting for [endpoints services] to be initialized from apiserver...
I0424 14:56:35.444469 1 dns.go:219] Waiting for [endpoints services] to be initialized from apiserver...
E0424 14:56:35.815863 1 reflector.go:201] k8s.io/dns/pkg/dns/dns.go:192: Failed to list *v1.Service: Get https://10.96.0.1:443/api/v1/services?resourceVersion=0: dial tcp 10.96.0.1:443: getsockopt: no route to host
E0424 14:56:35.815863 1 reflector.go:201] k8s.io/dns/pkg/dns/dns.go:189: Failed to list *v1.Endpoints: Get https://10.96.0.1:443/api/v1/endpoints?resourceVersion=0: dial tcp 10.96.0.1:443: getsockopt: no route to host
I0424 14:56:35.944444 1 dns.go:219] Waiting for [endpoints services] to be initialized from apiserver...
I0424 14:56:36.444462 1 dns.go:219] Waiting for [endpoints services] to be initialized from apiserver...
I0424 14:56:36.944507 1 dns.go:219] Waiting for [endpoints services] to be initialized from apiserver...
F0424 14:56:37.444434 1 dns.go:209] Timeout waiting for initialization
kubectl -n kube-system describe pod kube-dns-595fdb6c46-9tvn9
Type Reason Age From Message
---- ------ ---- ---- -------
Warning Unhealthy 47m (x181 over 3h) kubelet, worker1 Readiness probe failed: Get http://10.244.0.2:8081/readiness: net/http: request canceled while waiting for connection (Client.Timeout exceeded while awaiting headers)
Warning BackOff 27m (x519 over 3h) kubelet, worker1 Back-off restarting failed container
Normal Killing 17m (x44 over 3h) kubelet, worker1 Killing container with id containerd://dnsmasq:Container failed liveness probe.. Container will be killed and recreated.
Warning Unhealthy 12m (x178 over 3h) kubelet, worker1 Liveness probe failed: Get http://10.244.0.2:10054/metrics: net/http: request canceled while waiting for connection (Client.Timeout exceeded while awaiting headers)
Warning BackOff 2m (x855 over 3h) kubelet, worker1 Back-off restarting failed container
There is indeed no route to the 10.96.0.1 endpoint:
ip route
default via 10.240.0.254 dev ens160
10.240.0.0/24 dev ens160 proto kernel scope link src 10.240.0.21
10.244.0.0/24 via 10.244.0.0 dev flannel.1 onlink
10.244.0.0/16 dev cni0 proto kernel scope link src 10.244.0.1
10.244.1.0/24 via 10.244.1.0 dev flannel.1 onlink
10.244.2.0/24 via 10.244.2.0 dev flannel.1 onlink
10.244.4.0/24 via 10.244.4.0 dev flannel.1 onlink
10.244.5.0/24 via 10.244.5.0 dev flannel.1 onlink
What is responsible for configuring the cluster service address range and associated routes? Is it the container runtime, the overlay network (flannel in this case), or something else? Where should it be configured?
The 10-containerd-net.conflist configures the bridge between the host and my pod network. Can the service network be configured here too?
cat /etc/cni/net.d/10-containerd-net.conflist
{
"cniVersion": "0.3.1",
"name": "containerd-net",
"plugins": [
{
"type": "bridge",
"bridge": "cni0",
"isGateway": true,
"ipMasq": true,
"promiscMode": true,
"ipam": {
"type": "host-local",
"subnet": "10.244.0.0/16",
"routes": [
{ "dst": "0.0.0.0/0" }
]
}
},
{
"type": "portmap",
"capabilities": {"portMappings": true}
}
]
}
Edit:
Just came across this from 2016:
As of a few weeks ago (I forget the release but it was a 1.2.x where x
!= 0) (#24429) we fixed the routing such that any traffic that arrives
at a node destined for a service IP will be handled as if it came to a
node port. This means you should be able to set yo static routes for
your service cluster IP range to one or more nodes and the nodes will
act as bridges. This is the same trick most people do with flannel to
bridge the overlay.
It's imperfect but it works. In the future will will need to get more
precise with the routing if you want optimal behavior (i.e. not losing
the client IP), or we will see more non-kube-proxy implementations of
services.
Is that still relevant? Do I need to setup a static route for the service CIDR? Or is the issue actually with kube-proxy rather than flannel or containerd?
My flannel configuration:
cat /etc/cni/net.d/10-flannel.conflist
{
"name": "cbr0",
"plugins": [
{
"type": "flannel",
"delegate": {
"hairpinMode": true,
"isDefaultGateway": true
}
},
{
"type": "portmap",
"capabilities": {
"portMappings": true
}
}
]
}
And kube-proxy:
[Unit]
Description=Kubernetes Kube Proxy
Documentation=https://github.com/kubernetes/kubernetes
[Service]
ExecStart=/usr/local/bin/kube-proxy \
--cluster-cidr=10.244.0.0/16 \
--feature-gates=SupportIPVSProxyMode=true \
--ipvs-min-sync-period=5s \
--ipvs-sync-period=5s \
--ipvs-scheduler=rr \
--kubeconfig=/etc/kubernetes/kube-proxy.conf \
--logtostderr=true \
--master=https://192.168.160.1:6443 \
--proxy-mode=ipvs \
--v=2
Restart=on-failure
RestartSec=5
[Install]
WantedBy=multi-user.target
Edit:
Having looked at the kube-proxy debugging steps, it appears that kube-proxy cannot contact the master. I suspect this is a large part of the problem. I have 3 controller/master nodes behind a HAProxy loadbalancer, which is bound to 192.168.160.1:6443 and forwards round robin to each of the masters on 10.240.0.1[1|2|3]:6443. This can be seen in the output/configs above.
In kube-proxy.service, I have specified --master=192.168.160.1:6443. Why are connections being attempted to port 443? Can I change this - there doesn't seem to be a port flag? Does it need to be port 443 for some reason?

There are two components to this answer, one about running kube-proxy and one about where those :443 URLs are coming from.
First, about kube-proxy: please don't run kube-proxy as a systemd service like that. It is designed to be launched by kubelet in the cluster so that the SDN addresses behave rationally, since they are effectively "fake" addresses. By running kube-proxy outside the control of kubelet, all kinds of weird things are going to happen unless you expend a huge amount of energy to replicate the way that kubelet configures its subordinate docker containers.
Now, about that :443 URL:
E0424 14:56:35.815863 1 reflector.go:201] k8s.io/dns/pkg/dns/dns.go:192: Failed to list *v1.Service: Get https://10.96.0.1:443/api/v1/services?resourceVersion=0: dial tcp 10.96.0.1:443: getsockopt: no route to host
...
Why are connections being attempted to port 443? Can I change this - there doesn't seem to be a port flag? Does it need to be port 443 for some reason?
That 10.96.0.1 is from the Service CIDR of your cluster, which is (and should be) separate from the Pod CIDR which should be separate from the Node's subnets, etc. The .1 of the cluster's Service CIDR is either reserved (or traditionally allocated) to the kubernetes.default.svc.cluster.local Service, with its one Service.port as 443.
I'm not super sure why the --master flag doesn't supersede the value in /etc/kubernetes/kube-proxy.conf but since that file is very clearly only supposed to be used by kube-proxy, why not just update the value in the file to remove all doubt?

Related

Kubernetes: Calio not working on remote worker, local ok

I setup a Kubernetes cluster with calico.
The setup is "simple"
1x master (local network, ok)
1x node (local network, ok)
1x node (cloud server, not ok)
All debian buster with docker 19.03
On the cloud server the calico pods do not come up:
calico-kube-controllers-token-x:
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal SandboxChanged 47m (x50 over 72m) kubelet Pod sandbox changed, it will be killed and re-created.
Warning FailedMount 43m kubelet MountVolume.SetUp failed for volume "calico-kube-controllers-token-x" : failed to sync secret cache: timed out waiting for the condition
Normal SandboxChanged 3m41s (x78 over 43m) kubelet Pod sandbox changed, it will be killed and re-created.
calico-node-x:
Warning Unhealthy 43m (x5 over 43m) kubelet Liveness probe failed: calico/node is not ready: Felix is not live: Get "http://localhost:9099/liveness": dial tcp [::1]:9099: connect: connection refused
Warning Unhealthy 14m (x77 over 43m) kubelet Readiness probe failed: calico/node is not ready: BIRD is not ready: Error querying BIRD: unable to connect to BIRDv4 socket: dial unix /var/run/bird/bird.ctl: connect: no such file or directory
Warning BackOff 4m26s (x115 over 39m) kubelet Back-off restarting failed container
My guess is that there is something wrong with IP/Network config, but did not figure out which.
Required ports (k8s&BGP) are forwarded from the router, also tried the master directly connected to the internet
--control-plane-endpoint is a hostname and public resolveable
Calico is using BGP peering (using public ip as peer)
This entry does worry me the most:
displayes local ip: kubectl get --raw /api
I tried to find a way to change this to the public IP of the master, without success.
Anyone got a clue what to try next?
After an additional time spend with analysis the problem happend to be the distributed api ip address was the local one, not the dns-name.
Created a vpn with wireguard from the cloud node to the local master, so the local ip of the master is reachable from the cloud node.
Don't know if that is the cleanest solution, but it works.
Run this command to verify if IP_AUTODETECTION_METHOD environment variable in calico daemonset has been set
kubectl get daemonset/calico-node -n kube-system --output json | jq '.spec.template.spec.containers[].env[] | select(.name | startswith("IP"))'
Run this command in each of your k8s nodes to find the valid network interface
ifconfig
Explicitly set the IP_AUTODETECTION_METHOD environment variable, to make sure the calico node communicates to the correct network interface of the K8s node.
kubectl set env daemonset/calico-node -n kube-system IP_AUTODETECTION_METHOD=interface=en.*

Kubernetes pods cannot communicate outbound

I have installed Kubernetes v1.13.10 on a group of VMs running CentOS 7. When I deploy pods, they can connect to one another but cannot connect to anything outside of the cluster. The CoreDNS pods have these errors in the log:
[ERROR] plugin/errors: 2 app.harness.io.xentaurs.com. A: unreachable backend: read udp 172.21.0.33:48105->10.20.10.52:53: i/o timeout
[ERROR] plugin/errors: 2 app.harness.io.xentaurs.com. AAAA: unreachable backend: read udp 172.21.0.33:49098->10.20.10.51:53: i/o timeout
[ERROR] plugin/errors: 2 app.harness.io.xentaurs.com. AAAA: unreachable backend: read udp 172.21.0.33:53113->10.20.10.51:53: i/o timeout
[ERROR] plugin/errors: 2 app.harness.io.xentaurs.com. A: unreachable backend: read udp 172.21.0.33:39648->10.20.10.51:53: i/o timeout
The IPs 10.20.10.51 and 10.20.10.52 are the internal DNS servers and are reachable from the nodes. I did a Wireshark capture from the DNS servers, and I see the traffic is coming in from the CoreDNS pod IP address 172.21.0.33. There would be no route for the DNS servers to get back to that IP as it isn't routable outside of the Kubernetes cluster.
My understanding is that an iptables rule should be implemented to nat the pod IPs to the address of the node when a pod is trying to communicate outbound (correct?). Below is the POSTROUTING chain in iptables:
[root#kube-aci-1 ~]# iptables -t nat -L POSTROUTING -v --line-number
Chain POSTROUTING (policy ACCEPT 23 packets, 2324 bytes)
num pkts bytes target prot opt in out source destination
1 1990 166K KUBE-POSTROUTING all -- any any anywhere anywhere /* kubernetes postrouting rules */
2 0 0 MASQUERADE all -- any ens192.152 172.21.0.0/16 anywhere
Line 1 was added by kube-proxy and line 2 was a line I manually added to try to nat anything coming from the pod subnet 172.21.0.0/16 to the node interface ens192.152, but that didn't work.
Here's the kube-proxy logs:
[root#kube-aci-1 ~]# kubectl logs kube-proxy-llq22 -n kube-system
W1117 16:31:59.225870 1 proxier.go:498] Failed to load kernel module ip_vs with modprobe. You can ignore this message when kube-proxy is running inside container without mounting /lib/modules
W1117 16:31:59.232006 1 proxier.go:498] Failed to load kernel module ip_vs_rr with modprobe. You can ignore this message when kube-proxy is running inside container without mounting /lib/modules
W1117 16:31:59.233727 1 proxier.go:498] Failed to load kernel module ip_vs_wrr with modprobe. You can ignore this message when kube-proxy is running inside container without mounting /lib/modules
W1117 16:31:59.235700 1 proxier.go:498] Failed to load kernel module ip_vs_sh with modprobe. You can ignore this message when kube-proxy is running inside container without mounting /lib/modules
W1117 16:31:59.255278 1 server_others.go:296] Flag proxy-mode="" unknown, assuming iptables proxy
I1117 16:31:59.289360 1 server_others.go:148] Using iptables Proxier.
I1117 16:31:59.296021 1 server_others.go:178] Tearing down inactive rules.
I1117 16:31:59.324352 1 server.go:484] Version: v1.13.10
I1117 16:31:59.335846 1 conntrack.go:52] Setting nf_conntrack_max to 131072
I1117 16:31:59.336443 1 config.go:102] Starting endpoints config controller
I1117 16:31:59.336466 1 controller_utils.go:1027] Waiting for caches to sync for endpoints config controller
I1117 16:31:59.336493 1 config.go:202] Starting service config controller
I1117 16:31:59.336499 1 controller_utils.go:1027] Waiting for caches to sync for service config controller
I1117 16:31:59.436617 1 controller_utils.go:1034] Caches are synced for service config controller
I1117 16:31:59.436739 1 controller_utils.go:1034] Caches are synced for endpoints config controller
I have tried flushing the iptables nat table as well as restarted kube-proxy on all nodes, but the problem still persisted. Any clues in the output above, or thoughts on further troubleshooting?
Output of kubectl get nodes:
[root#kube-aci-1 ~]# kubectl get nodes -o wide
NAME STATUS ROLES AGE VERSION INTERNAL-IP EXTERNAL-IP OS-IMAGE KERNEL-VERSION CONTAINER-RUNTIME
kube-aci-1 Ready master 85d v1.13.10 10.10.52.217 <none> CentOS Linux 7 (Core) 3.10.0-957.el7.x86_64 docker://1.13.1
kube-aci-2 Ready <none> 85d v1.13.10 10.10.52.218 <none> CentOS Linux 7 (Core) 3.10.0-957.el7.x86_64 docker://1.13.1
Turns out it is necessary to use a subnet that is routable on the network with the CNI in use if outbound communication from pods is necessary. I made the subnet routable on the external network and the pods can now communicate outbound.

Kube-state-metrics error: Failed to create client: ... i/o timeout

I'm running Kubernetes in virtual machines and going through the basic tutorials, currently Add logging and metrics to the PHP / Redis Guestbook example. I'm trying to install kube-state-metrics:
git clone https://github.com/kubernetes/kube-state-metrics.git kube-state-metrics
kubectl create -f kube-state-metrics/kubernetes
but it fails.
kubectl describe pod --namespace kube-system kube-state-metrics-7d84474f4d-d5dg7
...
Warning Unhealthy 28m (x8 over 30m) kubelet, kubernetes-node1 Readiness probe failed: Get http://192.168.129.102:8080/healthz: dial tcp 192.168.129.102:8080: connect: connection refused
kubectl logs --namespace kube-system kube-state-metrics-7d84474f4d-d5dg7 -c kube-state-metrics
I0514 17:29:26.980707 1 main.go:85] Using default collectors
I0514 17:29:26.980774 1 main.go:93] Using all namespace
I0514 17:29:26.980780 1 main.go:129] metric white-blacklisting: blacklisting the following items:
W0514 17:29:26.980800 1 client_config.go:549] Neither --kubeconfig nor --master was specified. Using the inClusterConfig. This might not work.
I0514 17:29:26.983504 1 main.go:169] Testing communication with server
F0514 17:29:56.984025 1 main.go:137] Failed to create client: ERROR communicating with apiserver: Get https://10.96.0.1:443/version?timeout=32s: dial tcp 10.96.0.1:443: i/o timeout
I'm unsure if this 10.96.0.1 IP is correct. My virtual machines are in a bridged network 10.10.10.0/24 and a host-only network 192.168.59.0/24. When initializing Kubernetes I used the argument --pod-network-cidr=192.168.0.0/16 so that's one more IP range that I'd expect. But 10.96.0.1 looks unfamiliar.
I'm new to Kubernetes, just doing the basic tutorials, so I don't know what to do now. How to fix it or investigate further?
EDIT - additonal info:
kubectl get nodes -o wide
NAME STATUS ROLES AGE VERSION INTERNAL-IP EXTERNAL-IP OS-IMAGE KERNEL-VERSION CONTAINER-RUNTIME
kubernetes-master Ready master 15d v1.14.1 10.10.10.11 <none> Ubuntu 18.04.2 LTS 4.15.0-48-generic docker://18.9.2
kubernetes-node1 Ready <none> 15d v1.14.1 10.10.10.5 <none> Ubuntu 18.04.2 LTS 4.15.0-48-generic docker://18.9.2
kubernetes-node2 Ready <none> 15d v1.14.1 10.10.10.98 <none> Ubuntu 18.04.2 LTS 4.15.0-48-generic docker://18.9.2
The command I used to initialize the cluster:
sudo kubeadm init --apiserver-advertise-address=192.168.59.20 --pod-network-cidr=192.168.0.0/16
The reason for this is probably overlapping of Pod network with Node network - you set Pod network CIDR to 192.168.0.0/16 which your host-only network will be included into as its address is 192.168.59.0/24.
To solve this you can either change the pod network CIDR to 192.168.0.0/24 (it is not recommended as this will give you only 255 addresses for your pod networking)
You can also use different range for your Calico. If you want to do it on a running cluster here is an instruction.
Also other way I tried:
edit Calico manifest to different range (for example 10.0.0.0/8) - sudo kubeadm init --apiserver-advertise-address=192.168.59.20 --pod-network-cidr=10.0.0.0/8) and apply it after the init.
Another way would be using different CNI like Flannel (which uses 10.244.0.0/16).
You can find more information about ranges of CNI plugins here.

Coredns in CrashLoopBackOff (kubernetes 1.11)

I'm trying to install kubernetes on an Ubuntu 16.04 VM, followed instructions at https://kubernetes.io/docs/setup/independent/create-cluster-kubeadm/, using weave as my pod network add-on.
I'm seeing similar issue as coredns pods have CrashLoopBackOff or Error state, but I didn't see a solution there, and the versions I'm using are different:
kubeadm 1.11.4-00
kubectl 1.11.4-00
kubelet 1.11.4-00
kubernetes-cni 0.6.0-00
Docker version 1.13.1-cs8, build 91ca5f2
weave script 2.5.0
weave 2.5.0
I'm running behind a corporate firewall, so I set my proxy variables, then ran kubeadm init as follows:
# echo $http_proxy
http://135.28.13.11:8080
# echo $https_proxy
http://135.28.13.11:8080
# echo $no_proxy
127.0.0.1,135.21.27.139,135.0.0.0/8,10.96.0.0/12,10.32.0.0/12
# kubeadm init --pod-network-cidr=10.32.0.0/12
# kubectl apply -f "https://cloud.weave.works/k8s/net?k8s-version=$(kubectl version | base64 | tr -d '\n')"
# kubectl taint nodes --all node-role.kubernetes.io/master-
Both coredns pods stay in CrashLoopBackOff
# kubectl get pods --all-namespaces -o wide
NAMESPACE NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE
default hostnames-674b556c4-2b5h2 1/1 Running 0 5h 10.32.0.6 mtpnjvzonap001 <none>
default hostnames-674b556c4-4bzdj 1/1 Running 0 5h 10.32.0.5 mtpnjvzonap001 <none>
default hostnames-674b556c4-64gx5 1/1 Running 0 5h 10.32.0.4 mtpnjvzonap001 <none>
kube-system coredns-78fcdf6894-s7rvx 0/1 CrashLoopBackOff 18 1h 10.32.0.7 mtpnjvzonap001 <none>
kube-system coredns-78fcdf6894-vxwgv 0/1 CrashLoopBackOff 80 6h 10.32.0.2 mtpnjvzonap001 <none>
kube-system etcd-mtpnjvzonap001 1/1 Running 0 6h 135.21.27.139 mtpnjvzonap001 <none>
kube-system kube-apiserver-mtpnjvzonap001 1/1 Running 0 1h 135.21.27.139 mtpnjvzonap001 <none>
kube-system kube-controller-manager-mtpnjvzonap001 1/1 Running 0 6h 135.21.27.139 mtpnjvzonap001 <none>
kube-system kube-proxy-2c4tx 1/1 Running 0 6h 135.21.27.139 mtpnjvzonap001 <none>
kube-system kube-scheduler-mtpnjvzonap001 1/1 Running 0 1h 135.21.27.139 mtpnjvzonap001 <none>
kube-system weave-net-bpx22 2/2 Running 0 6h 135.21.27.139 mtpnjvzonap001 <none>
coredns pods have this entry in their log
E1114 20:59:13.848196 1 reflector.go:205]
github.com/coredns/coredns/plugin/kubernetes/controller.go:313: Failed
to list *v1.Service: Get
https://10.96.0.1:443/api/v1/services?limit=500&resourceVersion=0:
dial tcp 10.96.0.1:443: i/o timeout
This suggests to me that coredns cannot access apiserver pod using its cluster IP:
# kubectl describe svc/kubernetes
Name: kubernetes
Namespace: default
Labels: component=apiserver
provider=kubernetes
Annotations: <none>
Selector: <none>
Type: ClusterIP
IP: 10.96.0.1
Port: https 443/TCP
TargetPort: 6443/TCP
Endpoints: 135.21.27.139:6443
Session Affinity: None
Events: <none>
I also went through the troubleshooting steps at https://kubernetes.io/docs/tasks/debug-application-cluster/troubleshooting/
I created a busybox pod for testing
I created the hostnames deployment successfully
I exposed the hostnames deployment successfully
From the busybox pod, I accessed the hostnames service by its cluster IP successfully
from the node, I accessed the hostnames service by its cluster IP successfully
So in short, I created the hostnames service which had a cluster IP in 10.96.0.0/12 space (as expected), and it works, but for some reason, pods cannot access the apiserver's cluster IP of 10.96.0.1, though from the node I can access 10.96.0.1:
# wget --no-check-certificate https://10.96.0.1/hello
--2018-11-14 21:44:25-- https://10.96.0.1/hello
Connecting to 10.96.0.1:443... connected.
WARNING: cannot verify 10.96.0.1's certificate, issued by ‘CN=kubernetes’:
Unable to locally verify the issuer's authority.
HTTP request sent, awaiting response... 403 Forbidden
2018-11-14 21:44:25 ERROR 403: Forbidden.
Some other things I checked, based on advice from others who reported a similar problem:
# sysctl net.ipv4.conf.all.forwarding
net.ipv4.conf.all.forwarding = 1
# sysctl net.bridge.bridge-nf-call-iptables
net.bridge.bridge-nf-call-iptables = 1
# iptables-save | egrep ':INPUT|:OUTPUT|:POSTROUTING|:FORWARD'
:INPUT ACCEPT [0:0]
:OUTPUT ACCEPT [11:692]
:POSTROUTING ACCEPT [11:692]
:INPUT ACCEPT [1697:364811]
:FORWARD ACCEPT [0:0]
:OUTPUT ACCEPT [1652:363693]
# ls -l /usr/sbin/conntrack
-rwxr-xr-x 1 root root 65632 Jan 24 2016 /usr/sbin/conntrack
# systemctl status firewalld
● firewalld.service
Loaded: not-found (Reason: No such file or directory)
Active: inactive (dead)
I checked the log for kube-proxy, did not see any errors.
I also tried deleting coredns pods, apiserver pod; they are recreated (as expected), but the problem remains.
Here's a copy of the log from the weave container
# kubectl logs -n kube-system weave-net-bpx22 weave
DEBU: 2018/11/14 15:56:10.909921 [kube-peers] Checking peer "aa:53:be:75:71:f7" against list &{[]}
Peer not in list; removing persisted data
INFO: 2018/11/14 15:56:11.041807 Command line options: map[name:aa:53:be:75:71:f7 nickname:mtpnjvzonap001 ipalloc-init:consensus=1 ipalloc-range:10.32.0.0/12 db-prefix:/weavedb/weave-net docker-api: expect-npc:true host-root:/host http-addr:127.0.0.1:6784 metrics-addr:0.0.0.0:6782 conn-limit:100 datapath:datapath no-dns:true port:6783]
INFO: 2018/11/14 15:56:11.042230 weave 2.5.0
INFO: 2018/11/14 15:56:11.198348 Bridge type is bridged_fastdp
INFO: 2018/11/14 15:56:11.198372 Communication between peers is unencrypted.
INFO: 2018/11/14 15:56:11.203206 Our name is aa:53:be:75:71:f7(mtpnjvzonap001)
INFO: 2018/11/14 15:56:11.203249 Launch detected - using supplied peer list: [135.21.27.139]
INFO: 2018/11/14 15:56:11.216398 Checking for pre-existing addresses on weave bridge
INFO: 2018/11/14 15:56:11.229313 [allocator aa:53:be:75:71:f7] No valid persisted data
INFO: 2018/11/14 15:56:11.233391 [allocator aa:53:be:75:71:f7] Initialising via deferred consensus
INFO: 2018/11/14 15:56:11.233443 Sniffing traffic on datapath (via ODP)
INFO: 2018/11/14 15:56:11.234120 ->[135.21.27.139:6783] attempting connection
INFO: 2018/11/14 15:56:11.234302 ->[135.21.27.139:49182] connection accepted
INFO: 2018/11/14 15:56:11.234818 ->[135.21.27.139:6783|aa:53:be:75:71:f7(mtpnjvzonap001)]: connection shutting down due to error: cannot connect to ourself
INFO: 2018/11/14 15:56:11.234843 ->[135.21.27.139:49182|aa:53:be:75:71:f7(mtpnjvzonap001)]: connection shutting down due to error: cannot connect to ourself
INFO: 2018/11/14 15:56:11.236010 Listening for HTTP control messages on 127.0.0.1:6784
INFO: 2018/11/14 15:56:11.236424 Listening for metrics requests on 0.0.0.0:6782
INFO: 2018/11/14 15:56:11.990529 [kube-peers] Added myself to peer list &{[{aa:53:be:75:71:f7 mtpnjvzonap001}]}
DEBU: 2018/11/14 15:56:11.995901 [kube-peers] Nodes that have disappeared: map[]
10.32.0.1
135.21.27.139
DEBU: 2018/11/14 15:56:12.075738 registering for updates for node delete events
INFO: 2018/11/14 15:56:41.279799 Error checking version: Get https://checkpoint-api.weave.works/v1/check/weave-net?arch=amd64&flag_docker-version=none&flag_kernel-version=4.4.0-135-generic&flag_kubernetes-cluster-size=1&flag_kubernetes-cluster-uid=ce66cb23-e825-11e8-abc3-525400314503&flag_kubernetes-version=v1.11.4&os=linux&signature=EJdydeNysrC7LC5xAJAKyDvxXCvkeWUFzepdk3QDfr0%3D&version=2.5.0: dial tcp 74.125.196.121:443: i/o timeout
INFO: 2018/11/14 20:52:47.025412 Error checking version: Get https://checkpoint-api.weave.works/v1/check/weave-net?arch=amd64&flag_docker-version=none&flag_kernel-version=4.4.0-135-generic&flag_kubernetes-cluster-size=1&flag_kubernetes-cluster-uid=ce66cb23-e825-11e8-abc3-525400314503&flag_kubernetes-version=v1.11.4&os=linux&signature=EJdydeNysrC7LC5xAJAKyDvxXCvkeWUFzepdk3QDfr0%3D&version=2.5.0: dial tcp 74.125.196.121:443: i/o timeout
INFO: 2018/11/15 01:46:32.842792 Error checking version: Get https://checkpoint-api.weave.works/v1/check/weave-net?arch=amd64&flag_docker-version=none&flag_kernel-version=4.4.0-135-generic&flag_kubernetes-cluster-size=1&flag_kubernetes-cluster-uid=ce66cb23-e825-11e8-abc3-525400314503&flag_kubernetes-version=v1.11.4&os=linux&signature=EJdydeNysrC7LC5xAJAKyDvxXCvkeWUFzepdk3QDfr0%3D&version=2.5.0: dial tcp 74.125.196.121:443: i/o timeout
INFO: 2018/11/15 09:06:03.624359 Error checking version: Get https://checkpoint-api.weave.works/v1/check/weave-net?arch=amd64&flag_docker-version=none&flag_kernel-version=4.4.0-135-generic&flag_kubernetes-cluster-size=1&flag_kubernetes-cluster-uid=ce66cb23-e825-11e8-abc3-525400314503&flag_kubernetes-version=v1.11.4&os=linux&signature=EJdydeNysrC7LC5xAJAKyDvxXCvkeWUFzepdk3QDfr0%3D&version=2.5.0: dial tcp 172.217.9.147:443: i/o timeout
INFO: 2018/11/15 14:34:02.070893 Error checking version: Get https://checkpoint-api.weave.works/v1/check/weave-net?arch=amd64&flag_docker-version=none&flag_kernel-version=4.4.0-135-generic&flag_kubernetes-cluster-size=1&flag_kubernetes-cluster-uid=ce66cb23-e825-11e8-abc3-525400314503&flag_kubernetes-version=v1.11.4&os=linux&signature=EJdydeNysrC7LC5xAJAKyDvxXCvkeWUFzepdk3QDfr0%3D&version=2.5.0: dial tcp 172.217.9.147:443: i/o timeout
Here are the events for the 2 coredns pods
# kubectl get events -n kube-system --field-selector involvedObject.name=coredns-78fcdf6894-6f9q6
LAST SEEN FIRST SEEN COUNT NAME KIND SUBOBJECT TYPE REASON SOURCE MESSAGE
41m 20h 245 coredns-78fcdf6894-6f9q6.1568eab25f0acb02 Pod spec.containers{coredns} Normal Killing kubelet, mtpnjvzonap001 Killing container with id docker://coredns:Container failed liveness probe.. Container will be killed and recreated.
26m 20h 248 coredns-78fcdf6894-6f9q6.1568ea920f72ddd4 Pod spec.containers{coredns} Normal Pulled kubelet, mtpnjvzonap001 Container image "k8s.gcr.io/coredns:1.1.3" already present on machine
5m 20h 1256 coredns-78fcdf6894-6f9q6.1568eaa1fd9216d2 Pod spec.containers{coredns} Warning Unhealthy kubelet, mtpnjvzonap001 Liveness probe failed: HTTP probe failed with statuscode: 503
1m 19h 2963 coredns-78fcdf6894-6f9q6.1568eb75f2b1af3e Pod spec.containers{coredns} Warning BackOff kubelet, mtpnjvzonap001 Back-off restarting failed container
# kubectl get events -n kube-system --field-selector involvedObject.name=coredns-78fcdf6894-skjwz
LAST SEEN FIRST SEEN COUNT NAME KIND SUBOBJECT TYPE REASON SOURCE MESSAGE
6m 20h 1259 coredns-78fcdf6894-skjwz.1568eaa181fbeffe Pod spec.containers{coredns} Warning Unhealthy kubelet, mtpnjvzonap001 Liveness probe failed: HTTP probe failed with statuscode: 503
1m 19h 2969 coredns-78fcdf6894-skjwz.1568eb7578188f24 Pod spec.containers{coredns} Warning BackOff kubelet, mtpnjvzonap001 Back-off restarting failed container
#
Any help or further troubleshooting steps are welcome
I had the same problem and needed to allow several ports in my firewall: 22, 53, 6443, 6783, 6784, 8285.
I copied the rules from an existing healthy cluster. Probably only 6443, shown above as the target port for the coredns service, is required for this error and the others are for other services I run in my cluster.
With Ubuntu this was uncomplicated firewall
ufw allow 22/tcp # allowed for ssh, included in case you had firewall disabled altogether
ufw allow 6443
ufw allow 53
ufw allow 8285
ufw allow 6783
ufw allow 6784

Kubernetes unable to retrieve logs

I have kubeadm cluster deployed in CentOS VM. while trying to deploy ingress controller following github i noticed that i'm unable to see logs:
kubectl logs -n ingress-nginx nginx-ingress-controller-697f7c6ddb-x9xkh --previous
Error from server: Get https://192.168.56.34:10250/containerLogs/ingress-nginx/nginx-ingress-controller-697f7c6ddb-x9xkh/nginx-ingress-controller?previous=true: dial tcp 192.168.56.34:10250: getsockopt: connection timed out
In 192.168.56.34 (node1) netstat returns:
tcp6 0 0 :::10250 :::* LISTEN 1068/kubelet
In fact i'm unable to see any logs despite the status of the pod.
I disabled both the firewalld and SELinux.
I used proxy to enable kubernertes to download images, now i removed the proxy.
When navigating to the url in the error above i get Forbidden (user=system:anonymous, verb=get, resource=nodes, subresource=proxy)
I'm also able to fetch my nodes:
kubectl get node
NAME STATUS ROLES AGE VERSION
k8s-master Ready master 32d v1.9.3
k8s-node1 Ready <none> 30d v1.9.3
k8s-node2 NotReady <none> 32d v1.9.3
getsockopt: connection timed out
Is 99.99999% a firewall issue. If it was "connection refused" then showing the output of netstat would be meaningful, but (as you can see) kubelet is listening on that port just fine -- it's the networking configuration between the machine that is running kubectl and "192.168.56.34" that is incorrectly configured to allow traffic.
The apiserver expects that everyone who would want to view logs (or use kubectl exec) can reach that port on every Node in the cluster; so be sure you don't just fix the firewall rule(s) for that one Node -- fix it for all of them.
This message is from the apiserver running on your master. The command kubectl logs, running on your local machine, fetches logs via the apiserver. So the error message reveals a firewall misconfiguration between the master and the node(s) (port 10250)