MetalLB install & Configuration on Kubernetes - kubernetes

I install and configured MetalLB on My Kubernetes cluster. Then try to create LoadBalancer Type service.
( NodePort type service is working well. )
But, EXTERNAL-IP is pending status.
I got below error on MetalLB controller pod. Somebody can help to resolve this issues.
I also have simular issue when I try to install nginx ingress-controller.
# kubectl logs controller-65db86ddc6-4hkdn -n metallb-system
{"branch":"HEAD","caller":"main.go:142","commit":"v0.9.5","msg":"MetalLB controller starting version 0.9.5 (commit v0.9.5, branch HEAD)","ts":"2021-03-21T09:30:28.244151786Z","version":"0.9.5"}
I0321 09:30:58.442987 1 trace.go:81] Trace[1298498081]: "Reflector pkg/mod/k8s.io/client-go#v0.0.0-20190620085101-78d2af792bab/tools/cache/reflector.go:98 ListAndWatch" (started: 2021-03-21 09:30:28.44033291 +0000 UTC m=+1.093749549) (total time: 30.001755286s):
Trace[1298498081]: [30.001755286s] [30.001755286s] END
E0321 09:30:58.443118 1 reflector.go:125] pkg/mod/k8s.io/client-go#v0.0.0-20190620085101-78d2af792bab/tools/cache/reflector.go:98: Failed to list *v1.Service: Get https://10.96.0.1:443/api/v1/services?limit=500&resourceVersion=0: dial tcp 10.96.0.1:443: i/o timeout
I0321 09:30:58.443263 1 trace.go:81] Trace[2019727887]: "Reflector pkg/mod/k8s.io/client-go#v0.0.0-20190620085101-78d2af792bab/tools/cache/reflector.go:98 ListAndWatch" (started: 2021-03-21 09:30:28.342686736 +0000 UTC m=+0.996103363) (total time: 30.100527846s):
Trace[2019727887]: [30.100527846s] [30.100527846s] END
E0321 09:30:58.443298 1 reflector.go:125] pkg/mod/k8s.io/client-go#v0.0.0-20190620085101-78d2af792bab/tools/cache/reflector.go:98: Failed to list *v1.ConfigMap: Get https://10.96.0.1:443/api/v1/namespaces/metallb-system/configmaps?fieldSelector=metadata.name%3Dconfig&limit=500&resourceVersion=0: dial tcp 10.96.0.1:443: i/o timeout
I0321 09:31:29.444994 1 trace.go:81] Trace[1427131847]: "Reflector pkg/mod/k8s.io/client-go#v0.0.0-20190620085101-78d2af792bab/tools/cache/reflector.go:98 ListAndWatch" (started: 2021-03-21 09:30:59.443509127 +0000 UTC m=+32.096925747) (total time: 30.001450692s):
Trace[1427131847]: [30.001450692s] [30.001450692s] END
Below is my env.
# kubectl version --short
Client Version: v1.20.4
Server Version: v1.20.4
Calico CNI is installed.
# Installing Flannel network-plug-in for cluster network (calico)
kubectl apply -f https://docs.projectcalico.org/manifests/calico.yaml
Metal LB 0.9.5 is install & configured.
Access From Node is working as blow.
# curl -k https://10.96.0.1:443/api/v1/namespaces/metallb-system/configmaps?fieldSelector=metadata.name%3Dconfig&limit=500&resourceVersion=0
{
"kind": "Status",
"apiVersion": "v1",
"metadata": {
},
"status": "Failure",
"message": "configmaps \"config\" is forbidden: User \"system:anonymous\" cannot list resource \"configmaps\" in API group \"\" in the namespace \"metallb-system\"",
"reason": "Forbidden",
"details": {
"name": "config",
"kind": "configmaps"
},
"code": 403
}
But, From POD is not accessible as below. I think, It should be work.
# kubectl -n metallb-system exec -it controller-65db86ddc6-4hkdn /bin/sh
$ ip a
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN qlen 1000
link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
inet 127.0.0.1/8 scope host lo
valid_lft forever preferred_lft forever
inet6 ::1/128 scope host
valid_lft forever preferred_lft forever
2: tunl0#NONE: <NOARP> mtu 1480 qdisc noop state DOWN qlen 1000
link/ipip 0.0.0.0 brd 0.0.0.0
4: eth0#if5: <BROADCAST,MULTICAST,UP,LOWER_UP,M-DOWN> mtu 1480 qdisc noqueue state UP
link/ether 76:54:44:f1:8f:50 brd ff:ff:ff:ff:ff:ff
inet 192.168.41.146/32 brd 192.168.41.146 scope global eth0
valid_lft forever preferred_lft forever
inet6 fe80::7454:44ff:fef1:8f50/64 scope link
valid_lft forever preferred_lft forever
/bin $ **wget --no-check-certificate https://10.96.0.1:443/
Connecting to 10.96.0.1:443 (10.96.0.1:443)
^C**
/bin $

I changed my k8s cluster configuration as below. Now It works.
kubeadm init --apiserver-advertise-address=192.168.64.150 --apiserver-cert-extra-sans=192.168.64.150 --node-name kmaster --pod-network-cidr=10.10.0.0/16
cat /etc/hosts
192.168.64.150 kmaster
192.168.64.151 kworker1
And I change calico configuration as below.
- name: CALICO_IPV4POOL_CIDR
value: "10.10.0.0/16" ### Same pod-cidr in calico

What is the output of below ping 10.96.0.1 command from your metallb controller pod ?
kubectl -n metallb-system exec controller-65db86ddc6-4hkdn -- ping 10.96.0.1
Please also provide output of below commands
kubectl -n metallb-system exec controller-65db86ddc6-4hkdn -- ip r
kubectl -n metallb-system exec controller-65db86ddc6-4hkdn -- ip n
If you are able to ping but not able to wget, then it is firewall issue
URL https://kubernetes.io/docs/tasks/debug-application-cluster/debug-service/ in Kubernetes Documentation covers all scenarios why a service may not be working.

Related

Single node Microk8s multus master interface cannot be reached

I have a single node Microk8s with calico.
I have deployed Multus sucessfully and I can create PODs with the 2nd network interface created succesfuly in the pod because can see the interfaces and the IP address correctly assigned. The pods can reach each other on the 2nd interface successfuly but I cannot reach host eno8 ( ip address 10.128.1.244), the multus master interface from the pods. I also cannot reach the pods from outside.
Am new to this kind of deployment and need help to figure out where the problem is?
Thanks.
Here is some details about my environment:
ubuntu#test:$ kubectl get nodes
NAME STATUS ROLES AGE VERSION
test Ready <none> 9d v1.21.4-3+e5758f73ed2a04
Ip a on HOST
ubuntu#test:$ip a
8: eno8: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP group default qlen 1000
link/ether 3c:ec:ef:6c:2c:ff brd ff:ff:ff:ff:ff:ff
inet 10.128.1.244/24 brd 10.128.1.255 scope global eno8
valid_lft forever preferred_lft forever
inet6 fe80::3eec:efff:fe6c:2cff/64 scope link
valid_lft forever preferred_lft forever
ubuntu#test:$ kubectl get pods --all-namespaces | grep -i multus
kube-system kube-multus-ds-amd64-dz42s 1/1 Running 0 175m
Network Deployment:
apiVersion: "k8s.cni.cncf.io/v1"
kind: NetworkAttachmentDefinition
metadata:
name: test-network
spec:
config: '{
"cniVersion": "{{ .Values.Multus_cniVersion}}",
"name": "test-network",
"type": "{{ .Values.Multus_driverType}}",
"master": "{{ .Values.Multus_master_interface}}",
"mode": "{{ .Values.Multus_interface_mode}}",
"ipam": {
"type": "{{ .Values.Multus_ipam_type}}",
"subnet": "{{ .Values.Multus_ipam_subnet}}",
"rangeStart": "{{ .Values.Multus_ipam_rangeStart}}",
"rangeEnd": "{{ .Values.Multus_ipam_rangeStop}}",
"routes": [
{ "dst": "{{ .Values.Multus_defaultRoute}}" }
],
"dns": {"nameservers": ["{{ .Values.Multus_DNS}}"]},
"gateway": "{{ .Values.Multus_ipam_gw}}"
}
}'
Multus_cniVersion: 0.3.1
Multus_driverType: macvlan
Multus_master_interface: eno8
Multus_interface_mode: bridge
Multus_ipam_type: host-local
Multus_ipam_subnet: 10.128.1.0/24
Multus_ipam_rangeStart: 10.128.1.147
Multus_ipam_rangeStop: 10.128.1.156
Multus_defaultRoute: 0.0.0.0/0
Multus_DNS: 10.128.1.1
Multus_ipam_gw: 10.128.1.1
ubuntu#test:$ kubectl get network-attachment-definitions
NAME AGE
test-network 8m39s
Network description:
ubuntu#test:$ kubectl describe network-attachment-definitions.k8s.cni.cncf.io test-network
Name: test-network
Namespace: default
Labels: app.kubernetes.io/managed-by=Helm
Annotations: meta.helm.sh/release-name: test-demo
meta.helm.sh/release-namespace: default
API Version: k8s.cni.cncf.io/v1
Kind: NetworkAttachmentDefinition
Metadata:
Creation Timestamp: 2021-09-24T12:15:08Z
Generation: 1
Managed Fields:
API Version: k8s.cni.cncf.io/v1
Fields Type: FieldsV1
fieldsV1:
f:metadata:
f:annotations:
.:
f:meta.helm.sh/release-name:
f:meta.helm.sh/release-namespace:
f:labels:
.:
f:app.kubernetes.io/managed-by:
f:spec:
.:
f:config:
Manager: Go-http-client
Operation: Update
Time: 2021-09-24T12:15:08Z
Resource Version: 1062851
Self Link: /apis/k8s.cni.cncf.io/v1/namespaces/default/network-attachment-definitions/test-network
UID: c96f3a0f-b30f-4972-9271-6b2871adf299
Spec:
Config: { "cniVersion": "0.3.1", "name": "test-network", "type": "macvlan", "master": "eno8", "mode": "bridge", "ipam": { "type": "host-local", "subnet": "10.128.1.0/24", "rangeStart": "10.128.1.147", "rangeEnd": "10.128.1.156", "routes": [ { "dst": "0.0.0.0/0" } ], "dns": {"nameservers": ["10.128.1.1"]}, "gateway": "10.128.1.1" } }
Events: <none>
ip a in POD
root#test-deployment-6465bdfccc-k2sst:# ip a
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000
link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
inet 127.0.0.1/8 scope host lo
valid_lft forever preferred_lft forever
inet6 ::1/128 scope host
valid_lft forever preferred_lft forever
3: eth0#if505: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1440 qdisc noqueue state UP group default
link/ether 22:a8:17:13:35:39 brd ff:ff:ff:ff:ff:ff link-netnsid 0
inet 10.1.19.149/32 scope global eth0
valid_lft forever preferred_lft forever
inet6 fe80::20a8:17ff:fe13:3539/64 scope link
valid_lft forever preferred_lft forever
4: eth1#if8: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP group default
link/ether de:c1:d7:67:08:93 brd ff:ff:ff:ff:ff:ff link-netnsid 0
inet 10.128.1.149/24 brd 10.128.1.255 scope global eth1
valid_lft forever preferred_lft forever
inet6 fe80::dcc1:d7ff:fe67:893/64 scope link
valid_lft forever preferred_lft forever
Ping to eno8 in POD
root#test-deployment-6465bdfccc-g8bd4:# ping 10.128.1.244
PING 10.128.1.244 (10.128.1.244) 56(84) bytes of data.
^X^C
--- 10.128.1.244 ping statistics ---
14 packets transmitted, 0 received, 100% packet loss, time 13313ms
Ping to multus gateway
root#test-deployment-6465bdfccc-k2sst:# ping 10.128.1.1
PING 10.128.1.1 (10.128.1.1) 56(84) bytes of data.
From 10.128.1.149 icmp_seq=1 Destination Host Unreachable
From 10.128.1.149 icmp_seq=2 Destination Host Unreachable
From 10.128.1.149 icmp_seq=3 Destination Host Unreachable
From 10.128.1.149 icmp_seq=4 Destination Host Unreachable
From 10.128.1.149 icmp_seq=5 Destination Host Unreachable
From 10.128.1.149 icmp_seq=6 Destination Host Unreachable
^C
--- 10.128.1.1 ping statistics ---
8 packets transmitted, 0 received, +6 errors, 100% packet loss, time 7164ms
pipe 4
Netstat in the POD
root#test-deployment-6465bdfccc-k2sst:# netstat -rn
Kernel IP routing table
Destination Gateway Genmask Flags MSS Window irtt Iface
0.0.0.0 169.254.1.1 0.0.0.0 UG 0 0 0 eth0
10.128.1.0 0.0.0.0 255.255.255.0 U 0 0 0 eth1
169.254.1.1 0.0.0.0 255.255.255.255 UH 0 0 0 eth0
ip r in the POD
root#test-deployment-6465bdfccc-g8bd4:# ip r
default via 169.254.1.1 dev eth0
10.128.1.0/24 dev eth1 proto kernel scope link src 10.128.1.149
169.254.1.1 dev eth0 scope link
Your problem may stem from the fact that MACVLAN interfaces cannot be reached from the same host's default route interface. Let's say your PC has interface eth0 with IP 10.0.0.2 and you use MACVLAN to map an interface in a container as a parent interface eth0, or a sub-interface eth0.1 etc., by using an IP 10.0.0.3. You won't be able to reach services running on 10.0.0.3 from the same host, but you will from another host. To resolve this, either use IPVLAN in Layer-3 mode to have fully routable plane. Note that you can't do port forwarding to access the container, because MACVLAN separates the communication on lower layers or use a sub interface with trunking mode 802.1q (but you will need a switch that supports promiscuous mode on the ports to be able to pass VLAN-tagged traffic).

Kubelet - failed to "CreatePodSandbox" for coredns; failed to set bridge addr: could not add ip addr to "cni0": permission denied

EDIT 1
In response to the comments I have included additional information.
$ kubectl get pods --namespace kube-system
NAME READY STATUS RESTARTS AGE
coredns-66bff467f8-lkwfn 0/1 ContainerCreating 0 7m8s
coredns-66bff467f8-pcn6b 0/1 ContainerCreating 0 7m8s
etcd-masternode 1/1 Running 0 7m16s
kube-apiserver-masternode 1/1 Running 0 7m16s
kube-controller-manager-masternode 1/1 Running 0 7m16s
kube-proxy-7zrjn 1/1 Running 0 7m8s
kube-scheduler-masternode 1/1 Running 0 7m16s
More systemd logs
...
Jun 16 16:18:59 masternode kubelet[6842]: E0616 16:18:59.313433 6842 remote_runtime.go:105] RunPodSandbox from runtime service failed: rpc error: code = Unknown desc = failed to create pod network sandbox k8s_coredns-66bff467f8-pcn6b_kube-system_d5fe7a46-c32d-4fa3-b1b3-fe5a28983e08_0(cc72c59e22145274e47ca417c274af99591d0008baf2bf13364538b7debb57d3): failed to set bridge addr: could not add IP address to "cni0": permission denied
Jun 16 16:18:59 masternode kubelet[6842]: E0616 16:18:59.313512 6842 kuberuntime_sandbox.go:68] CreatePodSandbox for pod "coredns-66bff467f8-pcn6b_kube-system(d5fe7a46-c32d-4fa3-b1b3-fe5a28983e08)" failed: rpc error: code = Unknown desc = failed to create pod network sandbox k8s_coredns-66bff467f8-pcn6b_kube-system_d5fe7a46-c32d-4fa3-b1b3-fe5a28983e08_0(cc72c59e22145274e47ca417c274af99591d0008baf2bf13364538b7debb57d3): failed to set bridge addr: could not add IP address to "cni0": permission denied
Jun 16 16:18:59 masternode kubelet[6842]: E0616 16:18:59.313532 6842 kuberuntime_manager.go:727] createPodSandbox for pod "coredns-66bff467f8-pcn6b_kube-system(d5fe7a46-c32d-4fa3-b1b3-fe5a28983e08)" failed: rpc error: code = Unknown desc = failed to create pod network sandbox k8s_coredns-66bff467f8-pcn6b_kube-system_d5fe7a46-c32d-4fa3-b1b3-fe5a28983e08_0(cc72c59e22145274e47ca417c274af99591d0008baf2bf13364538b7debb57d3): failed to set bridge addr: could not add IP address to "cni0": permission denied
Jun 16 16:18:59 masternode kubelet[6842]: E0616 16:18:59.313603 6842 pod_workers.go:191] Error syncing pod d5fe7a46-c32d-4fa3-b1b3-fe5a28983e08 ("coredns-66bff467f8-pcn6b_kube-system(d5fe7a46-c32d-4fa3-b1b3-fe5a28983e08)"), skipping: failed to "CreatePodSandbox" for "coredns-66bff467f8-pcn6b_kube-system(d5fe7a46-c32d-4fa3-b1b3-fe5a28983e08)" with CreatePodSandboxError: "CreatePodSandbox for pod \"coredns-66bff467f8-pcn6b_kube-system(d5fe7a46-c32d-4fa3-b1b3-fe5a28983e08)\" failed: rpc error: code = Unknown desc = failed to create pod network sandbox k8s_coredns-66bff467f8-pcn6b_kube-system_d5fe7a46-c32d-4fa3-b1b3-fe5a28983e08_0(cc72c59e22145274e47ca417c274af99591d0008baf2bf13364538b7debb57d3): failed to set bridge addr: could not add IP address to \"cni0\": permission denied"
Jun 16 16:19:09 masternode kubelet[6842]: E0616 16:19:09.256408 6842 remote_runtime.go:105] RunPodSandbox from runtime service failed: rpc error: code = Unknown desc = failed to create pod network sandbox k8s_coredns-66bff467f8-lkwfn_kube-system_f0187bfd-89a2-474c-b843-b00875183c77_0(1aba005509e85f3ea7da3fc48ab789ae3a10ba0ffefc152d1c4edf65693befe2): failed to set bridge addr: could not add IP address to "cni0": permission denied
Jun 16 16:19:09 masternode kubelet[6842]: E0616 16:19:09.256498 6842 kuberuntime_sandbox.go:68] CreatePodSandbox for pod "coredns-66bff467f8-lkwfn_kube-system(f0187bfd-89a2-474c-b843-b00875183c77)" failed: rpc error: code = Unknown desc = failed to create pod network sandbox k8s_coredns-66bff467f8-lkwfn_kube-system_f0187bfd-89a2-474c-b843-b00875183c77_0(1aba005509e85f3ea7da3fc48ab789ae3a10ba0ffefc152d1c4edf65693befe2): failed to set bridge addr: could not add IP address to "cni0": permission denied
Jun 16 16:19:09 masternode kubelet[6842]: E0616 16:19:09.256525 6842 kuberuntime_manager.go:727] createPodSandbox for pod "coredns-66bff467f8-lkwfn_kube-system(f0187bfd-89a2-474c-b843-b00875183c77)" failed: rpc error: code = Unknown desc = failed to create pod network sandbox k8s_coredns-66bff467f8-lkwfn_kube-system_f0187bfd-89a2-474c-b843-b00875183c77_0(1aba005509e85f3ea7da3fc48ab789ae3a10ba0ffefc152d1c4edf65693befe2): failed to set bridge addr: could not add IP address to "cni0": permission denied
Jun 16 16:19:09 masternode kubelet[6842]: E0616 16:19:09.256634 6842 pod_workers.go:191] Error syncing pod f0187bfd-89a2-474c-b843-b00875183c77 ("coredns-66bff467f8-lkwfn_kube-system(f0187bfd-89a2-474c-b843-b00875183c77)"), skipping: failed to "CreatePodSandbox" for "coredns-66bff467f8-lkwfn_kube-system(f0187bfd-89a2-474c-b843-b00875183c77)" with CreatePodSandboxError: "CreatePodSandbox for pod \"coredns-66bff467f8-lkwfn_kube-system(f0187bfd-89a2-474c-b843-b00875183c77)\" failed: rpc error: code = Unknown desc = failed to create pod network sandbox k8s_coredns-66bff467f8-lkwfn_kube-system_f0187bfd-89a2-474c-b843-b00875183c77_0(1aba005509e85f3ea7da3fc48ab789ae3a10ba0ffefc152d1c4edf65693befe2): failed to set bridge addr: could not add IP address to \"cni0\": permission denied"
... (repeats over and over again)
I have sucessfully installed Kubernetes 1.18 with CRI-0 1.18 and set up a cluster using kubeadm init --pod-network-cidr=192.168.0.0/16. However, the "coredns"-nodes are stuck at "ContainerCreating". I followed the official Kubernetes install instructions.
What I have tried
I tried installing Calico but that didn't fix it. I also tried manually changing the cni0 interface to UP but that also didn't work. The problem apparently lies somewhere with the bridged traffic but I followed the Kubernetes tutorial and enabled it.
In my research of the problem I stumbled upon promising solutions and tutorials but none of them solved the problem. (Rancher GitHub Issue, CRI-O GitHub Page, Projectcalico, Kubernetes tutorial)
Firewall-cmd
$ sudo firewall-cmd --state
running
$ sudo firewall-cmd --version
0.7.0
Systemd logs
Image of the log
because pasting the entire log would be ugly.
uname -r
4.18.0-147.8.1.el8_1.x86_64 (Centos 8)
CRI-O
crio --version
crio version
Version: 1.18.1
GitCommit: 5cbf694c34f8d1af19eb873e39057663a4830635
GitTreeState: clean
BuildDate: 2020-05-25T19:01:44Z
GoVersion: go1.13.4
Compiler: gc
Platform: linux/amd64
Linkmode: dynamic
runc
$ runc --version
runc version spec: 1.0.1-dev
Kubernetes
1.18
Podman version
1.6.4
iptables/nft
I am using nft with the iptables compatability layer.
$ iptables --version
iptables v1.8.2 (nf_tables)
Provider of host:
Contabo VPS
sysctl
$ sysctl net.bridge
net.bridge.bridge-nf-call-arptables = 1
net.bridge.bridge-nf-call-ip6tables = 1
net.bridge.bridge-nf-call-iptables = 1
net.bridge.bridge-nf-filter-pppoe-tagged = 0
net.bridge.bridge-nf-filter-vlan-tagged = 0
net.bridge.bridge-nf-pass-vlan-input-dev = 0
$ sysctl net.ipv4.ip_forward
net.ipv4.ip_forward = 1
selinux disabled
$ cat /etc/sysconfig/selinux
# This file controls the state of SELinux on the system.
# SELINUX= can take one of these three values:
# enforcing - SELinux security policy is enforced.
# permissive - SELinux prints warnings instead of enforcing.
# disabled - No SELinux policy is loaded.
SELINUX=disabled
# SELINUXTYPE= can take one of these three values:
# targeted - Targeted processes are protected,
# minimum - Modification of targeted policy. Only selected processes are protected.
# mls - Multi Level Security protection.
SELINUXTYPE=targeted
ip addr list
$ ip a
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000
link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
inet 127.0.0.1/8 scope host lo
valid_lft forever preferred_lft forever
2: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc fq_codel state UP group default qlen 1000
link/ether REDACTED brd ff:ff:ff:ff:ff:ff
inet REDACTED scope global noprefixroute eth0
valid_lft forever preferred_lft forever
3: cni0: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc noqueue state DOWN group default qlen 1000
link/ether c6:00:41:85:da:ad brd ff:ff:ff:ff:ff:ff
inet 10.85.0.1/16 brd 10.85.255.255 scope global noprefixroute cni0
valid_lft forever preferred_lft forever
7: tunl0#NONE: <NOARP,UP,LOWER_UP> mtu 1440 qdisc noqueue state UNKNOWN group default qlen 1000
link/ipip 0.0.0.0 brd 0.0.0.0
inet 192.168.249.128/32 brd 192.168.249.128 scope global tunl0
valid_lft forever preferred_lft forever
Holy Hand Grenade of Antioch! I finally fixed it! It only took me, what, about a bazillion years and a restless-night. Sweet Victory! Well... ehm. On to the solution.
I finally understand the comments by #Arghya Sadhu and #Piotr Malec and they were right. I didn't configure my CNI-plugin correctly. I am using Flannel as a network provider and they require a 10.244.0.0/16 subnet. In my crio-bridge.conf found in /etc/cni/net.d/ the default subnet was different (10.85.0.0/16 or something). I thought it would be enough to specify the CIDR on the kubeadm init command but I was wrong. You need to set the correct CIDR in the crio-bridge.conf and podman.conflist (or similar files in the directory). I also thought those files that were installed with CRI-O were configured with reasonable defaults and, to be honest, I didn't fully understand what they were for.
Also something strange happened: According to Flannel the subnet for CRI-O should be /16 but when I checked the logs with journalctl -u kubelet it mentioned a /24 subnet.
failed to set bridge addr: \"cni0\" already has an IP address different from 10.244.0.1/24"
So I had to change the subnet in crio.conf to /24 and it worked. I probably have to change the subnet in the podman.conflist too, but I am not sure.
Anyway, thanks to Arghya and Piotr for their help!
To setup a cluster with Calico network plugin and cri-o container runtime, I had to:
Add to /etc/crio/crio.conf
[crio.network]
network_dir = "/etc/cni/net.d/"
plugin_dirs = [
"/opt/cni/bin/",
"/usr/libexec/cni/",
]
Add --cgroup-driver=systemd in /var/lib/kubelet/kubeadm-flags.env
KUBELET_KUBEADM_ARGS="--cgroup-driver=systemd --container-runtime=remote --container-runtime-endpoint=/var/run/crio/crio.sock --pod-infra-container-image=k8s.gcr.io/pause:3.5"
Restart kubelet and crio
systemctl daemon-reload && systemctl restart kubelet crio
Initialize cluster
kubeadm init --pod-network-cidr='10.85.0.0/16'
Install calico network plugin
kubectl create -f https://docs.projectcalico.org/manifests/calico.yaml

Calico CNI pod networking not working across different hosts on EKS Kubernetes worker nodes

I am running vanilla EKS Kubernetes at version 1.12.
I've used CNI Genie to allow custom selection of the CNI that pods use when starting and I've installed the standard Calico CNI setup.
With CNI Genie I configured the default CNI to be the AWS CNI (aws-node) and all pods start up as usual and get assigned an IP from my VPC subnets.
I then selectively use calico as the CNI for some basic pods I am testing with. I'm using the default calico 192.168.0.0/16 CIDR range. Everything works great if the pods are on the same EKS worker nodes.
Core DNS is working great too (as long as I keep the coredns pods running on the aws CNI).
However, if a pod moves to a different worker node, then networking between them does not work inside the cluster.
I've checked the routing tables on the worker nodes that calico auto configures and it appears logical to me.
Here is my wide pod listing across all namespaces:
NAMESPACE NAME READY STATUS RESTARTS AGE IP NODE NOMINATED NODE
default hello-node1-865588ccd7-64p5x 1/1 Running 0 31m 192.168.106.129 ip-10-0-2-31.eu-west-2.compute.internal <none>
default hello-node2-dc7bbcb74-gqpwq 1/1 Running 0 17m 192.168.25.193 ip-10-0-3-222.eu-west-2.compute.internal <none>
kube-system aws-node-cm2dp 1/1 Running 0 26m 10.0.3.222 ip-10-0-3-222.eu-west-2.compute.internal <none>
kube-system aws-node-vvvww 1/1 Running 0 31m 10.0.2.31 ip-10-0-2-31.eu-west-2.compute.internal <none>
kube-system calico-kube-controllers-56bfccb786-fc2j4 1/1 Running 0 30m 10.0.2.41 ip-10-0-2-31.eu-west-2.compute.internal <none>
kube-system calico-node-flmnl 1/1 Running 0 31m 10.0.2.31 ip-10-0-2-31.eu-west-2.compute.internal <none>
kube-system calico-node-hcmqd 1/1 Running 0 26m 10.0.3.222 ip-10-0-3-222.eu-west-2.compute.internal <none>
kube-system coredns-6c64c9f456-g2h9k 1/1 Running 0 30m 10.0.2.204 ip-10-0-2-31.eu-west-2.compute.internal <none>
kube-system coredns-6c64c9f456-g5lhl 1/1 Running 0 30m 10.0.2.200 ip-10-0-2-31.eu-west-2.compute.internal <none>
kube-system genie-plugin-hspts 1/1 Running 0 26m 10.0.3.222 ip-10-0-3-222.eu-west-2.compute.internal <none>
kube-system genie-plugin-vqd2d 1/1 Running 0 31m 10.0.2.31 ip-10-0-2-31.eu-west-2.compute.internal <none>
kube-system kube-proxy-jm7f7 1/1 Running 0 26m 10.0.3.222 ip-10-0-3-222.eu-west-2.compute.internal <none>
kube-system kube-proxy-nnp76 1/1 Running 0 31m 10.0.2.31 ip-10-0-2-31.eu-west-2.compute.internal <none>
As you can see, the two hello-node pods are using the Calico CNI.
I've exposed the hello-node pods with two services:
NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
hello-node1 ClusterIP 172.20.90.83 <none> 8081/TCP 43m
hello-node2 ClusterIP 172.20.242.22 <none> 8082/TCP 43m
I've confirmed if I start the hello-node pods with the aws CNI that I can ping / curl between them when they run on separate hosts using the cluster service names.
Things stop working when I use Calico CNI as above.
I only have two EKS worker hosts in this test cluster. Here is the routing for each:
K8s Worker 1 routes
[ec2-user#ip-10-0-3-222 ~]$ ip route
default via 10.0.3.1 dev eth0
10.0.3.0/24 dev eth0 proto kernel scope link src 10.0.3.222
169.254.169.254 dev eth0
blackhole 192.168.25.192/26 proto bird
192.168.25.193 dev calia0da7d91dc2 scope link
192.168.106.128/26 via 10.0.2.31 dev tunl0 proto bird onlink
K8s Worker 2 routes
[ec2-user#ip-10-0-2-31 ~]$ ip route
default via 10.0.2.1 dev eth0
10.0.2.0/24 dev eth0 proto kernel scope link src 10.0.2.31
10.0.2.41 dev enif4cf9019f11 scope link
10.0.2.200 dev eni412af1a0e55 scope link
10.0.2.204 dev eni04260ebbbe1 scope link
169.254.169.254 dev eth0
192.168.25.192/26 via 10.0.3.222 dev tunl0 proto bird onlink
blackhole 192.168.106.128/26 proto bird
192.168.106.129 dev cali19da7817849 scope link
To me, the route:
192.168.25.192/26 via 10.0.3.222 dev tunl0 proto bird onlink
tells me that traffic destined for the 192.168.25.192/16 subnet from this worker (and its containers/pods) should go out to the 10.0.3.222 (AWS VPC ENI for the EC2 host) on the tunl0 interface.
This route is on the EC2 host 10.0.2.31. So in other words when talking from this host's containers to containers on the calico subnet 192.168.25.192/16, network traffic should route to 10.0.3.222 (the ENI IP for my other EKS worker node where containers using Calico run on that subnet).
To clarify my testing procedure:
Exec into hello-node1 pod, and curl http://hello-node2:8082 (or ping the calico assigned IP address of the hello-node2 pod.
EDIT
To further test this, I've run tcpdump on the host where the hello-node2 pod is running, capturing on port 8080 (the container listens on this port).
I do get activity on the destination host where the test container that I am curling to is running, but it doesn't seem to indicate dropped traffic.
[ec2-user#ip-10-0-3-222 ~]$ sudo tcpdump -vv -x -X -i tunl0 'port 8080'
tcpdump: listening on tunl0, link-type RAW (Raw IP), capture size 262144 bytes
14:32:42.859238 IP (tos 0x0, ttl 254, id 63813, offset 0, flags [DF], proto TCP (6), length 60)
10.0.2.31.29192 > 192.168.25.193.webcache: Flags [S], cksum 0xf932 (correct), seq 3206263598, win 28000, options [mss 1400,sackOK,TS val 2836614698 ecr 0,nop,wscale 7], length 0
0x0000: 4500 003c f945 4000 fe06 9ced 0a00 021f E..<.E#.........
0x0010: c0a8 19c1 7208 1f90 bf1b b32e 0000 0000 ....r...........
0x0020: a002 6d60 f932 0000 0204 0578 0402 080a ..m`.2.....x....
0x0030: a913 4e2a 0000 0000 0103 0307 ..N*........
14:32:43.870168 IP (tos 0x0, ttl 254, id 63814, offset 0, flags [DF], proto TCP (6), length 60)
10.0.2.31.29192 > 192.168.25.193.webcache: Flags [S], cksum 0xf53f (correct), seq 3206263598, win 28000, options [mss 1400,sackOK,TS val 2836615709 ecr 0,nop,wscale 7], length 0
0x0000: 4500 003c f946 4000 fe06 9cec 0a00 021f E..<.F#.........
0x0010: c0a8 19c1 7208 1f90 bf1b b32e 0000 0000 ....r...........
0x0020: a002 6d60 f53f 0000 0204 0578 0402 080a ..m`.?.....x....
0x0030: a913 521d 0000 0000 0103 0307 ..R.........
^C
2 packets captured
2 packets received by filter
0 packets dropped by kernel
Even the calia0da7d91dc2 interface on the host running my target/test pod shows increased RX packets and byte counts whenever I run the curl from the other pod on the other host. Traffic is definitely traversing.
[ec2-user#ip-10-0-3-222 ~]$ ifconfig
calia0da7d91dc2: flags=4163<UP,BROADCAST,RUNNING,MULTICAST> mtu 1440
inet6 fe80::ecee:eeff:feee:eeee prefixlen 64 scopeid 0x20<link>
ether ee:ee:ee:ee:ee:ee txqueuelen 0 (Ethernet)
RX packets 84 bytes 5088 (4.9 KiB)
RX errors 0 dropped 0 overruns 0 frame 0
TX packets 0 bytes 0 (0.0 B)
TX errors 0 dropped 0 overruns 0 carrier 0 collisions 0
What is preventing the networking from working between hosts here? Am I missing something obvious?
Edit 2 - information for Arjun Pandey- parjun8840
Here is some more info about my Calico configuration:
I am have disabled source/destination checking on all AWS EC2 worker nodes
I've followed the latest calico docs to configure the IP pool for cross-subnet and NAT use for traffic outside the cluster
calicoctl configs Note: it seems that the workloadendpoints are non-existent...
me#mine ~ aws-vault exec my-vault-entry -- kubectl get IPPool --all-namespaces
NAME AGE
default-ipv4-ippool 1d
me#mine ~ aws-vault exec my-vault-entry -- kubectl get IPPool default-ipv4-ippool -o yaml
apiVersion: crd.projectcalico.org/v1
kind: IPPool
metadata:
annotations:
projectcalico.org/metadata: '{"uid":"41bd2c82-d576-11e9-b1ef-121f3d7b4d4e","creationTimestamp":"2019-09-12T15:59:09Z"}'
creationTimestamp: "2019-09-12T15:59:09Z"
generation: 1
name: default-ipv4-ippool
resourceVersion: "500448"
selfLink: /apis/crd.projectcalico.org/v1/ippools/default-ipv4-ippool
uid: 41bd2c82-d576-11e9-b1ef-121f3d7b4d4e
spec:
blockSize: 26
cidr: 192.168.0.0/16
ipipMode: CrossSubnet
natOutgoing: true
nodeSelector: all()
vxlanMode: Never
me#mine ~ aws-vault exec my-vault-entry -- calicoctl get nodes
NAME
ip-10-254-109-184.ec2.internal
ip-10-254-109-237.ec2.internal
ip-10-254-111-147.ec2.internal
me#mine ~ aws-vault exec my-vault-entry -- calicoctl get workloadendpoints
WORKLOAD NODE NETWORKS INTERFACE
me#mine ~
Here is some network info for a sample host in the cluster and one of the test container's container network:
host ip a
[ec2-user#ip-10-254-109-184 ~]$ ip a
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000
link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
inet 127.0.0.1/8 scope host lo
valid_lft forever preferred_lft forever
inet6 ::1/128 scope host
valid_lft forever preferred_lft forever
2: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 9001 qdisc mq state UP group default qlen 1000
link/ether 02:1b:79:d1:c5:bc brd ff:ff:ff:ff:ff:ff
inet 10.254.109.184/26 brd 10.254.109.191 scope global dynamic eth0
valid_lft 2881sec preferred_lft 2881sec
inet6 fe80::1b:79ff:fed1:c5bc/64 scope link
valid_lft forever preferred_lft forever
3: eni808caba7453#if4: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 9001 qdisc noqueue state UP group default
link/ether c2:be:80:d4:6a:f3 brd ff:ff:ff:ff:ff:ff link-netnsid 0
inet6 fe80::c0be:80ff:fed4:6af3/64 scope link
valid_lft forever preferred_lft forever
5: tunl0#NONE: <NOARP,UP,LOWER_UP> mtu 1440 qdisc noqueue state UNKNOWN group default qlen 1000
link/ipip 0.0.0.0 brd 0.0.0.0
inet 192.168.29.128/32 brd 192.168.29.128 scope global tunl0
valid_lft forever preferred_lft forever
6: eth1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 9001 qdisc mq state UP group default qlen 1000
link/ether 02:12:58:bb:c6:1a brd ff:ff:ff:ff:ff:ff
inet 10.254.109.137/26 brd 10.254.109.191 scope global eth1
valid_lft forever preferred_lft forever
inet6 fe80::12:58ff:febb:c61a/64 scope link
valid_lft forever preferred_lft forever
7: enia6f1918d9e2#if4: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 9001 qdisc noqueue state UP group default
link/ether 96:f5:36:53:e9:55 brd ff:ff:ff:ff:ff:ff link-netnsid 1
inet6 fe80::94f5:36ff:fe53:e955/64 scope link
valid_lft forever preferred_lft forever
8: enia32d23ac2d1#if4: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 9001 qdisc noqueue state UP group default
link/ether 36:5e:34:a7:82:30 brd ff:ff:ff:ff:ff:ff link-netnsid 2
inet6 fe80::345e:34ff:fea7:8230/64 scope link
valid_lft forever preferred_lft forever
9: cali5e7dde1e39e#if4: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1440 qdisc noqueue state UP group default
link/ether ee:ee:ee:ee:ee:ee brd ff:ff:ff:ff:ff:ff link-netnsid 3
inet6 fe80::ecee:eeff:feee:eeee/64 scope link
valid_lft forever preferred_lft forever
[ec2-user#ip-10-254-109-184 ~]$
nsenter on the test container pid to get ip a info:
[ec2-user#ip-10-254-109-184 ~]$ sudo nsenter -t 15715 -n ip a
1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN group default qlen 1000
link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
inet 127.0.0.1/8 scope host lo
valid_lft forever preferred_lft forever
2: tunl0#NONE: <NOARP> mtu 1480 qdisc noop state DOWN group default qlen 1000
link/ipip 0.0.0.0 brd 0.0.0.0
4: eth0#if9: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1440 qdisc noqueue state UP group default
link/ether 9a:6d:db:06:74:cb brd ff:ff:ff:ff:ff:ff link-netnsid 0
inet 192.168.29.129/32 scope global eth0
valid_lft forever preferred_lft forever
I am not sure about the exact solution right now ( I haven't tested calico on AWS, normally I use amazon-vpc-cni-k8s on AWS and on physical cluster calico), but below are the quick things we can look into.
Calico AWS requirement- https://docs.projectcalico.org/v2.3/reference/public-cloud/aws
kubectl get IPPool --all-namespaces
NAME AGE
default-ipv4-ippool 15d
kubectl get IPPool default-ipv4-ippool -o yaml
~ calicoctl get nodes
NAME
node1
node2
node3
node4
~ calicoctl get workloadendpoints
NODE ORCHESTRATOR WORKLOAD NAME
node2 k8s default.myapp-569c54f85-xtktk eth0
node1 k8s kube-system.calico-kube-controllers-5cbcccc885-b9x8s eth0
node1 k8s kube-system.coredns-fb8b8dcde-2zpw8 eth0
node1 k8s kube-system.coredns-fb8b8dcfg-hc6zv eth0
Also if we can get the detail of container network:
nsenter -t pid -n ip a
And for the host as well:
ip a

Building a Bare Metal Kubernetes Cluster with kubeadm

I am trying to build a 3 master, 3 worker Kubernetes Cluster, with 3 separate etcd servers.
[root#K8sMaster01 ~]# kubectl get nodes
NAME STATUS ROLES AGE VERSION
k8smaster01 Ready master 5h v1.11.1
k8smaster02 Ready master 4h v1.11.1
k8smaster03 Ready master 4h v1.11.1
k8snode01 Ready <none> 4h v1.11.1
k8snode02 Ready <none> 4h v1.11.1
k8snode03 Ready <none> 4h v1.11.1
I have spent weeks trying to get those to work, but can not get beyond one problem.
The containers / pods cannot access the API server.
[root#K8sMaster01 ~]# kubeadm version
kubeadm version: &version.Info{Major:"1", Minor:"11", GitVersion:"v1.11.1", GitCommit:"b1b29978270dc22fecc592ac55d903350454310a", GitTreeState:"clean", BuildDate:"2018-07-17T18:50:16Z", GoVersion:"go1.10.3", Compiler:"gc", Platform:"linux/amd64"}
[root#K8sMaster01 ~]# cat /etc/redhat-release
Fedora release 28 (Twenty Eight)
[root#K8sMaster01 ~]# uname -a
Linux K8sMaster01 4.16.3-301.fc28.x86_64 #1 SMP Mon Apr 23 21:59:58 UTC 2018 x86_64 x86_64 x86_64 GNU/Linux
NAME READY STATUS RESTARTS AGE
coredns-78fcdf6894-c2wbh 1/1 Running 1 4h
coredns-78fcdf6894-psbtq 1/1 Running 1 4h
heapster-77f99d6b7c-5pxj6 1/1 Running 0 4h
kube-apiserver-k8smaster01 1/1 Running 1 4h
kube-apiserver-k8smaster02 1/1 Running 1 4h
kube-apiserver-k8smaster03 1/1 Running 1 4h
kube-controller-manager-k8smaster01 1/1 Running 1 4h
kube-controller-manager-k8smaster02 1/1 Running 1 4h
kube-controller-manager-k8smaster03 1/1 Running 1 4h
kube-flannel-ds-amd64-542x6 1/1 Running 0 4h
kube-flannel-ds-amd64-6dw2g 1/1 Running 4 4h
kube-flannel-ds-amd64-h6j9b 1/1 Running 1 4h
kube-flannel-ds-amd64-mgggx 1/1 Running 0 3h
kube-flannel-ds-amd64-p8xfk 1/1 Running 0 4h
kube-flannel-ds-amd64-qp86h 1/1 Running 4 4h
kube-proxy-4bqxh 1/1 Running 0 3h
kube-proxy-56p4n 1/1 Running 0 3h
kube-proxy-7z8p7 1/1 Running 0 3h
kube-proxy-b59ns 1/1 Running 0 3h
kube-proxy-fc6zg 1/1 Running 0 3h
kube-proxy-wrxg7 1/1 Running 0 3h
kube-scheduler-k8smaster01 1/1 Running 1 4h
kube-scheduler-k8smaster02 1/1 Running 1 4h
kube-scheduler-k8smaster03 1/1 Running 1 4h
**kubernetes-dashboard-6948bdb78-4f7qj 1/1 Running 19 1h**
node-problem-detector-v0.1-77fdw 1/1 Running 0 4h
node-problem-detector-v0.1-96pld 1/1 Running 1 4h
node-problem-detector-v0.1-ctnfn 1/1 Running 0 3h
node-problem-detector-v0.1-q2xvw 1/1 Running 0 4h
node-problem-detector-v0.1-vvf4j 1/1 Running 1 4h
traefik-ingress-controller-7w44f 1/1 Running 0 4h
traefik-ingress-controller-8cprj 1/1 Running 1 4h
traefik-ingress-controller-f6c7q 1/1 Running 0 3h
traefik-ingress-controller-tf8zw 1/1 Running 0 4h
kube-ops-view-6744bdc77d-2x5w8 1/1 Running 0 2h
kube-ops-view-redis-74578dcc5d-5fnvf 1/1 Running 0 2h
The kubernetes-dashboard will not start, but actually the same is for the kube-ops-view. Core DNS also has errors. All this to me is something to do with networks. I have tried:
sudo iptables -P FORWARD ACCEPT
sudo iptables --policy FORWARD ACCEPT
sudo iptables -A FORWARD -o flannel.1 -j ACCEPT
Core DNS give this error in the logs:
[root#K8sMaster01 ~]# kubectl logs coredns-78fcdf6894-c2wbh -n kube-system
.:53
2018/08/26 15:15:28 [INFO] CoreDNS-1.1.3
2018/08/26 15:15:28 [INFO] linux/amd64, go1.10.1, b0fd575c
2018/08/26 15:15:28 [INFO] plugin/reload: Running configuration MD5 = 2a066f12ec80aeb2b92740dd74c17138
CoreDNS-1.1.3
linux/amd64, go1.10.1, b0fd575c
E0826 17:12:19.624560 1 reflector.go:322] github.com/coredns/coredns/plugin/kubernetes/controller.go:313: Failed to watch *v1.Service: Get https://10.96.0.1:443/api/v1/services?resourceVersion=556&timeoutSeconds=389&watch=true: dial tcp 10.96.0.1:443: i/o timeout
2018/08/26 17:35:34 [ERROR] 2 kube-ops-view-redis.uk.specsavers.com. A: unreachable backend: read udp 10.96.0.7:46862->10.4.4.28:53: i/o timeout
2018/08/26 17:35:34 [ERROR] 2 kube-ops-view-redis.uk.specsavers.com. AAAA: unreachable backend: read udp 10.96.0.7:46690->10.4.4.28:53: i/o timeout
2018/08/26 17:35:37 [ERROR] 2 kube-ops-view-redis.uk.specsavers.com. AAAA: unreachable backend: read udp 10.96.0.7:60267->10.4.4.28:53: i/o timeout
2018/08/26 17:35:37 [ERROR] 2 kube-ops-view-redis.uk.specsavers.com. A: unreachable backend: read udp 10.96.0.7:41482->10.4.4.28:53: i/o timeout
2018/08/26 17:36:58 [ERROR] 2 kube-ops-view-redis.specsavers.local. AAAA: unreachable backend: read udp 10.96.0.7:58042->10.4.4.28:53: i/o timeout
2018/08/26 17:36:58 [ERROR] 2 kube-ops-view-redis.specsavers.local. A: unreachable backend: read udp 10.96.0.7:53149->10.4.4.28:53: i/o timeout
2018/08/26 17:37:01 [ERROR] 2 kube-ops-view-redis.specsavers.local. A: unreachable backend: read udp 10.96.0.7:36861->10.4.4.28:53: i/o timeout
2018/08/26 17:37:01 [ERROR] 2 kube-ops-view-redis.specsavers.local. AAAA: unreachable backend: read udp 10.96.0.7:43235->10.4.4.28:53: i/o timeout
The Dash board:
[root#K8sMaster01 ~]# kubectl logs kubernetes-dashboard-6948bdb78-4f7qj -n kube-system
2018/08/26 20:10:31 Starting overwatch
2018/08/26 20:10:31 Using in-cluster config to connect to apiserver
2018/08/26 20:10:31 Using service account token for csrf signing
2018/08/26 20:10:31 No request provided. Skipping authorization
2018/08/26 20:11:01 Error while initializing connection to Kubernetes apiserver. This most likely means that the cluster is misconfigured (e.g., it has invalid apiserver certificates or service accounts configuration) or the --apiserver-host param points to a server that does not exist. Reason: Get https://10.96.0.1:443/version: dial tcp 10.96.0.1:443: i/o timeout
Refer to our FAQ and wiki pages for more information: https://github.com/kubernetes/dashboard/wiki/FAQ
kube-ops-view:
ERROR:kube_ops_view.update:Failed to query cluster 10-96-0-1:443 (https://10.96.0.1:443): ConnectTimeout (try 141, wait 63 seconds)
10.96.3.1 - - [2018-08-26 20:12:34] "GET /health HTTP/1.1" 200 117 0.001002
10.96.3.1 - - [2018-08-26 20:12:44] "GET /health HTTP/1.1" 200 117 0.000921
10.96.3.1 - - [2018-08-26 20:12:54] "GET /health HTTP/1.1" 200 117 0.000926
10.96.3.1 - - [2018-08-26 20:13:04] "GET /health HTTP/1.1" 200 117 0.000924
10.96.3.1 - - [2018-08-26 20:13:14] "GET /health HTTP/1.1" 200 117 0.000942
10.96.3.1 - - [2018-08-26 20:13:24] "GET /health HTTP/1.1" 200 117 0.000924
10.96.3.1 - - [2018-08-26 20:13:34] "GET /health HTTP/1.1" 200 117 0.000939
ERROR:kube_ops_view.update:Failed to query cluster 10-96-0-1:443 (https://10.96.0.1:443): ConnectTimeout (try 142, wait 61 seconds)
Flannel has created the networks:
[root#K8sMaster01 ~]# ip addr show 1: lo: <LOOPBACK,UP,LOWER_UP> mtu
65536 qdisc noqueue state UNKNOWN group default qlen 1000
link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
inet 127.0.0.1/8 scope host lo
valid_lft forever preferred_lft forever
inet6 ::1/128 scope host
valid_lft forever preferred_lft forever 2: ens192: BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP group
default qlen 1000
link/ether 00:50:56:9a:80:f7 brd ff:ff:ff:ff:ff:ff
inet 10.34.88.182/24 brd 10.34.88.255 scope global dynamic ens192
valid_lft 7071sec preferred_lft 7071sec
inet 10.10.40.90/24 brd 10.10.40.255 scope global ens192:1
valid_lft forever preferred_lft forever
inet6 fe80::250:56ff:fe9a:80f7/64 scope link
valid_lft forever preferred_lft forever 3: docker0: <NO-ARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc noqueue state DOWN
group default
link/ether 02:42:cf:ec:b3:ee brd ff:ff:ff:ff:ff:ff
inet 172.17.0.1/16 brd 172.17.255.255 scope global docker0
valid_lft forever preferred_lft forever 4: flannel.1: BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1450 qdisc noqueue state UNKNOWN
group default
link/ether 06:df:1e:87:b8:ee brd ff:ff:ff:ff:ff:ff
inet 10.96.0.0/32 scope global flannel.1
valid_lft forever preferred_lft forever
inet6 fe80::4df:1eff:fe87:b8ee/64 scope link
valid_lft forever preferred_lft forever 5: cni0: BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1450 qdisc noqueue state UP
group default qlen 1000
link/ether 0a:58:0a:60:00:01 brd ff:ff:ff:ff:ff:ff
inet 10.96.0.1/24 scope global cni0
valid_lft forever preferred_lft forever
inet6 fe80::8c77:39ff:fe6e:8710/64 scope link
valid_lft forever preferred_lft forever 7: veth9527916b#if3: BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1450 qdisc noqueue master cni0
state UP group default
link/ether 46:62:b6:b8:b9:ac brd ff:ff:ff:ff:ff:ff link-netnsid 1
inet6 fe80::4462:b6ff:feb8:b9ac/64 scope link
valid_lft forever preferred_lft forever 8: veth6e6f08f5#if3: BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1450 qdisc noqueue master cni0
state UP group default
link/ether 3e:a5:4b:8d:11:ce brd ff:ff:ff:ff:ff:ff link-netnsid 2
inet6 fe80::3ca5:4bff:fe8d:11ce/64 scope link
valid_lft forever preferred_lft forever
I can ping the IP:
[root#K8sMaster01 ~]# ping 10.96.0.1
PING 10.96.0.1 (10.96.0.1) 56(84) bytes of data.
64 bytes from 10.96.0.1: icmp_seq=1 ttl=64 time=0.052 ms
64 bytes from 10.96.0.1: icmp_seq=2 ttl=64 time=0.032 ms
64 bytes from 10.96.0.1: icmp_seq=3 ttl=64 time=0.042 ms
and telent the port:
[root#K8sMaster01 ~]# telnet 10.96.0.1 443
Trying 10.96.0.1...
Connected to 10.96.0.1.
Escape character is '^]'.
Some one PLEASE save my back holiday weekend and tell me what is going wrong!
As requested here is my get services:
[root#K8sMaster01 ~]# kubectl get services --all-namespaces
NAMESPACE NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE
default blackbox-database ClusterIP 10.110.56.121 <none> 3306/TCP 5h
default kube-ops-view ClusterIP 10.105.35.23 <none> 82/TCP 1d
default kube-ops-view-redis ClusterIP 10.107.254.193 <none> 6379/TCP 1d
default kubernetes ClusterIP 10.96.0.1 <none> 443/TCP 1d
kube-system heapster ClusterIP 10.103.5.79 <none> 80/TCP 1d
kube-system kube-dns ClusterIP 10.96.0.10 <none> 53/UDP,53/TCP 1d
kube-system kubernetes-dashboard ClusterIP 10.96.220.152 <none> 443/TCP 1d
kube-system traefik-ingress-service ClusterIP 10.102.84.167 <none> 80/TCP,8080/TCP 1d
liab-live-bb blackbox-application ClusterIP 10.98.40.25 <none> 8080/TCP 5h
liab-live-bb blackbox-database ClusterIP 10.108.43.196 <none> 3306/TCP 5h
Telnet to port 46690:
[root#K8sMaster01 ~]# telnet 10.96.0.7 46690
Trying 10.96.0.7...
(no response)
Today I tried deploying two of my applications to the cluster, as can be seen in the get services. The "app" is unable to connect to the "db" it cannot resolve the DB service name. I believe that I have an issue with the networking, not sure if it is at the host level, or with in the kubernetes layer. I did notice my resolv.conf files were not pointing to localhost, and found some changes to make to the coredns config. When Ilooked at its configuration it was trying to point to a IP V6 Address, so changed it to this:
apiVersion: v1
data:
Corefile: |
.:53 {
errors
health
kubernetes cluster.local 10.96.0.0/12 {
pods insecure
}
prometheus :9153
proxy 10.4.4.28
cache 30
reload
}
kind: ConfigMap
metadata:
creationTimestamp: 2018-08-27T12:28:57Z
name: coredns
namespace: kube-system
resourceVersion: "174571"
selfLink: /api/v1/namespaces/kube-system/configmaps/coredns
uid: c5016361-a9f4-11e8-b0b4-0050569afad9

Where is Flanneld configuration that Kubernetes (installed by Kubeadm) use?

Question
Flanneld on a Kubernetes worker node has the configuration file /etc/sysconfig/flanneld which points to the ETCD on localhost of the worker node, which should point to the master node etcd URL.
Does this mean the Pod network has not been appropriately configured or Flannel with Kubernetes users different configuration files? If so which configuration does flanneld uses?
Also if there are good references/resources relating how Kubernetes interacts with CNI, kindly suggest.
On the worker node, the configuration points to its self, instead of master IP.
$ cat /etc/sysconfig/flanneld
# Flanneld configuration options
# etcd url location. Point this to the server where etcd runs
FLANNEL_ETCD_ENDPOINTS="http://127.0.0.1:2379"
# etcd config key. This is the configuration key that flannel queries
# For address range assignment
FLANNEL_ETCD_PREFIX="/atomic.io/network"
# Any additional options that you want to pass
#FLANNEL_OPTIONS=""
Worker nodes successfully joined.
$ kubectl get nodes
NAME STATUS ROLES AGE VERSION
master Ready master 25m v1.8.5
node01 Ready <none> 25m v1.8.5
node02 Ready <none> 25m v1.8.5
The flannel.1 IF on the worker node is configured with the save CIDR with master, although the configuration does not point to the master where Flannel was configured.
$ ip addr
...
3: enp0s8: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP qlen 1000
link/ether 08:00:27:0d:f8:34 brd ff:ff:ff:ff:ff:ff
inet 192.168.99.12/24 brd 192.168.99.255 scope global enp0s8
valid_lft forever preferred_lft forever
inet6 fe80::6839:cd66:9352:2280/64 scope link
valid_lft forever preferred_lft forever
4: virbr0: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc noqueue state DOWN qlen 1000
link/ether 52:54:00:2c:56:b8 brd ff:ff:ff:ff:ff:ff
inet 192.168.122.1/24 brd 192.168.122.255 scope global virbr0
valid_lft forever preferred_lft forever
5: virbr0-nic: <BROADCAST,MULTICAST> mtu 1500 qdisc pfifo_fast master virbr0 state DOWN qlen 1000
link/ether 52:54:00:2c:56:b8 brd ff:ff:ff:ff:ff:ff
6: docker0: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc noqueue state DOWN
link/ether 02:42:67:48:ae:ef brd ff:ff:ff:ff:ff:ff
inet 172.17.0.1/16 scope global docker0
valid_lft forever preferred_lft forever
7: flannel.1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1450 qdisc noqueue state UNKNOWN
link/ether 56:20:a1:4d:f0:d2 brd ff:ff:ff:ff:ff:ff
inet 10.244.1.0/32 scope global flannel.1
valid_lft forever preferred_lft forever
inet6 fe80::5420:a1ff:fe4d:f0d2/64 scope link
valid_lft forever preferred_lft forever
The step executed on the worker (besides sudo yum install kubelet kubeadm flanneld) is kubeadm join which looks succeeded (although a few error messages).
changed: [192.168.99.12] => {...
"[kubeadm] WARNING: kubeadm is in beta, please do not use it for production clusters.",
"[preflight] Running pre-flight checks",
"[preflight] Starting the kubelet service",
"[discovery] Trying to connect to API Server \"192.168.99.10:6443\"",
"[discovery] Created cluster-info discovery client, requesting info from \"https://192.168.99.10:6443\"",
"[discovery] Failed to connect to API Server \"192.168.99.10:6443\": there is no JWS signed token in the cluster-info ConfigMap. This token id \"7ae0ed\" is invalid for this cluster, can't connect",
"[discovery] Trying to connect to API Server \"192.168.99.10:6443\"",
"[discovery] Created cluster-info discovery client, requesting info from \"https://192.168.99.10:6443\"",
"[discovery] Failed to connect to API Server \"192.168.99.10:6443\": there is no JWS signed token in the cluster-info ConfigMap. This token id \"7ae0ed\" is invalid for this cluster, can't connect",
"[discovery] Trying to connect to API Server \"192.168.99.10:6443\"",
"[discovery] Created cluster-info discovery client, requesting info from \"https://192.168.99.10:6443\"",
"[discovery] Requesting info from \"https://192.168.99.10:6443\" again to validate TLS against the pinned public key",
"[discovery] Cluster info signature and contents are valid and TLS certificate validates against pinned roots, will use API Server \"192.168.99.10:6443\"",
"[discovery] Successfully established connection with API Server \"192.168.99.10:6443\"",
"[bootstrap] Detected server version: v1.8.5",
"[bootstrap] The server supports the Certificates API (certificates.k8s.io/v1beta1)",
"",
"Node join complete:",
"* Certificate signing request sent to master and response",
" received.",
"* Kubelet informed of new secure connection details.",
"",
"Run 'kubectl get nodes' on the master to see this machine join."
Background
Installed Kubernetes 1.8.5 by following Using kubeadm to Create a Cluster in CentOS 7 on VirtualBox.
Related
kubeadm init --token=xyz or kubeadm init --token xyz?
Flannel configuration is stored in etcd. FLANNEL_ETCD_ENDPOINTS="http://127.0.0.1:2379" parameter defines where etcd is located, FLANNEL_ETCD_PREFIX="/atomic.io/network" defines where data is stored in etcd.
So, to get flannel configuration exactly for your case, we need to get those info from etcd:
etcdctl --endpoint=127.0.0.1:2379 get /atomic.io/network/config
{"Network":"10.2.0.0/16","Backend":{"Type":"vxlan"}}
Also, we can find how many subnets we use in our cluster:
etcdctl --endpoint=127.0.0.1:2379 ls /atomic.io/network/subnets
/atomic.io/network/subnets/10.2.41.0-24
/atomic.io/network/subnets/10.2.86.0-24
And check information about any of them:
etcdctl --endpoint=127.0.0.1:2379 get /atomic.com/network/subnets/10.2.4.0-24
{"PublicIP":"10.0.0.16","BackendType":"vxlan","BackendData":{"VtepMAC":"45:e7:76:d5:1c:49"}}