Kubernetes: Calio not working on remote worker, local ok - kubernetes

I setup a Kubernetes cluster with calico.
The setup is "simple"
1x master (local network, ok)
1x node (local network, ok)
1x node (cloud server, not ok)
All debian buster with docker 19.03
On the cloud server the calico pods do not come up:
calico-kube-controllers-token-x:
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal SandboxChanged 47m (x50 over 72m) kubelet Pod sandbox changed, it will be killed and re-created.
Warning FailedMount 43m kubelet MountVolume.SetUp failed for volume "calico-kube-controllers-token-x" : failed to sync secret cache: timed out waiting for the condition
Normal SandboxChanged 3m41s (x78 over 43m) kubelet Pod sandbox changed, it will be killed and re-created.
calico-node-x:
Warning Unhealthy 43m (x5 over 43m) kubelet Liveness probe failed: calico/node is not ready: Felix is not live: Get "http://localhost:9099/liveness": dial tcp [::1]:9099: connect: connection refused
Warning Unhealthy 14m (x77 over 43m) kubelet Readiness probe failed: calico/node is not ready: BIRD is not ready: Error querying BIRD: unable to connect to BIRDv4 socket: dial unix /var/run/bird/bird.ctl: connect: no such file or directory
Warning BackOff 4m26s (x115 over 39m) kubelet Back-off restarting failed container
My guess is that there is something wrong with IP/Network config, but did not figure out which.
Required ports (k8s&BGP) are forwarded from the router, also tried the master directly connected to the internet
--control-plane-endpoint is a hostname and public resolveable
Calico is using BGP peering (using public ip as peer)
This entry does worry me the most:
displayes local ip: kubectl get --raw /api
I tried to find a way to change this to the public IP of the master, without success.
Anyone got a clue what to try next?

After an additional time spend with analysis the problem happend to be the distributed api ip address was the local one, not the dns-name.
Created a vpn with wireguard from the cloud node to the local master, so the local ip of the master is reachable from the cloud node.
Don't know if that is the cleanest solution, but it works.

Run this command to verify if IP_AUTODETECTION_METHOD environment variable in calico daemonset has been set
kubectl get daemonset/calico-node -n kube-system --output json | jq '.spec.template.spec.containers[].env[] | select(.name | startswith("IP"))'
Run this command in each of your k8s nodes to find the valid network interface
ifconfig
Explicitly set the IP_AUTODETECTION_METHOD environment variable, to make sure the calico node communicates to the correct network interface of the K8s node.
kubectl set env daemonset/calico-node -n kube-system IP_AUTODETECTION_METHOD=interface=en.*

Related

Ingress-nginx is in CrashLoopBackOff after K8s upgrade

After upgrading Kubernetes node pool from 1.21 to 1.22, ingress-nginx-controller pods started crashing. The same deployment has been working fine in EKS. I'm just having this issue in GKE. Does anyone have any ideas about the root cause?
$ kubectl logs ingress-nginx-controller-5744fc449d-8t2rq -c controller
-------------------------------------------------------------------------------
NGINX Ingress controller
Release: v1.3.1
Build: 92534fa2ae799b502882c8684db13a25cde68155
Repository: https://github.com/kubernetes/ingress-nginx
nginx version: nginx/1.19.10
-------------------------------------------------------------------------------
W0219 21:23:08.194770 8 client_config.go:617] Neither --kubeconfig nor --master was specified. Using the inClusterConfig. This might not work.
I0219 21:23:08.194995 8 main.go:209] "Creating API client" host="https://10.1.48.1:443"
Ingress pod events:
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal Scheduled 27m default-scheduler Successfully assigned infra/ingress-nginx-controller-5744fc449d-8t2rq to gke-infra-nodep-ffe54a41-s7qx
Normal Pulling 27m kubelet Pulling image "registry.k8s.io/ingress-nginx/controller:v1.3.1#sha256:54f7fe2c6c5a9db9a0ebf1131797109bb7a4d91f56b9b362bde2abd237dd1974"
Normal Started 27m kubelet Started container controller
Normal Pulled 27m kubelet Successfully pulled image "registry.k8s.io/ingress-nginx/controller:v1.3.1#sha256:54f7fe2c6c5a9db9a0ebf1131797109bb7a4d91f56b9b362bde2abd237dd1974" in 6.443361484s
Warning Unhealthy 26m (x6 over 26m) kubelet Readiness probe failed: HTTP probe failed with statuscode: 502
Normal Killing 26m kubelet Container controller failed liveness probe, will be restarted
Normal Created 26m (x2 over 27m) kubelet Created container controller
Warning FailedPreStopHook 26m kubelet Exec lifecycle hook ([/wait-shutdown]) for Container "controller" in Pod "ingress-nginx-controller-5744fc449d-8t2rq_infra(c4c166ff-1d86-4385-a22c-227084d569d6)" failed - error: command '/wait-shutdown' exited with 137: , message: ""
Normal Pulled 26m kubelet Container image "registry.k8s.io/ingress-nginx/controller:v1.3.1#sha256:54f7fe2c6c5a9db9a0ebf1131797109bb7a4d91f56b9b362bde2abd237dd1974" already present on machine
Warning BackOff 7m7s (x52 over 21m) kubelet Back-off restarting failed container
Warning Unhealthy 2m9s (x55 over 26m) kubelet Liveness probe failed: HTTP probe failed with statuscode: 502
The Beta API versions (extensions/v1beta1 and networking.k8s.io/v1beta1) of Ingress are no longer served (removed) for GKE clusters created on versions 1.22 and later. Please refer to the official GKE ingress documentation for changes in the GA API version.
Also refer to Official Kubernetes documentation for API removals for Kubernetes v1.22 for more information.
Before upgrading your Ingress API as a client, make sure that every ingress controller that you use is compatible with the v1 Ingress API. See Ingress Prerequisites for more context about Ingress and ingress controllers.
Also check below possible causes for Crashloopbackoff :
Increasing the initialDelaySeconds value for the livenessProbe setting may help to alleviate the issue, as it will give the container more time to start up and perform its initial work operations before the liveness probe server checks its health.
Check “Container restart policy”, the spec of a Pod has a restartPolicy field with possible values Always, OnFailure, and Never. The default value is Always.
Out of memory or resources : Try to increase the VM size. Containers may crash due to memory limits, then new ones spun up, the health check failed and Ingress served up 502.
Check externalTrafficPolicy=Local is set on the NodePort service will prevent nodes from forwarding traffic to other nodes.
Refer to the Github issue Document how to avoid 502s #34 for more information.

Kube-state-metrics error: Failed to create client: ... i/o timeout

I'm running Kubernetes in virtual machines and going through the basic tutorials, currently Add logging and metrics to the PHP / Redis Guestbook example. I'm trying to install kube-state-metrics:
git clone https://github.com/kubernetes/kube-state-metrics.git kube-state-metrics
kubectl create -f kube-state-metrics/kubernetes
but it fails.
kubectl describe pod --namespace kube-system kube-state-metrics-7d84474f4d-d5dg7
...
Warning Unhealthy 28m (x8 over 30m) kubelet, kubernetes-node1 Readiness probe failed: Get http://192.168.129.102:8080/healthz: dial tcp 192.168.129.102:8080: connect: connection refused
kubectl logs --namespace kube-system kube-state-metrics-7d84474f4d-d5dg7 -c kube-state-metrics
I0514 17:29:26.980707 1 main.go:85] Using default collectors
I0514 17:29:26.980774 1 main.go:93] Using all namespace
I0514 17:29:26.980780 1 main.go:129] metric white-blacklisting: blacklisting the following items:
W0514 17:29:26.980800 1 client_config.go:549] Neither --kubeconfig nor --master was specified. Using the inClusterConfig. This might not work.
I0514 17:29:26.983504 1 main.go:169] Testing communication with server
F0514 17:29:56.984025 1 main.go:137] Failed to create client: ERROR communicating with apiserver: Get https://10.96.0.1:443/version?timeout=32s: dial tcp 10.96.0.1:443: i/o timeout
I'm unsure if this 10.96.0.1 IP is correct. My virtual machines are in a bridged network 10.10.10.0/24 and a host-only network 192.168.59.0/24. When initializing Kubernetes I used the argument --pod-network-cidr=192.168.0.0/16 so that's one more IP range that I'd expect. But 10.96.0.1 looks unfamiliar.
I'm new to Kubernetes, just doing the basic tutorials, so I don't know what to do now. How to fix it or investigate further?
EDIT - additonal info:
kubectl get nodes -o wide
NAME STATUS ROLES AGE VERSION INTERNAL-IP EXTERNAL-IP OS-IMAGE KERNEL-VERSION CONTAINER-RUNTIME
kubernetes-master Ready master 15d v1.14.1 10.10.10.11 <none> Ubuntu 18.04.2 LTS 4.15.0-48-generic docker://18.9.2
kubernetes-node1 Ready <none> 15d v1.14.1 10.10.10.5 <none> Ubuntu 18.04.2 LTS 4.15.0-48-generic docker://18.9.2
kubernetes-node2 Ready <none> 15d v1.14.1 10.10.10.98 <none> Ubuntu 18.04.2 LTS 4.15.0-48-generic docker://18.9.2
The command I used to initialize the cluster:
sudo kubeadm init --apiserver-advertise-address=192.168.59.20 --pod-network-cidr=192.168.0.0/16
The reason for this is probably overlapping of Pod network with Node network - you set Pod network CIDR to 192.168.0.0/16 which your host-only network will be included into as its address is 192.168.59.0/24.
To solve this you can either change the pod network CIDR to 192.168.0.0/24 (it is not recommended as this will give you only 255 addresses for your pod networking)
You can also use different range for your Calico. If you want to do it on a running cluster here is an instruction.
Also other way I tried:
edit Calico manifest to different range (for example 10.0.0.0/8) - sudo kubeadm init --apiserver-advertise-address=192.168.59.20 --pod-network-cidr=10.0.0.0/8) and apply it after the init.
Another way would be using different CNI like Flannel (which uses 10.244.0.0/16).
You can find more information about ranges of CNI plugins here.

Failed to create pod sandbox kubernetes error

I have a Ubuntu 16.04 which is acting as kubernetes master. I have installed kuber v1.13.1 and using weave for networking. I have 2 Raspberry pi devices running the same version of kubernetes. I created a cluster and joined the raspberry pi to Ubuntu kube master. I have started a deployment and everything looks to be working fine.
When I checked the logs of the container, I found out that it was not able to connect to the internet. I tried pinging but got no results. When I run the command to describe the pod, I got following:
Warning FailedCreatePodSandBox 42m (x3 over 42m) kubelet, node02 (combined from similar events): Failed create pod sandbox: rpc error: code = Unknown desc = failed to set up sandbox container "dea99f80488031b84b7b1f934343e54d877adf931071401651628505d52f55f9" network for pod "deployment-cnfc5": NetworkPlugin cni failed to set up pod "deployment-cnfc5_matrix-device" network: unable to allocate IP address: Post http://127.0.0.1:6784/ip/dea99f80488031b84b7b1f934343e54d877adf931071401651628505d52f55f9: dial tcp 127.0.0.1:6784: connect: connection refused
I have checked the directory /etc/cni/net.d and it contains 10-weave.conflist on both master and worker node. I have also checked the directory /opt/cni/bin and found below on master node:
bridge flannel ipvlan macvlan ptp tuning weave-ipam weave-plugin-2.5.1
dhcp host-local loopback portmap sample vlan weave-net
and on worker, I got below:
bridge flannel ipvlan macvlan ptp tuning weave-ipam weave-plugin-2.5.0
dhcp host-local loopback portmap sample vlan weave-net weave-plugin-2.5.1
Please can anyone please let me know what can I do to resolve this issue.? Thanks.
I initiated the kube master by using below commands:
sudo kubeadm init --token-ttl=0 --apiserver-advertise-address=192.168.0.142
and installed weave using:
kubectl apply -f "https://cloud.weave.works/k8s/net?k8s-version=$(kubectl version | base64 | tr -d '\n')"

Unable to mount PVC created by OpenEBS on pods on Kubernetes bare-metal deployment

I am facing issues while mounting pvc on pods with openebs installed on bare-metal kubernetes cluster created with RKE.
Expected Behavior
PVC's should be mounted on pods without issues.
Current Behavior
Pods unable to mount PVC's:
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Warning FailedScheduling 2m9s (x23 over 2m45s) default-scheduler pod has unbound PersistentVolumeClaims (repeated 4 times)
Normal Scheduled 2m8s default-scheduler Successfully assigned default/minio-deployment-64d7c79464-966jr to 192.168.1.21
Normal SuccessfulAttachVolume 2m8s attachdetach-controller AttachVolume.Attach succeeded for volume "pvc-63cf6c92-ec99-11e8-85c9-b06ebfd124ff"
Warning FailedMount 84s (x4 over 102s) kubelet, 192.168.1.21 MountVolume.WaitForAttach failed for volume "pvc-63cf6c92-ec99-11e8-85c9-b06ebfd124ff" : failed to get any path for iscsi disk, last err seen:
iscsi: failed to sendtargets to portal 10.43.227.122:3260 output: iscsiadm: cannot make connection to 10.43.227.122: Connection refused
iscsiadm: cannot make connection to 10.43.227.122: Connection refused
iscsiadm: cannot make connection to 10.43.227.122: Connection refused
iscsiadm: cannot make connection to 10.43.227.122: Connection refused
iscsiadm: cannot make connection to 10.43.227.122: Connection refused
iscsiadm: cannot make connection to 10.43.227.122: Connection refused
iscsiadm: connection login retries (reopen_max) 5 exceeded
iscsiadm: No portals found
, err exit status 21
Warning FailedMount 24s (x4 over 80s) kubelet, 192.168.1.21 MountVolume.WaitForAttach failed for volume "pvc-63cf6c92-ec99-11e8-85c9-b06ebfd124ff" : failed to get any path for iscsi disk, last err seen:
iscsi: failed to attach disk: Error: iscsiadm: Could not login to [iface: default, target: iqn.2016-09.com.openebs.jiva:pvc-63cf6c92-ec99-11e8-85c9-b06ebfd124ff, portal: 10.43.227.122,3260].
iscsiadm: initiator reported error (12 - iSCSI driver not found. Please make sure it is loaded, and retry the operation)
iscsiadm: Could not log into all portals
Logging in to [iface: default, target: iqn.2016-09.com.openebs.jiva:pvc-63cf6c92-ec99-11e8-85c9-b06ebfd124ff, portal: 10.43.227.122,3260] (multiple)
(exit status 12)
Warning FailedMount 2s kubelet, 192.168.1.21 Unable to mount volumes for pod "minio-deployment-64d7c79464-966jr_default(640263d0-ec99-11e8-85c9-b06ebfd124ff)": timeout expired waiting for volumes to attach or mount for pod "default"/"minio-deployment-64d7c79464-966jr". list of unmounted volumes=[storage]. list of unattached volumes=[storage default-token-9n8pn]
Steps to Reproduce
Install openebs with helm.
Create a pvc with storage class as openebs-standalone
Create pod and try to mount the PVC.
kubectl get pvc:
root#an4:/home/rke-k8s# kubectl get pvc
NAME STATUS VOLUME CAPACITY ACCESS MODES STORAGECLASS AGE
docker-private-registry-docker-registry Bound pvc-58cf63c1-ec95-11e8-9b5d-2cfda16d3cfd 10Gi RWO openebs-standalone 22m
Update
When I tried the sample minio deployment, here's what I have observed:
PVC creation took around 1-2 minutes.
Mounting PVC to the pod took around 1 hour.
Storage class used for this was openebs-standard.
Any reason for this? It is on-prem cluster deployment.
Well, this issue was documented in the troubleshooting guide - https://docs.openebs.io/docs/next/tsgiscsi.html
This is the issue with openebs and already been opened with team. Fix is still pending, you can track the issue here:
https://github.com/openebs/openebs/issues/1688
There is step by step instruction how to debug the issue. Hope this helps.

Pods are not created on new nodes

When i create a sample nginx pod with some replica's to test my kubernetes cluster. i get a strange output. The pods create themself on the first node but on the 2 other nodes they stuck at status "Container creating"
When i describe the pods (only the ones on the other nodes) they give this error message
Warning FailedCreatePodSandBox 1m kubelet, xploregroup Failed create pod sandbox.
Normal SandboxChanged 1m kubelet, xploregroup Pod sandbox changed, it will be killed and re-created.
the strange part is that all node have all exactly the same configuration (cloned the image from the master) and i joined them all exactly the same way.
The pods get distributed normally but only the pods on node1 is running .
Can someone direct me to the same direction :(
[EDIT]
journalctl -u kubelet gives this error
Mar 12 13:42:45 kubeMaster kubelet[16379]: W0312 13:42:45.824314 16379 cni.go:171] Unable to update cni config: No networks found in /etc/cni/net.d
Mar 12 13:42:45 kubeMaster kubelet[16379]: E0312 13:42:45.824816 16379 kubelet.go:2104] Container runtime network not ready: NetworkReady=false reason:NetworkPluginNotReady message:docker: network plugin is not ready: cni config uninitialized
The problem seems to be with my network plugin. In my /etc/systemd/system/kubelet.service.d/10.kubeadm.conf . the flags for the network plugins are present ? environment= kubelet_network_args --cni-bin-dir=/etc/cni/net.d
--network-plugin=cni