Using Microk8s and OpenEBS cStor leads to an error when creating pool claims. Anybody know why this is occurring, and how to fix it? - kubernetes

I am using Microk8s (1.19, on Ubuntu 20.04.1 LTS) and am trying to use OpenOBS (cStor engine) for storage.
Since I'm running this locally before pushing to prod, I created virtual block devices with:
blockdevicedisk='/k8storage/diskimage'
blockdevicesize=10000
sudo dd if=/dev/zero of=$blockdevicedisk bs=1M count=$blockdevicesize
sudo mkfs.ext4 $blockdevicedisk
sudo losetup /dev/loop0 /k8storage/diskimage
$lsblk
NAME MAJ:MIN RM SIZE RO TYPE MOUNTPOINT
loop0 7:0 0 9.8G 0 loop
loop1 7:1 0 7.8G 0 loop
sda 8:0 0 256G 0 disk
sdb 8:16 0 256G 0 disk /
I installed OpenEBS with helm, then removed 'loop' from the openebs-ndm-config -> filterconfigs -> path-filter -> exclude. So that ndm would display these as blockdevices.
$kubectl get blockdevices -n openebs
NAME NODENAME SIZE CLAIMSTATE STATUS AGE
blockdevice-87ca7d6819eab3ea3af2884f2f6e9f8e v 274877906944 Unclaimed Active 19h
blockdevice-0a6c8d26081660a37f0a87dbb316c7ae v 10485760000 Unclaimed Active 19h
blockdevice-cd43d37664edd1c880e11f5b8e9cbe60 v 8388608000 Unclaimed Active 19h
^ the last 2 are the ones I made.
I then wrote the config to create a cStor StoragePoolClaim
apiVersion: openebs.io/v1alpha1
kind: StoragePoolClaim
metadata:
name: cstor-pool-claim
spec:
name: cstor-pool-claim
type: disk
poolSpec:
poolType: striped
blockDevices:
blockDeviceList:
- blockdevice-0a6c8d26081660a37f0a87dbb316c7ae
- blockdevice-cd43d37664edd1c880e11f5b8e9cbe60
When I apply it, both the blockdevices are claimed
$kubectl get blockdevices -n openebs
NAME NODENAME SIZE CLAIMSTATE STATUS AGE
blockdevice-87ca7d6819eab3ea3af2884f2f6e9f8e v 274877906944 Unclaimed Active 19h
blockdevice-0a6c8d26081660a37f0a87dbb316c7ae v 10485760000 Claimed Active 19h
blockdevice-cd43d37664edd1c880e11f5b8e9cbe60 v 8388608000 Claimed Active 19h
which is expected.
$kubectl get spc
NAME AGE
cstor-pool-claim 18h
However, there is a problem!
$kubectl get csp
NAME ALLOCATED FREE CAPACITY STATUS READONLY TYPE AGE
cstor-pool-claim-nf0g Init false striped 19h
It never changes from Init status. There is a pod created which shows the error as
$kubectl describe pod cstor-pool-claim-nf0g-6cb75f8f49-sw6q2 -n openebs
which spews a lot of text I could show if it helps. The key part is the error message, which is:
Error: failed to create containerd task: OCI runtime create failed:
container_linux.go:370: starting container process caused:
process_linux.go:459: container init caused: rootfs_linux.go:59:
mounting
"/var/snap/microk8s/common/run/containerd/io.containerd.grpc.v1.cri/sandboxes/a9b84df9076c91b83982f157e9bacdc5a10f80846d32034dd15cdae1c1d4c4c1/shm"
to rootfs at "/dev/shm" caused: secure join: too many levels of
symbolic links: unknown
I've tried resetting my setup and re-entering the commands one-by-one to ensure that I was following the documentation, and other examples correctly, however, I keep encountering this error.
Is this a limitation of microk8s? A fault of OpenEBS? Something weird about my setup? Or did I do something wrong?
More importantly: Is there a way to get this to work correctly?

Related

kubernetes - is age of pod always since last restart

My understanding is that the AGE shown for a pod when using kubectl get pod, shows the time that the pod has been running since the last restart. So, for the pod shown below, my understanding is that it intially restarted 14 times, but hasn't restarted in the last 17 hours. Is this correct, and where is a kubernetes reference that explains this?
Hope you're enjoying your Kubernetes journey !
In fact, the AGE Headers when using kubectl get pod shows you for how long your pod has been created and it's running. But do not confuse POD and container:
The header "RESTARTS" is actually linked to the parameters > '.status.containerStatuses[0].restartCount' of the pod manifest. That means that this header is linked to the number of restarts, not of the pod, but of the container inside the pod.
Here is an example:
I just deployed a new pod:
NAME READY STATUS RESTARTS AGE
test-bg-7d57d546f4-f4cql 2/2 Running 0 9m38s
If I check the yaml configuration of this pod, we can see that in the "status" section we have the said "restartCount" field:
❯ k get po test-bg-7d57d546f4-f4cql -o yaml
apiVersion: v1
kind: Pod
metadata:
...
spec:
...
status:
...
containerStatuses:
...
- containerID: docker://3f53f140f775416644ea598d554e9b8185e7dd005d6da1940d448b547d912798
...
name: test-bg
ready: true
restartCount: 0
...
So, to demonstrate what I'm saying, I'm going to connect into my pod and kill the main process's my pod is running:
❯ k exec -it test-bg-7d57d546f4-f4cql -- bash
I have no name!#test-bg-7d57d546f4-f4cql:/tmp$ ps aux
USER PID %CPU %MEM VSZ RSS TTY STAT START TIME COMMAND
1000 1 0.0 0.0 5724 3256 ? Ss 03:20 0:00 bash -c source /tmp/entrypoint.bash
1000 22 1.5 0.1 2966140 114672 ? Sl 03:20 0:05 java -jar test-java-bg.jar
1000 41 3.3 0.0 5988 3592 pts/0 Ss 03:26 0:00 bash
1000 48 0.0 0.0 8588 3260 pts/0 R+ 03:26 0:00 ps aux
I have no name!#test-bg-7d57d546f4-f4cql:/tmp$ kill 22
I have no name!#test-bg-7d57d546f4-f4cql:/tmp$ command terminated with exit code 137
and after this, if I reexecute the "kubectl get pod" command, I got this:
NAME READY STATUS RESTARTS AGE
test-bg-7d57d546f4-f4cql 2/2 Running 1 11m
Then, if I go back to my yaml config, We can see that the restartCount field is actually linked to my container and not to my pod.
❯ k get po test-bg-7d57d546f4-f4cql -o yaml
apiVersion: v1
kind: Pod
metadata:
...
spec:
...
status:
...
containerStatuses:
...
- containerID: docker://3f53f140f775416644ea598d554e9b8185e7dd005d6da1940d448b547d912798
...
name: test-bg
ready: true
restartCount: 1
...
So, to conclude, the RESTARTS header is giving you the restartCount of the container not of the pod, but the AGE header is giving you the age of the pod.
This time, if I delete the pod:
❯ k delete pod test-bg-7d57d546f4-f4cql
pod "test-bg-7d57d546f4-f4cql" deleted
we can see that the restartCount is back to 0 since its a brand new pod with a brand new age:
NAME READY STATUS RESTARTS AGE
test-bg-7d57d546f4-bnvxx 2/2 Running 0 23s
test-bg-7d57d546f4-f4cql 2/2 Terminating 2 25m
For your example, it means that the container restarted 14 times, but the pod was deployed 17 hours ago.
I can't find the exact documentation of this but (as it is explained here: https://kubernetes.io/docs/concepts/workloads/_print/#working-with-pods):
"Note: Restarting a container in a Pod should not be confused with restarting a Pod. A Pod is not a process, but an environment for running container(s). A Pod persists until it is deleted."
Hope this has helped you better understand.
Here is a little tip from https://kubernetes.io/docs/reference/kubectl/cheatsheet/:
kubectl get pods --sort-by='.status.containerStatuses[0].restartCount'
(to sort your pods by their restartCount number :p)
Bye
OMG, they add a new feature in kubernetes (i dont know since when) but look:
NAME READY STATUS RESTARTS AGE
pod/nginx-deployment-74d589986c-ngvgc 1/1 Running 0 21h
pod/postgres 0/1 CrashLoopBackOff 7 (3m16s ago) 14m
now you can see the actual AGE of the container !!!!!! (3m16s ago in my example)
here is my kubernetes version:
❯ kubectl version --short
Client Version: v1.22.5
Server Version: v1.23.5

Enabling NodeLocalDNS fails

We have 2 clusters on GKE: dev and production. I tried to run this command on dev cluster:
gcloud beta container clusters update "dev" --update-addons=NodeLocalDNS=ENABLED
And everything went great, node-local-dns pods are running and all works, next morning I decided to run same command on production cluster and node-local-dns fails to run, and I noticed that both PILLAR__LOCAL__DNS and PILLAR__DNS__SERVER in yaml aren't changed to proper IPs, I tried to change those variables in config yaml, but GKE keeps overwriting them back to yaml with PILLAR__DNS__SERVER variables...
The only difference between clusters is that dev runs on 1.15.9-gke.24 and production 1.15.11-gke.1.
Apparently 1.15.11-gke.1 version has a bug.
I recreated it first on 1.15.11-gke.1 and can confirm that node-local-dns Pods fall into CrashLoopBackOff state:
node-local-dns-28xxt 0/1 CrashLoopBackOff 5 5m9s
node-local-dns-msn9s 0/1 CrashLoopBackOff 6 8m17s
node-local-dns-z2jlz 0/1 CrashLoopBackOff 6 10m
When I checked the logs:
$ kubectl logs -n kube-system node-local-dns-msn9s
2020/04/07 21:01:52 [FATAL] Error parsing flags - Invalid localip specified - "__PILLAR__LOCAL__DNS__", Exiting
Solution:
Upgrade to 1.15.11-gke.3 helped. First you need to upgrade your master-node and then your node pool. It looks like on this version everything runs nice and smoothly:
$ kubectl get daemonsets -n kube-system node-local-dns
NAME DESIRED CURRENT READY UP-TO-DATE AVAILABLE NODE SELECTOR AGE
node-local-dns 3 3 3 3 3 addon.gke.io/node-local-dns-ds-ready=true 44m
$ kubectl get pods -n kube-system -l k8s-app=node-local-dns
NAME READY STATUS RESTARTS AGE
node-local-dns-8pjr5 1/1 Running 0 11m
node-local-dns-tmx75 1/1 Running 0 19m
node-local-dns-zcjzt 1/1 Running 0 19m
As it comes to manually fixing this particular daemonset yaml file, I wouldn't recommend it as you can be sure that GKE's auto-repair and auto-upgrade features will overwrite it sooner or later anyway.
I hope it was helpful.

CockroachDB on Single Cluster Kube PODs fail with CrashLoopBackOff

Using VirtualBox and 4 x Centos7 OS installs.
Following a basic Single cluster kubernetes install:
https://kubernetes.io/docs/setup/independent/install-kubeadm/
https://kubernetes.io/docs/setup/independent/create-cluster-kubeadm/
[root#k8s-master cockroach]# kubectl get nodes
NAME STATUS ROLES AGE VERSION
k8s-master Ready master 41m v1.13.2
k8s-slave1 Ready <none> 39m v1.13.2
k8s-slave2 Ready <none> 39m v1.13.2
k8s-slave3 Ready <none> 39m v1.13.2
I have created 3 x NFS PV's on master for my slaves to pick up as part of the cockroachdb-statefulset.yaml as described here:
https://www.cockroachlabs.com/blog/running-cockroachdb-on-kubernetes/
However my cockroach PODs just continually fail to communicate with each other.
[root#k8s-slave1 kubernetes]# kubectl get pods
NAME READY STATUS RESTARTS AGE
cockroachdb-0 0/1 CrashLoopBackOff 6 8m47s
cockroachdb-1 0/1 CrashLoopBackOff 6 8m47s
cockroachdb-2 0/1 CrashLoopBackOff 6 8m47s
[root#k8s-slave1 kubernetes]# kubectl get pvc
NAME STATUS VOLUME CAPACITY ACCESS MODES STORAGECLASS AGE
datadir-cockroachdb-0 Bound cockroachdbpv0 10Gi RWO 17m
datadir-cockroachdb-1 Bound cockroachdbpv2 10Gi RWO 17m
datadir-cockroachdb-2 Bound cockroachdbpv1 10Gi RWO 17m
...the cockroach pod logs do not really tell me why...
[root#k8s-slave1 kubernetes]# kubectl logs cockroachdb-0
++ hostname -f
+ exec /cockroach/cockroach start --logtostderr --insecure --advertise-host cockroachdb-0.cockroachdb.default.svc.cluster.local --http-host 0.0.0.0 --join cockroachdb-0.cockroachdb,cockroachdb-1.cockroachdb,cockroachdb-2.cockroachdb --cache 25% --max-sql-memory 25%
W190113 17:00:46.589470 1 cli/start.go:1055 RUNNING IN INSECURE MODE!
- Your cluster is open for any client that can access <all your IP addresses>.
- Any user, even root, can log in without providing a password.
- Any user, connecting as root, can read or write any data in your cluster.
- There is no network encryption nor authentication, and thus no confidentiality.
Check out how to secure your cluster: https://www.cockroachlabs.com/docs/v2.1/secure-a-cluster.html
I190113 17:00:46.595544 1 server/status/recorder.go:609 available memory from cgroups (8.0 EiB) exceeds system memory 3.7 GiB, using system memory
I190113 17:00:46.600386 1 cli/start.go:1069 CockroachDB CCL v2.1.3 (x86_64-unknown-linux-gnu, built 2018/12/17 19:15:31, go1.10.3)
I190113 17:00:46.759727 1 server/status/recorder.go:609 available memory from cgroups (8.0 EiB) exceeds system memory 3.7 GiB, using system memory
I190113 17:00:46.759809 1 server/config.go:386 system total memory: 3.7 GiB
I190113 17:00:46.759872 1 server/config.go:388 server configuration:
max offset 500000000
cache size 947 MiB
SQL memory pool size 947 MiB
scan interval 10m0s
scan min idle time 10ms
scan max idle time 1s
event log enabled true
I190113 17:00:46.759896 1 cli/start.go:913 using local environment variables: COCKROACH_CHANNEL=kubernetes-insecure
I190113 17:00:46.759909 1 cli/start.go:920 process identity: uid 0 euid 0 gid 0 egid 0
I190113 17:00:46.759919 1 cli/start.go:545 starting cockroach node
I190113 17:00:46.762262 22 storage/engine/rocksdb.go:574 opening rocksdb instance at "/cockroach/cockroach-data/cockroach-temp632709623"
I190113 17:00:46.803749 22 server/server.go:851 [n?] monitoring forward clock jumps based on server.clock.forward_jump_check_enabled
I190113 17:00:46.804168 22 storage/engine/rocksdb.go:574 opening rocksdb instance at "/cockroach/cockroach-data"
I190113 17:00:46.828487 22 server/config.go:494 [n?] 1 storage engine initialized
I190113 17:00:46.828526 22 server/config.go:497 [n?] RocksDB cache size: 947 MiB
I190113 17:00:46.828536 22 server/config.go:497 [n?] store 0: RocksDB, max size 0 B, max open file limit 60536
W190113 17:00:46.838175 22 gossip/gossip.go:1499 [n?] no incoming or outgoing connections
I190113 17:00:46.838260 22 cli/start.go:505 initial startup completed, will now wait for `cockroach init`
or a join to a running cluster to start accepting clients.
Check the log file(s) for progress.
I190113 17:00:46.841243 22 server/server.go:1402 [n?] no stores bootstrapped and --join flag specified, awaiting init command.
W190113 17:01:16.841095 89 cli/start.go:535 The server appears to be unable to contact the other nodes in the cluster. Please try:
- starting the other nodes, if you haven't already;
- double-checking that the '--join' and '--listen'/'--advertise' flags are set up correctly;
- running the 'cockroach init' command if you are trying to initialize a new cluster.
If problems persist, please see https://www.cockroachlabs.com/docs/v2.1/cluster-setup-troubleshooting.html.
I190113 17:01:31.357765 1 cli/start.go:756 received signal 'terminated'
I190113 17:01:31.359529 1 cli/start.go:821 initiating graceful shutdown of server
initiating graceful shutdown of server
I190113 17:01:31.361064 1 cli/start.go:872 too early to drain; used hard shutdown instead
too early to drain; used hard shutdown instead
...any ideas how to debug this further?
I have gone through *.yaml file at https://github.com/cockroachdb/cockroach/blob/master/cloud/kubernetes/cockroachdb-statefulset.yaml
I noticed that towards the bottom there is no storageClassName mentioned which means that during the volume claim process, pods are going to look for standard storage class.
I am not sure if you used below annotation while provisioning 3 NFS volumes -
storageclass.kubernetes.io/is-default-class=true
You should be able to check the same using -
kubectl get storageclass
If the output does not show Standard storage class then I would suggest either readjusting persistent volumes definitions by adding annotation or add empty string as storageClassName towards the end of the cockroach-statefulset.yaml file
More logs can be viewed using -
kubectl describe cockroachdb-{statefulset}
OK it came down to the fact I had NAT as my virtualbox external facing network adaptor. I changed it to Bridged and it all started working perfectly. If anyone can tell me why, that would be awesome :)
In my case, using helm chart, like below:
$ helm install stable/cockroachdb \
-n cockroachdb \
--namespace cockroach \
--set Storage=10Gi \
--set NetworkPolicy.Enabled=true \
--set Secure.Enabled=true
After wait to finish adding csr's for cockroach:
$ watch kubectl get csr
Several csr's are pending:
$ kubectl get csr
NAME AGE REQUESTOR CONDITION
cockroachdb.client.root 130m system:serviceaccount:cockroachdb:cockroachdb-cockroachdb Pending
cockroachdb.node.cockroachdb-cockroachdb-0 130m system:serviceaccount:cockroachdb:cockroachdb-cockroachdb Pending
cockroachdb.node.cockroachdb-cockroachdb-1 129m system:serviceaccount:cockroachdb:cockroachdb-cockroachdb Pending
cockroachdb.node.cockroachdb-cockroachdb-2 130m system:serviceaccount:cockroachdb:cockroachdb-cockroachdb Pending
To approve that run follow command:
$ kubectl get csr -o json | \
jq -r '.items[] | select(.metadata.name | contains("cockroach.")) | .metadata.name' | \
xargs -n 1 kubectl certificate approve

Gitlab CI on Kubernetes Cluster (Openstack)

I am trying to follow this short doc about how to use Gitlab CI with a Kubernetes Cluster that I am creating with Openstack: https://docs.gitlab.com/runner/install/kubernetes.html
I manage to create it but any time I create the ConfigMap and Deployment as specified in the previous link the pods it creates are stuck in a CrashLoopBackOff like this:
NAMESPACE NAME READY STATUS RESTARTS AGE
gitlab gitlab-runner-3998042981-f8dlh 0/1 CrashLoopBackOff 36 2h
gitlab gitlab-runner-3998042981-g9m5g 0/1 CrashLoopBackOff 36 2h
gitlab gitlab-runner-3998042981-q0bth 0/1 CrashLoopBackOff 36 2h
gitlab gitlab-runner-3998042981-rjztk 0/1 CrashLoopBackOff 36 2h
kube-system coredns-1977636023-1q47s 1/1 Running 0 21h
kube-system grafana-1173934969-vw49f 1/1 Running 0 21h
kube-system node-exporter-gitlab-ci-hc6k3ffax54o-minion-0 1/1 Running 0 21h
kube-system node-exporter-gitlab-ci-hc6k3ffax54o-minion-1 1/1 Running 0 21h
kube-system prometheus-873144915-s9m6j 1/1 Running 0 21h
My problem is that I am not able to know why this happens since pod logs are not available when they are not created.
Apart from that I just do not know what to do with the specified volumes since I just think this has some relation with the crashloops.
Deployment specifies:
- configMap:
name: gitlab-runner
name: config
- hostPath:
path: /usr/share/ca-certificates/mozilla
name: cacerts
I have found that:
A hostPath volume mounts a file or directory from the host node’s
filesystem into your pod
After running the pods without the cacerts volume everything is created but afterwards no job will be executed.
Log from any pod:
Starting multi-runner from /etc/gitlab-runner/config.toml ... builds=0
Running in system-mode.
Configuration loaded builds=0
Metrics server disabled
ERROR: Checking for jobs... forbidden runner=<PARTOFTHETOKEN>
ERROR: Checking for jobs... forbidden runner=<PARTOFTHETOKEN>
ERROR: Checking for jobs... forbidden runner=<PARTOFTHETOKEN>
ERROR: Runner https://URL/ci<TOKEN> is not healthy and will be disabled!
Actual docs about having Gitlab CI running on a kubernetes cluster are not clear enough.
You need to run somewhere gitlab-runner register with the token you get from the Runner's admin page of your Gitlab instance and grab another token from resulting config (cat /etc/gitlab-runner/config.toml | grep token) and paste it into your deployment config so it can now receive jobs from CI.
UPDATE 2019: gitlab.com docs now make it clear:
https://docs.gitlab.com/runner/register/#gnulinux

How to fix weave-net CrashLoopBackOff for the second node?

I have got 2 VMs nodes. Both see each other either by hostname (through /etc/hosts) or by ip address. One has been provisioned with kubeadm as a master. Another as a worker node. Following the instructions (http://kubernetes.io/docs/getting-started-guides/kubeadm/) I have added weave-net. The list of pods looks like the following:
vagrant#vm-master:~$ kubectl get pods --all-namespaces
NAMESPACE NAME READY STATUS RESTARTS AGE
kube-system etcd-vm-master 1/1 Running 0 3m
kube-system kube-apiserver-vm-master 1/1 Running 0 5m
kube-system kube-controller-manager-vm-master 1/1 Running 0 4m
kube-system kube-discovery-982812725-x2j8y 1/1 Running 0 4m
kube-system kube-dns-2247936740-5pu0l 3/3 Running 0 4m
kube-system kube-proxy-amd64-ail86 1/1 Running 0 4m
kube-system kube-proxy-amd64-oxxnc 1/1 Running 0 2m
kube-system kube-scheduler-vm-master 1/1 Running 0 4m
kube-system kubernetes-dashboard-1655269645-0swts 1/1 Running 0 4m
kube-system weave-net-7euqt 2/2 Running 0 4m
kube-system weave-net-baao6 1/2 CrashLoopBackOff 2 2m
CrashLoopBackOff appears for each worker node connected. I have spent several ours playing with network interfaces, but it seems the network is fine. I have found similar question, where the answer advised to look into the logs and no follow up. So, here are the logs:
vagrant#vm-master:~$ kubectl logs weave-net-baao6 -c weave --namespace=kube-system
2016-10-05 10:48:01.350290 I | error contacting APIServer: Get https://100.64.0.1:443/api/v1/nodes: dial tcp 100.64.0.1:443: getsockopt: connection refused; trying with blank env vars
2016-10-05 10:48:01.351122 I | error contacting APIServer: Get http://localhost:8080/api: dial tcp [::1]:8080: getsockopt: connection refused
Failed to get peers
What I am doing wrong? Where to go from there?
I ran in the same issue too. It seems weaver wants to connect to the Kubernetes Cluster IP address, which is virtual. Just run this to find the cluster ip:
kubectl get svc. It should give you something like this:
$ kubectl get svc
NAME CLUSTER-IP EXTERNAL-IP PORT(S) AGE
kubernetes 100.64.0.1 <none> 443/TCP 2d
Weaver picks up this IP and tries to connect to it, but worker nodes does not know anything about it. Simple route will solve this issue. On all your worker nodes, execute:
route add 100.64.0.1 gw <your real master IP>
this happens with a single node setup, too. I tried several things like reapplying the configuration and recreation, but the most stable way at the moment is to perform a full tear down (as described in docs) and put the cluster up again.
I use these scripts for relaunching the cluster:
down.sh
#!/bin/bash
systemctl stop kubelet;
docker rm -f -v $(docker ps -q);
find /var/lib/kubelet | xargs -n 1 findmnt -n -t tmpfs -o TARGET -T | uniq | xargs -r umount -v;
rm -r -f /etc/kubernetes /var/lib/kubelet /var/lib/etcd;
up.sh
#!/bin/bash
systemctl start kubelet
kubeadm init
# kubectl taint nodes --all dedicated- # single node!
kubectl create -f https://git.io/weave-kube
edit: I would also give other Pod networks a try, like Calico, if this is a weave related issue
The most common causes for this may be:
- presence of a firewall (e.g. firewalld on CentOS)
- network configuration (e.g. default NAT interface on VirtualBox)
Currently kubeadm is still alpha, and this is one of the issues that has already been reported by many of the alpha testers. We are looking into fixing this by documenting the most common problems, such documentation is going to be ready closer to beta version.
Right there exists a VirtualBox+Vargant+Ansible for Ubunutu and CentOS reference implementation that provides solutions for firewall, SELinux and VirtualBox NAT issues.
/usr/local/bin/weave reset
was the fix for me - Hope its useful - and yes make sure selinux is set to disabled
and firewalld is not running (on redhat / centos) releases
kube-system weave-net-2vlvj 2/2 Running 3 11d
kube-system weave-net-42k6p 1/2 Running 3 11d
kube-system weave-net-wvsk5 2/2 Running 3 11d