Weavescope on Microk8s doesn't recognize containers - kubernetes

I'm running a Microk8s single-node cluster and just installed Weavescope, however it doesn't recognize any containers running. I can see my pods and services though fine, however each pod simply states "0 containers" underneath.
logs from the weavescope agent and app pods indicate that something is very wrong, but I'm not adept enough with Kubernetes to know how to deal with the errors.
Logs from Weavescope agent:
microk8s.kubectl logs -n weave weave-scope-cluster-agent-7944c858c9-bszjw
time="2020-05-23T14:56:10Z" level=info msg="publishing to: weave-scope-app.weave.svc.cluster.local:80"
<probe> INFO: 2020/05/23 14:56:10.378586 Basic authentication disabled
<probe> INFO: 2020/05/23 14:56:10.439179 command line args: --mode=probe --probe-only=true --probe.http.listen=:4041 --probe.kubernetes.role=cluster --probe.publish.interval=4.5s --probe.spy.interval=2s weave-scope-app.weave.svc.cluster.local:80
<probe> INFO: 2020/05/23 14:56:10.439215 probe starting, version 1.13.1, ID 6336ff46bcd86913
<probe> ERRO: 2020/05/23 14:56:10.439261 Error getting docker bridge ip: route ip+net: no such network interface
<probe> INFO: 2020/05/23 14:56:10.439487 kubernetes: targeting api server https://10.152.183.1:443
<probe> ERRO: 2020/05/23 14:56:10.440206 plugins: problem loading: no such file or directory
<probe> INFO: 2020/05/23 14:56:10.444345 Profiling data being exported to :4041
<probe> INFO: 2020/05/23 14:56:10.444355 go tool pprof http://:4041/debug/pprof/{profile,heap,block}
<probe> WARN: 2020/05/23 14:56:10.444505 Error collecting weave status, backing off 10s: Get http://127.0.0.1:6784/report: dial tcp 127.0.0.1:6784: connect: connection refused. If you are not running Weave Net, you may wish to suppress this warning by launching scope with the `--weave=false` option.
<probe> INFO: 2020/05/23 14:56:10.506596 volumesnapshotdatas are not supported by this Kubernetes version
<probe> INFO: 2020/05/23 14:56:10.506950 volumesnapshots are not supported by this Kubernetes version
<probe> INFO: 2020/05/23 14:56:11.559811 Control connection to weave-scope-app.weave.svc.cluster.local starting
<probe> INFO: 2020/05/23 14:56:14.948382 Publish loop for weave-scope-app.weave.svc.cluster.local starting
<probe> WARN: 2020/05/23 14:56:20.447578 Error collecting weave status, backing off 20s: Get http://127.0.0.1:6784/report: dial tcp 127.0.0.1:6784: connect: connection refused. If you are not running Weave Net, you may wish to suppress this warning by launching scope with the `--weave=false` option.
<probe> WARN: 2020/05/23 14:56:40.451421 Error collecting weave status, backing off 40s: Get http://127.0.0.1:6784/report: dial tcp 127.0.0.1:6784: connect: connection refused. If you are not running Weave Net, you may wish to suppress this warning by launching scope with the `--weave=false` option.
<probe> INFO: 2020/05/23 15:19:12.825869 Pipe pipe-7287306037502507515 connection to weave-scope-app.weave.svc.cluster.local starting
<probe> INFO: 2020/05/23 15:19:16.509232 Pipe pipe-7287306037502507515 connection to weave-scope-app.weave.svc.cluster.local exiting
logs from Weavescope app:
microk8s.kubectl logs -n weave weave-scope-app-bc7444d59-csxjd
<app> INFO: 2020/05/23 14:56:11.221084 app starting, version 1.13.1, ID 5e3953d1209f7147
<app> INFO: 2020/05/23 14:56:11.221114 command line args: --mode=app
<app> INFO: 2020/05/23 14:56:11.275231 Basic authentication disabled
<app> INFO: 2020/05/23 14:56:11.290717 listening on :4040
<app> WARN: 2020/05/23 14:56:11.340182 Error updating weaveDNS, backing off 20s: Error running weave ps: exit status 1: "Link not found\n". If you are not running Weave Net, you may wish to suppress this warning by launching scope with the `--weave=false` option.
<app> WARN: 2020/05/23 14:56:31.457702 Error updating weaveDNS, backing off 40s: Error running weave ps: exit status 1: "Link not found\n". If you are not running Weave Net, you may wish to suppress this warning by launching scope with the `--weave=false` option.
<app> ERRO: 2020/05/23 15:19:16.504169 Error copying to pipe pipe-7287306037502507515 (1) websocket: io: read/write on closed pipe

Related

facing problem while running kubeadm init command

[WARNING Firewalld]: firewalld is active, please ensure ports [6443 10250] are open or your cluster may not function correctly
error execution phase preflight: [preflight] Some fatal errors occurred:
[ERROR CRI]: container runtime is not running: output: time="2023-02-19T02:25:52Z" level=fatal msg="validate service connection: CRI v1 runtime API is not implemented for endpoint "unix:///var/run/containerd/containerd.sock": rpc error: code = Unimplemented desc = unknown service runtime.v1.RuntimeService"
, error: exit status 1
[preflight] If you know what you are doing, you can make a check non-fatal with --ignore-preflight-errors=...
To see the stack trace of this error execute with --v=5 or higher
I Enabled the port 6443 and 10250 in Security group.
Also tried the commands-
rm /etc/containerd/config.toml
systemctl restart containerd
kubeadm init
The containerd is not available in etc directory.

Unable to resolve address for Kubernetes service

I have installed Kafka single-node using Confluent. There is an error in Kafka pod :
[WARN] 2022-04-26 14:29:47,008 [main-SendThread(zookeeper.confluent.svc.cluster.local:2181)] org.apache.zookeeper.ClientCnxn run - Session 0x0 for sever zookeeper.confluent.svc.cluster.local:2181, Closing socket connection. Attempting reconnect except it is a SessionExpiredException.
java.lang.IllegalArgumentException: Unable to canonicalize address zookeeper.confluent.svc.cluster.local:2181 because it's not resolvable
at org.apache.zookeeper.SaslServerPrincipal.getServerPrincipal(SaslServerPrincipal.java:78)
at org.apache.zookeeper.SaslServerPrincipal.getServerPrincipal(SaslServerPrincipal.java:41)
at org.apache.zookeeper.ClientCnxn$SendThread.startConnect(ClientCnxn.java:1161)
at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1210)
[INFO] 2022-04-26 14:29:47,273 [main] kafka.zookeeper.ZooKeeperClient info - [ZooKeeperClient Kafka server] Closing.
[ERROR] 2022-04-26 14:29:48,112 [main-SendThread(zookeeper.confluent.svc.cluster.local:2181)] org.apache.zookeeper.client.StaticHostProvider resolve - Unable to resolve address: zookeeper.confluent.svc.cluster.local:2181
java.net.UnknownHostException: zookeeper.confluent.svc.cluster.local
Error messages :
Unable to canonicalize address zookeeper.confluent.svc.cluster.local:2181 because it's not resolvable
Unable to resolve address: zookeeper.confluent.svc.cluster.local:2181
I checked my zookeper ... it's good and works without a problem. Also, check DNS using dnsutils :
$ kubectl -n default exec -it dnsutils -- nslookup zookeeper.confluent.svc.cluster.local
Server: 10.96.0.10
Address: 10.96.0.10#53
Name: zookeeper.confluent.svc.cluster.local
Address: 192.168.0.111
What can I do? Is this a k8s related problem?
This happens with me, but on docker-compose project
finally, I found that no space left on server causes this issue.
clean some spaces and it worked.

kubelet won't start after kuberntes/manifest update

This is sort of strange behavior in our K8 cluster.
When we try to deploy a new version of our applications we get:
Failed to create pod sandbox: rpc error: code = Unknown desc = failed to set up sandbox container "<container-id>" network for pod "application-6647b7cbdb-4tp2v": networkPlugin cni failed to set up pod "application-6647b7cbdb-4tp2v_default" network: Get "https://[10.233.0.1]:443/api/v1/namespaces/default": dial tcp 10.233.0.1:443: connect: connection refused
I used kubectl get cs and found controller and scheduler in Unhealthy state.
As describer here updated /etc/kubernetes/manifests/kube-scheduler.yaml and
/etc/kubernetes/manifests/kube-controller-manager.yaml by commenting --port=0
When I checked systemctl status kubelet it was working.
Active: active (running) since Mon 2020-10-26 13:18:46 +0530; 1 years 0 months ago
I had restarted kubelet service and controller and scheduler were shown healthy.
But systemctl status kubelet shows (soon after restart kubelet it showed running state)
Active: activating (auto-restart) (Result: exit-code) since Thu 2021-11-11 10:50:49 +0530; 3s ago<br>
Docs: https://github.com/GoogleCloudPlatform/kubernetes<br> Process: 21234 ExecStart=/usr/bin/kubelet $KUBELET_KUBECONFIG_ARGS $KUBELET_CONFIG_ARGS $KUBELET_KUBEADM_ARGS $KUBELET
Tried adding Environment="KUBELET_SYSTEM_PODS_ARGS=--pod-manifest-path=/etc/kubernetes/manifests --allow-privileged=true --fail-swap-on=false" to /etc/systemd/system/kubelet.service.d/10-kubeadm.conf as described here, but still its not working properly.
Also removed --port=0 comment in above mentioned manifests and tried restarting,still same result.
Edit: This issue was due to kubelet certificate expired and fixed following these steps. If someone faces this issue, make sure /var/lib/kubelet/pki/kubelet-client-current.pem certificate and key values are base64 encoded when placing on /etc/kubernetes/kubelet.conf
Many other suggested kubeadm init again. But this cluster was created using kubespray no manually added nodes.
We have baremetal k8 running on Ubuntu 18.04.
K8: v1.18.8
We would like to know any debugging and fixing suggestions.
PS:
When we try to telnet 10.233.0.1 443 from any node, first attempt fails and second attempt success.
Edit: Found this in kubelet service logs
Nov 10 17:35:05 node1 kubelet[1951]: W1110 17:35:05.380982 1951 docker_sandbox.go:402] failed to read pod IP from plugin/docker: networkPlugin cni failed on the status hook for pod "app-7b54557dd4-bzjd9_default": unexpected command output nsenter: cannot open /proc/12311/ns/net: No such file or directory
Posting comment as the community wiki answer for better visibility
This issue was due to kubelet certificate expired and fixed following these steps. If someone faces this issue, make sure /var/lib/kubelet/pki/kubelet-client-current.pem certificate and key values are base64 encoded when placing on /etc/kubernetes/kubelet.conf

OC Cluster never goes up Error: timed out waiting for the condition

When ever i try to get the cluster up using "oc cluster up"
Below is the error I get. Kindly help on how to fix this
[mano#mano ~]$ oc cluster up
Getting a Docker client ...
Checking if image openshift/origin-control-plane:v3.11 is available ...
Checking type of volume mount ...
Determining server IP ...
Checking if OpenShift is already running ...
Checking for supported Docker version (=>1.22) ...
Checking if insecured registry is configured properly in Docker ...
Checking if required ports are available ...
Checking if OpenShift client is configured properly ...
Checking if image openshift/origin-control-plane:v3.11 is available ...
Starting OpenShift using openshift/origin-control-plane:v3.11 ...
I0923 13:40:32.364326 15396 config.go:40] Running "create-master-config"
I0923 13:40:59.938492 15396 config.go:46] Running "create-node-config"
I0923 13:41:10.721711 15396 flags.go:30] Running "create-kubelet-flags"
I0923 13:41:18.241285 15396 run_kubelet.go:49] Running "start-kubelet"
I0923 13:41:23.016238 15396 run_self_hosted.go:181] Waiting for the kube-apiserver to be ready ...
E0923 13:46:23.023479 15396 run_self_hosted.go:571] API server error: Get https://127.0.0.1:8443/healthz?timeout=32s: dial tcp 127.0.0.1:8443: connect: connection refused ()
Error: timed out waiting for the condition
OC version
[mano#mano` ~]$ oc version
oc v3.11.0+0cbc58b
kubernetes v1.11.0+d4cacc0
features: Basic-Auth GSSAPI Kerberos SPNEGO
followed the article :https://github.com/openshift/origin/blob/release-3.11/docs/cluster_up_down.md
yet no luck

Openshift origin oc cluster up fails

Using the last version of openshift origin v3.10.0 i run the following command on centos VM:
oc cluster up --public-hostname=192.168.56.15 --http-proxy=http://proxy.ip:port --https-proxy=https://proxy.ip:port --no-proxy=[192.168.56.0/24,172.0.0. 0/8,192.168.56.15,192.168.56.15,localhost]
In result i get:
Getting a Docker client ...
Checking if image openshift/origin-control-plane:v3.10 is available ...
Checking type of volume mount ...
Determining server IP ...
Checking if OpenShift is already running ...
Checking for supported Docker version (=>1.22) ...
Checking if insecured registry is configured properly in Docker ...
Checking if required ports are available ...
Checking if OpenShift client is configured properly ...
Checking if image openshift/origin-control-plane:v3.10 is available ...
Starting OpenShift using openshift/origin-control-plane:v3.10 ...
I1003 10:58:00.643521 3446 flags.go:30] Running "create-kubelet-flags"
I1003 10:58:01.314805 3446 run_kubelet.go:48] Running "start-kubelet"
I1003 10:58:01.549316 3446 run_self_hosted.go:172] Waiting for the kube-apiserver to be ready ...
E1003 11:03:01.559324 3446 run_self_hosted.go:542] API server error: Get https://127.0.0.1:8443/healthz?timeout=32s: dial tcp 127.0.0.1:8443: getsockopt: connection refused ()
Error: timed out waiting for the condition
And while following the log of docker i notice the following error:
E1003 github.com/openshift/origin/vendor/k8s.io/kubernetes/pkg/kubelet/kubelet.go:452: Failed to list *v1.Service: Get https://localhost:8443/api/v1/services?limit=500&resourceVersion=0: dial tcp [::1]:8443: getsockopt: connection refused
Which is a normal behavior since netstat shows only one port opened:
tcp6 0 0 :::10250 :::* LISTEN 3894/hyperkube
PS:
As you can see i use proxy.
I tried to use a local resolving, using dns instead of ip# and since i don't have a DNS server i used /etc/hosts, same problem.