ceph rook deployment issue found at mount pvc - kubernetes

I am warren . try to setup ceph by rook in my k8s environment . i followed the offical document
https://rook.io/docs/rook/v1.4/ceph-quickstart.html. almaost everythings looks well during the ceph setup. I also verified it by
ceph status
cluster:
id: 356efdf1-a1a7-4365-9ee6-b65ecf8481f9
health: HEALTH_OK
But failed at examples https://rook.io/docs/rook/v1.4/ceph-block.html, try to use block storage in the k8s environment. my k8s env is v1.18.2.
after deploy mysql and workpress. found error at pod . like below. I also checked the pv and pvc. all of them created success and bounded. so I thinks something error about mount compatibility. please help.
-----------------------------------------------------
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Warning FailedScheduling <unknown> default-scheduler running "VolumeBinding" filter plugin for pod "wordpress-mysql-764fc64f97-qwtjd": pod has unbound immediate PersistentVolumeClaims
Warning FailedScheduling <unknown> default-scheduler running "VolumeBinding" filter plugin for pod "wordpress-mysql-764fc64f97-qwtjd": pod has unbound immediate PersistentVolumeClaims
Normal Scheduled <unknown> default-scheduler Successfully assigned default/wordpress-mysql-764fc64f97-qwtjd to master1
Normal SuccessfulAttachVolume 7m14s attachdetach-controller AttachVolume.Attach succeeded for volume "pvc-dc8567bb-c2e3-44a4-a56a-c74616059db4"
Warning FailedMount 5m11s kubelet, master1 Unable to attach or mount volumes: unmounted volumes=[mysql-persistent-storage], unattached volumes=[default-token-czg9j mysql-persistent-storage]: timed out waiting for the condition
Warning FailedMount 40s (x2 over 2m54s) kubelet, master1 Unable to attach or mount volumes: unmounted volumes=[mysql-persistent-storage], unattached volumes=[mysql-persistent-storage default-token-czg9j]: timed out waiting for the condition
Warning FailedMount 6s (x4 over 6m6s) kubelet, master1 MountVolume.MountDevice failed for volume "pvc-dc8567bb-c2e3-44a4-a56a-c74616059db4" : rpc error: code = Internal desc = rbd: map failed with error an error (exit status 110) occurred while running rbd args: [--id csi-rbd-node -m 10.109.63.94:6789,10.96.135.241:6789,10.110.131.193:6789 --keyfile=***stripped*** map replicapool/csi-vol-5ccc546b-0914-11eb-9135-62dece6c0d98 --device-type krbd], rbd error output: rbd: sysfs write failed
-------------------------------------------------

Related

CSI CEPH FS cannot mount success on K8s

Could anyone help me? I cannot mount successfully.
I use this csi plugin to mount cephfs into pod: https://github.com/ceph/ceph-csi
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Warning FailedMount 11m (x26 over 169m) kubelet, sprl-pbkh-kubenode03 Unable to attach or mount volumes: unmounted volumes=[cephfs-pvc], unattached volumes=[default-token-bms74 cephfs-pvc]: timed out waiting for the condition
Warning FailedMount 6m53s (x47 over 163m) kubelet, sprl-pbkh-kubenode03 Unable to attach or mount volumes: unmounted volumes=[cephfs-pvc], unattached volumes=[cephfs-pvc default-token-bms74]: timed out waiting for the condition
Warning FailedMount 58s (x92 over 172m) kubelet, sprl-pbkh-kubenode03 MountVolume.MountDevice failed for volume "pvc-c266c4e3-9ea2-4b26-9759-b73a5ba3516a" : rpc error: code = Internal desc = an error (exit status 1) occurred while running nsenter args: [--net=/ -- ceph-fuse /var/lib/kubelet/plugins/kubernetes.io/csi/pv/pvc-c266c4e3-9ea2-4b26-9759-b73a5ba3516a/globalmount -m 172.18.4.26,172.18.4.31,172.18.4.32 -c /etc/ceph/ceph.conf -n client.admin --keyfile=***stripped*** -r /volumes/csi/csi-vol-83e27006-59a6-11ed-97f7-7e2180fc1e5e/66900fdf-648b-49ba-ac19-cf3f32cb874e -o nonempty --client_mds_namespace=cephfs] stderr: nsenter: reassociate to namespace 'ns/net' failed: Invalid argument
I have used this https://github.com/ceph/ceph-csi
Creating PVC and Storage Class.
Then use pod to mount PVC but cannot mount success.
I confirm I can mount successfully from my local machine using Ceph-Fuse

Centos 8 microk8s Readiness probe failed: HTTP probe failed with statuscode: 503

I have installed microk8s on my centos 8 operating system.
kube-system coredns-7f9c69c78c-lxm7c 0/1 Running 1 18m
kube-system calico-node-thhp8 1/1 Running 1 68m
kube-system calico-kube-controllers-f7868dd95-dpsnl 0/1 CrashLoopBackOff 23 68m
When I do microk8s enable dns, coredns or calico-kube-controllers cannot be started as above.
Describe the pod for coredns :
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal Scheduled 14m default-scheduler Successfully assigned kube-system/coredns-7f9c69c78c-lxm7c to localhost.localdomain
Normal Pulled 14m kubelet Container image "coredns/coredns:1.8.0" already present on machine
Normal Created 14m kubelet Created container coredns
Normal Started 14m kubelet Started container coredns
Warning Unhealthy 11m (x22 over 14m) kubelet Readiness probe failed: HTTP probe failed with statuscode: 503
Normal SandboxChanged 2m8s kubelet Pod sandbox changed, it will be killed and re-created.
Normal Pulled 2m7s kubelet Container image "coredns/coredns:1.8.0" already present on machine
Normal Created 2m7s kubelet Created container coredns
Normal Started 2m6s kubelet Started container coredns
Warning Unhealthy 2m6s kubelet Readiness probe failed: Get "http://10.1.102.132:8181/ready": dial tcp 10.1.102.132:8181: connect: connection refused
Warning Unhealthy 9s (x12 over 119s) kubelet Readiness probe failed: HTTP probe failed with statuscode: 503
Describe the pod for calico-kube-controllers :
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Warning FailedScheduling 73m default-scheduler no nodes available to schedule pods
Warning FailedScheduling 73m (x1 over 73m) default-scheduler no nodes available to schedule pods
Warning FailedScheduling 72m (x1 over 72m) default-scheduler 0/1 nodes are available: 1 node(s) had taint {node.kubernetes.io/not-ready: }, that the pod didn't tolerate.
Normal Scheduled 72m default-scheduler Successfully assigned kube-system/calico-kube-controllers-f7868dd95-dpsnl to localhost.localdomain
Warning FailedCreatePodSandBox 72m kubelet Failed to create pod sandbox: rpc error: code = Unknown desc = failed to setup network for sandbox "f3ea36b003b0c9142ae63fee31531f9102e40ab837f4d795d1efb5c85af223ec": error getting ClusterInformation: resource does not exist: ClusterInformation(default) with error: clusterinformations.crd.projectcalico.org "default" not found
Warning FailedCreatePodSandBox 71m kubelet Failed to create pod sandbox: rpc error: code = Unknown desc = failed to setup network for sandbox "a1c405cdcebe79c586badcc8da47700247751a50ef9a1403e95fc4995485fba0": error getting ClusterInformation: resource does not exist: ClusterInformation(default) with error: clusterinformations.crd.projectcalico.org "default" not found
Warning FailedCreatePodSandBox 71m kubelet Failed to create pod sandbox: rpc error: code = Unknown desc = failed to setup network for sandbox "4adb07610eef0d7a618105abf72a114e486c373a02d5d1b204da2bd35268dd1b": error getting ClusterInformation: resource does not exist: ClusterInformation(default) with error: clusterinformations.crd.projectcalico.org "default" not found
Warning FailedCreatePodSandBox 71m kubelet Failed to create pod sandbox: rpc error: code = Unknown desc = failed to setup network for sandbox "96aac009175973ac4c20034824db3443b3ab184cfcd1ed23786e539fb6147796": error getting ClusterInformation: resource does not exist: ClusterInformation(default) with error: clusterinformations.crd.projectcalico.org "default" not found
Warning FailedCreatePodSandBox 71m kubelet Failed to create pod sandbox: rpc error: code = Unknown desc = failed to setup network for sandbox "79639a18edcffddbdb93492157af43bb6c1f1a9ac2af1b3fbbac58335737d5dc": error getting ClusterInformation: resource does not exist: ClusterInformation(default) with error: clusterinformations.crd.projectcalico.org "default" not found
Warning FailedCreatePodSandBox 70m kubelet Failed to create pod sandbox: rpc error: code = Unknown desc = failed to setup network for sandbox "3264f006447297583a37d8cc87ffe01311deaf2a31bf25867b3b18c83db2167d": error getting ClusterInformation: resource does not exist: ClusterInformation(default) with error: clusterinformations.crd.projectcalico.org "default" not found
Warning FailedCreatePodSandBox 70m kubelet Failed to create pod sandbox: rpc error: code = Unknown desc = failed to setup network for sandbox "5c5cf6509bfcf515ad12bc51451e4c385e5242c4f7bb593779d207abf9c906a4": error getting ClusterInformation: resource does not exist: ClusterInformation(default) with error: clusterinformations.crd.projectcalico.org "default" not found
Normal Pulling 70m kubelet Pulling image "calico/kube-controllers:v3.13.2"
Normal Pulled 69m kubelet Successfully pulled image "calico/kube-controllers:v3.13.2" in 50.744281789s
Normal Created 69m kubelet Created container calico-kube-controllers
Normal Started 69m kubelet Started container calico-kube-controllers
Warning Unhealthy 69m (x2 over 69m) kubelet Readiness probe failed: Failed to read status file status.json: open status.json: no such file or directory
Warning MissingClusterDNS 37m (x185 over 72m) kubelet pod: "calico-kube-controllers-f7868dd95-dpsnl_kube-system(d8c3ee40-7d3b-4a84-9398-19ec8a6d9082)". kubelet does not have ClusterDNS IP configured and cannot create Pod using "ClusterFirst" policy. Falling back to "Default" policy.
Warning Unhealthy 31m (x6 over 32m) kubelet Readiness probe failed: Failed to read status file status.json: open status.json: no such file or directory
Normal Pulled 30m (x4 over 32m) kubelet Container image "calico/kube-controllers:v3.13.2" already present on machine
Normal Created 30m (x4 over 32m) kubelet Created container calico-kube-controllers
Normal Started 30m (x4 over 32m) kubelet Started container calico-kube-controllers
Warning BackOff 22m (x42 over 32m) kubelet Back-off restarting failed container
Normal SandboxChanged 10m kubelet Pod sandbox changed, it will be killed and re-created.
Warning Unhealthy 9m36s (x6 over 10m) kubelet Readiness probe failed: Failed to read status file status.json: open status.json: no such file or directory
Normal Pulled 8m51s (x4 over 10m) kubelet Container image "calico/kube-controllers:v3.13.2" already present on machine
Normal Created 8m51s (x4 over 10m) kubelet Created container calico-kube-controllers
Normal Started 8m51s (x4 over 10m) kubelet Started container calico-kube-controllers
Warning BackOff 42s (x42 over 10m) kubelet Back-off restarting failed container
I cannot start my microk8s services. I don't encounter these on my Ubuntu server. What can I do in these error situations that I encounter for my Centos 8 server?
Have you tried updating the microk8s version?

Kubernetes - how to list conditions that are not met

I upgraded k8s version on GCP to 1.21.6-gke.1500. Some of my pods are stuck in the status "ContainerCreating". When I describe them, I see these errors:
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal Scheduled 12m default-scheduler Successfully assigned gamma/xxxx-d58747f46-j7fzs to gke-us-east4-gke-us-east4-xxxx--6c23c312-p5q2
Warning FailedMount 10m kubelet Unable to attach or mount volumes: unmounted volumes=[nfs-data], unattached volumes=[my-license kube-api-access-b32js nfs-data]: timed out waiting for the condition
Warning FailedMount 3m56s (x2 over 6m13s) kubelet Unable to attach or mount volumes: unmounted volumes=[nfs-data], unattached volumes=[nfs-data my-license kube-api-access-b32js]: timed out waiting for the condition
Warning FailedMount 100s (x2 over 8m31s) kubelet Unable to attach or mount volumes: unmounted volumes=[nfs-data], unattached volumes=[kube-api-access-b32js nfs-data my-license]: timed out waiting for the condition
How to list conditions which are not met and pods are waiting for them?
Try following commands:
kubectl describe pod <name>
kubectl get nodes -o wide
kubectl get volumeattachments
kubectl get componentstatus
You can also check your GKE Logs.

Error ICP 3.1.1 Grafana Prometheus Kubernetes Status Pods Always 'Init'

I Was Complete Installing ICP with VA. Using 1 Master, 1 Proxy, 1 Management, 1 VA, and 3 Workers with GlusterFS Inside.
This List Kubernetes Pods Not Running
Storage - PersistentVolume GlusterFS on ICP
This Describe Kubernetes Pods Error Evenet
custom-metrics-adapter
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal Scheduled 17m default-scheduler Successfully assigned kube-system/custom-metrics-adapter-5d5b694df7-cggz8 to 192.168.10.126
Normal Pulled 17m kubelet, 192.168.10.126 Container image "swgcluster.icp:8500/ibmcom/curl:4.0.0" already present on machine
Normal Created 17m kubelet, 192.168.10.126 Created container
Normal Started 17m kubelet, 192.168.10.126 Started container
monitoring-grafana
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal Scheduled 18m default-scheduler Successfully assigned kube-system/monitoring-grafana-799d7fcf97-sj64j to 192.168.10.126
Warning FailedMount 1m (x8 over 16m) kubelet, 192.168.10.126 (combined from similar events): MountVolume.SetUp failed for volume "pvc-251f69e3-fd60-11e8-9779-000c2914ff99" : mount failed: mount failed: exit status 32
Mounting command: systemd-run
Mounting arguments: --description=Kubernetes transient mount for /var/lib/kubelet/pods/e2c85434-fd67-11e8-822b-000c2914ff99/volumes/kubernetes.io~glusterfs/pvc-251f69e3-fd60-11e8-9779-000c2914ff99 --scope -- mount -t glusterfs -o log-file=/var/lib/kubelet/plugins/kubernetes.io/glusterfs/pvc-251f69e3-fd60-11e8-9779-000c2914ff99/monitoring-grafana-799d7fcf97-sj64j-glusterfs.log,backup-volfile-servers=192.168.10.115:192.168.10.116:192.168.10.119,auto_unmount,log-level=ERROR 192.168.10.115:vol_946f98c8a92ce2930acd3181d803943c /var/lib/kubelet/pods/e2c85434-fd67-11e8-822b-000c2914ff99/volumes/kubernetes.io~glusterfs/pvc-251f69e3-fd60-11e8-9779-000c2914ff99
Output: Running scope as unit run-r6ba2425d0e7f437d922dbe0830cd5a97.scope.
mount: unknown filesystem type 'glusterfs'
the following error information was pulled from the glusterfs log to help diagnose this issue: could not open log file for pod monitoring-grafana-799d7fcf97-sj64j
Warning FailedMount 50s (x8 over 16m) kubelet, 192.168.10.126 Unable to mount volumes for pod "monitoring-grafana-799d7fcf97-sj64j_kube-system(e2c85434-fd67-11e8-822b-000c2914ff99)": timeout expired waiting for volumes to attach or mount for pod "kube-system"/"monitoring-grafana-799d7fcf97-sj64j". list of unmounted volumes=[grafana-storage]. list of unattached volumes=[grafana-storage config-volume dashboard-volume dashboard-config ds-job-config router-config monitoring-ca-certs monitoring-certs router-entry default-token-f6d9q]
monitoring-prometheus
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal Scheduled 19m default-scheduler Successfully assigned kube-system/monitoring-prometheus-85546d8575-jr89h to 192.168.10.126
Warning FailedMount 4m (x6 over 17m) kubelet, 192.168.10.126 Unable to mount volumes for pod "monitoring-prometheus-85546d8575-jr89h_kube-system(e2ca91a8-fd67-11e8-822b-000c2914ff99)": timeout expired waiting for volumes to attach or mount for pod "kube-system"/"monitoring-prometheus-85546d8575-jr89h". list of unmounted volumes=[storage-volume]. list of unattached volumes=[config-volume rules-volume etcd-certs storage-volume router-config monitoring-ca-certs monitoring-certs monitoring-client-certs router-entry lua-scripts-config-config default-token-f6d9q]
Warning FailedMount 55s (x11 over 17m) kubelet, 192.168.10.126 (combined from similar events): MountVolume.SetUp failed for volume "pvc-252001ed-fd60-11e8-9779-000c2914ff99" : mount failed: mount failed: exit status 32
Mounting command: systemd-run
Mounting arguments: --description=Kubernetes transient mount for /var/lib/kubelet/pods/e2ca91a8-fd67-11e8-822b-000c2914ff99/volumes/kubernetes.io~glusterfs/pvc-252001ed-fd60-11e8-9779-000c2914ff99 --scope -- mount -t glusterfs -o auto_unmount,log-level=ERROR,log-file=/var/lib/kubelet/plugins/kubernetes.io/glusterfs/pvc-252001ed-fd60-11e8-9779-000c2914ff99/monitoring-prometheus-85546d8575-jr89h-glusterfs.log,backup-volfile-servers=192.168.10.115:192.168.10.116:192.168.10.119 192.168.10.115:vol_f101b55d8b1dc3021ec7689713a74e8c /var/lib/kubelet/pods/e2ca91a8-fd67-11e8-822b-000c2914ff99/volumes/kubernetes.io~glusterfs/pvc-252001ed-fd60-11e8-9779-000c2914ff99
Output: Running scope as unit run-r638272b55bca4869b271e8e4b1ef45cf.scope.
mount: unknown filesystem type 'glusterfs'
the following error information was pulled from the glusterfs log to help diagnose this issue: could not open log file for pod monitoring-prometheus-85546d8575-jr89h
monitoring-prometheus-alertmanager
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal Scheduled 20m default-scheduler Successfully assigned kube-system/monitoring-prometheus-alertmanager-65445b66bd-6bfpn to 192.168.10.126
Warning FailedMount 1m (x9 over 18m) kubelet, 192.168.10.126 (combined from similar events): MountVolume.SetUp failed for volume "pvc-251ed00f-fd60-11e8-9779-000c2914ff99" : mount failed: mount failed: exit status 32
Mounting command: systemd-run
Mounting arguments: --description=Kubernetes transient mount for /var/lib/kubelet/pods/e2cbe5e7-fd67-11e8-822b-000c2914ff99/volumes/kubernetes.io~glusterfs/pvc-251ed00f-fd60-11e8-9779-000c2914ff99 --scope -- mount -t glusterfs -o backup-volfile-servers=192.168.10.115:192.168.10.116:192.168.10.119,auto_unmount,log-level=ERROR,log-file=/var/lib/kubelet/plugins/kubernetes.io/glusterfs/pvc-251ed00f-fd60-11e8-9779-000c2914ff99/monitoring-prometheus-alertmanager-65445b66bd-6bfpn-glusterfs.log 192.168.10.115:vol_7766e36a77cbd2c0afe3bd18626bd2c4 /var/lib/kubelet/pods/e2cbe5e7-fd67-11e8-822b-000c2914ff99/volumes/kubernetes.io~glusterfs/pvc-251ed00f-fd60-11e8-9779-000c2914ff99
Output: Running scope as unit run-r35994e15064e48e2a36f69a88009aa5d.scope.
mount: unknown filesystem type 'glusterfs'
the following error information was pulled from the glusterfs log to help diagnose this issue: could not open log file for pod monitoring-prometheus-alertmanager-65445b66bd-6bfpn
Warning FailedMount 23s (x9 over 18m) kubelet, 192.168.10.126 Unable to mount volumes for pod "monitoring-prometheus-alertmanager-65445b66bd-6bfpn_kube-system(e2cbe5e7-fd67-11e8-822b-000c2914ff99)": timeout expired waiting for volumes to attach or mount for pod "kube-system"/"monitoring-prometheus-alertmanager-65445b66bd-6bfpn". list of unmounted volumes=[storage-volume]. list of unattached volumes=[config-volume storage-volume router-config monitoring-ca-certs monitoring-certs router-entry default-token-f6d9q]
Just Got Resolve Problem this Issues, After Reinstall the ICP (IBM Cloud Private).
And I Checking Few Possibilities Error, then Got on Few Nodes Not Completely Installing the GlusterFS Client.
I Checking Commands 'GlusterFS Client on ALL Nodes' : (Using Ubuntu for the OS)
sudo apt-get install glusterfs-client -y

Kubernetes pod deployment error FailedSync| Error syncing pod

Env:
Vbox on a windows 10 desktop machine
Two ubuntu VMs, one VM is master and the other one is k8s(1.7) worker.
I can see two nodes are "ready" when get nodes. But even deploy a very simple nginx pod, I got the error message from pod describe
"norm | SandboxChanged |Pod sandbox changed, it will be killed and re-created." and "warning | FailedSync| Error syncing pod".
But if I run the docker container directly on the worker, the container can be up and running. Anyone has a suggestion what I can check for?
k8s-master#k8smaster-VirtualBox:~$ **kubectl get pods** NAME
READY STATUS RESTARTS AGE
movie-server-1517284798-lbb01 0/1 CrashLoopBackOff 6
16m
k8s-master#k8smaster-VirtualBox:~$ **kubectl describe pod
movie-server-1517284798-lbb01**
--- clip --- kubelet, master-virtualbox spec.containers{movie-server} Warning FailedError: failed to start
container "movie-server": Error response from daemon:
{"message":"cannot join network of a non running container:
3f59947dbd404ecf2f6dd0b65dd9dad8b25bf0c418aceb8cf666ad0761402b53"}
kubelet, master-virtualbox spec.containers{movie-server}
Warning BackOffBack-off restarting failed container
kubelet, master-virtualbox Normal
SandboxChanged Pod sandbox changed, it will be killed and
re-created.
kubelet, master-virtualbox spec.containers{movie-server} Normal
PulledContainer image "nancyfeng/movie-server:0.1.0" already present
on machine
kubelet, master-virtualbox spec.containers{movie-server}
Normal CreatedCreated container
kubelet, master-virtualbox
Warning FailedSync Error syncing pod
kubelet, master-virtualbox spec.containers{movie-server}
Warning FailedError: failed to start container "movie-server": Error
response from daemon: {"message":"cannot join network of a non running
container:
72ba77b25b6a3969e8921214f0ca73ffaab4c82d8a2852e3d1b1f3ac5dde6ce1"}
--- clip ---