Why Velero and minion backup restore not working for PGO cluster? - kubernetes

I have set up Minio and Velero backup for my k8s cluster. Everything works fine as I can take backups and I can see them in Minio. I have a PGO operator cluster hippo running with load balancer service. When I restore a backup via Velero, everything seems okay. It creates namespaces and all the deployments and pods in running state.
However I am not able to connect to my database via PGadmin. When I delete the pod it is not recreating it but shows an error of unbound PVC.
This is the output.
Type Reason Age From Message
---- ------ ---- ---- -------
Warning FailedScheduling 16m default-scheduler 0/3 nodes are available: pod has unbound immediate PersistentVolumeClaims. preemption: 0/3 nodes are available: 3 No preemption victims found for incoming pod..
Warning FailedScheduling 16m default-scheduler 0/3 nodes are available: pod has unbound immediate PersistentVolumeClaims. preemption: 0/3 nodes are available: 3 No preemption victims found for incoming pod..
master#masterk8s-virtual-machine:~/postgres-operator-examples-main$ kubectl get PV
error: the server doesn't have a resource type "PV"
master#masterk8s-virtual-machine:~/postgres-operator-examples-main$ kubectl get pv
NAME CAPACITY ACCESS MODES RECLAIM POLICY STATUS CLAIM STORAGECLASS REASON AGE
pvc-1ca9e092-4e84-4ca4-88e3-0050890ef101 5Gi RWO Delete Bound postgres-operator/hippo-s3-instanc e2-4bhf-pgdata openebs-hostpath 16m
pvc-2dd12937-a70e-40b4-b1ad-be1c9f7b39ec 5G RWO Delete Bound default/local-hostpath-pvc openebs-hostpath 6d9h
pvc-30af7f3b-7ce5-4e2a-8c68-5c701881293b 5Gi RWO Delete Bound postgres-operator/hippo-s3-instanc e2-xvhq-pgdata openebs-hostpath 16m
pvc-531c9ac7-938c-46b1-b4fa-3a7599f40038 5Gi RWO Delete Bound postgres-operator/hippo-instance2- p4ct-pgdata openebs-hostpath 7m32s
pvc-968d9794-e4ba-479c-9138-8fbd85422920 5Gi RWO Delete Bound postgres-operator/hippo-instance2- s6fs-pgdata openebs-hostpath 7m33s
pvc-987c1bd1-bf41-4180-91de-15bb5ead38ad 5Gi RWO Delete Bound postgres-operator/hippo-s3-instanc e2-c4rt-pgdata openebs-hostpath 16m
pvc-d4629dba-b172-47ea-ab01-12a9039be571 5Gi RWO Delete Bound postgres-operator/hippo-instance2- 29gh-pgdata openebs-hostpath 7m32s
pvc-e79d68c3-4e2f-4314-b83f-f96c306a9b38 5Gi RWO Delete Bound postgres-operator/hippo-repo2 openebs-hostpath 7m30s
master#masterk8s-virtual-machine:~/postgres-operator-examples-main$ kubectl get pvc -n postgres-operator NAME STATUS VOLUME CAPACITY ACCESS MODES STORAGECLASS AGE
hippo-instance2-29gh-pgdata Bound pvc-d4629dba-b172-47ea-ab01-12a9039be571 5Gi RWO openebs-hostpath 7m51s
hippo-instance2-p4ct-pgdata Bound pvc-531c9ac7-938c-46b1-b4fa-3a7599f40038 5Gi RWO openebs-hostpath 7m51s
hippo-instance2-s6fs-pgdata Bound pvc-968d9794-e4ba-479c-9138-8fbd85422920 5Gi RWO openebs-hostpath 7m51s
hippo-repo2 Bound pvc-e79d68c3-4e2f-4314-b83f-f96c306a9b38 5Gi RWO openebs-hostpath 7m51s
hippo-s3-instance2-4bhf-pgdata Bound pvc-1ca9e092-4e84-4ca4-88e3-0050890ef101 5Gi RWO openebs-hostpath 16m
hippo-s3-instance2-c4rt-pgdata Bound pvc-987c1bd1-bf41-4180-91de-15bb5ead38ad 5Gi RWO openebs-hostpath 16m
hippo-s3-instance2-xvhq-pgdata Bound pvc-30af7f3b-7ce5-4e2a-8c68-5c701881293b 5Gi RWO openebs-hostpath 16m
hippo-s3-repo1 Pending pgo 16m
master#masterk8s-virtual-machine:~/postgres-operator-examples-main$ kubectl get pods -n postgres-operator NAME READY STATUS RESTARTS AGE
hippo-backup-txk9-rrk4m 0/1 Completed 0 7m43s
hippo-instance2-29gh-0 4/4 Running 0 8m5s
hippo-instance2-p4ct-0 4/4 Running 0 8m5s
hippo-instance2-s6fs-0 4/4 Running 0 8m5s
hippo-repo-host-0 2/2 Running 0 8m5s
hippo-s3-instance2-c4rt-0 3/4 Running 0 16m
hippo-s3-repo-host-0 0/2 Pending 0 16m
pgo-7c867985c-kph6l 1/1 Running 0 16m
pgo-upgrade-69b5dfdc45-6qrs8 1/1 Running 0 16m
master#masterk8s-virtual-machine:~/postgres-operator-examples-main$ kubectl delete pods hippo-s3-repo-host-0 -n postgres-operator
pod "hippo-s3-repo-host-0" deleted
master#masterk8s-virtual-machine:~/postgres-operator-examples-main$ kubectl get pods -n postgres-operator NAME READY STATUS RESTARTS AGE
hippo-backup-txk9-rrk4m 0/1 Completed 0 7m57s
hippo-instance2-29gh-0 4/4 Running 0 8m19s
hippo-instance2-p4ct-0 4/4 Running 0 8m19s
hippo-instance2-s6fs-0 4/4 Running 0 8m19s
hippo-repo-host-0 2/2 Running 0 8m19s
hippo-s3-instance2-c4rt-0 3/4 Running 0 17m
hippo-s3-repo-host-0 0/2 Pending 0 2s
pgo-7c867985c-kph6l 1/1 Running 0 17m
pgo-upgrade-69b5dfdc45-6qrs8 1/1 Running 0 17m
master#masterk8s-virtual-machine:~/postgres-operator-examples-main$ kubectl get pvc -n postgres-operator NAME STATUS VOLUME CAPACITY ACCESS MODES STORAGECLASS AGE
hippo-instance2-29gh-pgdata Bound pvc-d4629dba-b172-47ea-ab01-12a9039be571 5Gi RWO openebs-hostpath 8m45s
hippo-instance2-p4ct-pgdata Bound pvc-531c9ac7-938c-46b1-b4fa-3a7599f40038 5Gi RWO openebs-hostpath 8m45s
hippo-instance2-s6fs-pgdata Bound pvc-968d9794-e4ba-479c-9138-8fbd85422920 5Gi RWO openebs-hostpath 8m45s
hippo-repo2 Bound pvc-e79d68c3-4e2f-4314-b83f-f96c306a9b38 5Gi RWO openebs-hostpath 8m45s
hippo-s3-instance2-4bhf-pgdata Bound pvc-1ca9e092-4e84-4ca4-88e3-0050890ef101 5Gi RWO openebs-hostpath 17m
hippo-s3-instance2-c4rt-pgdata Bound pvc-987c1bd1-bf41-4180-91de-15bb5ead38ad 5Gi RWO openebs-hostpath 17m
hippo-s3-instance2-xvhq-pgdata Bound pvc-30af7f3b-7ce5-4e2a-8c68-5c701881293b 5Gi RWO openebs-hostpath 17m
hippo-s3-repo1 Pending pgo 17m
What Do I Want?
I want Velero to restore the full backup and I should be able to get access to my databases as I can before restore. It seems like Velero is not able to perform full backups.
Any suggestion will be appreciated

Velero is a backup and restore solution for Kubernetes clusters and their associated persistent volumes. While Velero does not currently support full backup and restore of databases Refer these limitations. It does support snapshotting and restoring persistent volumes. This means that, while you may not be able to directly restore a full database, you can restore the persistent volumes associated with the database and then use the appropriate tools to restore the data from the snapshots. Additionally, Velero's plugin architecture allows you to extend the capabilities of Velero with custom plugins that can add custom backup and restore functionality.
Refer to this blog from digital ocean by Hanif Jetha and Jamon Camisso for more information on backup and restore.

You setup is missing the PVC or PVC based on the error you have shared.
Velero can take backup of PVC and PV in general snapshot if using AWS, GCP plugin and when you restore it create the PVC and PV for you with that also.
i have migrated the Elasticsearch database with velero along with PVC and worked well in my case, however not are you using the same Cloud provider or storageclass in both cluster ? Why PVC is pending for hippo-s3-repo ? Did you the reason for that ?
Here is my article however i was using the plugin and bucket as storage : https://faun.pub/clone-migrate-data-between-kubernetes-clusters-with-velero-e298196ec3d8

Related

Default Grafana K8s app PV issue: FailedBinding persistentvolume-controller no persistent volumes available for this claim and no storage class is set

I am simply trying to deploy this Grafana app as-is, no changes to the YAML have been made: https://grafana.com/docs/grafana/latest/setup-grafana/installation/kubernetes/
VMs are Ubuntu 20.04 LTS. The Kubernetes cluster is made up of the Control-Plane/Mstr & 3x Worker nodes:
root#k8s-master:~# kubectl get nodes
NAME STATUS ROLES AGE VERSION
k8s-master Ready control-plane 35d v1.24.2
k8s-worker1 Ready worker 4h24m v1.24.2
k8s-worker2 Ready worker 4h24m v1.24.2
k8s-worker3 Ready worker 4h24m v1.24.2v
Other K8s Pods such as NGINX run without issue.
However, the Grafana pod cannot start and is stuck in a Pending state:
root#k8s-master:~# kubectl create -f grafana.yaml
persistentvolumeclaim/grafana-pvc created
deployment.apps/grafana created
service/grafana created
# time passed here...
root#k8s-master:~# kubectl get pods
NAME READY STATUS RESTARTS AGE
grafana-9bd5bbd6b-k7ljz 0/1 Pending 0 3h39m
Troubleshooting this, I found there is an issue with the storage PersistentVolumeClaim (the pvc):
root#k8s-master:~# kubectl get pvc
NAME STATUS VOLUME CAPACITY ACCESS MODES STORAGECLASS AGE
grafana-pvc Pending 2m22s
root#k8s-master:~#
root#k8s-master:~# kubectl describe pvc grafana-pvc
Name: grafana-pvc
Namespace: default
StorageClass:
Status: Pending
Volume:
Labels: <none>
Annotations: <none>
Finalizers: [kubernetes.io/pvc-protection]
Capacity:
Access Modes:
VolumeMode: Filesystem
Used By: grafana-9bd5bbd6b-k7ljz
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal FailedBinding 6s (x11 over 2m30s) persistentvolume-controller no persistent volumes available for this claim and no storage class is set
UPDATE:
I created a StorageClass and set it as default:
root#k8s-master:~# kubectl get sc
NAME PROVISIONER RECLAIMPOLICY VOLUMEBINDINGMODE ALLOWVOLUMEEXPANSION AGE
generic (default) no-provisioner Delete Immediate false 19m
I also created a PersistentVolume:
root#k8s-master:~# kubectl get pv
NAME CAPACITY ACCESS MODES RECLAIM POLICY STATUS CLAIM STORAGECLASS REASON AGE
task-pv-volume 10Gi RWO Retain Released default/task-pv-claim manual 12m
However, now when I try to deploy the Grafana PVC it is still stuck - why?
root#k8s-master:~# kubectl get pvc
NAME STATUS VOLUME CAPACITY ACCESS MODES STORAGECLASS AGE
grafana-pvc Pending generic 4m16s
root#k8s-master:~# kubectl describe pvc grafana-pvc
Name: grafana-pvc
Namespace: default
StorageClass: generic
Status: Pending
Volume:
Labels: <none>
Annotations: volume.beta.kubernetes.io/storage-provisioner: no-provisioner
volume.kubernetes.io/storage-provisioner: no-provisioner
Finalizers: [kubernetes.io/pvc-protection]
Capacity:
Access Modes:
VolumeMode: Filesystem
Used By: grafana-9bd5bbd6b-mmqs6
grafana-9bd5bbd6b-pvhtm
grafana-9bd5bbd6b-rtwgj
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal ExternalProvisioning 12s (x19 over 4m27s) persistentvolume-controller waiting for a volume to be created, either by external provisioner "no-provisioner" or manually created by system administrator
I have tried creating a Grafana configuration file from the documentation, and was able to create successfully. The pod has a Running state, also the PVC(PersistentVolumeClaim) shows the Storage class as standard.
The below is the output of PVC:
$ kubectl describe pvc grafana-pvc
Name: grafana-pvc
Namespace: default
StorageClass: standard
Status: Bound
Volume: pvc-ee20cc5d-6ca5-4075-b5f3-d1a6323a5241
Labels: <none>
Annotations: pv.kubernetes.io/bind-completed: yes
pv.kubernetes.io/bound-by-controller: yes
volume.beta.kubernetes.io/storage-provisioner: pd.csi.storage.gke.io
Finalizers: [kubernetes.io/pvc-protection]
Capacity: 1Gi
Access Modes: RWO
VolumeMode: Filesystem
Used By: grafana-75789d79d4-wbgtv
Events: <none>
But in your use case the StorageClass field is showing as empty. So, try deleting the existing one and recreate the Grafana configuration file. If you were not able to create and are still facing the same error message which is “no persistent volumes available for this claim and no storage class is set” then you will have to create PV(PersistentVolume).
Because, your error says, "Your PVC hasn't found a matching PV and you also haven't mentioned any storageClass name". After you create the PersistentVolumeClaim, the Kubernetes control plane looks for a PersistentVolume that satisfies the claim's requirements. If the control plane finds a suitable PersistentVolume with the same StorageClass, it binds the claim to the volume.
In order to resolve your issue you will need to create a StorageClass with no-provisioner and then create a PV(PersistentVolume) by defining this storageClassName. Then you have to create PVC and Pod/Deployment .
Refer to stackpost1 and stackpost2 for more information.

Kubernetes OpenSearch Deployment | "no persistent volumes available for this claim and no storage class is set" error

We deployed OpenSearch using Kubernetes according documentation instructions on 3 nodes cluster (https://opensearch.org/docs/latest/opensearch/install/helm/) , after deployment pods are on Pending state and when checking it, we see following msg:
"
persistentvolume-controller no persistent volumes available for this claim and no storage class is set
"
Can you please advise what could be wrong in our OpenSearch/Kubernetes deployment or what can be missing from configuration perspective?
sharing some info:
Cluster nodes:
[root#I***-M1 ~]# kubectl get nodes
NAME STATUS ROLES AGE VERSION
ir***-m1 Ready control-plane,master 4h34m v1.23.4
ir***-w1 Ready 3h41m v1.23.4
ir***-w2 Ready 3h19m v1.23.4
Pods State:
[root#I****1 ~]# kubectl get pods
NAME READY STATUS RESTARTS AGE
opensearch-cluster-master-0 0/1 Pending 0 80m
opensearch-cluster-master-1 0/1 Pending 0 80m
opensearch-cluster-master-2 0/1 Pending 0 80m
[root#I****M1 ~]# kubectl describe pvc
Name: opensearch-cluster-master-opensearch-cluster-master-0
Namespace: default
StorageClass:
Status: Pending
Volume:
Labels: app.kubernetes.io/instance=my-deployment
app.kubernetes.io/name=opensearch
Annotations: <none>
Finalizers: [kubernetes.io/pvc-protection]
Capacity:
Access Modes:
VolumeMode: Filesystem
Used By: opensearch-cluster-master-0
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal FailedBinding 2m24s (x18125 over 3d3h) persistentvolume-controller **no persistent
volumes available for this claim and no storage class is set**
.....
[root#IR****M1 ~]# kubectl get pv
NAME CAPACITY ACCESS MODES RECLAIM
POLICY STATUS CLAIM STORAGECLASS REASON AGE
opensearch-cluster-master-opensearch-cluster-master-0 30Gi RWO Retain Available manual 6h24m
opensearch-cluster-master-opensearch-cluster-master-1 30Gi RWO Retain Available manual 6h22m
opensearch-cluster-master-opensearch-cluster-master-2 30Gi RWO Retain Available manual 6h23m
task-pv-volume 60Gi RWO Retain Available manual 7h48m
[root#I****M1 ~]# kubectl get pvc
NAME STATUS VOLUME CAPACITY ACCESS MODES STORAGECLASS AGE
opensearch-cluster-master-opensearch-cluster-master-0 Pending 3d3h
opensearch-cluster-master-opensearch-cluster-master-1 Pending 3d3h
opensearch-cluster-master-opensearch-cluster-master-2 Pending 3d3h
...no storage class is set...
Try upgrade your deployment with storage class, presumed you run on AWS EKS: helm upgrade my-deployment opensearch/opensearch --set persistence.storageClass=gp2
If you are running on GKE, change gp2 to standard. On AKS change to default.

EKS - Pod has unbound immediate PersistentVolumeClaims on t2.large instances (t2.large, Bottlerocket OS)

i've looked through several solutions but couldnt find an answer, so i'm trying to run a stateful set on the cluster, but the pod fails to run because of unbound claim. I'm running t2.large machines with Bottlerocket host types.
kubectl get events
28m Warning FailedScheduling pod/carabbitmq-0 pod has unbound immediate PersistentVolumeClaims (repeated 3 times)
28m Normal Scheduled pod/carabbitmq-0 Successfully assigned default/carabbitmq-0 to ip-x.compute.internal
28m Normal SuccessfulAttachVolume pod/carabbitmq-0 AttachVolume.Attach succeeded for volume "pvc-f6e8ec20-4bc1-4539-8d11-2dd1b3dbd4d7"
28m Normal Pulled pod/carabbitmq-0 Container image "busybox:1.30.1" already present on machine
kubectl get pv,pvc + describe
NAME STATUS VOLUME CAPACITY ACCESS MODES STORAGECLASS AGE
persistentvolumeclaim/data-carabbitmq-0 Bound pvc-f6e8ec20-4bc1-4539-8d11-2dd1b3dbd4d7 30Gi RWO gp2 12m
NAME CAPACITY ACCESS MODES RECLAIM POLICY STATUS CLAIM STORAGECLASS REASON AGE
persistentvolume/pvc-f6e8ec20-4bc1-4539-8d11-2dd1b3dbd4d7 30Gi RWO Retain Bound rabbitmq/data-carabbitmq-0 gp2 12m
describe pv:
Name: pvc-f6e8ec20-4bc1-4539-8d11-2dd1b3dbd4d7
Labels: failure-domain.beta.kubernetes.io/region=eu-west-1
failure-domain.beta.kubernetes.io/zone=eu-west-1b
Annotations: kubernetes.io/createdby: aws-ebs-dynamic-provisioner
pv.kubernetes.io/bound-by-controller: yes
pv.kubernetes.io/provisioned-by: kubernetes.io/aws-ebs
Finalizers: [kubernetes.io/pv-protection]
StorageClass: gp2
Status: Bound
Claim: rabbitmq/data-carabbitmq-0
Reclaim Policy: Retain
Access Modes: RWO
VolumeMode: Filesystem
Capacity: 30Gi
Node Affinity:
Required Terms:
Term 0: failure-domain.beta.kubernetes.io/zone in [eu-west-1b]
failure-domain.beta.kubernetes.io/region in [eu-west-1]
Message:
Source:
Type: AWSElasticBlockStore (a Persistent Disk resource in AWS)
VolumeID: aws://eu-west-1b/vol-xx
FSType: ext4
Partition: 0
ReadOnly: false
Events: <none>
describe pvc:
Name: data-carabbitmq-0
Namespace: rabbitmq
StorageClass: gp2
Status: Bound
Volume: pvc-f6e8ec20-4bc1-4539-8d11-2dd1b3dbd4d7
Labels: app=rabbitmq-ha
release=rabbit-mq
Annotations: pv.kubernetes.io/bind-completed: yes
pv.kubernetes.io/bound-by-controller: yes
volume.beta.kubernetes.io/storage-provisioner: kubernetes.io/aws-ebs
Finalizers: [kubernetes.io/pvc-protection]
Capacity: 30Gi
Access Modes: RWO
VolumeMode: Filesystem
Mounted By: carabbitmq-0
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal ProvisioningSucceeded 36m persistentvolume-controller Successfully provisioned volume pvc-f6e8ec20-4bc1-4539-8d11-2dd1b3dbd4d7 using kubernetes.io/aws-ebs
The storage type is gp2.
Name: gp2
IsDefaultClass: Yes
Annotations: storageclass.kubernetes.io/is-default-class=true
Provisioner: kubernetes.io/aws-ebs
Parameters: encrypted=true,type=gp2
AllowVolumeExpansion: <unset>
MountOptions:
debug
ReclaimPolicy: Retain
VolumeBindingMode: Immediate
Events: <none>
I'm not sure what i'm missing, same configuration used to work until i switched to "t" type of EC2s
So, it was weird but i had some readiness probe that failed its healthchecks, i thought that it was because the volume was not mounted well.
The healthcheck basically did some request to localhost, which it had issues on (not sure why) - changing to 127.0.0.1 made the check pass, and then the volume error disappeard.
So - if you have this weird issue (volumes were mounted, but you still get that error) - check the pod's probes.

PVC status is stuck on pending and PV status is available

I was trying to increase the PVC size from 10G to 20G, since we are running on 1.9.3 doing it online is not there. So i have deleted the PVC and created with new value of 20G as storage.
pvc-b196868cd-bc75-12e8-ad32-075738325c 100Gi RWO Retain Released myapp/myapp-backup-pv-claim` persistent 4m
As i deleted, the PV status turned on to "Released" and when i tried to recreate the PVC the it got created but with the status "lost"
myapp-myapp-backup-pv-claim Lost pvc-03b34iknca1-6fr3-19ea-af3b-0073yh2u97f 0 ntfts19-k8s-0101 13m
We are using the Vsphere volumes. Tried the solution of "kubectl patch pv pv-for-rabbitmq -p '{"spec":{"claimRef": null}}'" this helped me to bring back the pv in "Available" status, now the PVC is in stuck with "Pending" state.
pvc-b196868cd-bc75-12e8-ad32-075738325c 100Gi RWO Retain Available myapp/myapp-backup-pv-claim` persistent 2m
myapp-myapp-backup-pv-claim Pending pvc-03b34iknca1-6fr3-19ea-af3b-0073yh2u97f 0 ntfts19-k8s-0101 28m
PV Describe:
Name: myapp-myapp-backup-pv-claim
Namespace: myapp
StorageClass: ntfts19-k8s-0101
Status: Pending
Volume: pvc-03b34iknca1-6fr3-19ea-af3b-0073yh2u97f
Labels: app=my-app
Annotations: <none>
Finalizers: []
Capacity: 0
Access Modes:
Events: <none>
PVC Describe:
Name: pvc-b196868cd-bc75-12e8-ad32-075738325c
Labels: <none>
Annotations: <none>
StorageClass: persistent
Status: Available
Claim: myapp/myapp-backup-pv-claim
Reclaim Policy: Retain
Access Modes: RWO
Capacity: 100Gi
Message:
Source:
Type: vSphereVolume (a Persistent Disk resource in vSphere)
VolumePath: StoragePolicyName: %v
FSType: [dsNTFTS19_0101] kubevols/kubernetes-dynamic-pvc-b196868cd-bc75-12e8-ad32-075738325c.vmdk
%!(EXTRA string=ext4, string=)Events: <none>
The problem was not having annotations, since this is a VSphere storage the annotation volume.beta.kubernetes.io/storage-provisioner=kubernetes.io/vsphere-volume is a mandatory one.
The storage class for PV and PVC should be same. The control plane can only bind PVC with PV only if it can find the PV with same storage class.
You PV has storageClass: ntfts19-k8s-0101 and your PVC has storageClass: persistent. So control plane couldn't find a matching PV that has storageClass persistent.
Delete and recreate the PVC to match the storage class of the PV.
Please refer the official documentation

Rook: Timeout expired waiting for volumes to attach/mount for pod

k8s version: v1.9
env: VirtualBox
os: Coreos
It is 1 node Kubernetes cluster
I followed the below steps:
Followed https://rook.io/docs/rook/v0.5/k8s-pre-reqs.html and updated the kubelet with
Environment="RKT_OPTS=--volume modprobe,kind=host,source=/usr/sbin/modprobe \
--mount volume=modprobe,target=/usr/sbin/modprobe \
--volume lib-modules,kind=host,source=/lib/modules \
--mount volume=lib-modules,target=/lib/modules \
--uuid-file-save=/var/run/kubelet-pod.uuid"
Installed ceph utility
rbd -v
ceph version 10.2.2 (45107e21c568dd033c2f0a3107dec8f0b0e58374)
All rook pods are working but MySQL pod fails with error 'timeout expired waiting for volumes to attach/mount for pod'
➜ kubectl get pod -n rook-system
NAME READY STATUS RESTARTS AGE
rook-agent-rqw6j 1/1 Running 0 21m
rook-operator-5457d48c94-bhh2z 1/1 Running 0 22m
➜ kubectl get pod -n rook
NAME READY STATUS RESTARTS AGE
rook-api-848df956bf-fhmg2 1/1 Running 0 20m
rook-ceph-mgr0-cfccfd6b8-8brxz 1/1 Running 0 20m
rook-ceph-mon0-xdd77 1/1 Running 0 21m
rook-ceph-mon1-gntgh 1/1 Running 0 20m
rook-ceph-mon2-srmg8 1/1 Running 0 20m
rook-ceph-osd-84wmn 1/1 Running 0 20m
➜ kubectl get pv
NAME CAPACITY ACCESSMODES RECLAIMPOLICY STATUS CLAIM STORAGECLASS REASON AGE
pvc-6a4c5c2a-127d-11e8-a846-080027b424ef 20Gi RWO Delete Bound default/mysql-pv-claim rook-block 15m
➜ kubectl get pvc
NAME STATUS VOLUME CAPACITY ACCESSMODES STORAGECLASS AGE
mysql-pv-claim Bound pvc-6a4c5c2a-127d-11e8-a846-080027b424ef 20Gi RWO rook-block 15m
kubectl get pods
NAME READY STATUS RESTARTS AGE
wordpress-mysql-557ffc4f69-8zxsq 0/1 ContainerCreating 0 16m
Error when I describe pod : FailedMount Unable to mount volumes for pod "wordpress-mysql-557ffc4f69-8zxsq_default(6a932df1-127d-11e8-a846-080027b424ef)": timeout expired waiting for volumes to attach/mount for pod "default"/"wordpress-mysql-557ffc4f69-8zxsq". list of unattached/unmounted volumes=[mysql-persistent-storage]
Also added the following option to rook-operator.yaml
- name: FLEXVOLUME_DIR_PATH
value: "/var/lib/kubelet/volumeplugins"
Could you please help with this? Please let me know if you need further details. I checked the similar issues but a solution is not working.
Are you using cephfs or rbd volumes as a backend to Ceph? Here are some things to check:
Please confirm that your Pod can communicate with ceph cluster well, this looks like an issue with the communication to the ceph volumes you’re trying to use.
Check that your Ceph volume plugins are set correctly.
What’s the state of # kubectl get pv?
Take a look at your persistent volumes and claims.
You could also try the Root.io tool, they have good integration with Ceph object storage.