Adding a volume to a Kubernetes StatefulSet using kubectl patch - kubernetes

Problem summary:
I am following the Kubernetes guide to set up a sample Cassandra cluster. The cluster is up and running, and I would like to add a second volume to each node in order to try enable backups for Cassandra that would be stored on a separate volume.
My attempt to a solution:
I tried editing my cassandra-statefulset.yaml file by adding a new volumeMounts and volumeClaimTemplates entry, and reapplying it, but got the following error message:
$ kubectl apply -f cassandra-statefulset.yaml
storageclass.storage.k8s.io/fast unchanged
The StatefulSet "cassandra" is invalid: spec: Forbidden: updates to statefulset spec for fields other than 'replicas', 'template', and 'updateStrategy' are forbidden
I then tried to enable rolling updates and patch my configuration following the documentation here:
https://kubernetes.io/docs/tasks/run-application/update-api-object-kubectl-patch/
$ kubectl patch statefulset cassandra -p '{"spec":{"updateStrategy":{"type":"RollingUpdate"}}}'
statefulset.apps/cassandra patched (no change)
My cassandra-backup-patch.yaml:
spec:
template:
spec:
containers:
volumeMounts:
- name: cassandra-backup
mountPath: /cassandra_backup
volumeClaimTemplates:
- metadata:
name: cassandra-backup
spec:
accessModes: [ "ReadWriteOnce" ]
storageClassName: fast
resources:
requests:
storage: 1Gi
However this resulted in the following error:
$ kubectl patch statefulset cassandra --patch "$(cat cassandra-backup-patch.yaml)"
The request is invalid: patch: Invalid value: "map[spec:map[template:map[spec:map[containers:map[volumeMounts:[map[mountPath:/cassandra_backup name:cassandra-backup]]]]] volumeClaimTemplates:[map[metadata:map[name:cassandra-backup] spec:map[accessModes:[ReadWriteOnce] resources:map[requests:map[storage:1Gi]] storageClassName:fast]]]]]": cannot restore slice from map
Could anyone please point me to the correct way of adding an additional volume for each node or explain why the patch does not work? This is my first time using Kubernetes so my approach may be completely wrong. Any comment or help is very welcome, thanks in advance.

The answer is in your first log:
The StatefulSet "cassandra" is invalid: spec: Forbidden: updates to statefulset spec for fields other than 'replicas', 'template', and 'updateStrategy'
You can't change some fields in a statefulset after creation. You will likely need to delete and recreate the statefulset to add a new volumeClaimTemplate.
edit:
It can many times be useful to leave your pods running even when you delete the statefulset. To accomplish this use the --cascade=false flag on the delete operation.
kubectl delete statefulset <name> --cascade=false
Then your workload will stay running while you recreate your statefulset with the updated VPC.

As mentioned by switchboard.op, deleting is the answer.
but
Watch out for deleting these objects:
PersistentVolumeClaim (kubectl get pvc)
PersistentVolume (kubectl get pv)
which for example in case you'd want to do just helm uninstall instead of kubectl delete statefulset/<item> will be deleted thus unless there's any other reference for the volumes and in case you don't have backups of the previous YAMLs that contain the IDs (i.e. not just generated from Helm templates, but from the orchestrator) you might have a painful day ahead of you.
PVCs and PVs hold IDs and other reference properties for the underlying (probably/mostly?) vendor specific volume referencing by e.g. S3 or other object or file storage implementation used in the background as a volume in a Pod or other resources.
Deleting or otherwise modifying a StatefulSet if you preserve the PVC name within the spec doesn't affect mounting of the correct resource.
If in doubt, always just copy locally the whole volume prior to doing destructive action to PVCs and PVs if you need them in the future or running commands without knowing the underlying source code e.g. by:
kubectl cp <some-namespace>/<some-pod>:/var/lib/something /tmp/backup-something
and then just load it back by reversing the arguments.
Also for Helm usage, delete the StatefulSet, then issue helm upgrade command and it'll fix the missing StatefulSet without touching PVCs and PVs.

Related

Change Kubernetes Persistent Volume's Storage Class and keep the data

I have Elasticsearch Data pods that are currently running on an AKS and are connected to Persistent Volumes that is using a Premium SSD Managed Disk Storage Class and I want to downgrade it to Standard SSD Managed Disk without losing the data I have on the currently used Persistent Volume.
I've created a new Storage Class that is defined with Standard SSD Managed Disk, but if I create a new PV from that it obviously doesn't keep the old data and I need to copy it somehow, so I was wondering what would be best practice switching PV's Storage Class.
Unfortunately, once a PVC is created and a PV is provisioned for it, the only thing you can change without creating a new one is the volume's size
The only straightforward way I could think of without leveraging CSI snapshots/clones, which you might not have access to (depends on how you created PVCs/PVs AFAIK), would be to create a new PVC and mount both volumes on a Deployment whose Pod has root access and the rsync command.
Running rsync -a /old/volume/mount/path /new/volume/mount/path on such a Pod should get you what you want.
However, you should make sure that you do so BEFORE deleting PVCs or any other resource using your PVs. By default, most of the default storage classes create volumes with reclaim policies that immediately delete the PV as soon as all resources using it are gone, so there's a small risk of data loss
It is not possible in Kubernetes to change the storage class of a PVC - because of the described reason. Unfortunately you have to create a new PVC with the desired storage class.
One way to go about this would be to create a snapshot and then new volume from the that snapshot : https://kubernetes.io/docs/concepts/storage/volume-snapshots/
Another is to leverage CSI cloning : https://kubernetes.io/docs/concepts/storage/volume-pvc-datasource/
Both effectively facilitate creation of a fresh PV with a copy of existing data
In extension to #LeoD's answer, I'm sharing a complete step by step guide of what I did exactly eventually.
Scale Stateful Set to 0.
Find the old PVs' Disk resources in Azure:
kubectl describe pv to check which PVs are associated with the PVCs I want to replace.
Follow the resource ID that is specified in field DiskURI to get to the Disks in Azure.
Create an Azure Snapshot resource from each PVC Disk.
Create a new Disk with at least the same allocated GiB as current PVC has from the Snapshots. SKU still doesn't matter at this point but I preferred using the new SKU I want to implement.
Create a K8S Persistent Volume from each new Disk:
apiVersion: v1
kind: PersistentVolume
metadata:
name: pv-from-azure-disk-0
namespace: my-namespace
annotations:
pv.kubernetes.io/provisioned-by: disk.csi.azure.com
spec:
capacity:
storage: 1064
accessModes:
- ReadWriteOnce
persistentVolumeReclaimPolicy: Retain
storageClassName: managed
claimRef:
name: pvc-from-azure-0
namespace: my-namespace
csi:
driver: disk.csi.azure.com
volumeHandle: /subscriptions/<Your subscription ID>/resourcegroups/<Name of RG where the new Disk is stored at>/providers/microsoft.compute/disks/new-disk-from-pvc-0
fsType: ext4
Make sure to edit the following fields:
metadata.namespace: To the namespace where you have your old resources.
spec.capacity.storage: To the allocated GiB you chose for the new Disk.
spec.storageClassName: Name of Storage Class that suits the SKU you chose for the Disk.
spec.claimRef.name: Here we specify a name for a PVC we haven't created yet that will be used later, just make sure the name you choose here is the name you're going to use when creating the PVC in the next step.
spec.claimRef.namespace: To the namespace where you have your old resources.
csi.volumeHandle: Insert Resource ID of your new Disk.
Create a K8S Persistent Volume Claim from each new Persistent Volume you've just created:
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
name: pvc-from-azure-0
namespace: my-namespace
spec:
storageClassName: managed
volumeName: pv-from-azure-disk-0
accessModes:
- ReadWriteOnce
resources:
requests:
storage: 1064
Make sure to edit the following fields:
metadata.name: Use the exact same name you used in the PV's spec.claimRef.name field.
metadata.namespace: To the namespace where you have your old resources.
spec.storageClassName: To the name of Storage Class that you specified for the PV in the spec.storageClassName field.
spec.volumeName: To the name of the PV you've created in the former step.
spec.resources.requests.storage: To the allocated GiB you specified in the PV's spec.capacity.storage field.
Perform steps 2-6 for each existing PVC you want to replace; if your Stateful Set had 3 Pods for example you should do that 3 times for each Pod's PVC.
Create an Ubuntu Pod for each new PVC and mount the PVC to the pod. (If you have 3 PVCs for example you should be having 3 Ubuntu pods, one for each).
apiVersion: v1
kind: Pod
metadata:
name: ubuntu-0
namespace: my-namespace
spec:
containers:
- name: ubuntu-0
image: ubuntu
command:
- "sleep"
- "604800"
volumeMounts:
- mountPath: /mnt/snapshot-data-0
name: snapshot-data-0
volumes:
- name: snapshot-data-0
persistentVolumeClaim:
claimName: pvc-from-azure-0
Make sure to edit the following fields:
metadata.namespace: To the namespace where you have your old resources.
spec.volumes.persistentVolumeClaim.claimName: To the name of the new PVC you've created.
Exec to each Ubuntu Pod and validate all your data is in there.
If all data is there you can delete the Ubuntu Pod.
Now is the very crucial part, you need to delete the old original PVCs and Stateful Set but before you do that, make sure you have the original yaml file of the Stateful Set or anything else you used to create the Stateful Set, make sure you know how to re-create the Stateful Set and only then continue to delete the old original PVCs and Stateful Set.
After deletion of Stateful Set and old PVCs has been completed, in your Stateful Set yaml or configuration replace the value of storageClassName that's within the volumeClaimTemplates block to the new Storage Class you want to use instead.
...
volumeClaimTemplate:
accessModes: [ "ReadWriteOnce" ]
storageClassName: "managed" # <--- Edit this field
resources:
requests:
storage: "1064"
limits:
storage: "1064"
...
Now recreate the Stateful Set with the new configuration. This will recreate the Stateful Set and create new empty PVCs which are going to use the new specified SKU from the Storage Class.
After creation of the PVCs has been completed, scale the new Stateful Set to 0.
Now recreate the Ubuntu Pod but this time mount both the PVC from the Azure Disk you created in the first steps and the new empty PVC of the new Stateful Set:
apiVersion: v1
kind: Pod
metadata:
name: ubuntu-0
namespace: my-namespace
spec:
containers:
- name: ubuntu-0
image: ubuntu
command:
- "sleep"
- "604800"
volumeMounts:
- mountPath: /mnt/snapshot-data-0
name: snapshot-data-0
- mountPath: /mnt/new-data-0
name: new-data-0
volumes:
- name: snapshot-data-0
persistentVolumeClaim:
claimName: pvc-from-azure-0
- name: new-data-0
persistentVolumeClaim:
claimName: my-stateful-set-data-0
Make sure to edit the following fields:
metadata.namespace: To the namespace where you have your resources.
spec.volumes.persistentVolumeClaim: There are two fields of this, one for snapshot-data-0, and one for new-data-0. As their names suggest, the snapshot-data-0 should have the old data from the Azure Disk we created in the first steps, and the new-data-0 should be empty since it should be the new PVC that was created from the recreation of the Stateful Set.
So, the value for spec.volumes.persistentVolumeClaim of snapshot-data-0 should be the name of the new PVC you've created from the Azure Disk. And the value for spec.volumes.persistentVolumeClaim of new-data-0 should be the name of the new PVC that was created following the recreation of the Stateful Set.
Exec to the Ubuntu Pod and perform the following commands:
apt update && apt install rsync -y
nohup rsync -avz /mnt/snapshot-data-0/ /mnt/new-data-0 --log-file=$HOME/.rsyncd.log &
First command is to install rsync and second command to copy all the old data into the new empty PVC and keep logs of the command at /root/.rsyncd.log for your reference of checking the progress. Do notice depending on the amount of data you have stored that this command could take some time. For me personally it took about 1 hour to complete.
Perform steps 15-16 for each backed up and old PVC pair.
Once it finishes for all PVC pairs, delete the Ubuntu Pods.
Scale the new Stateful Set back up.
Profit! You now have the same Stateful Set having the same data but with a different Storage Class.

Moving Pod to another node automatically

Is it possible for a pod/deployment/statefulset to be moved to another node or be recreated on another node automatically if the first node fails? The pod in question is set to 1 replica. So is it possible to configure some sort of failover for kubernetes pods? I've tried out pod affinity settings but nothing is moved automatically it has been around 10 minutes.
the yaml for the said pod is like below:
kind: PersistentVolumeClaim
apiVersion: v1
metadata:
name: ceph-rbd-sc-pvc
spec:
accessModes:
- ReadWriteOnce
resources:
requests:
storage: 2Gi
storageClassName: ceph-rbd-sc
---
apiVersion: v1
kind: Pod
metadata:
name: ceph-rbd-pod-pvc-sc
labels:
app: ceph-rbd-pod-pvc-sc
spec:
containers:
- name: ceph-rbd-pod-pvc-sc
image: busybox
command: ["sleep", "infinity"]
volumeMounts:
- mountPath: /mnt/ceph_rbd
name: volume
nodeSelector:
etiket: worker
volumes:
- name: volume
persistentVolumeClaim:
claimName: ceph-rbd-sc-pvc
affinity:
podAntiAffinity:
requiredDuringSchedulingIgnoredDuringExecution:
- labelSelector:
matchLabels:
name: ceph-rbd-pod-pvc-sc
topologyKey: "kubernetes.io/hostname"
Edit:
I managed to get it to work. But now i have another problem, the newly created pod in the other node is stuck in "container creating" and the old pod is stuck in "terminating". I also get Multi-Attach error for volume stating that the pv is still in use by the old pod. The situation is the same for any deployment/statefulset with a pv attached, the problem is resolved only when the failed node comes back online. Is there a solution for this?
Answer from coderanger remains valid regarding Pods. Answering to your last edit:
Your issue is with CSI.
When your Pod uses a PersistentVolume whose accessModes is RWO.
And when the Node hosting your Pod gets unreachable, prompting Kubernetes scheduler to Terminate the current Pod and create a new one on another Node
Your PersistentVolume can not be attached to the new Node.
The reason for this is that CSI introduced some kind of "lease", marking a volume as bound.
With previous CSI spec & implementations, this lock was not visible, in terms of Kubernetes API. If your ceph-csi deployment is recent enough, you should find a corresponding "VolumeAttachment" object that could be deleted, to fix your issue:
# kubectl get volumeattachments -n ci
NAME ATTACHER PV NODE ATTACHED AGE
csi-194d3cfefe24d5f22616fabd3d2fb2ce5f79b16bdca75088476c2902e7751794 rbd.csi.ceph.com pvc-902c3925-11e2-4f7f-aac0-59b1edc5acf4 melpomene.xxx.com true 14d
csi-24847171efa99218448afac58918b6e0bb7b111d4d4497166ff2c4e37f18f047 rbd.csi.ceph.com pvc-b37722f7-0176-412f-b6dc-08900e4b210d clio.xxx.com true 90d
....
kubectl delete -n ci volumeattachment csi-xxxyyyzzz
Those VolumeAttachments are created by your CSI provisioner, before the device mapper attaches a volume.
They would be deleted only once the corresponding PV would have been released from a given Node, according to its device mapper - that needs to be running, kubelet up/Node marked as Ready according to the the API. Until then, other Nodes can't map it. There's no timeout, should a Node get unreachable due to network issues or an abrupt shutdown/force off/reset: its RWO PV are stuck.
See: https://github.com/ceph/ceph-csi/issues/740
One workaround for this would be not to use CSI, and rather stick with legacy StorageClasses, in your case installing rbd on your nodes.
Though last I checked -- k8s 1.19.x -- I couldn't manage to get it working, I can't recall what was wrong, ... CSI tends to be "the way" to do it, nowadays. Despite not being suitable for production use, sadly, unless running in an IAAS with auto-scale groups deleting Nodes from the Kubernetes API (eventually evicting the corresponding VolumeAttachments), or using some kind of MachineHealthChecks like OpenShift 4 implements.
A bare Pod is a single immutable object. It doesn't have any of these nice things. Related: never ever use bare Pods for anything. If you try this with a Deployment you should see it spawn a new one to get back to the requested number of replicas. If the new Pod is Unschedulable you should see events emitted explaining why. For example if only node 1 matches the nodeSelector you specified, or if another Pod is already running on the other node which triggers the anti-affinity.

Cannot update Kubernetes pod from yaml generated from kubectl get pod pod_name -o yaml

I have a pod in my kubernetes which needed an update to have securityContext. So generated a yaml file using -
kubectl get pod pod_name -o yaml > mypod.yaml
After updating the required securityContext and executing command -
kubectl apply -f mypod.yaml
no changes are observed in pod.
Where as a fresh newly created yaml file works perfectly fine.
new yaml file -
apiVersion: v1
kind: Pod
metadata:
name: mypod
namespace: default
spec:
securityContext:
runAsUser: 1010
containers:
- command:
- sleep
- "4800"
image: ubuntu
name: myubuntuimage
Immutable fields
In Kubernetes you can find information about Immutable fields.
A lot of fields in APIs tend to be immutable, they can't be changed after creation. This is true for example for many of the fields in pods. There is currently no way to declaratively specify that fields are immutable, and one has to rely on either built-in validation for core types, or have to build a validating webhooks for CRDs.
Why ?
There are resources in Kubernetes which have immutable fields by design, i.e. after creation of an object, those fields cannot be mutated anymore. E.g. a pod's specification is mostly unchangeable once it is created. To change the pod, it must be deleted, recreated and rescheduled.
Editing existing pod configuration
If you want to apply new config with security context using kubectl apply you will get error like below:
The Pod "mypod" is invalid: spec: Forbidden: pod updates may not change fields other than `spec.containers[*].image`, `spec.initContainers[*].image`, `spec.activeDeadlineSeconds` or `spec.tolerations` (only additions to existing tolerations)
Same output will be if you would use kubectl patch
kubectl patch pod mypod -p '{"spec":{"securityContext":{"runAsUser":1010}}}'
Also kubectl edit will not change this specific configuration
$ kubectl edit pod
Edit cancelled, no changes made.
Solution
If you need only one pod, you must delete it and create new one with requested configuration.
Better solution is to use resource which will make sure to fulfil some own requirements, like Deployment. After change of the current configuration, deployment will create new Replicaset which will create new pods with new configuration.
by updating the PodTemplateSpec of the Deployment. A new ReplicaSet is created and the Deployment manages moving the Pods from the old ReplicaSet to the new one at a controlled rate. Each new ReplicaSet updates the revision of the Deployment.

Kubernetes multiple pvc with statefulset for each pod vs single pvc for all pods?

I have deploy kubernetes cluster with stateful pod for mysql. for each pod i have different pvc.
for example : if 3 pod thn 3 5GB EBS PVC
SO Which way is better using one PVC for all pods or use different pvc for each pod.
StatefulSet must use volumeClaimTemplates if you want to have dedicated storage for each pod of a set. Based on that template PersistentVolumeClaim for each pod is created and configured the volume to be bound to that claim. The generated PersistentVolumeClaims names consists of volumeClaimTemplate name + pod-name + ordinal number.
So if you add volumeClaimTemplate part to your StatefulSet YAML(and delete specific persistentVolumeClaim references), smth like that:
volumeClaimTemplates:
- metadata:
name: mysql-data
spec:
resources:
requests:
storage: 10Gi
accessModes:
- ReadWriteOnce
Then go and create your StatefulSet and after to examine one of its pods (kubectl get pods pod-name-0 yaml) you’ll see smth like that(volumes part of the output):
volumes:
- name: mysql-data
persistentVolumeClaim:
claimName: mysql-data-pod-name-0. | dynamically created claim based on the template
So by using volumeClaimTemplates you don’t need to create a separate PVCs yourself and then in each separate StatefulSet reference that PVC to be mounted in your container at a specific mountPath(remember that each pod of a set must reference a different PVC, 1PVC-1PV) :
Part of “containers” definition of your Statefulset YAML:
volumeMounts:
- name: mysql-data || references your PVC by -name(not PVC name itself)
mountPath: /var/lib/mysql
So to aim for each pod of the set to have dedicated storage and not use volumeClaimTemplates is leading to a lot of problems and over complications to manage and scale it.
A PVC gets bound to a specific PV. For a StatefulSet, in most imaginable cases, you want to have a PV that can be accessed only by certain pod, so that data is not corrupted by write attempt from a parallel process/pod (RWO rather then RWX mode).
With that in mind, you need a PVC per replica in StatefulSet. Creating PVCs for replicas would get problematic very quickly if done manualy, this is why the right way to do it is to use volumeClaimTemplates that will dynamicaly create PVCs for you as you scale your set.

Cancel or undo deletion of Persistent Volumes in kubernetes cluster

Accidentally tried to delete all PV's in cluster but thankfully they still have PVC's that are bound to them so all PV's are stuck in Status: Terminating.
How can I get the PV's out of the "terminating" status and back to a healthy state where it is "bound" to the pvc and is fully working?
The key here is that I don't want to lose any data and I want to make sure the volumes are functional and not at risk of being terminated if claim goes away.
Here are some details from a kubectl describe on the PV.
$ kubectl describe pv persistent-vol-1
Finalizers: [kubernetes.io/pv-protection foregroundDeletion]
Status: Terminating (lasts 1h)
Claim: ns/application
Reclaim Policy: Delete
Here is the describe on the claim.
$ kubectl describe pvc application
Name: application
Namespace: ns
StorageClass: standard
Status: Bound
Volume: persistent-vol-1
It is, in fact, possible to save data from your PersistentVolume with Status: Terminating and RetainPolicy set to default (delete). We have done so on GKE, not sure about AWS or Azure but I guess that they are similar
We had the same problem and I will post our solution here in case somebody else has an issue like this.
Your PersistenVolumes will not be terminated while there is a pod, deployment or to be more specific - a PersistentVolumeClaim using it.
The steps we took to remedy our broken state:
Once you are in the situation lke the OP, the first thing you want to do is to create a snapshot of your PersistentVolumes.
In GKE console, go to Compute Engine -> Disks and find your volume there (use kubectl get pv | grep pvc-name) and create a snapshot of your volume.
Use the snapshot to create a disk: gcloud compute disks create name-of-disk --size=10 --source-snapshot=name-of-snapshot --type=pd-standard --zone=your-zone
At this point, stop the services using the volume and delete the volume and volume claim.
Recreate the volume manually with the data from the disk:
---
apiVersion: v1
kind: PersistentVolume
metadata:
name: name-of-pv
spec:
accessModes:
- ReadWriteOnce
capacity:
storage: 10Gi
gcePersistentDisk:
fsType: ext4
pdName: name-of-disk
persistentVolumeReclaimPolicy: Retain
Now just update your volume claim to target a specific volume, the last line of the yaml file:
---
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
name: my-pvc
namespace: my-namespace
labels:
app: my-app
spec:
accessModes:
- ReadWriteOnce
resources:
requests:
storage: 10Gi
volumeName: name-of-pv
Edit: This only applies if you deleted the PVC and not the PV. Do not follow these instructions if you deleted the PV itself or the disk may be deleted!
I found myself in this same situation due to a careless mistake. It was with a statefulset on Google Cloud/GKE. My PVC said terminating because the pod referencing it was still running and the PV was configured with a retain policy of Deleted. I ended up finding a simpler method to get everything straightened out that also preserved all of the extra Google/Kubernetes metadata and names.
First, I would make a snapshot of your disk as suggested by another answer. You won't need it, but if something goes wrong, the other answer here can then be used to re-create a disk from it.
The short version is that you just need reconfigure the PV to "Retain", allow the PVC to get deleted, then remove the previous claim from the PV. A new PVC can then be bound to it and all is well.
Details:
Find the full name of the PV:
kubectl get pv
Reconfigure your PV to set the reclaim policy to "Retain": (I'm doing this on Windows so you may need to handle the quotes differently depending on OS)
kubectl patch pv <your-pv-name-goes-here> -p "{\"spec\":{\"persistentVolumeReclaimPolicy\":\"Retain\"}}"
Verify that the status of the PV is now Retain.
Shutdown your pod/statefulset (and don't allow it to restart). Once that's finished, your PVC will get removed and the PV (and the disk it references!) will be left intact.
Edit the PV:
kubectl edit pv <your-pv-name-goes-here>
In the editor, remove the entire "claimRef" section. Remove all of the lines from (and including) "claimRef:" until the next tag with the same indentation level. The lines to remove should look more or less like this:
claimRef:
apiVersion: v1
kind: PersistentVolumeClaim
name: my-app-pvc-my-app-0
namespace: default
resourceVersion: "1234567"
uid: 12345678-1234-1234-1234-1234567890ab
Save the changes and close the editor. Check the status of the PV and it should now show "Available".
Now you can re-create your PVC exactly as you originally did. That should then find the now "Available" PV and bind itself to it. In my case, I have the PVC defined with my statefulset as a volumeClaimTemplate so all I had to do was "kubectl apply" my statefulset.
You can check out this tool, it will update the Terminating PV's status in etcd back to Bound.
The way it works has been mentioned by Anirudh Ramanathan in his answer.
Be sure to back up your PV first.
Do not attempt this if you don't know what you're doing
There is another fairly hacky way of undeleting PVs. Directly editing the objects in etcd. Note that the following steps work only if you have control over etcd - this may not be true on certain cloud providers or managed offerings. Also note that you can screw things up much worse easily; since objects in etcd were never meant to be edited directly - so please approach this with caution.
We had a situation wherein our PVs had a policy of delete and I accidentally ran a command deleting a majority of them, on k8s 1.11. Thanks to storage-object-in-use protection, they did not immediately disappear, but they hung around in a dangerous state. Any deletion or restarts of the pods that were binding the PVCs would have caused the kubernetes.io/pvc-protection finalizer to get removed and thereby deletion of the underlying volume (in our case, EBS). New finalizers also cannot be added when the resource is in terminating state - From a k8s design standpoint, this is necessary in order to prevent race conditions.
Below are the steps I followed:
Back up the storage volumes you care about. This is just to cover yourself against possible deletion - AWS, GCP, Azure all provide mechanisms to do this and create a new snapshot.
Access etcd directly - if it's running as a static pod, you can ssh into it and check the http serving port. By default, this is 4001. If you're running multiple etcd nodes, use any one.
Port-forward 4001 to your machine from the pod.
kubectl -n=kube-system port-forward etcd-server-ip-x.y.z.w-compute.internal 4001:4001
Use the REST API, or a tool like etcdkeeper to connect to the cluster.
Navigate to /registry/persistentvolumes/ and find the corresponding PVs. The deletion of resources by controllers in k8s is done by setting the .spec.deletionTimeStamp field in the controller spec. Delete this field in order to have the controllers stop trying to delete the PV. This will revert them to the Bound state, which is probably where they were before you ran a delete.
You can also carefully edit the reclaimPolicy to Retain and then save the objects back to etcd. The controllers will re-read the state soon and you should see it reflected in kubectl get pv output as well shortly.
Your PVs should go back to the old undeleted state:
$ kubectl get pv
NAME CAPACITY ACCESS MODES RECLAIM POLICY STATUS CLAIM STORAGECLASS REASON AGE
pvc-b5adexxx 5Gi RWO Retain Bound zookeeper/datadir-zoo-0 gp2 287d
pvc-b5ae9xxx 5Gi RWO Retain Bound zookeeper/datalogdir-zoo-0 gp2 287d
As a general best practice, it is best to use RBAC and the right persistent volume reclaim policy to prevent accidental deletion of PVs or the underlying storage.
Unfortunately, you can't save your PV's and data in this case.
All you may do is recreate PV with Reclaim Policy: Retain - this will prevent data loss in the future.
You can read more about reclaim Policies here and here.
What happens if I delete a PersistentVolumeClaim (PVC)? If the volume
was dynamically provisioned, then the default reclaim policy is set to
“delete”. This means that, by default, when the PVC is deleted, the
underlying PV and storage asset will also be deleted. If you want to
retain the data stored on the volume, then you must change the reclaim
policy from “delete” to “retain” after the PV is provisioned.