Recreate Pod managed by a StatefulSet with a fresh PersistentVolume - kubernetes

On an occasional basis I need to perform a rolling replace of all Pods in my StatefulSet such that all PVs are also recreated from scratch. The reason to do so is to get rid of all underlying hard drives that use old versions of encryption key. This operation should not be confused with regular rolling upgrades, for which I still want volumes to survive Pod terminations. The best routine I figured so far to do that is following:
Delete the PV.
Delete the PVC.
Delete the Pod.
Wait until all deletions complete.
Manually recreate the PVC deleted in step 2.
Wait for the new Pod to finish streaming data from other Pods in the StatefulSet.
Repeat from step 1. for the next Pod.
I'm not happy about step 5. I wish StatefulSet recreated the PVC for me, but unfortunately it does not. I have to do it myself, otherwise Pod creation fails with following error:
Warning FailedScheduling 3s (x15 over 15m) default-scheduler persistentvolumeclaim "foo-bar-0" not found
Is there a better way to do that?

I just recently had to do this. The following worked for me:
# Delete the PVC
$ kubectl delete pvc <pvc_name>
# Delete the underlying statefulset WITHOUT deleting the pods
$ kubectl delete statefulset <statefulset_name> --cascade=false
# Delete the pod with the PVC you don't want
$ kubectl delete pod <pod_name>
# Apply the statefulset manifest to re-create the StatefulSet,
# which will also recreate the deleted pod with a new PVC
$ kubectl apply -f <statefulset_yaml>

This is described in https://github.com/kubernetes/kubernetes/issues/89910. The workaround proposed there, of deleting the new Pod which is stuck pending, works and the second time it gets replaced a new PVC is created. It was marked as a duplicate of https://github.com/kubernetes/kubernetes/issues/74374, and reported as potentially fixed in 1.20.

It seems like you're using "Persistent" volume in a wrong way. It's designed to keep the data between roll-outs, not to delete it. There are other different ways to renew the keys. One can use k8s Secret and ConfigMap to mount the key into the Pod. Then you just need to recreate a Secret during a rolling update

Related

Kubernetes Persistent Volume Claim FileSystemResizePending

i have a persistent volume claim for a kubernetes pod which shows the message "Waiting for user to (re-)start a pod to finish file system resize of volume on node." if i check it with 'kubectl describe pvc ...'
The rezising itself worked which was done with terraform in our deployments but this message still shows up here and i'm not really sure how to get this fixed? The pod was already restarted several times - i tried kubectl delete pod and scale it down with kubectl scale deployment.
Does anyone have an idea how to get rid of this message?screenshot
There are few things to consider:
Instead of using the Terraform, try resizing the PVC by editing it manually. After that wait for the underlying volume to be expanded by the storage provider and verify if the FileSystemResizePending condition is present by executing kubectl get pvc <pvc_name> -o yaml. Than, make sure that all the associated pods are restarted so the whole process can be completed. Once file system resizing is done, the PVC will automatically be updated to reflect new size.
Make sure that your volume type is supported for expansion. You can expand the following types of volumes:
gcePersistentDisk
awsElasticBlockStore
Cinder
glusterfs
rbd
Azure File
Azure Disk
Portworx
FlexVolumes
CSI
Check if in your StorageClass the allowVolumeExpansion field is set to true.

Understanding deleting stateful sets

New to k8s. I want to understand, what kubectl delete sts --cascade=false does?
If i remove cascade, it deletes the statefulsets pods.
It is clearly explained in the documentation under Deleting the Statefulset:
Deleting a StatefulSet through kubectl will scale it down to 0,
thereby deleting all pods that are a part of it. If you want to delete
just the StatefulSet and not the pods, use --cascade=false.
So by passing this flag to kubectl delete the Pods that are managed by Statefulset are still running even though the StatefulSet object itself is deleted.
As described by the fine manual, it deletes the StatefulSet oversight mechanism without actually deleting the underlying Pods. Removing the oversight mechanism means that if a Pod dies, or you wish to make some kind of change, kubernetes will no longer take responsibility for ensuring the Pods are in the desired configuration.

How to delete persistent volumes in Kubernetes

I am trying to delete persistent volumes on a Kubernetes cluster. I ran the following command:
kubectl delete pv pvc-08e65270-b7ce-11e9-ba0b-0a1e280502e2 pvc-08e87826-b7ce-11e9-ba0b-0a1e280502e2 pvc-08ea5f97-b7ce-11e9-ba0b-0a1e280502e2 pvc-08ec1cac-b7ce-11e9-ba0b-0a1e280502e2
However it showed:
persistentvolume "pvc-08e65270-b7ce-11e9-ba0b-0a1e280502e2" deleted
persistentvolume "pvc-08e87826-b7ce-11e9-ba0b-0a1e280502e2" deleted
persistentvolume "pvc-08ea5f97-b7ce-11e9-ba0b-0a1e280502e2" deleted
persistentvolume "pvc-08ec1cac-b7ce-11e9-ba0b-0a1e280502e2" deleted
But the command did not exit. So I CONTROL+C to force exit the command. After a few minutes, I ran:
kubectl get pv
And the status is Terminating, but the volumes don't appear to be deleting.
How can I delete these persistent volumes?
It is not recommended to delete pv it should be handled by cloud provisioner. If you need to remove pv just delete pod bounded to claim and then pvc. After that cloud provisioner should also remove pv as well.
kubectl delete pvc --all
It sometimes could take some time so be patient.
Delete all the pods, which is using the pvc(you want to delete), then delete the PVC(PersistentVolumeClaim) & PV(PersistentVolume) in sequence.
Some thing like below(in sequence):
kubectl delete pod --all / pod-name
kubectl delete pvc --all / pvc-name
kubectl delete pv --all / pv-name
I have created below diagram to help explain this better.
The Kubectl commands are mentioned by other answers in this thread. The same should work.
kubectl delete sts sts-name
kubectl delete pvc pvc-name
kubectl delete pv pv-name
Some more useful info
If you see something stuck in terminating state, its because of guardrails set in place by k8s. These are referred to as 'Finalizers'.
If your PV is stuck in terminating state after deletion, it likely because you have deleted the PV before deleting the PVC.
If your PVC is stuck in terminating state after deletion, it likely because your pods are still running. (simply delete the pods/statefulset in such cases)
If you wish to delete the resource in terminating state, use below commands to bypass the pvc, pv protection finalizers.
kubectl patch pvc pvc_name -p '{"metadata":{"finalizers":null}}'
kubectl patch pv pv_name -p '{"metadata":{"finalizers":null}}'
Here is the documentation on PVC retention policy.
Here is the documentation on PV reclaim policy.
PVs are cluster resources provisioned by an administrator, whereas PVCs are a user's request for storage and resources. I guess you have still deployed the corresponding PVC.
Delete the deployment. E.g.:
kubectl delete deployment mongo-db
List the Persistent Volume Claim. E.g.:
kubectl get pvc
Delete the corresponding pcv. E.g.:
kubectl delete pvc mongo-db

What is the recommended way to move lone pods to different node before draining? Such that kubectl drain node1 --force does not delete the pod

Cannot find how to do so in the docs. After draining the node with --ignore-daemonsets --force pods not managed by ReplicationController, ReplicaSet, Job, DaemonSet or StatefulSet are lost. How should I move such pods prior to issuing the drain command? I want to preserve the local data on these pods.
A good practice is to always start a Pod as a Deployment with specs.replicas: 1. It's very easy as the Deployment specs.template literally takes in your Pod specs, and quite convenient as the deployment will make sure your Pod is always running.
Then, assuming you'll only have 1 replica of your Pod, you can simply use a PersistentVolumeClaim and attach it to the pod as a volume, you do not need a StatefulSet in that case. Your data will be stored in the PVC, and whenever your Pod is moved over nodes for whatever reason it will reattach the volume automatically without loosing any data.
Now, if it's too late for you, and your Pod hasn't got a volume pointing to a PVC, you can still get ready to change that by implementing the Deployment/PVC approach, and manually copy data out of your current pod:
kubectl cp theNamespace/thePod:/the/path /somewhere/on/your/local/computer
Before copying it back to the new pod:
kubectl cp /somewhere/on/your/local/computer theNamespace/theNewPod:/the/path
This time, just make sure /the/path (to reuse the example above) is actually a Volume mapped to a PVC so you won't have to do that manually again!

Pod gets recreated after deletion

I'm unable to delete the kubernetes pod, it keeps recreating it.
There's no service or deployment associated with the pod. There's a label on the pod thou, is that the root cause?
If I edit the label out with kubectl edit pod podname it removes the label from the pod, but creates a new pod with the same label at the same time. ¿?
Pod can be created by ReplicationControllers or ReplicaSets. The latter one might be created by an Deployment. The described behavior strongly indicates, that the Pod is managed by either of these two.
You can check for these with this commands:
kubectl get rs
kubectl get rc