Understanding deleting stateful sets - kubernetes

New to k8s. I want to understand, what kubectl delete sts --cascade=false does?
If i remove cascade, it deletes the statefulsets pods.

It is clearly explained in the documentation under Deleting the Statefulset:
Deleting a StatefulSet through kubectl will scale it down to 0,
thereby deleting all pods that are a part of it. If you want to delete
just the StatefulSet and not the pods, use --cascade=false.
So by passing this flag to kubectl delete the Pods that are managed by Statefulset are still running even though the StatefulSet object itself is deleted.

As described by the fine manual, it deletes the StatefulSet oversight mechanism without actually deleting the underlying Pods. Removing the oversight mechanism means that if a Pod dies, or you wish to make some kind of change, kubernetes will no longer take responsibility for ensuring the Pods are in the desired configuration.

Related

Deployment vs POD - change

I was wondering what would happen in this scenario or if it's even possible:
Kubernetes cluster -
If the deployment has a container restartPolicy of: Always
but on the POD level you specify a restartPolicy of: Never
Which will Kubernetes do?
As #Turing85 commented, in the normal use case a Deployment and its Pod cannot have different restartPolicys, as the Deployment creates the Pods. If you try to alter the Pods restartPolicy manually after it is created (e.g. with kubectl edit pod <pod-name>) you will get an error, as this property cannot be changed after creation. However, we can trick a Deployment or more specifically the underlying ReplicaSet into accepting a manually created Pod. ReplicaSets in Kubernetes know which Pods are theirs through the use of labels. If you inspect the ReplicaSet belonging to your Deployment, you will see a label selector, that shows you which labels need to be present for the ReplicaSet to consider the Pod part of the ReplicaSet.
So if you want to manually create a Pod that is later managed by the ReplicaSet, you first create a Pod with the desired restartPolicy. After this Pod has started and is ready you delete an existing Pod of the ReplicaSet and update the labels of your pod to contain the correct labels. Now there is a Pod in the ReplicaSet with a different restartPolicy.
This is really hacky and actually depends on the timing of deletion and update of the labels, because as soon as you delete a Pod in the ReplicaSet it will try to create a new one. You essentially have to be faster with the label change than the ReplicaSet is with the creation of a new Pod.

Recreate Pod managed by a StatefulSet with a fresh PersistentVolume

On an occasional basis I need to perform a rolling replace of all Pods in my StatefulSet such that all PVs are also recreated from scratch. The reason to do so is to get rid of all underlying hard drives that use old versions of encryption key. This operation should not be confused with regular rolling upgrades, for which I still want volumes to survive Pod terminations. The best routine I figured so far to do that is following:
Delete the PV.
Delete the PVC.
Delete the Pod.
Wait until all deletions complete.
Manually recreate the PVC deleted in step 2.
Wait for the new Pod to finish streaming data from other Pods in the StatefulSet.
Repeat from step 1. for the next Pod.
I'm not happy about step 5. I wish StatefulSet recreated the PVC for me, but unfortunately it does not. I have to do it myself, otherwise Pod creation fails with following error:
Warning FailedScheduling 3s (x15 over 15m) default-scheduler persistentvolumeclaim "foo-bar-0" not found
Is there a better way to do that?
I just recently had to do this. The following worked for me:
# Delete the PVC
$ kubectl delete pvc <pvc_name>
# Delete the underlying statefulset WITHOUT deleting the pods
$ kubectl delete statefulset <statefulset_name> --cascade=false
# Delete the pod with the PVC you don't want
$ kubectl delete pod <pod_name>
# Apply the statefulset manifest to re-create the StatefulSet,
# which will also recreate the deleted pod with a new PVC
$ kubectl apply -f <statefulset_yaml>
This is described in https://github.com/kubernetes/kubernetes/issues/89910. The workaround proposed there, of deleting the new Pod which is stuck pending, works and the second time it gets replaced a new PVC is created. It was marked as a duplicate of https://github.com/kubernetes/kubernetes/issues/74374, and reported as potentially fixed in 1.20.
It seems like you're using "Persistent" volume in a wrong way. It's designed to keep the data between roll-outs, not to delete it. There are other different ways to renew the keys. One can use k8s Secret and ConfigMap to mount the key into the Pod. Then you just need to recreate a Secret during a rolling update

What happens to persistent volume if the StatefulSet got deleted and re-created?

I made a Kafka and zookeeper as a statefulset and exposed Kafka to the outside of the cluster. However, whenever I try to delete the Kafka statefulset and re-create one, the data seemed to be gone? (when I tried to consume all the message using kafkacat, the old messages seemed to be gone) even if it is using the same PVC and PV. I am currently using EBS as my persistent volume.
Can someone explain to me what is happening to PV when I delete the statefulset? Please help me.
I would probably look at how the persistent volume is created.
If you run the command
kubectl get pv
you can see the Reclaim policy, if it is set to retain, then your volume will survive even when stateful set is deleted
This is the expected behaviour , because the new statefulSet will create a new set of PVs and start over. ( if there is no other choice it can randomly land on old PVs as well , for example local volumes )
StatefulSet doesn't mean that kubernetes will remember what you were doing in some other old statefulset that u have deleted.
Statefulset means that if the pod is restarted or re-created for some reason, the same volume will be assigned to it. This doesn't mean that the volume will be assigned across the StatefulSets.
I assume your scenario is that you have a statefulset which has got a persistentvolumeclaim definition in it - or it is just referencing an existing volume - and you try to delete it.
In this case the persistent volume will stay there. Also the pvc won't disappear;
This is so that you can, if you wanted to, remount the same volume to a different statefulset pod - or an update of the previous one thereof.
If you want to delete the persistent volume you should delete the pvc and the buond PVs will disappear.
Kubernetes default prevent deleting PersistentVolumeClaims and bounded PersistentVolume objects when you scaling StatefulSet down or deleting them.
Retaining PersistentVolumeClaims is the default behavior, but you can configure the StatefulSet to delete them via the persistentVolumeClaimRetentionPolicy field.
This example shows part of StatefulSet manifest file, where retention policy causes deleting PersistentVolumeClaim when StatefulSet is scaled down, and retaining when StatefulSet is deleted.
apiVersion: apps/v1
kind: StatefulSet
metadata:
name: quiz
spec:
persistentVolumeClaimRetentionPolicy:
whenScaled: Delete
whenDeleted: Retain
Make sure you have properly configured StatefulSet manifest and Kafka cluster.
NOTE
If you want to delete a StatefulSet but keep the Pods and the
PersistentVolumeClaims, you can use the --cascade=orphan option. In
this case, the PersistentVolumeClaims will be preserved even if the
retention policy is set to Delete.
Marko Lukša "Kubernetes in Action, Second Edition"

What is the recommended way to move lone pods to different node before draining? Such that kubectl drain node1 --force does not delete the pod

Cannot find how to do so in the docs. After draining the node with --ignore-daemonsets --force pods not managed by ReplicationController, ReplicaSet, Job, DaemonSet or StatefulSet are lost. How should I move such pods prior to issuing the drain command? I want to preserve the local data on these pods.
A good practice is to always start a Pod as a Deployment with specs.replicas: 1. It's very easy as the Deployment specs.template literally takes in your Pod specs, and quite convenient as the deployment will make sure your Pod is always running.
Then, assuming you'll only have 1 replica of your Pod, you can simply use a PersistentVolumeClaim and attach it to the pod as a volume, you do not need a StatefulSet in that case. Your data will be stored in the PVC, and whenever your Pod is moved over nodes for whatever reason it will reattach the volume automatically without loosing any data.
Now, if it's too late for you, and your Pod hasn't got a volume pointing to a PVC, you can still get ready to change that by implementing the Deployment/PVC approach, and manually copy data out of your current pod:
kubectl cp theNamespace/thePod:/the/path /somewhere/on/your/local/computer
Before copying it back to the new pod:
kubectl cp /somewhere/on/your/local/computer theNamespace/theNewPod:/the/path
This time, just make sure /the/path (to reuse the example above) is actually a Volume mapped to a PVC so you won't have to do that manually again!

Kubernetes deleting statefulset deletes pod inspite of cascade=false

I am upgrading from one version of a helm chart to a newer version which fails due to changes in one of the statefulsets. My solution to this was to delete the statefulset and then upgrade. I wanted to delete the stateful set without deleting the pod so i used the --cascade=false argument, but regardless the pod is immediately terminated upon the statefulsets deletion.
kc delete --cascade=false statefulsets/trendy-grasshopper-sts
Am I doing something wrong? How can I diagnose?