Can I execute a prestop hook only before deletion? - kubernetes

My Go-based custom resource operator needs some cleanup operations before it is deleted. It has to delete a specific znode from the ZooKeeper.
These operations must not be executed before regenerating resource. They have to be executed with the user's deletion command only. Thus, I can't use an ordinary prestop-hook.
Can I execute a prestop hook only before deletion? Or is there any other way for the operator to execute cleanup logic before the resource is deleted?

Can I execute a prestop hook only before deletion?
This is the whole purpose of the preStop hook. A pre-stop hook is executed immediately before the container is terminated. Once there as terminatino signal from API, Kubelet runs the pre-stop hook and afterwards sends the SIGTERM signal to the process.
Its design was to perform arbitrary operations before shutdown without having to implement those operations in the application itself. This is especially useful if you run some 3rd party app which code you can't modify.
Now the call the to terminate pod and invoke the hook can be due to API request, probes failed, resource contention and others.
For more reading please visit: Container Lifecycle Hooks

Related

How do it create a shutdown sequence for containers in a pod?

when a SIGTERM is received from k8s I want my sidecar to die only after the main container is finished. How do I create a dependency chain for shutdown routine?
I can use a preStop and wait for say 120 secs, but it is going to a constant 120 secs for the pods to be cleaned up.
I am wondering if there is a way for my side car to check if my main container was killed. Or is it possible for the main container to signal my side car when it has finished its clean up (through k8s instead of a code change in my main container.
In your preStop hook, check if your main container is still running (instead of just waiting a fixed amount of time.
If the main container has an existing healthprobe (for example an http endpoint) you can call that from within the preStop hook.
Another option is to edit the entrypoint script in your main container to include a trap command that creates a file on shutdown. The preStop hook of your sidecar to wait for this file be present.
trap "touch /lifecycle/main-pod-terminated" SIGTERM
If using the preStop hook, just keep the grace period countdown in mind and increase the default (30 seconds) if required by setting an appropriate value of terminationGracePeriodSeconds
Pod's termination grace period countdown begins before the preStop
hook is executed, so regardless of the outcome of the handler, the
container will eventually terminate within the Pod's termination grace
period

Will mongock work correctly with kubernetes replicas?

Mongock looks very promising. We want to use it inside a kubernetes service that has multiple replicas that run in parallel.
We are hoping that when our service is deployed, the first replica will acquire the mongockLock and all of its ChangeLogs/ChangeSets will be completed before the other replicas attempt to run them.
We have a single instance of mongodb running in our kubernetes environment, and we want the mongock ChangeLogs/ChangeSets to execute only once.
Will the mongockLock guarantee that only one replica will run the ChangeLogs/ChangeSets to completion?
Or do I need to enable transactions (or some other configuration)?
I am going to provide the short answer first and then the long one. I suggest you to read the long one too in order to understand it properly.
Short answer
By default, Mongock guarantees that the ChangeLogs/changeSets will be run only by one pod at a time. The one owning the lock.
Long answer
What really happens behind the scenes(if it's not configured otherwise) is that when a pod takes the lock, the others will try to acquire it too, but they can't, so they are forced to wait for a while(configurable, but 4 mins by default)as many times as the lock is configured(3 times by default). After this, if i's not able to acquire it and there is still pending changes to apply, Mongock will throw an MongockException, which should mean the JVM startup fail(what happens by default in Spring).
This is fine in Kubernetes, because it ensures it will restart the pods.
So now, assuming the pods start again and changeLogs/changeSets are already applied, the pods start successfully because they don't even need to acquire the lock as there aren't pending changes to apply.
Potential problem with MongoDB without transaction support and Frameworks like Spring
Now, assuming the lock and the mutual exclusion is clear, I'd like to point out a potential issue that needs to be mitigated by the the changeLog/changeSet design.
This issue applies if you are in an environment such as Kubernetes, which has a pod initialisation time, your migration take longer than that initialisation time an the Mongock process is executed before the pod becomes ready/health(and it's a condition for it). This last condition is highly desired as it ensures the application runs with the right version of the data.
In this situation imagine the Pod starts the Mongock process. After the Kubernetes initialisation time, the process is still not finished, but Kubernetes stops the JVM abruptly. This means that some changeSets were successfully executed, some other not even started(no problem, they will be processed in the next attempt), but one changeSet was partially executed and marked as not done. This is the potential issue. The next time Mongock runs, it will see the changeSet as pending and it will execute it from the beginning. If you haven't designed your changeLogs/changeSets accordingly, you may experience some unexpected results because some part of the data process covered by that changeSet has already taken place and it will happen again.
This, somehow needs to be mitigated. Either with the help of mechanisms like transactions, with a changeLog/changeSet design that takes this into account or both.
Mongock currently provides transactions with “all or nothing”, but it doesn’t really help much as it will retry every time from scratch and will probably end up in an infinite loop. The next version 5 will provide transactions per ChangeLogs and changeSets, which together with good organisation, is the right solution for this.
Meanwhile this issue can be addressed by following this design suggestions.
Just to follow up... Mongock's locking mechanism works fine with replicas. To solve the "long-running script" problem, we will run our Mongock scripts from Kubernetes initContainer. K8s will wait for the initContainers to finish before it starts the pod's main service containers.
For transactions, we will follow the advice above of making our scripts idempotent.

What does shutdown look like for a kubernetes cron job pod when it's being terminated by "replace" concurrency policy?

I couldn't find anything in the official kubernetes docs about this. What's the actual low-level process for replacing a long-running cron job? I'd like to understand this so my application can handle it properly.
Is it a clean SIGHUP/SIGTERM signal that gets sent to the app that's running?
Is there a waiting period after that signal gets sent, so the app has time to cleanup/shutdown before it potentially gets killed? If so, what's that timeout in seconds? Or does it wait forever?
For reference, here's the Replace policy explanation in the docs:
https://kubernetes.io/docs/tasks/job/automated-tasks-with-cron-jobs/
Concurrency Policy
Replace: If it is time for a new job run and the previous job run hasn’t finished yet, the cron job replaces the currently running job run with a new job run
A CronJob has just another Pod underneath.
When a Cronjob with a concurrency policy of "Replace" is still active, the Job will be deleted, which also deletes the Pod.
When a Pod is deleted the Linux container/s will be sent a SIGTERM and then a SIGKILL after a grace period, defaulted to 30 seconds. The terminationGracePeriodSeconds property in a PodSpec can be set to override that default.
Due to the flag added to the DeleteJob call, it sounds like this delete is only deleting values from the kube key/value store. Which would mean the new Job/Pod could be created while the current Job/Pod is still terminating. You could confirm with a Job that doesn't respect a SIGTERM and has a terminationGracePeriodSeconds set to a few times your clusters scheduling speed.

How to implement a post process container in one Kubernetes' pod

I have a golang server which is used to deploy AI model trainings into kubernetes that every training would run their job on a pod. After job is completed, my server need to upload the model output to HDFS/S3.
So I need a post-container to process the upload task like init-container doing init task on k8s pod.
For now, i use a tricky way that adding the model job container into init-containers and running the upload task container in containers. This works if there's no error thrown in init-containers. However, if there's errors in init-conttainer the pod status is Init:ContainerCannotRun which should be Failed in normal.
I know i can attach a preStop command to container lifecycle events if the image container the upload-hdfs/s3 command tool. However I do not want to let the model training images to include these commands. So this is not my answer.
So my question is that how to implement a post process container so that i can run the upload task after job is completed?
I also find a related issue in github, i would try it if there's no other choice.
There are two ways in Kubernetes to cope with it - preStop hook and the code which will listen for the SIGTERM signal.
Also, you can increase the termination grace period to have more time before pod will be terminated.
More information you can find in Attach Handlers to Container Lifecycle Events and Kubernetes best practices: terminating with grace.

Gracefully draining sessions from Dropwizard application using Kubernetes

I have a Dropwizard application that holds short-lived sessions (phone calls) in memory. There are good reasons for this, and we are not changing this model in the near term.
We will be setting up Kubernetes soon and I'm wondering what the best approach would be for handling shutdowns / rolling updates. The process will need to look like this:
Remove the DW node from the load balancer so no new sessions can be started on this node.
Wait for all remaining sessions to be completed. This should take a few minutes.
Terminate the process / container.
It looks like kubernetes can handle this if I make step 2 a "preStop hook"
http://kubernetes.io/docs/user-guide/pods/#termination-of-pods
My question is, what will the preStop hook actually look like? Should I set up a DW "task" (http://www.dropwizard.io/0.9.2/docs/manual/core.html#tasks) that will just wait until all sessions are completed and CURL it from kubernetes? Should I put a bash script that polls some sessionCount service until none are left in the docker container with the DW app and execute that?
Assume you don't use the preStop hook, and a pod deletion request has been issued.
API server processes the deletion request and modify the pod object.
Endpoint controller observes the change and removes the pod from the list of endpoints.
On the node, a SIGTERM signal is sent to your container/process.
Your process should trap the signal and drains all existing requests. Note that this step should not take more than the defined TerminationGracePeriod defined in your pod spec.
Alternatively, you can use the preStop hook which blocks until all the requests are drained. Most likely, you'll need a script to accomplish this.