Basically I am trying to do is play around with pod lifecycle and check if we can do some cleanup/backup such as copy logs before the pod terminates.
What I need :
Copy logs/heapdumps from container to a hostPath/S3 before terminating
What I tried:
I used a preStop hook with a bash command to echo a message (just to see if it works !!). Used terminationGracePeriodSeconds with a delay to preStop and toggle them to see if the process works. Ex. keep terminationGracePeriodSeconds:30 sec (default) and set preStop command to sleep by 50 sec and the message should not be generated since the container will be terminated by then. This works as expected.
My questions:
what kind of processes are allowed(recommended) for a preStop hook? As copying logs/heapdumps of 15 gigs or more will take a lot of time. This time will then be used to define terminationGracePeriodSeconds
what happens when preStop takes more time than the set gracePeriod ?
(in case logs are huge say 10 gigs)
what happens if I do not have any hooks but still set terminationGracePeriodSeconds ? will the container remain up until that grace time ?
I found this article which closely relates to this but could not follow through https://github.com/kubernetes/kubernetes/issues/24695
All inputs appreciated !!
what kind of processes are allowed(recommended) for a preStop hook? As copying logs/heapdumps of 15 gigs or more will take a lot of time. This time will then be used to define terminationGracePeriodSeconds
Anything goes here, it's more of an opinion and how you would like your pods to linger around. Another option is to let your pods terminate and store your data in some place (i.e, AWS S3, EBS) where data will persist past the pod lifecycle then use something like Job to clean up the data, etc.
what happens when preStop takes more time than the set gracePeriod? (in case logs are huge say 10 gigs)
Your preStop will not complete which may mean incomplete data or data corruption.
what happens if I do not have any hooks but still set terminationGracePeriodSeconds ? will the container remain up until that grace time ?
This explains would be the sequence:
A SIGTERM signal is sent to the main process in each container, and a “grace period” countdown starts.
If a container doesn’t terminate within the grace period, a SIGKILL signal will be sent and the container.
Related
I 1 celery broker and several celery workers, all communicating with rabbitMQ. In my setup, I send several tasks to my celery workers, they'll process all the tasks (it takes ~1 hour), and then I'll manually terminate my celery workers.
I want to move towards a system where if a celery worker id 'idle' (which I define as: has 0 active tasks for a time period of timeout_seconds, which I will define beforehand), the worker will be terminated programatically. All workers will have approx the same # of tasks to run, and will all go 'idle' around the same time.
I have code set up that lets me terminate workers, but I am not sure how to detect that a worker is 'idle' and ready for termination. I think I want to use a signal but it doesn't look like there is one that fits my requirement
Here where I work we have a task that is doing basically what you want - automatically scales up/down the cluster depending on the "situation". The key in this process is the Celery inspect/control API, so I suggest you get familiar with it. This is an area that is not well-documented so start with the following:
insp = celery_app.control.inspect()
active_queues = insp.active_queues()
# Note: between these two calls some nodes may shut down and disappear
# from the dictionary so may need to deal with this...
active_stats = insp.active()
You can do this in a separate IPython session while your Celery cluster runs tasks, and look at what is there...
In Kubernetes cronjobs, It is stated in the limitations section that
Jobs may fail to run if the CronJob controller is not running or broken for a span of time from before the start time of the CronJob to start time plus startingDeadlineSeconds, or if the span covers multiple start times and concurrencyPolicy does not allow concurrency.
What I understand from this is that, If the startingDeadlineSeconds is set to 10 and the cronjob couldn't start for some reason at its scheduled time, then it can still be attempted to start again as long as those 10 seconds haven't passed, however, after the 10 seconds, it for sure won't be started, is this correct?
Also, If I have concurrencyPolicy set to Forbid, does K8s count it as a fail if a cronjob tries to be scheduled, when there is one already running?
After investigating the code base of the Kubernetes repo, so this is how the CronJob controller works:
The CronJob controller will check the every 10 seconds the list of cronjobs in the given Kubernetes Client.
For every CronJob, it checks how many schedules it missed in the duration from the lastScheduleTime till now. If there are more than 100 missed schedules, then it doesn't start the job and records the event:
"FailedNeedsStart", "Cannot determine if job needs to be started. Too many missed start time (> 100). Set or decrease .spec.startingDeadlineSeconds or check clock skew."
It is important to note, that if the field startingDeadlineSeconds is set (not nil), it will count how many missed jobs occurred from the value of startingDeadlineSeconds till now. For example, if startingDeadlineSeconds = 200, It will count how many missed jobs occurred in the last 200 seconds. The exact implementation of counting how many missed schedules can be found here.
In case there are not more than a 100 missed schedules from the previous step, the CronJob controller will check if the time now is not after the time of its scheduledTime + startingDeadlineSeconds , i.e. that it's not too late to start the job (passed the deadline). If it wasn't too late, the job will continue to be attempted to be started by the CronJob Controller. However, If it is already too late, then it doesn't start the job and records the event:
"Missed starting window for {cronjob name}. Missed scheduled time to start a job {scheduledTime}"
It is also important to note, that if the field startingDeadlineSeconds is not set, then it means there is no deadline at all. This means the job will be attempted to start by the CronJob controller without checking if it's later or not.
Therefore to answer the questions above:
1. If the startingDeadlineSeconds is set to 10 and the cronjob couldn't start for some reason at its scheduled time, then it can still be attempted to start again as long as those 10 seconds haven't passed, however, after the 10 seconds, it for sure won't be started, is this correct?
The CronJob controller will attempt to start the job and it will be successfully scheduled if the 10 seconds after it's schedule time haven't passed yet. However, if the deadline has passed, it won't be started this run, and it will be counted as a missed schedule in later executions.
2. If I have concurrencyPolicy set to Forbid, does K8s count it as a fail if a cronjob tries to be scheduled, when there is one already running?
Yes, it will be counted as a missed schedule. Since missed schedules are calculated as I stated above in point 2.
we'd like to scale in some of the running instances on which many kubernetes pods are running. So, we are going to gracefully stop the pods by using graceful period according to the official document termination-of-pods. I have read many blog posts and official document, they all tells how to gracefully terminate pod with graceful period. But they do not say how to determine how long the graceful period would be better.
Let's say, for example, a container in a pod may serves for thousands of requests in a time period and it will spend more than 30s to complete all request. I think in this case it would be a bad idea to set graceful period to 30s, because some of the request would be lost. However, when the user load is down and the same container in the same pod serves for only dozens of request in other time period and it only spend 5s to complete all, in this case 30s for graceful period would be too long.
That's my consideration. So, my question is as follows.
1. Is there any best practice to determine how long the graceful period is better?
2. Is there any approach to check if the processing request is completed in a container and then gracefully terminate pod?
3. Can I extend the initial graceful period after sending the termination command to a pod?
Thanks in advance.
The best way to determine the ideal graceful period is by observability. Put your service under a realistic production load and measure. This is highly project specific!
If the process with PID 1 exits before the graceful period your container will be marked as Terminated before the end of the graceful period, so it's worth setting a value slightly higher than what you would expect under normal circumstances.
You might be interested in letting your containers write arbitrary information when they terminate. Kubernetes has a feature called Termination messages you might want to look into.
I was wondering what would be the potential problem if I reduce the --update-period (whose default value is 1m0s) to about 5s (or even 1s)? I've watched a few video clips, and it seems the presenters implied that it's a bad idea to have a short period but did not explain why.
The reason why I want to make it shorter is that we sometimes prefer fast and a little risky transition, rather than safe and steady one. As far as I know, what rolling-update does is:
while the goal has not been achieved {
scale-up the new version
sleep as specified by --update-period
scale-down the old one
check deadline
}
From the above flow, I don't see any problem of not sleeping for a long time. Deadline checking is based on the timeout configuration, and so, it seems the only outcome of changing the --update-period would be iterating the loop more frequently.
One thing I have not fully understood is how scaling down is performed, but I assume that it still does graceful termination, such as sending SIGTERM and waiting for 30s until finally sending SIGKILL to the processes in the pod.
FYI, I'm using the Google Container Engine.
It should not be long, this is just a precaution in case a pod transitions to a Running state but crashes a couple of seconds later. If your update period is short, you'll keep deploying pods that are unstable eventually, and won't give the whole process enough time to notice.
If you're willing to take the risk it's totally fine to have a short update period.
Also, if you want true fast and reliable deployments you should check the Deployment API. The rolling update logic happens server side which increases the reliability and speed.
My problem description is follows:
I have n state based database infinite crawlers:
Currently how it is happening:
We are using single machine for crawling.
We have three level of priority queue. High, Medium and LOW.
At starting all Database job are put into lower level queue.
Worker reads a job from queue and do operation.
After finishing job it reschedule it with a delay of 5 minutes.
Solution I found
For Priority Queue I can use:
-
http://zookeeper.apache.org/doc/r3.2.2/recipes.html#sc_recipes_priorityQueues
Problem solution I am still searching are:
How to reschedule a job in queue with future schedule time. Is there
a way to do that in zookeeper ?
Canceling a already started job. Suppose user change his database
authentication details. I want to stop already running job for that
database and restart with new details.
What I thought is while starting a worker It will subscribe for that
it's znode changes and if something happen, It will stop that job and
reschedule it.
Infinite Queue
What I thought is that after finishing it will remove it from queue and
readd it with future schdule time. (It implementation depend on point 1)
Is it correct way of doing this task infinite task?