I need a sidecar in kubernetes cluster which runs initialization script and then terminates or sleep forever. I can't do this in postStart as postStart does not guarantee the CMD / ENTRYPOINT in main container has started. Any advice and insight is appreciated.
I use Job to accomplish the task.
apiVersion: batch/v1
kind: Job
Related
I need to run this kubernets Cronjob inside the container which is already running in my existing pod.
But This Cronjob always creating a Pod and terminating it based on scheduler.
Is that possible to run the kubernets cron inside the existing pod container ?
or
In existing pod can we run this kubernets cron as container ?
You can trigger a kubernetes cronjob from within a pod, but it defeats the object of the cronjob resource.
There are alternatives to scheduling stuff. Airflow, as m_vemuri is one option. We've got a few cases here where we have actually setup a pod that runs crond, because we have a job that runs every minute, and the time it takes to scale up the pod, pull the image, run the job and then terminate the pod is often more than the 1 minute between runs.
Before I start all the services I need a pod that would do some initialization. But I dont want this pod to be running after the init is done and also all the other pods/services should start after this init is done. I am aware of init containers in a pod, but I dont think that would solve this issue, as I want the pod to exit after initialization.
You are recommended to let Kubernetes handle pods automatically instead of manual management by yourself, when you can. Consider Job for run-once tasks like this:
apiVersion: batch/v1
kind: Job
metadata:
name: myjob
spec:
template:
spec:
restartPolicy: Never
containers:
- name:
image:
$ kubectl apply -f job.yml
Kubernetes will create a pod to run that job. The pod will complete as soon as the job exits. Note that the completed job and its pod will be released from consuming resources but not removed completely, to give you a chance to examine its status and log. Deleting the job will remove the pod completely.
Jobs can do advanced things like restarting on failure with exponential backoff, running tasks in parallelism, and limiting the time it runs.
It depend on the Init task, but the init container is the best option you have (https://kubernetes.io/docs/concepts/workloads/pods/init-containers/).
Kubernetes will create the initContainer before the other container, and when it accomplished it's task, it will exit.
Make the InitContainer exit gracefully (with a code 0) so that k8s will understand that the container completed the task it was meant to do and do no try to restart it.
You init task will be done, and the container will no longer exist
You can try attaching handlers to Container lifecycle events. Kubernetes supports the postStart and preStop events. Kubernetes sends the postStart event immediately after a Container is started, and it sends the preStop event immediately before the Container is terminated.
https://kubernetes.io/docs/tasks/configure-pod-container/attach-handler-lifecycle-event/
https://kubernetes.io/docs/concepts/containers/container-lifecycle-hooks/
I need to scale a set of pods that run queue-based workers. Jobs for workers can run for a long time (hours) and should not get interrupted. The number of pods is based on the length of the worker queue. Scaling would be either using the horizontal autoscaler using custom metrics, or a simple controller that changes the number of replicas.
Problem with either solution is that, when scaling down, there is no control over which pod(s) get terminated. At any given time, most workers are likely working on short running jobs, idle, or (more rare) processing a long running job. I'd like to avoid killing the long running job workers, idle or short running job workers can be terminated without issue.
What would be a way to do this with low complexity? One thing I can think of is to do this based on CPU usage of the pods. Not ideal, but it could be good enough. Another method could be that workers somehow expose a priority indicating whether they are the preferred pod to be deleted. This priority could change every time a worker picks up a new job though.
Eventually all jobs will be short running and this problem will go away, but that is a longer term goal for now.
Since version 1.22 there is a beta feature that helps you do that. You can add the annotation controller.kubernetes.io/pod-deletion-cost with a value in the range [-2147483647, 2147483647] and this will cause pods with lower value to be killed first. Default is 0, so anything negative on one pod will cause a pod to get killed during downscaling, e.g.
kubectl annotate pods my-pod-12345678-abcde controller.kubernetes.io/pod-deletion-cost=-1000
Link to discussion about the implementation of this feature: Scale down a deployment by removing specific pods (PodDeletionCost) #2255
Link to the documentation: ReplicaSet / Pod deletion cost
During the process of termination of a pod, Kubernetes sends a SIGTERM signal to the container of your pod. You can use that signal to gracefully shutdown your app. The problem is that Kubernetes does not wait forever for your application to finish and in your case your app may take a long time to exit.
In this case I recommend you use a preStop hook, which is completed before Kubernetes sends the KILL signal to the container. There is an example here on how to use handlers:
apiVersion: v1
kind: Pod
metadata:
name: lifecycle-demo
spec:
containers:
- name: lifecycle-demo-container
image: nginx
lifecycle:
postStart:
exec:
command: ["/bin/sh", "-c", "echo Hello from the postStart handler > /usr/share/message"]
preStop:
exec:
command: ["/bin/sh","-c","nginx -s quit; while killall -0 nginx; do sleep 1; done"]
There is a kind of workaround that can give some control over the pod termination.
Not quite sure if it the best practice, but at least you can try it and test if it suits your app.
Increase the Deployment grace period with terminationGracePeriodSeconds: 3600 where 3600 is the time in seconds of the longest possible task in the app. This makes sure that the pods will not be terminated by the end of the grace period. Read the docs about the pod termination process in detail.
Define a preStop handler. More details about lifecycle hooks can be found in docs as well as in the example. In my case, I've used the script below to create the file which will later be used as a trigger to terminate the pod (probably there are more elegant solutions).
lifecycle:
preStop:
exec:
command: ["/bin/sh", "-c", "touch /home/node/app/preStop"]
Stop your app running as soon as the condition is met. When the app exits the pod terminates as well. It is not possible to end the process with PID 1 from preStop shell script so you need to add some logic to the app to terminate itself. In my case, it is a NodeJS app, there is a scheduler that is running every 30 seconds and checks whether two conditions are met. !isNodeBusy identifies whether it is allowed to finish the app and fs.existsSync('/home/node/app/preStop') whether preStop hook was triggered. It might be different logic for your app but you get the basic idea.
schedule.scheduleJob('*/30 * * * * *', () => {
if(!isNodeBusy && fs.existsSync('/home/node/app/preStop')){
process.exit();
}
});
Keep in mind that this workaround works only with voluntary disruptions and obviously not helpful with involuntary disruptions. More info in docs.
I think running this type of workload using a Deployment or similar, and using a HorizontalPodAutoscaler for scaling, is the wrong way to go. One way you could go about this is to:
Define a controller (this could perhaps be a Deployment) whose task is to periodically create a Kubernetes Job object.
The spec of the Job should contain a value for .spec.parallelism equal to the maximum number of concurrent executions you will accept.
The Pods spawned by the Job then run your processing logic. They should each pull a message from the queue, process it, and then delete it from the queue (in the case of success).
The Job must exit with the correct status (success or failure). This ensures that the Job recognises when the processing has completed, and so will not spin up additional Pods.
Using this method, .spec.parallelism controls the autoscaling based on how much work there is to be done, and scale-down is an automatic benefit of using a Job.
You are looking for Pod Priority and Preemption. By configuring a high priority PriorityClass for your pods you can ensure that they won't be removed to make space for other pods with a lower priority.
Create a new PriorityClass
apiVersion: scheduling.k8s.io/v1
kind: PriorityClass
metadata:
name: high-priority
value: 1000000
globalDefault: false
description: "This priority class will not cause other pods to be preempted."
Set your new PriorityClass in your pods
priorityClassName: high-priority
The value: 1000000 in the PriorityClass configures the scheduling priority of the pod. The higher the value the more important the pod is.
For those who lands on this page facing the issues of Pods getting killed while Node scaling down -
This is an expected feature of Cluster Autoscaler as CA will try to optimize the pods so that it could use a minimum size of the cluster.
However, You can protect your Job pods from eviction (getting killed) by creating a PodDisruptionBudget with maxUnavailable=0 for them.
apiVersion: policy/v1
kind: PodDisruptionBudget
metadata:
name: sample-pdb
spec:
maxUnavailable: 0
selector:
matchLabels:
app: <your_app_name>
I have a container that runs some data fetching from a MySQL database and simply displays the result in console.log(), and want to run this as a cron job in GKE. So far I have the container working on my local machine, and have successfully deployed this to GKE (in terms of there being no errors thrown so far as I can see).
However, the pods that were created were just left as Running instead of stopping after completion of the task. Are the pods supposed to stop automatically after executing all the code, or do they require explicit instruction to stop and if so what is the command to terminate a pod after creation (by the Cron Job)?
I'm reading that there is supposedly some kind of termination grace period of ~30s by default, but after running a minutely-executed cronjob for ~20minutes, all the pods were still running. Not sure if there's a way to terminate the pods from inside the code, otherwise it would be a little silly to have a cronjob generating lots of pods left running idly..My cronjob.yaml below:
apiVersion: batch/v1beta1
kind: CronJob
metadata:
name: test
spec:
schedule: "5 * * * *"
jobTemplate:
spec:
template:
spec:
containers:
- name: test
image: gcr.io/project/test:v1
# env:
# - name: "DELAY"
# value: 15
restartPolicy: OnFailure
A CronJob is essentially a cookie cutter for jobs. That is, it knows how to create jobs and execute them at a certain time. Now, that being said, when looking at garbage collection and clean up behaviour of a CronJob, we can simply look at what the Kubernetes docs have to say about this topic in the context of jobs:
When a Job completes, no more Pods are created, but the Pods are not deleted either. Keeping them around allows you to still view the logs of completed pods to check for errors, warnings, or other diagnostic output. The job object also remains after it is completed so that you can view its status. It is up to the user to delete old jobs after noting their status. Delete the job with kubectl (e.g. kubectl delete jobs/pi or kubectl delete -f ./job.yaml).
Adding a process.kill(); line in the code to explicitly end the process after the code has finished executing allowed the pod to automatically stop after execution
A job in Kubernetes is intended to run a single instance of a pod and ensure it runs to completion. As another answer specifies, a CronJob is a factory for Jobs which knows how and when to spawn a job according to the specified schedule.
Accordingly, and unlike a service which is intended to run forever, the container(s) in the pod created by the pod must exit upon completion of the job. There is a notable problem with the sidecar pattern which often requires manual pod lifecycle handling; if your main pod requires additional pods to provide logging or database access, you must arrange for these to exit upon completion of the main pod, otherwise they will remain running and k8s will not consider the job complete. In such circumstances, the pod associated with the job will never terminate.
The termination grace period is not applicable here: this timer applies after Kubernetes has requested that your pod terminate (e.g. if you delete it). It specifies the maximum time the pod is afforded to shutdown gracefully before the kubelet will summarily terminate it. If Kubernetes never considers your job to be complete, this phase of the pod lifecycle will not be entered.
Furthermore, old pods are kept around after completion for some time to allow perusal of logs and such. You may see pods listed which are not actively running and so not consuming compute resources on your worker nodes.
If your pods are not completing, please provide more information regarding the code they are running so we can assist in determining why the process never exits.
I'm running Kubernetes in a GKE cluster and need to run a DB migration script on every deploy. For staging this is easy: we have a permanent, separate MySQL service with its own volume. For production however we make use of GCE SQL, resulting in the job having two containers - one more for the migration, and the other for Cloud Proxy.
Because of this new container, the job always shows as 1 active when running kubectl describe jobs/migration and I'm at a complete loss. I have tried re-ordering the containers to see if it checks one by default but that made no difference and I cannot see a way to either a) kill a container or b) check the status of just one container inside the Job.
Any ideas?
I know it's a year too late, but best practice would be to run single cloudsql proxy service for all app's purposes, and then configure DB access in app's image to use this service as a DB hostname.
This way you will not require putting cloudsql proxy container into every pod which uses DB.
each Pod can be configured with a init container which seems to be a good fit for your issue. So instead of having a Pod with two containers which have to run permanently, you could rather define a init container to do your migration upfront. E.g. like this:
apiVersion: v1
kind: Pod
metadata:
name: init-container
annotations:
pod.beta.kubernetes.io/init-containers: '[
{
"name": "migrate",
"image": "application:version",
"command": ["migrate up"],
}
]'
spec:
containers:
- name: application
image: application:version
ports:
- containerPort: 80
You haven't posted enough details about your specific problem. But I'm taking a guess based on experience.
TL;DR: Move your containers into separate jobs if they are independent.
--
Kubernetes jobs keep restarting till the job succeeds.
A kubernetes job will succeed only if every container within succeeds.
This means that your containers should be return in a restart proof way. Once a container sucessfully runs, it should return a success even if it runs again. Otherwise, say container1 is successful, container2 fails. Job restarts. Then, container1 fails (because it has already been successful). Hence, Job keeps restarting.
The reason is the container/process never terminates.
One possible work around is: move the cloud-sql-proxy to it's own deployment - and add a service in front of that. Hence your job won't be responsible for running the long running cloud-sql-proxy and hence will terminate / complete.