Kubernetes CronJobs - not start at the top of the minute - kubernetes

I have a couple (at some point many) k8s cron jobs, each with schedules like
*/5 * * * * # Every five minutes
*/1 * * * * # Every minute
*/1 * * * *
*/2 * * * * # Every two minutes
...
My problem is that k8s seems to start them all at the top of the minute, so there might be a large number of jobs running at the same time. Each job only takes <10 seconds to run, so ideally, I would like to be able to distribute them over the span of a minute, i.e. delay their start.
Any ideas how to do that, given that k8s does not second-based schedule expressions?

You cant start a job indicating the seconds.
What you can do is delaying each job with sleep .
The pods of the cronjobs would start however together, but you can distributed the load of the jobs in this way.
apiVersion: batch/v1
kind: CronJob
metadata:
name: myminutlyCronjob
spec:
schedule: "1 * * * *"
jobTemplate:
spec:
template:
spec:
containers:
- name: mycronjob
image: myjobImage
imagePullPolicy: IfNotPresent
command: ["/bin/sh", "-c"]
args:
- sleep 10;
mycronjobStartCommand;
restartPolicy: OnFailure
Hope this can help you.

Related

how to run a cronjob every 10 seconds in kubernetes?

"I just want to run a cronjob in Kubernetes in every 10 seconds. what would be the imperative command for that?"
You can’t use CronJob kubernetes object for running less than 1 minute. You might be using the wrong tool for a process that has to run so often. https://kubernetes.io/docs/concepts/workloads/controllers/cron-jobs/
Create an infinite loop on a Deployment (daemonize it)
You’ll need to use a bash formula (or whatever programming language you like best, Go, Java, Python or Ruby) to make an infinite loop and sleep 10 seconds per each execution inside a Deployment. Here an example with bash/sh:
apiVersion: apps/v1
kind: Deployment
metadata:
name: cronjob-deployment
labels:
app: cronjob
spec:
replicas: 1
selector:
matchLabels:
app: cronjob
template:
metadata:
labels:
app: cronjob
spec:
containers:
- name: cronjob
image: busybox
args:
- /bin/sh
- -c
- while true; do echo call ./script.sh here; sleep 10; done
Create 1 CronJob with several containers
If you still want to use CronJobs you can do it with 6 containers inside the definition. One without delay, and the others with 10, 20, 30, 40 and 50 seconds of delay.
apiVersion: batch/v1beta1
kind: CronJob
metadata:
name: hello
spec:
schedule: "*/1 * * * *"
jobTemplate:
spec:
template:
spec:
containers:
- name: no_delay
image: busybox
args:
- /bin/sh
- -c
- echo call ./script.sh here
- name: 10_seconds
image: busybox
args:
- /bin/sh
- -c
- sleep 10; echo call ./script.sh here
- name: 20_seconds
image: busybox
args:
- /bin/sh
- -c
- sleep 20; echo call ./script.sh here
- name: 30_seconds
image: busybox
args:
- /bin/sh
- -c
- sleep 30; echo call ./script.sh here
- name: 40_seconds
image: busybox
args:
- /bin/sh
- -c
- sleep 40; echo call ./script.sh here
- name: 50_seconds
image: busybox
args:
- /bin/sh
- -c
- sleep 50; echo call ./script.sh here
restartPolicy: OnFailure
Of course, one of the problems you might encounter is that your process might be overlapped (runned concurrently at the same time). This will depend on the amount of seconds your process needs to run, and the time kubernetes needs to schedule and create a container.
If your task needs to run that frequently, cron is the wrong tool.
Aside from the fact that it simply won't launch jobs that frequently, you also risk some serious problems if the job takes longer to run than the interval between launches. Rewrite your task to daemonize and run persistently, then launch it from cron if necessary (while making sure that it won't relaunch if it's already running).
You can write a script that executes for 6 times with an interval of 10 seconds.
and set Kubernetes cron job to run every minute.
in that manner, in every minute your script starts running which in turn execute the task in every 10 seconds.
script to run logic in every 10 seconds for 6 times when cron job executes after one minute.
This will print hello world in every 10 seconds for 6 times :
#!/bin/bash -x
a=0
until [ $a -gt 5 ]
do
echo "hello world"
a=expr $a + 1
sleep 10
done
cronjob sample :
apiVersion: batch/v1beta1
kind: CronJob
metadata:
name: hello
spec:
schedule: "*/1 * * * *"
jobTemplate:
spec:
template:
spec:
containers:
- name: hello
image:
imagePullPolicy: IfNotPresent
command:
- /bin/sh
- -c
- ./sample.sh
restartPolicy: OnFailure
~
So in that way your cron job executes in every one minute .which in turns starts your srcipt which runs in every 10 seconds and execute buisness logic for 6 minutes.
This is the idea which you can follow to make cron job work in seconds as Kubernetes does not provide value for scheduling lower than 1 minute.
Although in this approach you need to set the strategy of not overlapping next execution of cron job.
for example, if your business logic takes 15 seconds to executes and you are running business logic every 10 seconds 6 times in a minute.
As business logic takes 15seconds so ideally, it should run for 4 times rather 6 times in a minute. Accordingly, you need to tweak the repetition inside the script.

Handling cronjobs in a Pod with multiple containers

I have a requirement in which I need to create a cronjob in kubernetes but the pod is having multiple containers (with single container its working fine).
Is it possible?
The requirement is something like this:
1. First container: Run the shell script to do a job.
2. Second container: run fluentbit conf to parse the log and send it.
Previously I thought to have a deployment in place and that is working fine but since that deployment was used just for 10 mins jobs I thought to make it a cron job.
Any help is really appreciated.
Also about the cronjob I am not sure if a pod can support multiple containers to do that same.
Thank you,
Sunny
Yes you can create a cronjob with multiple containers. CronJob is an abstraction on top of pod. So in the pod spec you can have multiple containers just like you can have in a normal pod. As an example
apiVersion: batch/v1beta1
kind: CronJob
metadata:
name: hello
namespace: default
spec:
schedule: "*/1 * * * *"
jobTemplate:
spec:
template:
spec:
containers:
- name: hello
image: busybox
args:
- /bin/sh
- -c
- date; echo Hello from the Kubernetes cluster
- name: app
image: alpine
command:
- echo
- Hello World!
restartPolicy: OnFailure
I need to agree with the answer provided by #Arghya Sadhu. It shows how you can run multi container Pod with a CronJob. Before the answer I would like to give more attention to the comment provided by #Chris Stryczynski:
It's not clear whether the containers are run in parallel or sequentially
It is not entirely clear if the workload that you are trying to run:
The requirement is something like this:
First container: Run the shell script to do a job.
Second container: run fluentbit conf to parse the log and send it.
could be used in parallel (both running at the same time) or require sequential approach (after X completed successfully, run Y).
If the workload could be run in parallel the answer provided by #Arghya Sadhu is correct, however if one workload is depending on another, I'd reckon you should be using initContainers instead of multi container Pods.
The example of a CronJob that implements the initContainer could be following:
apiVersion: batch/v1beta1
kind: CronJob
metadata:
name: hello
spec:
schedule: "*/1 * * * *"
jobTemplate:
spec:
template:
spec:
restartPolicy: Never
containers:
- name: ubuntu
image: ubuntu
command: [/bin/bash]
args: ["-c","cat /data/hello_there.txt"]
volumeMounts:
- name: data-dir
mountPath: /data
initContainers:
- name: echo
image: busybox
command: ["bin/sh"]
args: ["-c", "echo 'General Kenobi!' > /data/hello_there.txt"]
volumeMounts:
- name: data-dir
mountPath: "/data"
volumes:
- name: data-dir
emptyDir: {}
This CronJob will write a specific text to a file with an initContainer and then a "main" container will display its result. It's worth to mention that the main container will not start if the initContainer won't succeed with its operations.
$ kubectl logs hello-1234567890-abcde
General Kenobi!
Additional resources:
Linchpiner.github.io: K8S multi container pods
Whats about sidecar container for logging as second container which keep running without exit code. Even the job might run the state of the job still failed.

K8S - Use "Last Schedule" of a CronJob as a parameters in container command

I created a Cronjob that is scheduled for every morning, from Monday to Friday.
schedule: "30 7 * * 1-5"
The job synchronize a database with a datawarehouse. As I do not want to sync all the database everyday, I want to be able to give two parameters in the command container's section of my CronJob ; last time the job started and now:
containers:
- image: xxx
name: dwh
command: ["/bin/bash", "-c", "rake etl:sync LAST_TIME_SCHEDULED NOW"]
resources: {}
How can I get the "Last Schedule" timestamp of the CronJob in this situation, and the timestamp of now ?

Pass in Dynamic Formatted Datetime to K8s Container Config

I have a CronJob that runs a process in a container in Kubernetes.
This process takes in a time window that is defined by a --since and --until flag. This time window needs to be defined at container start time (when the cron is triggered) and is a function of the current time. An example running this process would be:
$ my-process --since=$(date -v -1H +"%Y-%m-%dT%H:%M:%SZ") --until=$(date -v +1H +"%Y-%m-%dT%H:%M:%SZ")
So for the example above, I would like the time window to be from 1 hour ago to 1 hour in the future. Is there a way in Kubernetes to pass in a formatted datetime as a command argument to a process?
An example of what I am trying to do would be the following config:
apiVersion: batch/v1beta1
kind: CronJob
metadata:
name: my-process
spec:
schedule: "*/2 * * * *"
jobTemplate:
spec:
template:
spec:
containers:
- name: my-process
image: my-image
args:
- my-process
- --since=$(date -v -1H +"%Y-%m-%dT%H:%M:%SZ")
- --until=$(date -v +1H +"%Y-%m-%dT%H:%M:%SZ")
When doing this, the literal string "$(date -v -1H +"%Y-%m-%dT%H:%M:%SZ")" would be passed in as the --since flag.
Is something like this possible? If so, how would I do it?
Note that in your CronJob you don't run bash or any other shell and command substitution is a shell feature and without one will not work. In your example only one command my-process is started in the container and as it is not a shell, it is unable to perform command substitution.
This one:
$ my-process --since=$(date -v -1H +"%Y-%m-%dT%H:%M:%SZ") --until=$(date -v +1H +"%Y-%m-%dT%H:%M:%SZ")
will work properly because it is started in a shell so it may take advantage of shell features such as mentioned command substitution
One thing: date -v -1H +"%Y-%m-%dT%H:%M:%SZ" doesn't expand properly in bash shell with default GNU/Linux date implementation. Among others -v option is not recognized so I guess you're using it on MacOSX or some kind of BSD system. In my examples below I will use date version that works on Debian.
So for testing it on GNU/Linux it will be something like this:
date --date='-1 hour' +"%Y-%m-%dT%H:%M:%SZ"
For testing purpose I've tried it with simple CronJob from this example with some modifications:
apiVersion: batch/v1beta1
kind: CronJob
metadata:
name: hello
spec:
schedule: "*/1 * * * *"
jobTemplate:
spec:
template:
spec:
containers:
- name: hello
image: debian
env:
- name: FROM
value: $(date --date="-1 hour" +"%Y-%m-%dT%H:%M:%SZ")
- name: TILL
value: $(date --date="+1 hour" +"%Y-%m-%dT%H:%M:%SZ")
args:
- /bin/sh
- -c
- date; echo from $(FROM) till $(TILL)
restartPolicy: OnFailure
It works properly. Below you can see the result of CronJob execution:
$ kubectl logs hello-1569947100-xmglq
Tue Oct 1 16:25:11 UTC 2019
from 2019-10-01T15:25:11Z till 2019-10-01T17:25:11Z
Apart from the example with use of environment variables I tested it with following code:
apiVersion: batch/v1beta1
kind: CronJob
metadata:
name: hello
spec:
schedule: "*/1 * * * *"
jobTemplate:
spec:
template:
spec:
containers:
- name: hello
image: debian
args:
- /bin/sh
- -c
- date; echo from $(date --date="-1 hour" +"%Y-%m-%dT%H:%M:%SZ") till $(date --date="+1 hour" +"%Y-%m-%dT%H:%M:%SZ")
restartPolicy: OnFailure
and as you can see here command substitution also works properly:
$ kubectl logs hello-1569949680-fk782
Tue Oct 1 17:08:09 UTC 2019
from 2019-10-01T16:08:09Z till 2019-10-01T18:08:09Z
It works properly because in both examples first we spawn bash shell in our container and subsequently it runs other commands as simple echo provided as its argument. You can use your my-process command instead of echo only you'll need to provide it in one line with all its arguments, like this:
args:
- /bin/sh
- -c
- my-process --since=$(date -v -1H +"%Y-%m-%dT%H:%M:%SZ") --until=$(date -v +1H +"%Y-%m-%dT%H:%M:%SZ")
This example will not work as there is no shell involved. echo command not being a shell will not be able to perform command substitution which is a shell feature:
apiVersion: batch/v1beta1
kind: CronJob
metadata:
name: hello
spec:
schedule: "*/1 * * * *"
jobTemplate:
spec:
template:
spec:
containers:
- name: hello
image: debian
args:
- /bin/echo
- from $(date --date="-1 hour" +"%Y-%m-%dT%H:%M:%SZ") till $(date --date="+1 hour" +"%Y-%m-%dT%H:%M:%SZ")
restartPolicy: OnFailure
and the results will be a literal string:
$ kubectl logs hello-1569951180-fvghz
from $(date --date="-1 hour" +"%Y-%m-%dT%H:%M:%SZ") till $(date --date="+1 hour" +"%Y-%m-%dT%H:%M:%SZ")
which is similar to your case as your command, like echo isn't a shell and it cannot perform command substitution.
To sum up: The solution for that is wrapping your command as a shell argument. In first two examples echo command is passed along with other commands as shell argument.
Maybe it is visible more clearly in the following example with a bit different syntax:
apiVersion: batch/v1beta1
kind: CronJob
metadata:
name: hello
spec:
schedule: "*/1 * * * *"
jobTemplate:
spec:
template:
spec:
containers:
- name: hello
image: debian
command: ["/bin/sh","-c"]
args: ["FROM=$(date --date='-1 hour' +'%Y-%m-%dT%H:%M:%SZ'); TILL=$(date --date='+1 hour' +'%Y-%m-%dT%H:%M:%SZ') ;echo from $FROM till $TILL"]
restartPolicy: OnFailure
man bash says:
-c If the -c option is present, then commands are read from the first non-option argument command_string.
so command: ["/bin/sh","-c"] basically means run a shell and execute following commands which then we pass to it using args. In bash commands should be separated with semicolon ; so they are run independently (subsequent command is executed no matter what was the result of executing previous command/commands).
In the following fragment:
args: ["FROM=$(date --date='-1 hour' +'%Y-%m-%dT%H:%M:%SZ'); TILL=$(date --date='+1 hour' +'%Y-%m-%dT%H:%M:%SZ') ;echo from $FROM till $TILL"]
we provide to /bin/sh -c three separate commands:
FROM=$(date --date='-1 hour' +'%Y-%m-%dT%H:%M:%SZ')
which sets FROM environment variable to result of execution of date --date='-1 hour' +'%Y-%m-%dT%H:%M:%SZ' command,
TILL=$(date --date='+1 hour' +'%Y-%m-%dT%H:%M:%SZ')
which sets TILL environment variable to result of execution of date --date='+1 hour' +'%Y-%m-%dT%H:%M:%SZ' command
and finally we run
echo from $FROM till $TILL
which uses both variables.
Exactly the same can be done with any other command.

Kubernetes Cronjob Only Runs Half the Time

I want a job to trigger every 15 minutes but it is consistently triggering every 30 minutes.
UPDATE:
I've simplified the problem by just running:
kubectl run hello --schedule="*/1 * * * *" --restart=OnFailure --image=busybox -- /bin/sh -c "date; echo Hello from the Kubernetes cluster"
As specified in the docs here: https://kubernetes.io/docs/tasks/job/automated-tasks-with-cron-jobs/
and yet the job still refuses to run on time.
$ kubectl get cronjobs
NAME SCHEDULE SUSPEND ACTIVE LAST SCHEDULE AGE
hello */1 * * * * False 1 5m 30m
hello2 */1 * * * * False 1 5m 12m
It took 25 minutes for the command line created cronjob to run and 7 minutes for the cronjob created from yaml. They were both finally scheduled at the same time so it's almost like etcd finally woke up and did something?
ORIGINAL ISSUE:
When I drill into an active job I see Status: Terminated: Completed but
Age: 25 minutes or something greater than 15.
In the logs I see that the python script meant to run has completed it's final print statement. The script takes about ~2min to complete based on it's output file in s3. Then no new job is scheduled for 28 more minutes.
I have tried with different configurations:
Schedule: */15 * * * * AND Schedule: 0,15,30,45 * * * *
As well as
Concurrency Policy: Forbid AND Concurrency Policy: Replace
What else could be going wrong here?
Full config with identifying lines modified:
apiVersion: batch/v1beta1
kind: CronJob
metadata:
labels:
type: f-c
name: f-c-p
namespace: extract
spec:
concurrencyPolicy: Forbid
failedJobsHistoryLimit: 1
jobTemplate:
metadata:
creationTimestamp: null
spec:
template:
metadata:
creationTimestamp: null
labels:
type: f-c
spec:
containers:
- args:
- /f_c.sh
image: identifier.amazonaws.com/extract_transform:latest
imagePullPolicy: Always
env:
- name: ENV
value: prod
- name: SLACK_TOKEN
valueFrom:
secretKeyRef:
key: slack_token
name: api-tokens
- name: AWS_ACCESS_KEY_ID
valueFrom:
secretKeyRef:
key: aws_access_key_id
name: api-tokens
- name: AWS_SECRET_ACCESS_KEY
valueFrom:
secretKeyRef:
key: aws_secret_access_key
name: api-tokens
- name: F_ACCESS_TOKEN
valueFrom:
secretKeyRef:
key: f_access_token
name: api-tokens
name: s-f-c
resources: {}
terminationMessagePath: /dev/termination-log
terminationMessagePolicy: File
dnsPolicy: ClusterFirst
restartPolicy: Never
schedulerName: default-scheduler
securityContext: {}
terminationGracePeriodSeconds: 30
schedule: '*/15 * * * *'
successfulJobsHistoryLimit: 1
suspend: false
status: {}
After running these jobs in a test cluster I discovered that external circumstances prevented them from running as intended.
On the original cluster there were ~20k scheduled jobs. The built-in scheduler for Kubernetes is not yet capable of handling this volume consistently.
The maximum number of jobs that can be reliably run (within a minute of the time intended) may depend on the size of your master nodes.
Isn't that by design?
A cron job creates a job object about once per execution time of its schedule. We say “about” because there are certain circumstances where two jobs might be created, or no job might be created. We attempt to make these rare, but do not completely prevent them. Therefore, jobs should be idempotent.
Ref. https://kubernetes.io/docs/concepts/workloads/controllers/cron-jobs/#cron-job-limitations