How to use concurrencyPolicy for GKE cron job correctly?

How to use concurrencyPolicy for GKE cron job correctly? - kubernetes

I set concurrencyPolicy to Allow, here is my cronjob.yaml:
apiVersion: batch/v1beta1
kind: CronJob
metadata:
name: gke-cron-job
spec:
schedule: '*/1 * * * *'
startingDeadlineSeconds: 10
concurrencyPolicy: Allow
successfulJobsHistoryLimit: 3
failedJobsHistoryLimit: 1
jobTemplate:
spec:
template:
metadata:
labels:
run: gke-cron-job
spec:
restartPolicy: Never
containers:
- name: gke-cron-job-solution-2
image: docker.io/novaline/gke-cron-job-solution-2:1.3
env:
- name: NODE_ENV
value: 'production'
- name: EMAIL_TO
value: 'novaline.dulin#gmail.com'
- name: K8S_POD_NAME
valueFrom:
fieldRef:
fieldPath: metadata.name
ports:
- containerPort: 8080
protocol: TCP
After reading docs: https://cloud.google.com/kubernetes-engine/docs/how-to/cronjobs
I still don't understand how to use concurrencyPolicy.
How can I run my cron job concurrency?
Here is the logs of cron job:
☁ nodejs-gcp [master] ⚡ kubectl logs -l run=gke-cron-job
> gke-cron-job-solution-2#1.0.2 start /app
> node ./src/index.js
config: { ENV: 'production',
EMAIL_TO: 'novaline.dulin#gmail.com',
K8S_POD_NAME: 'gke-cron-job-1548660540-gmwvc',
VERSION: '1.0.2' }
[2019-01-28T07:29:10.593Z] Start daily report
send email: { to: 'novaline.dulin#gmail.com', text: { test: 'test data' } }
> gke-cron-job-solution-2#1.0.2 start /app
> node ./src/index.js
config: { ENV: 'production',
EMAIL_TO: 'novaline.dulin#gmail.com',
K8S_POD_NAME: 'gke-cron-job-1548660600-wbl5g',
VERSION: '1.0.2' }
[2019-01-28T07:30:11.405Z] Start daily report
send email: { to: 'novaline.dulin#gmail.com', text: { test: 'test data' } }
> gke-cron-job-solution-2#1.0.2 start /app
> node ./src/index.js
config: { ENV: 'production',
EMAIL_TO: 'novaline.dulin#gmail.com',
K8S_POD_NAME: 'gke-cron-job-1548660660-8mn4r',
VERSION: '1.0.2' }
[2019-01-28T07:31:11.099Z] Start daily report
send email: { to: 'novaline.dulin#gmail.com', text: { test: 'test data' } }
As you can see, the timestamp indicates that the cron job is not concurrency.

It's because you're reading the wrong documentation. CronJobs aren't a GKE-specific feature. For the full documentation on CronJob API, refer to the Kubernetes documentation: https://kubernetes.io/docs/tasks/job/automated-tasks-with-cron-jobs/#concurrency-policy (quoted below).
Concurrency policy decides whether a new container can be started while the previous CronJob is still running. If you have a CronJob that runs every 5 minutes, and sometimes the Job takes 8 minutes, then you may run into a case where multiple jobs are running at a time. This policy decides what to do in that case.
Concurrency Policy
The .spec.concurrencyPolicy field is also optional. It specifies how to treat concurrent executions of a job that is created by this cron job. the spec may specify only one of the following concurrency policies:
Allow (default): The cron job allows concurrently running jobs
Forbid: The cron job does not allow concurrent runs; if it is time for a new job run and the previous job run hasn’t finished yet, the cron job skips the new job run
Replace: If it is time for a new job run and the previous job run hasn’t finished yet, the cron job replaces the currently running job run with a new job run
Note that concurrency policy only applies to the jobs created by the same cron job. If there are multiple cron jobs, their respective jobs are always allowed to run concurrently.

Related

Ensure pod deletion when one of container terminates [duplicate]

I have a Kubernetes JOB that does database migrations on a CloudSQL database.
One way to access the CloudSQL database from GKE is to use the CloudSQL-proxy container and then connect via localhost. Great - that's working so far. But because I'm doing this inside a K8s JOB the job is not marked as successfully finished because the proxy keeps on running.
$ kubectrl get po
NAME READY STATUS RESTARTS AGE
db-migrations-c1a547 1/2 Completed 0 1m
Even though the output says 'completed' one of the initially two containers is still running - the proxy.
How can I make the proxy exit on completing the migrations inside container 1?

The best way I have found is to share the process namespace between containers and use the SYS_PTRACE securityContext capability to allow you to kill the sidecar.
apiVersion: batch/v1
kind: Job
metadata:
name: my-db-job
spec:
template:
spec:
restartPolicy: OnFailure
shareProcessNamespace: true
containers:
- name: my-db-job-migrations
command: ["/bin/sh", "-c"]
args:
- |
<your migration commands>;
sql_proxy_pid=$(pgrep cloud_sql_proxy) && kill -INT $sql_proxy_pid;
securityContext:
capabilities:
add:
- SYS_PTRACE
- name: cloudsql-proxy
image: gcr.io/cloudsql-docker/gce-proxy:1.17
command:
- "/cloud_sql_proxy"
args:
- "-instances=$(DB_CONNECTION_NAME)=tcp:5432"

One possible solution would be a separate cloudsql-proxy deployment with a matching service. You would then only need your migration container inside the job that connects to your proxy service.
This comes with some downsides:
higher network latency, no pod local mysql communication
possible security issue if you provide the sql port to your whole kubernetes cluster
If you want to open cloudsql-proxy to the whole cluster you have to replace tcp:3306 with tcp:0.0.0.0:3306 in the -instance parameter on the cloudsql-proxy.

There are 3 ways of doing this.
1- Use private IP to connect your K8s job to Cloud SQL, as described by #newoxo in one of the answers. To do that, your cluster needs to be a VPC-native cluster. Mine wasn't and I was not whiling to move all my stuff to a new cluster. So I wasn't able to do this.
2- Put the Cloud SQL Proxy container in a separate deployment with a service, as described by #Christian Kohler. This looks like a good approach, but it is not recommended by Google Cloud Support.
I was about to head in this direction (solution #2) but I decided to try something else.
And here is the solution that worked for me:
3- You can communicate between different containers in the same Pod/Job using the file system. The idea is to tell the Cloud SQL Proxy container when the main job is done, and then kill the cloud sql proxy. Here is how to do it:
In the yaml file (my-job.yaml)
apiVersion: v1
kind: Pod
metadata:
name: my-job-pod
labels:
app: my-job-app
spec:
restartPolicy: OnFailure
containers:
- name: my-job-app-container
image: my-job-image:0.1
command: ["/bin/bash", "-c"]
args:
- |
trap "touch /lifecycle/main-terminated" EXIT
{ your job commands here }
volumeMounts:
- name: lifecycle
mountPath: /lifecycle
- name: cloudsql-proxy-container
image: gcr.io/cloudsql-docker/gce-proxy:1.11
command: ["/bin/sh", "-c"]
args:
- |
/cloud_sql_proxy -instances={ your instance name }=tcp:3306 -credential_file=/secrets/cloudsql/credentials.json &
PID=$!
while true
do
if [[ -f "/lifecycle/main-terminated" ]]
then
kill $PID
exit 0
fi
sleep 1
done
securityContext:
runAsUser: 2 # non-root user
allowPrivilegeEscalation: false
volumeMounts:
- name: cloudsql-instance-credentials
mountPath: /secrets/cloudsql
readOnly: true
- name: lifecycle
mountPath: /lifecycle
volumes:
- name: cloudsql-instance-credentials
secret:
secretName: cloudsql-instance-credentials
- name: lifecycle
emptyDir:
Basically, when your main job is done, it will create a file in /lifecycle that will be identified by the watcher added to the cloud-sql-proxy container, which will kill the proxy and terminate the container.
I hope it helps! Let me know if you have any questions.
Based on: https://stackoverflow.com/a/52156131/7747292

Doesn't look like Kubernetes can do this alone, you would need to manually kill the proxy once the migration exits. Similar question asked here: Sidecar containers in Kubernetes Jobs?

Google cloud sql has recently launched private ip address connectivity for cloudsql. If the cloud sql instance and kubernetes cluster is in same region you can connect to cloudsql without using cloud sql proxy.
https://cloud.google.com/sql/docs/mysql/connect-kubernetes-engine#private-ip

A possible solution would be to set the concurrencyPolicy: Replace in the job spec ... this will agnostically replace the current pod with the new instance whenever it needs to run again. But, you have to make sure that the subsequent cron runs are separated enough.

Unfortunately the other answers weren't working for me because of CloudSQLProxy running in a distroless environment where there is no shell.
I managed to get around this by bundling a CloudSQLProxy binary with my deployment and running a bash script to start up CloudSQLProxy followed by my app.
Dockerfile:
FROM golang:1.19.4
RUN apt update
COPY . /etc/mycode/
WORKDIR /etc/mycode
RUN chmod u+x ./scripts/run_migrations.sh
RUN chmod u+x ./bin/cloud_sql_proxy.linux-amd64
RUN go install
ENTRYPOINT ["./scripts/run_migrations.sh"]
Shell Script (run_migrations.sh):
#!/bin/sh
# This script is run from the parent directory
dbConnectionString=$1
cloudSQLProxyPort=$2
echo "Starting Cloud SQL Proxy"
./bin/cloud_sql_proxy.linux-amd64 -instances=${dbConnectionString}=tcp:5432 -enable_iam_login -structured_logs &
CHILD_PID=$!
echo "CloudSQLProxy PID: $CHILD_PID"
echo "Migrating DB..."
go run ./db/migrations/main.go
MAIN_EXIT_CODE=$?
kill $CHILD_PID;
echo "Migrations complete.";
exit $MAIN_EXIT_CODE
K8s (via Pulumi):
import * as k8s from '#pulumi/kubernetes'
const jobDBMigrations = new k8s.batch.v1.Job("job-db-migrations", {
metadata: {
namespace: namespaceName,
labels: appLabels,
},
spec: {
backoffLimit: 4,
template: {
spec: {
containers: [
{
image: pulumi.interpolate`gcr.io/${gcpProject}/${migrationsId}:${migrationsVersion}`,
name: "server-db-migration",
args: [
dbConnectionString,
],
},
],
restartPolicy: "Never",
serviceAccount: k8sSAMigration.metadata.name,
},
},
},
},
{
provider: clusterProvider,
});

The active users count dont match on execution through 2 containers in Kubernates

We are running jmx through Tauras using 2 containers in Kubernetes.
We are seeing only 50 users in results instead of 100(50*2 containers).
Can anyone please through some light if we are missing something here.
We get two jtl and checking them individual or combined the total users are same 50 only. Is it related to same Thread name being generated and logged in jtl file or something else.
Here is the yml details:
apiVersion: v1
kind: ConfigMap
metadata:
name: joba
namespace: AAA
data:
protocol: "https"
serverUrl: “testurl”
users: "50”
duration: "1m"
nodeName: "Nodename"
---
apiVersion: /v1
kind: Job
metadata:
name: perftest
namespace: dev
spec:
template:
spec:
containers:
- args: ["split -l ${users} --numeric-suffixes Test.csv Test-; /bin/bash ./Shellscripttoread_assignvariables.sh;"]
command: ["/bin/bash", "-c"]
env:
- name: JobNumber
value: "00"
envFrom:
- configMapRef:
name: job-multi
image: imagepath
name: ubuntu-00
resources:
limits:
memory: “8000Mi"
cpu: "2880m"
- args: ["split -l ${users} --numeric-suffixes Test.csv Test-; /bin/bash ./Shellscripttoread_assignvariables.sh;"]
command: ["/bin/bash", "-c"]
env:
- name: JobNumber
value: "01”
envFrom:
- configMapRef:
name: job-multi
image: imagepath
name: ubuntu-01
resources:
limits:
memory: “8000Mi"
cpu: "2880m"

Your YAML is very nice but it doesn't tell anything about how do you launch JMeter or what these shell scripts you invoke are doing.
If you just kick off 2 separate JMeter instances by means of k8s - JMeter will look at the number of active threads from the .jtl file and given the Sampler/Transaction names are the same JMeter "thinks" that the tests were executed on one engine.
The workaround is to add i.e. machineName() or __machineIP() function to sampler/transaction labels, this way JMeter will distinguish the results coming from different instances and you will see real number of active threads.
The solution would be running your JMeter test in Distributed Mode so master will run in one pod, slaves in their own pods and the master will be responsible for transferring .jmx script to the slaves and collecting results from them

how to run a cronjob every 10 seconds in kubernetes?

"I just want to run a cronjob in Kubernetes in every 10 seconds. what would be the imperative command for that?"

You can’t use CronJob kubernetes object for running less than 1 minute. You might be using the wrong tool for a process that has to run so often. https://kubernetes.io/docs/concepts/workloads/controllers/cron-jobs/
Create an infinite loop on a Deployment (daemonize it)
You’ll need to use a bash formula (or whatever programming language you like best, Go, Java, Python or Ruby) to make an infinite loop and sleep 10 seconds per each execution inside a Deployment. Here an example with bash/sh:
apiVersion: apps/v1
kind: Deployment
metadata:
name: cronjob-deployment
labels:
app: cronjob
spec:
replicas: 1
selector:
matchLabels:
app: cronjob
template:
metadata:
labels:
app: cronjob
spec:
containers:
- name: cronjob
image: busybox
args:
- /bin/sh
- -c
- while true; do echo call ./script.sh here; sleep 10; done
Create 1 CronJob with several containers
If you still want to use CronJobs you can do it with 6 containers inside the definition. One without delay, and the others with 10, 20, 30, 40 and 50 seconds of delay.
apiVersion: batch/v1beta1
kind: CronJob
metadata:
name: hello
spec:
schedule: "*/1 * * * *"
jobTemplate:
spec:
template:
spec:
containers:
- name: no_delay
image: busybox
args:
- /bin/sh
- -c
- echo call ./script.sh here
- name: 10_seconds
image: busybox
args:
- /bin/sh
- -c
- sleep 10; echo call ./script.sh here
- name: 20_seconds
image: busybox
args:
- /bin/sh
- -c
- sleep 20; echo call ./script.sh here
- name: 30_seconds
image: busybox
args:
- /bin/sh
- -c
- sleep 30; echo call ./script.sh here
- name: 40_seconds
image: busybox
args:
- /bin/sh
- -c
- sleep 40; echo call ./script.sh here
- name: 50_seconds
image: busybox
args:
- /bin/sh
- -c
- sleep 50; echo call ./script.sh here
restartPolicy: OnFailure
Of course, one of the problems you might encounter is that your process might be overlapped (runned concurrently at the same time). This will depend on the amount of seconds your process needs to run, and the time kubernetes needs to schedule and create a container.

If your task needs to run that frequently, cron is the wrong tool.
Aside from the fact that it simply won't launch jobs that frequently, you also risk some serious problems if the job takes longer to run than the interval between launches. Rewrite your task to daemonize and run persistently, then launch it from cron if necessary (while making sure that it won't relaunch if it's already running).

You can write a script that executes for 6 times with an interval of 10 seconds.
and set Kubernetes cron job to run every minute.
in that manner, in every minute your script starts running which in turn execute the task in every 10 seconds.
script to run logic in every 10 seconds for 6 times when cron job executes after one minute.
This will print hello world in every 10 seconds for 6 times :
#!/bin/bash -x
a=0
until [ $a -gt 5 ]
do
echo "hello world"
a=expr $a + 1
sleep 10
done
cronjob sample :
apiVersion: batch/v1beta1
kind: CronJob
metadata:
name: hello
spec:
schedule: "*/1 * * * *"
jobTemplate:
spec:
template:
spec:
containers:
- name: hello
image:
imagePullPolicy: IfNotPresent
command:
- /bin/sh
- -c
- ./sample.sh
restartPolicy: OnFailure
~
So in that way your cron job executes in every one minute .which in turns starts your srcipt which runs in every 10 seconds and execute buisness logic for 6 minutes.
This is the idea which you can follow to make cron job work in seconds as Kubernetes does not provide value for scheduling lower than 1 minute.
Although in this approach you need to set the strategy of not overlapping next execution of cron job.
for example, if your business logic takes 15 seconds to executes and you are running business logic every 10 seconds 6 times in a minute.
As business logic takes 15seconds so ideally, it should run for 4 times rather 6 times in a minute. Accordingly, you need to tweak the repetition inside the script.

Handling cronjobs in a Pod with multiple containers

I have a requirement in which I need to create a cronjob in kubernetes but the pod is having multiple containers (with single container its working fine).
Is it possible?
The requirement is something like this:
1. First container: Run the shell script to do a job.
2. Second container: run fluentbit conf to parse the log and send it.
Previously I thought to have a deployment in place and that is working fine but since that deployment was used just for 10 mins jobs I thought to make it a cron job.
Any help is really appreciated.
Also about the cronjob I am not sure if a pod can support multiple containers to do that same.
Thank you,
Sunny

Yes you can create a cronjob with multiple containers. CronJob is an abstraction on top of pod. So in the pod spec you can have multiple containers just like you can have in a normal pod. As an example
apiVersion: batch/v1beta1
kind: CronJob
metadata:
name: hello
namespace: default
spec:
schedule: "*/1 * * * *"
jobTemplate:
spec:
template:
spec:
containers:
- name: hello
image: busybox
args:
- /bin/sh
- -c
- date; echo Hello from the Kubernetes cluster
- name: app
image: alpine
command:
- echo
- Hello World!
restartPolicy: OnFailure

I need to agree with the answer provided by #Arghya Sadhu. It shows how you can run multi container Pod with a CronJob. Before the answer I would like to give more attention to the comment provided by #Chris Stryczynski:
It's not clear whether the containers are run in parallel or sequentially
It is not entirely clear if the workload that you are trying to run:
The requirement is something like this:
First container: Run the shell script to do a job.
Second container: run fluentbit conf to parse the log and send it.
could be used in parallel (both running at the same time) or require sequential approach (after X completed successfully, run Y).
If the workload could be run in parallel the answer provided by #Arghya Sadhu is correct, however if one workload is depending on another, I'd reckon you should be using initContainers instead of multi container Pods.
The example of a CronJob that implements the initContainer could be following:
apiVersion: batch/v1beta1
kind: CronJob
metadata:
name: hello
spec:
schedule: "*/1 * * * *"
jobTemplate:
spec:
template:
spec:
restartPolicy: Never
containers:
- name: ubuntu
image: ubuntu
command: [/bin/bash]
args: ["-c","cat /data/hello_there.txt"]
volumeMounts:
- name: data-dir
mountPath: /data
initContainers:
- name: echo
image: busybox
command: ["bin/sh"]
args: ["-c", "echo 'General Kenobi!' > /data/hello_there.txt"]
volumeMounts:
- name: data-dir
mountPath: "/data"
volumes:
- name: data-dir
emptyDir: {}
This CronJob will write a specific text to a file with an initContainer and then a "main" container will display its result. It's worth to mention that the main container will not start if the initContainer won't succeed with its operations.
$ kubectl logs hello-1234567890-abcde
General Kenobi!
Additional resources:
Linchpiner.github.io: K8S multi container pods

Whats about sidecar container for logging as second container which keep running without exit code. Even the job might run the state of the job still failed.

Kubernetes Cronjob Only Runs Half the Time

I want a job to trigger every 15 minutes but it is consistently triggering every 30 minutes.
UPDATE:
I've simplified the problem by just running:
kubectl run hello --schedule="*/1 * * * *" --restart=OnFailure --image=busybox -- /bin/sh -c "date; echo Hello from the Kubernetes cluster"
As specified in the docs here: https://kubernetes.io/docs/tasks/job/automated-tasks-with-cron-jobs/
and yet the job still refuses to run on time.
$ kubectl get cronjobs
NAME SCHEDULE SUSPEND ACTIVE LAST SCHEDULE AGE
hello */1 * * * * False 1 5m 30m
hello2 */1 * * * * False 1 5m 12m
It took 25 minutes for the command line created cronjob to run and 7 minutes for the cronjob created from yaml. They were both finally scheduled at the same time so it's almost like etcd finally woke up and did something?
ORIGINAL ISSUE:
When I drill into an active job I see Status: Terminated: Completed but
Age: 25 minutes or something greater than 15.
In the logs I see that the python script meant to run has completed it's final print statement. The script takes about ~2min to complete based on it's output file in s3. Then no new job is scheduled for 28 more minutes.
I have tried with different configurations:
Schedule: */15 * * * * AND Schedule: 0,15,30,45 * * * *
As well as
Concurrency Policy: Forbid AND Concurrency Policy: Replace
What else could be going wrong here?
Full config with identifying lines modified:
apiVersion: batch/v1beta1
kind: CronJob
metadata:
labels:
type: f-c
name: f-c-p
namespace: extract
spec:
concurrencyPolicy: Forbid
failedJobsHistoryLimit: 1
jobTemplate:
metadata:
creationTimestamp: null
spec:
template:
metadata:
creationTimestamp: null
labels:
type: f-c
spec:
containers:
- args:
- /f_c.sh
image: identifier.amazonaws.com/extract_transform:latest
imagePullPolicy: Always
env:
- name: ENV
value: prod
- name: SLACK_TOKEN
valueFrom:
secretKeyRef:
key: slack_token
name: api-tokens
- name: AWS_ACCESS_KEY_ID
valueFrom:
secretKeyRef:
key: aws_access_key_id
name: api-tokens
- name: AWS_SECRET_ACCESS_KEY
valueFrom:
secretKeyRef:
key: aws_secret_access_key
name: api-tokens
- name: F_ACCESS_TOKEN
valueFrom:
secretKeyRef:
key: f_access_token
name: api-tokens
name: s-f-c
resources: {}
terminationMessagePath: /dev/termination-log
terminationMessagePolicy: File
dnsPolicy: ClusterFirst
restartPolicy: Never
schedulerName: default-scheduler
securityContext: {}
terminationGracePeriodSeconds: 30
schedule: '*/15 * * * *'
successfulJobsHistoryLimit: 1
suspend: false
status: {}

After running these jobs in a test cluster I discovered that external circumstances prevented them from running as intended.
On the original cluster there were ~20k scheduled jobs. The built-in scheduler for Kubernetes is not yet capable of handling this volume consistently.
The maximum number of jobs that can be reliably run (within a minute of the time intended) may depend on the size of your master nodes.

Isn't that by design?
A cron job creates a job object about once per execution time of its schedule. We say “about” because there are certain circumstances where two jobs might be created, or no job might be created. We attempt to make these rare, but do not completely prevent them. Therefore, jobs should be idempotent.
Ref. https://kubernetes.io/docs/concepts/workloads/controllers/cron-jobs/#cron-job-limitations