I am trying to schedule a workflow C when both workflows A and B complete their daily run.
Is there any way to do this without using the workflow of workflows pattern?
I'm assuming A and B run in parallel.
If you could incorporate workflows A and B into a single workflow AB with parallel steps A and B, then you could use a mutex to delay the execution of C until after AB finishes.
apiVersion: argoproj.io/v1alpha1
kind: CronWorkflow
metadata:
name: AB
spec:
schedule: "* * * * *"
workflowSpec:
entrypoint: AB
synchronization:
mutex:
name: AB-C-mutex
templates:
- name: AB
steps:
- - name: A
templateRef:
name: A
template: A
- name: B
templateRef:
name: B
template: B
---
apiVersion: argoproj.io/v1alpha1
kind: CronWorkflow
metadata:
name: C
spec:
schedule: "* * * * *"
workflowSpec:
entrypoint: C
synchronization:
mutex:
name: AB-C-mutex
templates:
# The rest of the workflow.
You would need to make sure C starts long enough after AB for AB to claim the mutex.
You could avoid combining A and B into one workflow if Argo supported using more than one mutex/semaphore per workflow. Currently, there are no plans to support that.
Another option would be to combine all three workflows into a single workflow (with A and B running in parallel before C).
Related
I'm looking into customizing workflow names. I see that argo submit --generate-name can override the .metadata.generateName property, but does anyone know if this is possible with a Sensor that triggers a Workflow?
I'm using a GitHub event to trigger these workflows, but it would be nice to pull the repository name out of the event and set it as the generateName on the Workflow.
Here's an example of what I was hoping would work, but doesn't seem to as far as I can tell. Maybe I've got the syntax wrong? Does anyone know if something like this is possible?
(Note, I've removed a large part of this sensor so that just the important parts are shown. Basically, I want to parse a GitHub event payload for the repository name. Set it on the workflow arguments. Then use those to override the workflow's generateName property.)
apiVersion: argoproj.io/v1alpha1
kind: Sensor
metadata:
name: github-sensor
spec:
dependencies:
- name: github-webhook-sensor
eventSourceName: github-events
eventName: github
triggers:
- template:
name: github
k8s:
group: argoproj.io
version: v1alpha1
resource: workflows
operation: create
source:
resource:
apiVersion: argoproj.io/v1alpha1
kind: Workflow
metadata:
generateName: {{ workflow.parameters.name }}
spec:
arguments:
parameters:
- name: "git-repository-name"
parameters:
# Parameter: git-repository-name
- src:
dependencyName: github-webhook-sensor
dataKey: body.repository.name
dest: spec.arguments.parameters.0.value
I think you can do this to prepend the generated name with the repo name (and a hypen):
...
source:
resource:
apiVersion: argoproj.io/v1alpha1
kind: Workflow
metadata:
generateName: "-"
spec:
arguments:
parameters:
- name: "git-repository-name"
parameters:
- src:
dependencyName: github-webhook-sensor
dataKey: body.repository.name
dest: metadata.generateName
operation: prepend
# Parameter: git-repository-name
- src:
dependencyName: github-webhook-sensor
dataKey: body.repository.name
dest: spec.arguments.parameters.0.value
You can also use name instead of generateName, but I'm not sure how that will behave if multiple triggers come.
I'm exploring an easy way to read K8S resources in the Argo workflow. The current documentation is focusing mainly on create/patch with conditions (https://argoproj.github.io/argo/examples/#kubernetes-resources), while I'm curious if it's possible to perform "action: get", extra part of the resource state (or full resource) and pass it downstream as artifact or result output. Any ideas?
action: get is not a feature available from Argo.
However, it's easy to use kubectl from within a Pod and then send the JSON output to an output parameter. This uses a BASH script to send the JSON to the result output parameter, but an explicit output parameter or an output artifact are also viable options.
apiVersion: argoproj.io/v1alpha1
kind: Workflow
metadata:
generateName: kubectl-bash-
spec:
entrypoint: kubectl-example
templates:
- name: kubectl-example
steps:
- - name: generate
template: get-workflows
- - name: print
template: print-message
arguments:
parameters:
- name: message
value: "{{steps.generate.outputs.result}}"
- name: get-workflows
script:
image: bitnami/kubectl:latest
command: [bash]
source: |
some_workflow=$(kubectl get workflows -n argo | sed -n 2p | awk '{print $1;}')
kubectl get workflow "$some_workflow" -ojson
- name: print-message
inputs:
parameters:
- name: message
container:
image: alpine:latest
command: [sh, -c]
args: ["echo result was: '{{inputs.parameters.message}}'"]
Keep in mind that kubectl will run with the permissions of the Workflow's ServiceAccount. Be sure to submit the Workflow using a ServiceAccount which has access to the resource you want to get.
"I just want to run a cronjob in Kubernetes in every 10 seconds. what would be the imperative command for that?"
You can’t use CronJob kubernetes object for running less than 1 minute. You might be using the wrong tool for a process that has to run so often. https://kubernetes.io/docs/concepts/workloads/controllers/cron-jobs/
Create an infinite loop on a Deployment (daemonize it)
You’ll need to use a bash formula (or whatever programming language you like best, Go, Java, Python or Ruby) to make an infinite loop and sleep 10 seconds per each execution inside a Deployment. Here an example with bash/sh:
apiVersion: apps/v1
kind: Deployment
metadata:
name: cronjob-deployment
labels:
app: cronjob
spec:
replicas: 1
selector:
matchLabels:
app: cronjob
template:
metadata:
labels:
app: cronjob
spec:
containers:
- name: cronjob
image: busybox
args:
- /bin/sh
- -c
- while true; do echo call ./script.sh here; sleep 10; done
Create 1 CronJob with several containers
If you still want to use CronJobs you can do it with 6 containers inside the definition. One without delay, and the others with 10, 20, 30, 40 and 50 seconds of delay.
apiVersion: batch/v1beta1
kind: CronJob
metadata:
name: hello
spec:
schedule: "*/1 * * * *"
jobTemplate:
spec:
template:
spec:
containers:
- name: no_delay
image: busybox
args:
- /bin/sh
- -c
- echo call ./script.sh here
- name: 10_seconds
image: busybox
args:
- /bin/sh
- -c
- sleep 10; echo call ./script.sh here
- name: 20_seconds
image: busybox
args:
- /bin/sh
- -c
- sleep 20; echo call ./script.sh here
- name: 30_seconds
image: busybox
args:
- /bin/sh
- -c
- sleep 30; echo call ./script.sh here
- name: 40_seconds
image: busybox
args:
- /bin/sh
- -c
- sleep 40; echo call ./script.sh here
- name: 50_seconds
image: busybox
args:
- /bin/sh
- -c
- sleep 50; echo call ./script.sh here
restartPolicy: OnFailure
Of course, one of the problems you might encounter is that your process might be overlapped (runned concurrently at the same time). This will depend on the amount of seconds your process needs to run, and the time kubernetes needs to schedule and create a container.
If your task needs to run that frequently, cron is the wrong tool.
Aside from the fact that it simply won't launch jobs that frequently, you also risk some serious problems if the job takes longer to run than the interval between launches. Rewrite your task to daemonize and run persistently, then launch it from cron if necessary (while making sure that it won't relaunch if it's already running).
You can write a script that executes for 6 times with an interval of 10 seconds.
and set Kubernetes cron job to run every minute.
in that manner, in every minute your script starts running which in turn execute the task in every 10 seconds.
script to run logic in every 10 seconds for 6 times when cron job executes after one minute.
This will print hello world in every 10 seconds for 6 times :
#!/bin/bash -x
a=0
until [ $a -gt 5 ]
do
echo "hello world"
a=expr $a + 1
sleep 10
done
cronjob sample :
apiVersion: batch/v1beta1
kind: CronJob
metadata:
name: hello
spec:
schedule: "*/1 * * * *"
jobTemplate:
spec:
template:
spec:
containers:
- name: hello
image:
imagePullPolicy: IfNotPresent
command:
- /bin/sh
- -c
- ./sample.sh
restartPolicy: OnFailure
~
So in that way your cron job executes in every one minute .which in turns starts your srcipt which runs in every 10 seconds and execute buisness logic for 6 minutes.
This is the idea which you can follow to make cron job work in seconds as Kubernetes does not provide value for scheduling lower than 1 minute.
Although in this approach you need to set the strategy of not overlapping next execution of cron job.
for example, if your business logic takes 15 seconds to executes and you are running business logic every 10 seconds 6 times in a minute.
As business logic takes 15seconds so ideally, it should run for 4 times rather 6 times in a minute. Accordingly, you need to tweak the repetition inside the script.
I'm trying out Argo workflow and would like to understand how to freeze a step. Let's say that I have 3 step workflow and a workflow failed at step 2. So I'd like to resubmit the workflow from step 2 using successful step 1's artifact. How can I achieve this? I couldn't find the guidance anywhere on the document.
I think you should consider using Conditions and Artifact passing in your steps.
Conditionals provide a way to affect the control flow of a
workflow at runtime, depending on parameters. In this example
the 'print-hello' template may or may not be executed depending
on the input parameter, 'should-print'. When submitted with
$ argo submit examples/conditionals.yaml
the step will be skipped since 'should-print' will evaluate false.
When submitted with:
$ argo submit examples/conditionals.yaml -p should-print=true
the step will be executed since 'should-print' will evaluate true.
apiVersion: argoproj.io/v1alpha1
kind: Workflow
metadata:
generateName: conditional-
spec:
entrypoint: conditional-example
arguments:
parameters:
- name: should-print
value: "false"
templates:
- name: conditional-example
inputs:
parameters:
- name: should-print
steps:
- - name: print-hello
template: whalesay
when: "{{inputs.parameters.should-print}} == true"
- name: whalesay
container:
image: docker/whalesay:latest
command: [sh, -c]
args: ["cowsay hello"]
If you use conditions in each step you will be able to start from a step you like with appropriate condition.
Also have a loot at this article Argo: Workflow Engine for Kubernetes as author explains the use of conditions on coinflip example.
You can see many examples on their GitHub page.
I want a job to trigger every 15 minutes but it is consistently triggering every 30 minutes.
UPDATE:
I've simplified the problem by just running:
kubectl run hello --schedule="*/1 * * * *" --restart=OnFailure --image=busybox -- /bin/sh -c "date; echo Hello from the Kubernetes cluster"
As specified in the docs here: https://kubernetes.io/docs/tasks/job/automated-tasks-with-cron-jobs/
and yet the job still refuses to run on time.
$ kubectl get cronjobs
NAME SCHEDULE SUSPEND ACTIVE LAST SCHEDULE AGE
hello */1 * * * * False 1 5m 30m
hello2 */1 * * * * False 1 5m 12m
It took 25 minutes for the command line created cronjob to run and 7 minutes for the cronjob created from yaml. They were both finally scheduled at the same time so it's almost like etcd finally woke up and did something?
ORIGINAL ISSUE:
When I drill into an active job I see Status: Terminated: Completed but
Age: 25 minutes or something greater than 15.
In the logs I see that the python script meant to run has completed it's final print statement. The script takes about ~2min to complete based on it's output file in s3. Then no new job is scheduled for 28 more minutes.
I have tried with different configurations:
Schedule: */15 * * * * AND Schedule: 0,15,30,45 * * * *
As well as
Concurrency Policy: Forbid AND Concurrency Policy: Replace
What else could be going wrong here?
Full config with identifying lines modified:
apiVersion: batch/v1beta1
kind: CronJob
metadata:
labels:
type: f-c
name: f-c-p
namespace: extract
spec:
concurrencyPolicy: Forbid
failedJobsHistoryLimit: 1
jobTemplate:
metadata:
creationTimestamp: null
spec:
template:
metadata:
creationTimestamp: null
labels:
type: f-c
spec:
containers:
- args:
- /f_c.sh
image: identifier.amazonaws.com/extract_transform:latest
imagePullPolicy: Always
env:
- name: ENV
value: prod
- name: SLACK_TOKEN
valueFrom:
secretKeyRef:
key: slack_token
name: api-tokens
- name: AWS_ACCESS_KEY_ID
valueFrom:
secretKeyRef:
key: aws_access_key_id
name: api-tokens
- name: AWS_SECRET_ACCESS_KEY
valueFrom:
secretKeyRef:
key: aws_secret_access_key
name: api-tokens
- name: F_ACCESS_TOKEN
valueFrom:
secretKeyRef:
key: f_access_token
name: api-tokens
name: s-f-c
resources: {}
terminationMessagePath: /dev/termination-log
terminationMessagePolicy: File
dnsPolicy: ClusterFirst
restartPolicy: Never
schedulerName: default-scheduler
securityContext: {}
terminationGracePeriodSeconds: 30
schedule: '*/15 * * * *'
successfulJobsHistoryLimit: 1
suspend: false
status: {}
After running these jobs in a test cluster I discovered that external circumstances prevented them from running as intended.
On the original cluster there were ~20k scheduled jobs. The built-in scheduler for Kubernetes is not yet capable of handling this volume consistently.
The maximum number of jobs that can be reliably run (within a minute of the time intended) may depend on the size of your master nodes.
Isn't that by design?
A cron job creates a job object about once per execution time of its schedule. We say “about” because there are certain circumstances where two jobs might be created, or no job might be created. We attempt to make these rare, but do not completely prevent them. Therefore, jobs should be idempotent.
Ref. https://kubernetes.io/docs/concepts/workloads/controllers/cron-jobs/#cron-job-limitations