How do you manually release a mutex for an argo workflow? - argo-workflows

I have an argo workflow with a mutex e.g.
kind: Workflow
metadata:
generateName: synchronization-wf-level-
spec:
entrypoint: whalesay
synchronization:
mutex:
name: test
templates:
- name: whalesay
container:
image: docker/whalesay:latest
command: [cowsay]
args: ["hello world"]
However I deleted a workflow while it was trying to run. Now argo has gone in a deadlock where no jobs can be created using same mutex.
Where does argo store mutex information? and how can I manually remove the mutex information to get out of dead lock?

Workflow deletion should release the acquired lock. All locks are stored in Controller memory. One workaround is restarting controller will clear the all lock and repopulate it. Please create issue in https://github.com/argoproj/argo with sample workflow and controller logs

Related

How to restart Tekton PipelineRun, having a pipeline-run.yml defined in git (e.g. using Cloud Native Buildpacks)?

We want to use the official Tekton buildpacks task from Tekton Hub to run our builds using Cloud Native Buildpacks. The buildpacks documentation for Tekton tells us to install the buildpacks & git-clone Task from Tekton Hub, create Secret, ServiceAccount, PersistentVolumeClaim and a Tekton Pipeline.
As the configuration is parameterized, we don't want to start our Tekton pipelines using a huge kubectl command but instead configure the PipelineRun using a separate pipeline-run.yml YAML file (as also stated in the docs) containing the references to the ServiceAccount, workspaces, image name and so on:
apiVersion: tekton.dev/v1beta1
kind: PipelineRun
metadata:
name: buildpacks-test-pipeline-run
spec:
serviceAccountName: buildpacks-service-account # Only needed if you set up authorization
pipelineRef:
name: buildpacks-test-pipeline
workspaces:
- name: source-workspace
subPath: source
persistentVolumeClaim:
claimName: buildpacks-source-pvc
- name: cache-workspace
subPath: cache
persistentVolumeClaim:
claimName: buildpacks-source-pvc
params:
- name: image
value: <REGISTRY/IMAGE NAME, eg gcr.io/test/image > # This defines the name of output image
Now running the Tekton pipeline once is no problem using kubectl apply -f pipeline-run.yml. But how can we restart or reuse this YAML-based configuration for all the other pipelines runs?
There are some discussions about that topic in the Tekton GitHub project - see tektoncd/pipeline/issues/664 and tektoncd/pipeline/issues/685. Since Tekton is heavily based on Kubernetes, all Tekton objects are Kubernetes CRDs - which are in fact immutable. So it is intended to not be able to re-run an already run PipelineRun.
But as also discussed in tektoncd/pipeline/issues/685 we can simply use the generateName variable of the metadata field like this:
apiVersion: tekton.dev/v1beta1
kind: PipelineRun
metadata:
generateName: buildpacks-test-pipeline-run-
spec:
serviceAccountName: buildpacks-service-account # Only needed if you set up authorization
pipelineRef:
name: buildpacks-test-pipeline
workspaces:
- name: source-workspace
subPath: source
persistentVolumeClaim:
claimName: buildpacks-source-pvc
- name: cache-workspace
subPath: cache
persistentVolumeClaim:
claimName: buildpacks-source-pvc
params:
- name: image
value: <REGISTRY/IMAGE NAME, eg gcr.io/test/image > # This defines the name of output image
Running kubectl create -f pipeline-run.yml will now work multiple times and kind of "restart" our Pipeline, while creating a new PipelineRun object like buildpacks-test-pipeline-run-dxcq6 everytime the command is issued.
Keep in mind to delete old PipelineRun objects once in a while though.
tkn cli has the switch --use-pipelinerun to the command tkn pipeline start, what this command does is to reuse the params/workspaces from that pipelinerun and create a new one, so effectively "restarting" it.
so to 'restart' the pipelinerun pr1 which belong to the pipeline p1 you would do:
tkn pipeline start p1 --use-pipelinerun pr1
maybe we should have a easier named command, I kicked the discussion sometime ago feel free to contribute a feedback :
https://github.com/tektoncd/cli/issues/1091
You cannot restart a pipelinerun.
Since in tekton, a pipelinerun is one time execution for a pipeline(treat as template), so it should not able to be restart, another kubectl apply for pipelinerun is another execution...

kubernetes with multiple jobs counter

New to kubernetes i´m trying to move a current pipeline we have using a queing system without k8s.
I have a perl script that generates a list of batch jobs (yml files) for each of the samples that i have to process.
Then i run kubectl apply --recursive -f 16S_jobscripts/
For example each sample needs to be treated sequentially and go through different processing
Exemple:
SampleA -> clean -> quality -> some_calculation
SampleB -> clean -> quality -> some_calculation
and so on for 300 samples.
So the idea is to prepare all the yml files and run them sequentially. This is working.
BUT, with this approach i need to wait that all samples are processed (let´s say that all the clean jobs need to completed before i run the next jobs quality).
what would be the best approach in such case, run each sample independently ?? how ?
The yml below describe one Sample for one job. You can see that i´m using a counter (mergereads-1 for sample1(A))
apiVersion: batch/v1
kind: Job
metadata:
name: merge-reads-1
namespace: namespace-id-16s
labels:
jobgroup: mergereads
spec:
template:
metadata:
name: mergereads-1
labels:
jobgroup: mergereads
spec:
containers:
- name: mergereads-$idx
image: .../bbmap:latest
command: ['sh', '-c']
args: ['
cd workdir &&
bbmerge.sh -Xmx1200m in1=files/trimmed/1.R1.trimmed.fq.gz in2=files/trimmed/1.R2.trimmed.fq.gz out=files/mergedpairs/1.merged.fq.gz merge=t mininsert=300 qtrim2=t minq=27 ratiomode=t &&
ls files/mergedpairs/
']
resources:
limits:
cpu: 1
memory: 2000Mi
requests:
cpu: 0.8
memory: 1500Mi
volumeMounts:
- mountPath: '/workdir'
name: db
volumes:
- name: db
persistentVolumeClaim:
claimName: workdir
restartPolicy: Never
If i understand you correctly you can use parallel-jobs with a use of Job Patterns.
It does support parallel processing of a set of independent but
related work items.
Also you can consider using Argo.
https://github.com/argoproj/argo
Argo Workflows is an open source container-native workflow engine for
orchestrating parallel jobs on Kubernetes. Argo Workflows is
implemented as a Kubernetes CRD (Custom Resource Definition).
Please let me know if that helps.

How to skip a step for Argo workflow

I'm trying out Argo workflow and would like to understand how to freeze a step. Let's say that I have 3 step workflow and a workflow failed at step 2. So I'd like to resubmit the workflow from step 2 using successful step 1's artifact. How can I achieve this? I couldn't find the guidance anywhere on the document.
I think you should consider using Conditions and Artifact passing in your steps.
Conditionals provide a way to affect the control flow of a
workflow at runtime, depending on parameters. In this example
the 'print-hello' template may or may not be executed depending
on the input parameter, 'should-print'. When submitted with
$ argo submit examples/conditionals.yaml
the step will be skipped since 'should-print' will evaluate false.
When submitted with:
$ argo submit examples/conditionals.yaml -p should-print=true
the step will be executed since 'should-print' will evaluate true.
apiVersion: argoproj.io/v1alpha1
kind: Workflow
metadata:
generateName: conditional-
spec:
entrypoint: conditional-example
arguments:
parameters:
- name: should-print
value: "false"
templates:
- name: conditional-example
inputs:
parameters:
- name: should-print
steps:
- - name: print-hello
template: whalesay
when: "{{inputs.parameters.should-print}} == true"
- name: whalesay
container:
image: docker/whalesay:latest
command: [sh, -c]
args: ["cowsay hello"]
If you use conditions in each step you will be able to start from a step you like with appropriate condition.
Also have a loot at this article Argo: Workflow Engine for Kubernetes as author explains the use of conditions on coinflip example.
You can see many examples on their GitHub page.

How to use concurrencyPolicy for GKE cron job correctly?

I set concurrencyPolicy to Allow, here is my cronjob.yaml:
apiVersion: batch/v1beta1
kind: CronJob
metadata:
name: gke-cron-job
spec:
schedule: '*/1 * * * *'
startingDeadlineSeconds: 10
concurrencyPolicy: Allow
successfulJobsHistoryLimit: 3
failedJobsHistoryLimit: 1
jobTemplate:
spec:
template:
metadata:
labels:
run: gke-cron-job
spec:
restartPolicy: Never
containers:
- name: gke-cron-job-solution-2
image: docker.io/novaline/gke-cron-job-solution-2:1.3
env:
- name: NODE_ENV
value: 'production'
- name: EMAIL_TO
value: 'novaline.dulin#gmail.com'
- name: K8S_POD_NAME
valueFrom:
fieldRef:
fieldPath: metadata.name
ports:
- containerPort: 8080
protocol: TCP
After reading docs: https://cloud.google.com/kubernetes-engine/docs/how-to/cronjobs
I still don't understand how to use concurrencyPolicy.
How can I run my cron job concurrency?
Here is the logs of cron job:
☁ nodejs-gcp [master] ⚡ kubectl logs -l run=gke-cron-job
> gke-cron-job-solution-2#1.0.2 start /app
> node ./src/index.js
config: { ENV: 'production',
EMAIL_TO: 'novaline.dulin#gmail.com',
K8S_POD_NAME: 'gke-cron-job-1548660540-gmwvc',
VERSION: '1.0.2' }
[2019-01-28T07:29:10.593Z] Start daily report
send email: { to: 'novaline.dulin#gmail.com', text: { test: 'test data' } }
> gke-cron-job-solution-2#1.0.2 start /app
> node ./src/index.js
config: { ENV: 'production',
EMAIL_TO: 'novaline.dulin#gmail.com',
K8S_POD_NAME: 'gke-cron-job-1548660600-wbl5g',
VERSION: '1.0.2' }
[2019-01-28T07:30:11.405Z] Start daily report
send email: { to: 'novaline.dulin#gmail.com', text: { test: 'test data' } }
> gke-cron-job-solution-2#1.0.2 start /app
> node ./src/index.js
config: { ENV: 'production',
EMAIL_TO: 'novaline.dulin#gmail.com',
K8S_POD_NAME: 'gke-cron-job-1548660660-8mn4r',
VERSION: '1.0.2' }
[2019-01-28T07:31:11.099Z] Start daily report
send email: { to: 'novaline.dulin#gmail.com', text: { test: 'test data' } }
As you can see, the timestamp indicates that the cron job is not concurrency.
It's because you're reading the wrong documentation. CronJobs aren't a GKE-specific feature. For the full documentation on CronJob API, refer to the Kubernetes documentation: https://kubernetes.io/docs/tasks/job/automated-tasks-with-cron-jobs/#concurrency-policy (quoted below).
Concurrency policy decides whether a new container can be started while the previous CronJob is still running. If you have a CronJob that runs every 5 minutes, and sometimes the Job takes 8 minutes, then you may run into a case where multiple jobs are running at a time. This policy decides what to do in that case.
Concurrency Policy
The .spec.concurrencyPolicy field is also optional. It specifies how to treat concurrent executions of a job that is created by this cron job. the spec may specify only one of the following concurrency policies:
Allow (default): The cron job allows concurrently running jobs
Forbid: The cron job does not allow concurrent runs; if it is time for a new job run and the previous job run hasn’t finished yet, the cron job skips the new job run
Replace: If it is time for a new job run and the previous job run hasn’t finished yet, the cron job replaces the currently running job run with a new job run
Note that concurrency policy only applies to the jobs created by the same cron job. If there are multiple cron jobs, their respective jobs are always allowed to run concurrently.

Kubernetes can analytical jobs be chained together in a workflow?

Reading the Kubernetes "Run to Completion" documentation, it says that jobs can be run in parallel, but is it possible to chain together a series of jobs that should be run in sequential order (parallel and/or non-parallel).
https://kubernetes.io/docs/concepts/workloads/controllers/jobs-run-to-completion/
Or is it up to the user to keep track of which jobs have finished and triggering the next job using a PubSub messaging service?
I have used initContainers under the PodSpec in the past to solve problems like this: https://kubernetes.io/docs/concepts/workloads/pods/init-containers/
apiVersion: v1
kind: Pod
metadata:
name: myapp-pod
labels:
app: myapp
spec:
containers:
- name: myapp-container
image: busybox
command: ['sh', '-c', 'echo The app is running! && sleep 3600']
initContainers:
- name: init-myservice
image: busybox
command: ['sh', '-c', 'until nslookup myservice; do echo waiting for myservice; sleep 2; done;']
- name: init-mydb
image: busybox
command: ['sh', '-c', 'until nslookup mydb; do echo waiting for mydb; sleep 2; done;']
Take a look here for the chaining of containers using the "depends" keyword is also an option:
https://github.com/kubernetes/kubernetes/issues/1996
Overall, no. Check out things like Airflow for this. Job objects give you a pretty simple way to run a container until it completes, that's about it. The parallelism is in that you can run multiple copies, it's not a full workflow management system :)
It is not possible to manage job workflows with Kubernetes core API objects.
Argo Workflow looks like an interesting tool to manage workflow inside Kubernetes: https://argoproj.github.io/projects/argo.
It looks like it can handle Kubernetes jobs workflow: https://argoproj.github.io/argo/examples/#kubernetes-resources
It is in the CNCF incubator: https://www.cncf.io/blog/2020/04/07/toc-welcomes-argo-into-the-cncf-incubator/
Other alternatives include:
https://www.nextflow.io/
https://www.pachyderm.com/
Airflow: https://airflow.apache.org/docs/apache-airflow/stable/kubernetes.html
This document might also help: https://www.preprints.org/manuscript/202001.0378/v1/download
Nearly 3 years later, I'll throw another answer into the mix.
Kubeflow Pipelines
https://www.kubeflow.org/docs/components/pipelines/overview/pipelines-overview/
Which actually use Argo under the hood.