Argo Workflow - DAG Task level retry - argo-workflows

I have a DAG workflow as below
taskA -> after taskA completes taskB and taskC runs in parallel -> once task B and C completes taskD starts. In case taskC fails due to some external issue which needs a manual intervention for correction. After correction can we manually restart (from UI or CLI) the workflow so that it resumes directly from the failed taskC and goes to taskD and completes the workflow.

Yes, clicking 'retry' from the workflow in the GUI will do just that.

Related

Cancelling all notStarted or inProgress tasks if job is cancelled

I have a pipeline in AzureDevOps. It install several NPM dependencies and afterwards I use npm run <script_name>.
However if I cancel the job it still spawns webdrivers and it can be seen that the job is still running based on the counter.
Is there a way to cancel the tasks which are inProgress/notStarted if I cancel an ongoing job?
Thank you
Is there a way to cancel the tasks which are inProgress/notStarted if I cancel an ongoing job?
This needs to be explained in a case-by-case basis.
One case is that the task has not started yet after you choose to cancel the job. In this case, the task will not start.
Another case is that the task is inProgress when you cancel the job. This situation depends on the specific circumstances of your task running.
If your task just runs its own task, it can be canceled.
But if your task executes the task by calling other programs through the command,like using command line task invoke MSBuild.exe to build the project, there is no way to cancel the task after the command is issued. Even if you cancel the job, the task is still executed in the background until the job is completely closed.

How can I kill (not cancel) an errant Azure Pipeline run, stage, job, or task?

I want to know how to kill an Azure Pipeline task (or any level of execution - run, stage, job, etc.), so that I am not blocked waiting for an errant pipeline to finish executing or timeout.
For example, canceling the pipeline does not stop it immediately if a condition is configured incorrectly. If the condition resolves to true the task will execute even if the pipeline is cancelled.
This is really painful if your org/project only has 1 agent. :(
How can I kill (not cancel) an errant Azure Pipeline run, stage, job, or task?
For the hosted agent, we could not kill that azure pipeline directly, since we cannot directly access the running machine.
As workaround, we could reduce the time that the task continues to run after the job is cancelled by setting a shorter Build job cancel timeout in minutes:
For example, I create a pipeline with task, which will still run for 60 minutes after the job is cancelled. But if I set the value of Build job cancel timeout in minutes to 2 mins, the azure pipeline will be cancelled completely.
For the private agent, we could run services.msc, and look for "VSTS Agent (name of your agent)". Right-click the entry and then choose restart.

How to launch scheduled spark jobs even if previous jobs are still executing on rundeck?

Why rundeck not launching scheduled spark jobs even if the previous job is still executing?
Rundeck is skipping the jobs set to launch during the execution of the previous job, then after the completion of its execution launch new job based on the schedule.
But I want to launch a scheduled job even if the previous job is executing.
Check your workflow strategy, here you have an explanation about that:
https://www.rundeck.com/blog/howto-controlling-how-and-where-rundeck-jobs-execute
You can design a workflow strategy based on "Parallel" to launch the jobs simultaneously on your node.
Example using the parallel strategy with a parent job.
Example jobs:
Job one, Job two and Parent Job (using parallel strategy).

how to operate in airflow so that the task rerun and continue downstream tasks

I set up a work flow in airflow, one of jobs was failed, after I fixed the problem, I want to rerun the failed task and continue the workflow.the details like below:
as above, I prepared to run the task "20_validation", I pressed the button 'Run' like below:
but the problem is when the task '20_validation' has finished, these downstream tasks were not continue to be started. How should I do?
Use the clear button directly below the run button you drew the box around in your screenshot.
This will clear the failed task state and all the tasks after (since downstream is selected to the right), causing them all to be run/rerun.
Since the upstream task failed, the downstream tasks shouldn't have been run anyway, so it shouldn't cause successful tasks to rerun.

Does rundeck support jobs dependencies?

I've been searching for days on how to layout a rundeck workflow with job dependencies. what I need to do is to have 3 jobs: job-1 and job-2 are scheduled to run in parallel while job-3 will only be triggered after the completion of both job-1, and job-2. assuming that job-1 and job-2 have different execution times.
I tried using job state conditionals to do that but it seems that the condition if not met will halt or fail only. My idea is to halt the execution until all the parent jobs completes and then resume the workflow.
You can achieve this by compiling a master job which includes 2 steps:
step: job-1 and job-2 as a sub-job which includes both (run in parallel if node oriented execution is selected)
step: job-3
But not all 3 in in the same flow.
Right now you can use Job State Conditional feature for that: https://docs.rundeck.com/2.9.4/plugins-user-guide/bundled-plugins.html#job-state-plugin
Rundeck cannot do this for you automatically. You can set a scheduler for job-3 to run after the max timestamp of job1 or job2. Enable "retry" for job3 incase the dependencies would be fail.