Argo stop workflow early, mark complete - kubernetes

Imagine I have a workflow with 5 steps.
Step 2 may or may not create a file as its output (which is then used as input to subsequent steps).
If the file is created, I want to run the subsequent steps.
If no file gets created in step 2, I want to mark the workflow as completed and not execute steps 3 through to 5.
I'm sure there must be a simple way to do this yet I'm failing to figure out how.
I tried by making step 2 return non-zero exit code when no file is created and then using
when: "{{steps.step2.outputs.exitCode}} == 0" on step 3, but that still executes step 4 and 5 (not to mention marks step 2 as "failed")
So I'm out of ideas, any suggestions are greatly appreciated.

By default, a step that exits with a non-zero exit code fails the workflow.
I would suggest writing an output parameter to determine whether the workflow should continue.
- name: yourstep
container:
command: [sh, -c]
args: ["yourcommand; if [ -f /tmp/yourfile ]; then echo continue > /tmp/continue; fi"]
outputs:
parameters:
- name: continue
valueFrom:
default: "stop"
path: /tmp/continue
Alternatively, you can override the fail-on-nonzero-exitcode behavior with continueOn.
continueOn:
failed: true
I'd caution against continueOn.failed: true. If your command throws a non-zero exit code for an unexpected reason, the workflow won't fail like it should, and the bug might go un-noticed.

Related

Where actually is the syntax error in my github actions yml file

I am actually implementing CI/CD for my application. I want to start the application automatically using pm2. So I am getting the syntax error on line 22.
This is my yml file
This is the error I am getting on github
The problem in the syntax here is related to how you used the - symbol.
With Github actions, you need at least a run or uses field inform for each step inside your job, at the same level of the name field (which is not mandarory), otherwise the github interpreter will return an error.
Here, from line 22, you used something like this:
- name: ...
- run: ...
- run: ...
- run: ...
So there are two problems:
First, the name and the run field aren't at the same yaml level.
Second, your step with the name field doesn't have a run or uses field associated with it (you need at least one of them).
The correct syntax should be:
- name: ...
run: ...
- run: ...
- run: ...
Reference about workflow syntax

Luigi does not send error codes to concourse ci

I have a test pipeline on concourse with one job that runs a set of luigi tasks. My problem is: failures in the luigi tasks do not rise up to the concourse job. In other words, if a luigi task fails, concourse will not register that failure and states that the concourse job completed successfully. I will first post the code I am running, then the solutions I have tried.
luigi-tasks.py
class Pipeline1(luigi.WrapperTask):
def requires(self):
yield Task1()
yield Task2()
yield Task3()
tasks.py
class Task1(luigi.Task):
def requires(self):
return None
def output(self):
return luigi.LocalTarget('stuff/task1.csv')
def run(self):
#uncomment line below to generate task failure
#assert(True==False)
print('task 1 complete...')
t = pd.DataFrame()
with self.output().open('w') as outtie:
outtie.write('complete')
# Tasks 2 and 3 are duplicates of this, but with 1s replaced with 2s or 3s.
config file
[retcode]
# codes are in increasing level of severity (for most applications)
already_running=10
missing_data=20
not_run=25
task_failed=30
scheduling_error=35
unhandled_exception=40
begin.sh
#!/bin/sh
set -e
export PYTHONPATH='.'
luigi --module luigi-tasks Pipeline1 --local-scheduler
echo $?
pipeline.yml
# <resources, resource types, and docker image build job defined here>
#job of interest
- name: run-docker-image
plan:
- get: timer
trigger: true
- get: docker-image-ecr
passed: [build-docker-image]
- get: run-git
- task: run-script
image: docker-image-ecr
config:
inputs:
- name: run-git
platform: linux
run:
dir: ./run-git
path: /bin/bash
args: ["begin.sh"]
I've introduced errors in a few ways: assertions/raising an exception (ValueError) within an individual task's run() method and within the wrapper, and sys.exit(luigi.retcodes.retcode().unhandled_exception). I also tried failing all tasks. I did this in case the error needed to be generated in a specific manner/location. Though they all produced a failed task, none of them produced an error in the concourse server.
At first, I thought concourse just gives a success if it can run the file or command tasked to it. I'm not sure it's that simple, though. Interestingly, when I run the pipeline on my local computer (luigi --modules luigi-tasks Pipeline1 --local-scheduler) I get an appropriate return code (e.g. 30), but when I run the pipeline within the concourse server, I get a return code of 0 after the luigi tasks complete (from echo $? in the bash script).
Would appreciate any insight into this problem.
My suspicion is that luigi doesn't see your config file with return codes. Its default behavior is to return 0, whether tasks fail or succeed.
This experiment should help to debug that:
Force a failed job: add an exit 1 at the end of begin.sh
Hijack the job: fly -t <target> i -j <pipeline>/<job> -> select run-script
cd ./run-git; /bin/bash begin.sh
Ensure the luigi config is present and named appropriately, e.g. luigi.cfg
Re-run the command: LUIGI_CONFIG_PATH=luigi.cfg bash ./begin.sh
Check output: echo $?

Task Control Option - Custom Condition - run task when previous failed or timed out

Is there an option to set the custom condition that will test if the previous task has failed OR timed out?
Currently, I'm using the Only when a previous task has failed which works when the task fails. If the task times out, then it is not considered an error and it is skipped.
I need a custom condition then, something like or(failed(), timedout()). Is it possible?
Context
We have this intermittent problem with the npm install task that we can't find a reason for but it is resolved with next job run, so we were searching for a retry functionality. Partial solution was to duplicate npm install and use the Control Option but it wasnt working for all "failure" cases. Solution gave by #Levi Lu-MSFT seems to be working for all our needs (it does retry) but sadly it doesnt solve the problem, 2nd line repeated task also fails.
Sample errors:
20741 error stack: 'Error: EPERM: operation not permitted, unlink \'C:\\agent2\\_work\\4\\s\\node_modules\\.staging\\typescript-4440ace9\\lib\\tsc.js\'',
20741 error errno: -4048,
20741 error code: 'EPERM',
20741 error syscall: 'unlink',
20741 error path: 'C:\\agent2\\_work\\4\\s\\node_modules\\.staging\\typescript-4440ace9\\lib\\tsc.js',
20741 error parent: 's' }
20742 error The operation was rejected by your operating system.
20742 error It's possible that the file was already in use (by a text editor or antivirus),
20742 error or that you lack permissions to access it.
or
21518 verbose stack SyntaxError: Unexpected end of JSON input while parsing near '...ter/doc/TypeScript%20'
21518 verbose stack at JSON.parse (<anonymous>)
21518 verbose stack at parseJson (C:\agent2\_work\_tool\node\8.17.0\x64\node_modules\npm\node_modules\json-parse-better-errors\index.js:7:17)
21518 verbose stack at consumeBody.call.then.buffer (C:\agent2\_work\_tool\node\8.17.0\x64\node_modules\npm\node_modules\node-fetch-npm\src\body.js:96:50)
21518 verbose stack at <anonymous>
21518 verbose stack at process._tickCallback (internal/process/next_tick.js:189:7)
21519 verbose cwd C:\agent2\_work\7\s
21520 verbose Windows_NT 10.0.14393
21521 verbose argv "C:\\agent2\\_work\\_tool\\node\\8.17.0\\x64\\node.exe" "C:\\agent2\\_work\\_tool\\node\\8.17.0\\x64\\node_modules\\npm\\bin\\npm-cli.js" "install"
21522 verbose node v8.17.0
21523 verbose npm v6.13.4
21524 error Unexpected end of JSON input while parsing near '...ter/doc/TypeScript%20'
21525 verbose exit [ 1, true ]
Sometimes also time's out
It is possible to add a custom condition. If you want the task to be executed when previous task failed or skipped, you can use custom condition not(succeeded())
However there is a problem with above custom condition, it does not work in the multiple tasks scenario.
For example, there are three tasks A,B,C. The expected behavior is Task C gets executed only when Task B failed. But the actual behavior is Task C will also get executed when Task A failed even if Task B succeeded. Check below screenshot.
The workaround for above problem is to add a script task to call azure devops restful api to get the status of Task B and set it to a variable using this expression echo "##vso[task.setvariable variable=taskStatus]taskStatus".
For below example, Add a powershell task (You need to set conditon for this task to Even if a previous task has failed, even if the build was canceled to always run this powershell task) before Task C to run below inline scripts:
$url = "$(System.TeamFoundationCollectionUri)$(System.TeamProject)/_apis/build/builds/$(Build.BuildId)/timeline?api-version=5.1"
$result = Invoke-RestMethod -Uri $url -Headers #{authorization = "Bearer $env:SYSTEM_ACCESSTOKEN"} -ContentType "application/json" -Method get
#Get the task B's task result
$taskResult = $result.records | where {$_.name -eq "B"} | select result
#set the Task B's taskResult to variable taskStatus
echo "##vso[task.setvariable variable=taskStatus]$($taskResult.result)"
In order above scripts can access the access token, you also need to click the Agent job and check option Allow scripts to access the OAuth token. Refer to below screenshot.
At last you can use custom condition and(not(canceled()), ne(variables.taskStatus, 'succeeded')) for Task C. Task C should be executed only when Task B not succeeded.
Although I failed to find a built-in function to detect if a build step is timed out, you can try to emulate this with the help of variables.
Consider the following YAML piece of pipeline declaration:
steps:
- script: |
echo Hello from the first task!
sleep 90
echo "##vso[task.setvariable variable=timedOut]false"
timeoutInMinutes: 1
displayName: 'A'
continueOnError: true
- script: echo Previous task has failed or timed out!
displayName: 'B'
condition: or(failed(), ne(variables.timedOut, 'false'))
The first task (A) is set to time out after 1 minute, but the script inside emulates the long-running task (sleep 90) for 1.5 minutes. As a result, the task times out and the timedOut variable is NOT set to false. Hence, the condition of the task B evaluates to true and it executes. The same happens if you replace sleep 90 with exit 1 to emulate the task A failure.
On the other hand, if task A succeeds, neither of the condition parts of task B evaluates to true, and the whole task B is skipped.
This is a very simplified example, but it demonstrates the idea which you can tweak further to satisfy the needs of your pipeline.

How to manipulate the status of current job-execution from inside of an inline script?

The following code returns an error to rundeck.
#!/bin/bash
exit -1
And rundeck decides how to deal with it by running the next step or changing the execution "status" to "failed".
I would like to modify the status directly by inline script to support more than 2 states. I need "succeeded", "failed" and "nodata" to express that the data are missing.
Is there a way to express this?
There is none. Just like bash can return zero or non-zero
One possible alternative is raise an exception with message nodata and exit with non-zero code. Rundeck will mark this job as fail with NonZeroResultCode error. You should be able to get your error message nodata with ${result.message}

Buildbot: how to skip a step if a file doesn't exist?

I need to skip a build step when building some branches.
More exactly, I want to execute a ShellCommand step only if the script to be ran is present on the source tree.
I tried:
ShellCommand(command=["myscript"],
workdir="path/to",
doStepIf=(lambda step: os.path.isfile("path/to/myscript")))
but the step is never executed.
def doesMyCriticalFileExist(step):
if step.getProperty("myCriticalFileExists"):
return True
return False
<factory>.addStep(SetProperty(command='[ -f /path/to/myscript ] && ls -1 /path/to/myscript || exit 0', property='myCriticalFileExists')))
<factory>.addStep(ShellCommand(command=["myscript"], workdir="path/to", doStepIf=doesMyCriticalFileExist))
The better thing to do is to set a property in a previous step and then check the property in your doStepif method. the os.path.isfile you have there gets run at configure time, (buildbot startup) not run time.
I ended up solving this by letting the step run unconditionally but setting haltOnFailure=False. This way if the required file doesn't exist it fails but doesn't kill the rest of the build.
Just use simple python if statement before the step:
if(condition):
your buildbot step