how to kill a process on devops (ideally with a timeout) - azure-devops

I have a complex devops build script in yaml. Is there some way that if a given step takes too much time the process is killed (or some task is executes which kills certain processed).
This is would be useful in our case where we have large tests suites in several DLLs. I am seeing often that some tests fail and after devops hangs. I would like to kill the testrunner and other processes which may be hanging with (and also without) a timeout.
Is this possible on devops?

You can specify timeoutInMinutes and cancelTimeoutInMinutes for the job:
jobs:
- job: Test
timeoutInMinutes: 10 # how long to run the job before automatically cancelling
cancelTimeoutInMinutes: 2 # how much time to give 'run always even if cancelled tasks' before stopping them
More information: https://learn.microsoft.com/en-us/azure/devops/pipelines/process/phases?view=azure-devops&tabs=yaml#timeouts

Related

Is there a way to configure retries for Azure DevOps pipeline tasks or jobs?

Currently I have a OneBranch DevOps pipeline that fails every now and then while restoring packages. Usually it fails because of some transient error like a socket exception or timeout. Re-trying the job usually fixes the issue.
Is there a way to configure a job or task to retry?
Azure Devops now supports the retryCountOnTaskFailure setting on a task to do just this.
See this page for further information:
https://learn.microsoft.com/en-us/azure/devops/release-notes/2021/pipelines/sprint-195-update
Update:
Automatic retries for a task was added and when you read this it should be available for usage.
It can be used as follow:
- task: <name of task>
retryCountOnTaskFailure: <max number of retries>
...
Here are a few things to note when using retries:
The failing task is retried immediately.
There is no assumption about the idempotency of the task. If the task has side-effects (for instance, if it created an external resource partially), then it may fail the second time it is run.
There is no information about the retry count made available to the task.
A warning is added to the task logs indicating that it has failed before it is retried.
All of the attempts to retry a task are shown in the UI as part of the same task node.
Original answer:
There is no way of doing that with native tasks. However, if you can script then you can put such logic inside.
You could do this for instance in this way:
n=0
until [ "$n" -ge 5 ]
do
command && break # substitute your command here
n=$((n+1))
sleep 15
done
However there is no native way of doing this for regular tasks.
Automatically retry a task in on roadmap so it could change in near future.

AzDO ManualValidation step failing in YAML pipeline with no explanation of why

I'm converting to a full YAML AzDO pipeline and need to wait for manual validation for certain stages of my pipeline. Added the new ManualValidation task into a serverless job, however it fails immediately with no details about why. I did add a Delay task in there as well (just as a sanity check to make sure my serverless job was actually running successfully), and it runs fine.
- job: waitForValidation
displayName: Wait for external validation
pool: Server
timeoutInMinutes: 4320 # job times out in 3 days
steps:
- task: Delay#1
inputs:
delayForMinutes: '1'
- task: ManualValidation#0
timeoutInMinutes: 1440 # task times out in 1 day
inputs:
notifyUsers: |
me#email.com
you#email.com
instructions: 'Please validate deployment can continue and resume'
onTimeout: 'reject'
These are the docs I'm using:
https://learn.microsoft.com/en-us/azure/devops/pipelines/tasks/utility/manual-validation?view=azure-devops&tabs=yaml
I also dropped into the GitHub project just to make sure the task is still version 0 (it is).
Suggestions on why this might be failing and/or ways I can get some more details in the pipeline about WHY it failed?
Turns out we are actually using AzDO Server, not AzDO Services (thanks, Microsoft for naming them so similarly) and this task is not yet available in the Server version :(
For anyone also frustrated by this lack of functionality on-prem, here’s the documentation on using Deployment Jobs and some about Environments
We are able to get most of the functionality we were looking for this way, thou it does require setting up environments.

How can I kill (not cancel) an errant Azure Pipeline run, stage, job, or task?

I want to know how to kill an Azure Pipeline task (or any level of execution - run, stage, job, etc.), so that I am not blocked waiting for an errant pipeline to finish executing or timeout.
For example, canceling the pipeline does not stop it immediately if a condition is configured incorrectly. If the condition resolves to true the task will execute even if the pipeline is cancelled.
This is really painful if your org/project only has 1 agent. :(
How can I kill (not cancel) an errant Azure Pipeline run, stage, job, or task?
For the hosted agent, we could not kill that azure pipeline directly, since we cannot directly access the running machine.
As workaround, we could reduce the time that the task continues to run after the job is cancelled by setting a shorter Build job cancel timeout in minutes:
For example, I create a pipeline with task, which will still run for 60 minutes after the job is cancelled. But if I set the value of Build job cancel timeout in minutes to 2 mins, the azure pipeline will be cancelled completely.
For the private agent, we could run services.msc, and look for "VSTS Agent (name of your agent)". Right-click the entry and then choose restart.

azure devops build pipeline reduce the timeout to 30 minutes

Is there a way to change the timeout for build pipeline, currently the pipeline time's out after 60 mintues. I want to reduce it to 30 minutes.
I looked at all the organization settings and project settings, but not able to find anything on the UI
Or else can it be set from YAML?
For a YAML pipeline the documentation says you can write
jobs:
- job: Test
timeoutInMinutes: 10 # how long to run the job before automatically cancelling
cancelTimeoutInMinutes: 2 # how much time to give 'run always even if cancelled tasks' before stopping them
timeoutInMinutes: 0 should also work for individual tasks, and 0 means max value (infinite for self-hosted agents).
azure devops build pipeline reduce the timeout to 30 minutes
Edit the pipeline you want to modify. On the Options tab, there is an option Build job timeout in minutes, which you can set the Build job timeout, the default value is 60 minutes.
This timeout are including all tasks in your build pipeline rather than a particular job, if one of your build step out of time. Then the whole build definition will be canceled by the server. Certainly, the whole build fails and all subsequent steps are aborted.
As per documentation ,
On the Options tab you can specify default values for all jobs in the
pipeline. If you specify a non-zero value for the job timeout, then it
overrides any value that is specified in the pipeline options. If you
specify a zero value, then the timeout value from the pipeline options
is used. If the pipeline value is also set to zero, then there is no
timeout.
more on,
https://learn.microsoft.com/en-us/azure/devops/pipelines/process/phases?view=azure-devops&tabs=classic&viewFallbackFrom=vsts#timeouts

Does rundeck support jobs dependencies?

I've been searching for days on how to layout a rundeck workflow with job dependencies. what I need to do is to have 3 jobs: job-1 and job-2 are scheduled to run in parallel while job-3 will only be triggered after the completion of both job-1, and job-2. assuming that job-1 and job-2 have different execution times.
I tried using job state conditionals to do that but it seems that the condition if not met will halt or fail only. My idea is to halt the execution until all the parent jobs completes and then resume the workflow.
You can achieve this by compiling a master job which includes 2 steps:
step: job-1 and job-2 as a sub-job which includes both (run in parallel if node oriented execution is selected)
step: job-3
But not all 3 in in the same flow.
Right now you can use Job State Conditional feature for that: https://docs.rundeck.com/2.9.4/plugins-user-guide/bundled-plugins.html#job-state-plugin
Rundeck cannot do this for you automatically. You can set a scheduler for job-3 to run after the max timestamp of job1 or job2. Enable "retry" for job3 incase the dependencies would be fail.