Is it possible to rerun a failed pipeline on a different agent? - azure-devops

Should be a straightforward question, we had a pipeline run that failed on a hosted agent and we have been experimenting with switching to self-hosted as a test. Is there any way to re-run the failed job and get it to use a different agent pool? I suspect the answer is no but wanted to double check
Cheers

As far as I know not. Why because each run creates kind of snapshot of variables/parameters. So even if you parameterize pool it will be bound to value from the snapshot.

Related

Is it possible in Azure DevOps Pipelines to wait for enough agents to run the job

We are running into the following issue:
We have a job in our pipeline that runs tests. The number of tests need to be distributed over 4 agents to run optimal. It can happen that only one agent is available and the job will start to run all the load on that specific agent, which can then time-out because it takes too long for other agents to become available in time to share in the load.
In essence, if we run with 4 agents, the job will run with optimal efficiency.
My question: is it possible to let a job wait for a specific number of agents to become available before starting the tasks in the job?
That`s not possible through out-of-box features.... But you may create a simple PowerShell script that will query your agents statuses: https://learn.microsoft.com/en-us/rest/api/azure/devops/distributedtask/agents/list?view=azure-devops-rest-7.1
and use includeAssignedRequest
GET https://dev.azure.com/{organization}/_apis/distributedtask/pools/{poolId}/agents?includeAssignedRequest={includeAssignedRequest}&api-version=7.1-preview.1
if you see assignedRequest, your build agent is busy...

Waiting on job "x" to finish in pipeline "1" before running job "x" in pipeline "2" (Azure DevOps)

We're using SonarQube for tests, and there's one token it uses, as long as one pipeline is running, it goes fine, but if I run different pipelines (all of them have E2E tests as final jobs), they all fail, because they keep calling a token that expires as soon as its used by one pipeline (job). Would it be possible to have -all- pipelines pause at job "x" if they detect some pipeline running job "x" already? The jobs have same names across all pipelines. Yes, I know this is solved by just running one pipeline at a time, but that's not what my devs wanna do.
The best way to make jobs run one by one is set demands for your agent job to run on a specific self-hosted agent. Just as below, set a user-defined capabilities for the self-hosted agent and then require run on the agent by setting demands in agent job.
In this way, agent jobs will only run on this agent. Build will run one by one until the previous one complete.
Besides, you could control if a job should run by defining approvals and checks. By using Invoke REST API check to make a call to a REST API such as Gets a list of builds, and define the success criteria as build count is zero, then, next build starts.

Azure DevOps Agent - Custom Setup/Teardown Operations

We have a cloud full of self-hosted Azure Agents running on custom AMIs. In some cases, I have some cleanup operations which I'd really like to do either before or after a job runs on the machine, but I don't want the developer waiting for the job to wait either at the beginning or the end of the job (which holds up other stages).
What I'd really like is to have the Azure Agent itself say "after this job finishes, I will run a set of custom scripts that will prepare for the next job, and I won't accept more work until that set of scripts is done".
In a pinch, maybe just a "cooldown" setting would work -- wait 30 seconds before accepting another job. (Then at least a job could trigger some background work before finishing.)
Has anyone had experience with this, or knows of a workable solution?
I suggest three solutions
Create another pipeline to run the clean up tasks on agents - you can also add demand for specific agents (See here https://learn.microsoft.com/en-us/azure/devops/pipelines/process/demands?view=azure-devops&tabs=yaml) by Agent.Name -equals [Your Agent Name]. You can set frequency to minutes, hours, as you like using cron pattern. As while this pipeline will be running and taking up the agent, the agent being cleaned will not be available for other jobs. Do note that you can trigger this pipeline from another pipeline, but if both are using the same agents - they can just get deadlocked.
Create a template containing scripts tasks having all clean up logic and use it at the end of every job (which you have discounted).
Rather than using static VM's for agent hosting, use Azure scaleset for Self hosted agents - everytime agents are scaled down they are gone and when scaled up they start fresh. This saves a lot of money in terms of sitting agents not doing anything when no one is working. We use this option and went away from static agents. We have also used packer to create the VM image/vhd overnight to update it with patches, softwares required, and docker images cached.
ref: https://learn.microsoft.com/en-us/azure/devops/pipelines/agents/scale-set-agents?view=azure-devops
For those discovering this question, there's a much better way: run your self-hosted agent with the --once flag, documented here.
You'll need to wrap it in your own bash script, but something like this works:
while :
do
echo "Performing pre-job setup..."
echo "Waiting for job..."
./run.sh --once
echo "Cleaning up..."
sleep 2
done
Another option would be to use a ScaleSet VM setup which preps a new agent for each run and discards the VM when the run is done. It can prepare new VMs in the background while the job is running.
And I suspect you could implement your own IMaintenanceProvirer..
https://github.com/microsoft/azure-pipelines-agent/blob/master/src/Agent.Worker/Maintenance/MaintenanceJobExtension.cs#L53

Azure Data Factory: what happens if Self-Hosted IR is down

Let's say we need to maintain and reboot a Self-Hosted Integration Runtime machine. We only have one node. At the same, some pipelines may be running. What will happen with activities that are normally scheduled on this SHIR. Will they fail immediately once it's not available, or will they remain in the "waiting" state up to their maximum Timeout value, until a runtime comes back up?
I'd assume it's the latter but wanted to confirm.
I did a quick try out by stopping the Self-hosted IR service.
In ADF, the test connection from linked services return error:
Copy activity that involves the self-hosted IR failed immediately:

Azure worker - Complete the release pipeline

I have a release pipeline composed of several stages
The tfs server have only one worker.
When i start a release, the worker run stages randomly.
The problem is : if someone else start another pipeline, sometimes the worker take the other pipeline before getting back to the next stage.
Is there a way to lock the worker on the entire release pipeline ?
When running a pipeline, a job is the unit of scale. This means each job can potentially run on a different agent.
Each job runs on an agent. A job represents an execution boundary of a set of steps. All of the steps run together on the same agent.
Next to that,
A stage is a logical boundary in the pipeline. It can be used to mark separation of concerns (for example, Build, QA, and production).
More information: Azure Pipelines - Key concepts.
You should also have a look at the Pipeline run sequence. It clearly explains how the entire pipeline-process works.
Whenever Azure Pipelines needs to run a job, it will ask the pool for an agent.
If the availability of the agent is the issue, you might want to add more agents to the pool. If needed, you can even run multiple agents on the same machine.
Although multiple agents can be installed per machine, we strongly suggest to only install one agent per machine. Installing two or more agents may adversely affect performance and the result of your pipelines.