Why is my Azure DevOps Migration timing out after several hours? - azure-devops

I have a long running Migration (don't ask) being run by an AzureDevOps Release pipeline.
Specifically, it's an "Azure SQL Database deployment" activity, running a "SQL Script File" Deployment Type.
Despite having configured maximums in all the timeouts in the Invoke-Sql Additional Parameters settings, my migration is still timing out.
Specifically, I get:
We stopped hearing from agent Hosted Agent. Verify the agent machine is running and has a healthy network connection. Anything that terminates an agent process, starves it for CPU, or blocks its network access can cause this error.
So far it's timed out after:
6:13:15
6:13:18
6:14:41
6:10:19
So "after 6 and a bit hours". It's ~22,400 seconds, which doesn't seem like any obvious kind of number either :)
Why? And how do I fix it?

It turns out that AzureDevOps uses Hosting Agents, to execute each Task in a pipeline, and those Agents have innate lifetimes, independent from whatever task they're running.
https://learn.microsoft.com/en-us/azure/devops/pipelines/troubleshooting/troubleshooting?view=azure-devops#job-time-out
A pipeline may run for a long time and then fail due to job time-out. Job timeout closely depends on the agent being used. Free Microsoft hosted agents have a max timeout of 60 minutes per job for a private repository and 360 minutes for a public repository. To increase the max timeout for a job, you can opt for any of the following.
Buy a Microsoft hosted agent which will give you 360 minutes for all jobs, irrespective of the repository used
Use a self-hosted agent to rule out any timeout issues due to the agent
Learn more about job timeout.
So I'm hitting the "360 minute" limit (presumably they give you a little extra on top, so that no-one complains?).
Solution is to use a self-hosted agent. (or make my Migration run in under 6 hours, of course)

Related

Is it possible in Azure DevOps Pipelines to wait for enough agents to run the job

We are running into the following issue:
We have a job in our pipeline that runs tests. The number of tests need to be distributed over 4 agents to run optimal. It can happen that only one agent is available and the job will start to run all the load on that specific agent, which can then time-out because it takes too long for other agents to become available in time to share in the load.
In essence, if we run with 4 agents, the job will run with optimal efficiency.
My question: is it possible to let a job wait for a specific number of agents to become available before starting the tasks in the job?
That`s not possible through out-of-box features.... But you may create a simple PowerShell script that will query your agents statuses: https://learn.microsoft.com/en-us/rest/api/azure/devops/distributedtask/agents/list?view=azure-devops-rest-7.1
and use includeAssignedRequest
GET https://dev.azure.com/{organization}/_apis/distributedtask/pools/{poolId}/agents?includeAssignedRequest={includeAssignedRequest}&api-version=7.1-preview.1
if you see assignedRequest, your build agent is busy...

Azure DevOps Agent - Custom Setup/Teardown Operations

We have a cloud full of self-hosted Azure Agents running on custom AMIs. In some cases, I have some cleanup operations which I'd really like to do either before or after a job runs on the machine, but I don't want the developer waiting for the job to wait either at the beginning or the end of the job (which holds up other stages).
What I'd really like is to have the Azure Agent itself say "after this job finishes, I will run a set of custom scripts that will prepare for the next job, and I won't accept more work until that set of scripts is done".
In a pinch, maybe just a "cooldown" setting would work -- wait 30 seconds before accepting another job. (Then at least a job could trigger some background work before finishing.)
Has anyone had experience with this, or knows of a workable solution?
I suggest three solutions
Create another pipeline to run the clean up tasks on agents - you can also add demand for specific agents (See here https://learn.microsoft.com/en-us/azure/devops/pipelines/process/demands?view=azure-devops&tabs=yaml) by Agent.Name -equals [Your Agent Name]. You can set frequency to minutes, hours, as you like using cron pattern. As while this pipeline will be running and taking up the agent, the agent being cleaned will not be available for other jobs. Do note that you can trigger this pipeline from another pipeline, but if both are using the same agents - they can just get deadlocked.
Create a template containing scripts tasks having all clean up logic and use it at the end of every job (which you have discounted).
Rather than using static VM's for agent hosting, use Azure scaleset for Self hosted agents - everytime agents are scaled down they are gone and when scaled up they start fresh. This saves a lot of money in terms of sitting agents not doing anything when no one is working. We use this option and went away from static agents. We have also used packer to create the VM image/vhd overnight to update it with patches, softwares required, and docker images cached.
ref: https://learn.microsoft.com/en-us/azure/devops/pipelines/agents/scale-set-agents?view=azure-devops
For those discovering this question, there's a much better way: run your self-hosted agent with the --once flag, documented here.
You'll need to wrap it in your own bash script, but something like this works:
while :
do
echo "Performing pre-job setup..."
echo "Waiting for job..."
./run.sh --once
echo "Cleaning up..."
sleep 2
done
Another option would be to use a ScaleSet VM setup which preps a new agent for each run and discards the VM when the run is done. It can prepare new VMs in the background while the job is running.
And I suspect you could implement your own IMaintenanceProvirer..
https://github.com/microsoft/azure-pipelines-agent/blob/master/src/Agent.Worker/Maintenance/MaintenanceJobExtension.cs#L53

Azure Data Factory: what happens if Self-Hosted IR is down

Let's say we need to maintain and reboot a Self-Hosted Integration Runtime machine. We only have one node. At the same, some pipelines may be running. What will happen with activities that are normally scheduled on this SHIR. Will they fail immediately once it's not available, or will they remain in the "waiting" state up to their maximum Timeout value, until a runtime comes back up?
I'd assume it's the latter but wanted to confirm.
I did a quick try out by stopping the Self-hosted IR service.
In ADF, the test connection from linked services return error:
Copy activity that involves the self-hosted IR failed immediately:

Timeout for a whole pipeline w/manual steps in Azure DevOps

I have created a release pipeline in Azure DevOps, with several stages, deployments to each environment. On some of the environments (Test and Production), I have manual approval tasks (not set in YAML, but on the environment). If the approval task is not performed within a set time, I want the whole pipeline to cancel.
I have set a timeoutInMinutes on the stage itself, however, the timeout never starts, as the stage is waiting for the approval before it can start at all.
I haven't found a way to set a timeout on the approval/review activity, nor have I found a way to have a different stage/job independent of the others sit and wait for a timeout and cancel the job with a logging command ##vso[task.complete result=Canceled;]DONE
See the screenshot. The pipeline just sits and waits forever. Any ideas?
Timeout for a whole pipeline w/manual steps in Azure DevOps
Yes, you are right. I could reproduced this issue on my side.
As we know, the Timeout is used:
To avoid taking up resources when your job is hung or waiting too
long, it's a good idea to set a limit on how long your job is allowed
to run.
When we set the checks on the stage. Our job is in the pending state not Running, At this time, the timeout we set has not yet started working. It only starts timing after our job starts running. So, we need a timeout for the checks, just like the timeout for Pre-deployment approvals:
I could not find any solution/workaround after spending a long time, but I found this is a highly prioritized feature requirement that has been tracked by the Azure Devops teams after confirming with Azure devops teams.
Now, I could see the status of this feature timeout for the checks request is In Progress at sprint 158,
I believe this feature will meet us soon, please pay attention to the release notes of Azure devops. Thank you for helping us build a better Azure DevOps.
Hope this helps.

Can one private agent from VSTS be installed on multiple VM's?

I have installed agent on VM and configured a CI build pipeline. The pipeline is triggered and works perfectly fine.
Now I want to use same build pipeline, same agent, but different VM. Is this possible?
How will the execution happen for builds and on which VM will the source be copied?
Thank you.
Like the others I'm also not sure what you're trying to do and also think that the same agent across multiple machines is not possible.
But if you have to alternate or choose easily between VMs, you could set up for each of your VMs (used for this special scenario) an individual agent queue with one agent in that pool. That way you can choose the agent pool at queue time via the agent queue dropdown field. But that would only work if you're triggering manually, not in a typical CI scenario. In that case you would have to edit the definition to enforce any particular VM each time you want to swap VMs.
NO. These private agents are supposed to have a unique name and are assigned to an Agent Pool/Queue. They are polling up to VSTS/Azure Devops server if they have a job to do. Then they execute it. If you clone a machine with the same private build agent, then theoretically the agent that picks it up will execute the job, but that is theoretic. I really don't know how the Agent Queues will handle this.
It depends on what you want to do.
If you want to spread the workload, like 2 build servers and have builds go to whichever build server isn't busy, then you would create 1 Agent Pool/Queue. Create a Private Agent on one server and register it to that Pool, then on the second server un-register the agent and then re-register the agent add it to the SAME pool.
If you want to do work on 2 servers at the exact same time, like a deployment to 2 servers at the same time, then you would create a 'Deployment Group' and add both servers to that. You would unregister both agents from the Agent Pool/Queue. From your 'Deployment Group' copy the PowerShell script snippet and run it on each machine. This way you can use this in your Release Pipeline and deployments in parallel, which take less time to do deployments.
You could set up a variable in the pipeline so you can specify the name of the VM at build-time.
Also, once you have one or more agents, you would add them to an app pool. When builds are run, it will choose one agent from the pool and use that.