Deployment started failing - decryption operation failed - azure-devops

Deployment task of a pipeline has been working fine until yesterday. No changes that Im aware of. This is for a build that uses a deployment agent on an on-prem target. Not sure where to look, other than possibly reinstall the build agent via the script?
2021-08-11T18:36:20.7233450Z ##[warning]Failed to download task 'DownloadBuildArtifacts'. Error The decryption operation failed, see inner exception.
2021-08-11T18:36:20.7243097Z ##[warning]Inner Exception: {ex.InnerException.Message}
2021-08-11T18:36:20.7247393Z ##[warning]Back off 28.375 seconds before retry.
2021-08-11T18:36:51.6834124Z ##[error]The decryption operation failed, see inner exception.

Plese check announcement here
The updated implementation of BuildArtifact tasks requires an agent upgrade, which should happen automatically unless automatic upgrades have been specifically disabled or the firewalls are incorrectly configured.
If your agents are running in firewalled environments that did not follow the linked instructions, they may see failures when updating the agent or in the PublishBuildArtifacts or DownloadBuildArtifacts tasks until the firewall configuration is corrected.
A common symptom of this problem are sudden errors relating to ssl handshakes or artifact download failures, generally on deployment pools targeted by Release Management definitions. Alternatively, if agent upgrades have been blocked, you might observe that releases are waiting for an agent in the pool that never arrives, or that agents go offline half-way through their update (this latter is related to environments that erroneously block the agent CDN).
To fix that, please update your self hosted agents.

Related

Azure Data Factory: what happens if Self-Hosted IR is down

Let's say we need to maintain and reboot a Self-Hosted Integration Runtime machine. We only have one node. At the same, some pipelines may be running. What will happen with activities that are normally scheduled on this SHIR. Will they fail immediately once it's not available, or will they remain in the "waiting" state up to their maximum Timeout value, until a runtime comes back up?
I'd assume it's the latter but wanted to confirm.
I did a quick try out by stopping the Self-hosted IR service.
In ADF, the test connection from linked services return error:
Copy activity that involves the self-hosted IR failed immediately:

Why is my Azure DevOps Migration timing out after several hours?

I have a long running Migration (don't ask) being run by an AzureDevOps Release pipeline.
Specifically, it's an "Azure SQL Database deployment" activity, running a "SQL Script File" Deployment Type.
Despite having configured maximums in all the timeouts in the Invoke-Sql Additional Parameters settings, my migration is still timing out.
Specifically, I get:
We stopped hearing from agent Hosted Agent. Verify the agent machine is running and has a healthy network connection. Anything that terminates an agent process, starves it for CPU, or blocks its network access can cause this error.
So far it's timed out after:
6:13:15
6:13:18
6:14:41
6:10:19
So "after 6 and a bit hours". It's ~22,400 seconds, which doesn't seem like any obvious kind of number either :)
Why? And how do I fix it?
It turns out that AzureDevOps uses Hosting Agents, to execute each Task in a pipeline, and those Agents have innate lifetimes, independent from whatever task they're running.
https://learn.microsoft.com/en-us/azure/devops/pipelines/troubleshooting/troubleshooting?view=azure-devops#job-time-out
A pipeline may run for a long time and then fail due to job time-out. Job timeout closely depends on the agent being used. Free Microsoft hosted agents have a max timeout of 60 minutes per job for a private repository and 360 minutes for a public repository. To increase the max timeout for a job, you can opt for any of the following.
Buy a Microsoft hosted agent which will give you 360 minutes for all jobs, irrespective of the repository used
Use a self-hosted agent to rule out any timeout issues due to the agent
Learn more about job timeout.
So I'm hitting the "360 minute" limit (presumably they give you a little extra on top, so that no-one complains?).
Solution is to use a self-hosted agent. (or make my Migration run in under 6 hours, of course)

Running issue when buid defenation execute using microsoft host agent?

When I execute the build definition using Microsoft agent, the below error is reflecting
Error: The agent request is not running because all potential agents are running other requests. The current position in the queue: 1
The build execution took days and it is bonded. Can someone please help me to run out of this issue?
Scenarios I tried:
reinstalling the self-hosted agent and reconfigure the agent again.
I am trying with Microsoft agent "Azure pipelines"
Running issue when buid defenation execute using microsoft host agent?
If your pipeline queues but never gets an agent, check the following items:
Parallel job limits - no available agents or you have hit your free limits
You don't have enough concurrency
Your job may be waiting for approval
All available agents are in use
Demands that don't match the capabilities of an agent
Check Azure DevOps status for a service degradation
Please check this document for some more details.
Note: Please also check if Microsoft Hosted Pool `Azure Pipelines' are stuck on builds that you cancelled and deleted.
If above not help you, please share more info about your definition to help us to find the reason for this issue:
Agent pool info:
Execution plan info:
Parallel jobs info (And click the link View in-progress jobs):

How isUpgrade setting affects deployment process in Service Fabric Application Deployment task in Azure DevOps

Azure Devops has a standard task for deploying apps to ServiceFabric. The task is named Service Fabric Application Deployment and is documented here. Among other settings, it contains an optional boolean isUpgrade setting (default value 'true'). I tried to set it explicitly to true and false, but I did not find any difference in the behavior of the task. In both cases, the deployment was successful, all previously deployed packages were still provisioned, and Azure Pipelines logs were the same. The time of the deployment was the same, too.
My question is what the setting affects? Maybe, somebody has used it in his CI pipelines.
There are 2 types of deployment in Service Fabric. This isUpgrade flag controls which type op upgrade you are executing.
Regular
Basically this removes the old application and deploys the new version. So if you have Statefull services, this will remove all state. You will have downtime when you do a regular upgrade.
Upgrade
An upgrade will do a lot of things, It will keep the state, it will do health checking, make sure the services are available. Does a rollback when the healthcheck fails, ...
If your application or services didn't change, nothing changes in your cluster.
Typically an upgrade will take more time (This is highly dependent on your health check rules). See the application upgrade flowchart
More info about the 2 types
If you look at the code of the task. You see that it will only take effect if overridePublishProfileSettings is true. Otherwise the PulishProfile.xml is used.

How to auto retry deployments to agents when they come online again (after having been offline)

When using Azure pipelines and deployment groups it is possible to re-deploy the "last successful" release to new agents with given "tags" using the instructions found here:
https://learn.microsoft.com/en-us/azure/devops/release-notes/2018/jul-10-vsts#automatically-deploy-to-new-targets-in-a-deployment-group
My issue is when releasing to a deployment group consisting of 3 machines. 2 are online and 1 is periodically offline. In this situation my release fails when the 1 machine is offline. This would be OK by me if Azure pipelines retried the deployment when machine offline comes back online. I thought this would work in the same way as "new targets", but I still haven't figured out how.
This is just a small test. When going in production my deployment group will consist of hundreds of machines and not all of them will be online at the same time.
So - Is it possible to automate the process to ensure all machines eventually will be up to date when all of them have been online?
Octopus-deploy seems to have this feature
https://help.octopusdeploy.com/discussions/questions/9351-possibility-to-deploy-when-agent-become-online
https://octopus.com/docs/deployment-patterns/elastic-and-transient-environments/deploying-to-transient-targets
Status after failed deployment
(and target is online again)
Well, in general the queued deployments will be automatically triggered once the agent is online. But for the failed deployments you have to re-deploy them manually. No any way to retry it automatically when the agent is online again...
Based on my test, to redeploy to all "not-updated-agents", you have to remove the other target machines which passed in previous deployment from deployment group...