Azure Data Factory Pipeline stuck in "PendingUpdate" provisioning status on deployment - azure-data-factory

I am a having a problem for the last couple of hours, upon deployment of a pipeline, it goes into the provisioning state and then gets stuck there.
It then eventually fails with one of these two errors:"Failed to reach service" or "Internal Server Error."
The state of the pipeline is stuck at "Pending Update"
I have even created a Data Factory from scratch, created new linked services and data sets, and created the pipeline within that data factory, and the same thing is happening.
What can be causing this issue?
I have to add that I was using DF and deploying pipelines and everything was going well, this was a sudden issue.

We have had occasional service incidents in the past which have transiently impacted provisioning of ADF entities. If you are still experiencing this issue please open a support ticket with the specific info for your data factory. We appreciate your feedback!

Related

Container App Environment creation timing out

Where I work has just started migrating to the cloud. We've successfully deployed a number of resources using Terraform and Pipelines into Azure.
Where we are running into issues is deploying a Container App Environment, we have code that was working in a less locked down environment (setup for Proof of Concept), but are now having issues using that code in our go-forward.
When deploying, the Container App Environment spends 30mins attempting to create before it returns a context deadline exceeded error. Looking in Azure Portal, I can see the resource in "Waiting" provisioning state and I can also see the MC_ and AKS resources that get generated. It then fails around 4hrs later.
Any advice?
I am suspecting it's related to security on the Virtual Network that the subnets are sitting on, but I'm not seeing any logs on the deployment to confirm. The original subnets had a Network Security Group (NSG) assigned and I configured the rules that Microsoft provide before I added a couple of subnets without an NSG assigned and no luck.
My next step is to try provisioning it via the GUI and see if that works.
I managed to break our build in the "anything goes" environment.
The root cause is an incomplete configuration of the Virtual Network which has custom DNS entries. This has now been passed to our network architects to resolve. If I can get more details on the fix they apply I'll include that here for anyone else that runs into the issue.

CloudFormation: stack is stuck, CloudTrail events shows repeating DeleteNetworkInterface event

I am deploying a stack with CDK. It gets stuck in CREATE_IN_PROGRESS. CloudTrail logs show repeating events in logs:
DeleteNetworkInterface
CreateLogStream
What should I look at next to continue debugging? Is there a known reason for this to happen?
I also saw the exact same issue with the deployment of a CDK-based ECS/Fargate Deployment
In my instance, I was able to diagnose the issue by following the content from the AWS support article https://aws.amazon.com/premiumsupport/knowledge-center/cloudformation-stack-stuck-progress/
What specifically diagnosed and then resolved it for me:-
I updated my ECS service to set the desired task count of the ECS Service to 0. At that point the Cloud Formation stack did complete successfully.
From that, it became obvious that the actual issue was related to the creation of the initial task for my ECS Service. I was able to diagnose that by reviewing the output in Deployment and Events Tab of the ECS Service in the AWS Management Console. In my case, the task creation was failing because of an issue with accessing the associated ECR repository. Obviously there could be other reasons but they should show-up there.

Azure DevOps Sever to Services Migration Time Longer for ProductionRun?

A few days ago I successfully completed a DryRun for our organization's migration from Azure DevOps Server to Azure DevOps Services and it took ~12 hours. Last night I kicked off the ProductionRun migration and it's currently still on Step 1 of 7 after 14 hours. Does anyone know if this is normal? Does the ProductionRun typically take longer? I was hoping it would be ready by this morning.
There are not determinate consuming time for ProductionRun migration or DryRun. And the consuming time will be impacted by many factors, such as the collection database size, server queue wait, and service events, etc.
Thus we cannot guarantee that ProductionRun migration should also be completed within 12 hours as your mentioned DryRun.
In addition, you could monitor Azure DevOps Status history page, to know the latest service report. If currently there is service event in region of your target organization, the migration would be delayed. Also if there is service deployment in the same region, the migration would be delayed.
BTW, supposed that you encounter unexpected failed migration, you could follow this doc: Migrate data from Azure DevOps Server to Azure DevOps Services to migrate to Azure DevOps again. Also you could file a new ticket here and provide the failed import identifier, the support engineer will contact the product group to check it in the backend and investigate the detailed failed logs.

How to auto retry deployments to agents when they come online again (after having been offline)

When using Azure pipelines and deployment groups it is possible to re-deploy the "last successful" release to new agents with given "tags" using the instructions found here:
https://learn.microsoft.com/en-us/azure/devops/release-notes/2018/jul-10-vsts#automatically-deploy-to-new-targets-in-a-deployment-group
My issue is when releasing to a deployment group consisting of 3 machines. 2 are online and 1 is periodically offline. In this situation my release fails when the 1 machine is offline. This would be OK by me if Azure pipelines retried the deployment when machine offline comes back online. I thought this would work in the same way as "new targets", but I still haven't figured out how.
This is just a small test. When going in production my deployment group will consist of hundreds of machines and not all of them will be online at the same time.
So - Is it possible to automate the process to ensure all machines eventually will be up to date when all of them have been online?
Octopus-deploy seems to have this feature
https://help.octopusdeploy.com/discussions/questions/9351-possibility-to-deploy-when-agent-become-online
https://octopus.com/docs/deployment-patterns/elastic-and-transient-environments/deploying-to-transient-targets
Status after failed deployment
(and target is online again)
Well, in general the queued deployments will be automatically triggered once the agent is online. But for the failed deployments you have to re-deploy them manually. No any way to retry it automatically when the agent is online again...
Based on my test, to redeploy to all "not-updated-agents", you have to remove the other target machines which passed in previous deployment from deployment group...

Cannot create Service Fabric Cluster in Azure Portal

I can't start creating service fabric cluster.
When starting creation portal always shows "Rainy Cloud" and nothing can be inserted?
Thanks for reporting this, we found a problem in the portal that may be causing this. We'll be rolling out a fix in the next few days.
BTW, we have a repo on GitHub that you can use to report issues like this for a faster response: https://github.com/Azure/service-fabric-issues