How to fix the data factory v2 adf_publish branch being out of sync with the master branch in azure devops - azure-devops

Recently I ran into the issue with not being able to publish in azure data factory integrated with azure devops/git. This happened because we tried using powershell to automatically create pipelines based on a json template. When this is done in the data factory using Set-AzDataFactoryV2Pipeline, you by-pass the azure devops integration and the pipeline gets published right away without any commits or pull requests. Below is the error message
Publishing Error
The publish branch is out of sync with the collaboration branch. This is likely due to publishing outside of Git mode. To recover from this state, please refer to our Git troubleshooting guide

The MS GIT troubleshooting guide suggests some hardcore measures to resolve this out-of-sync issues (by deleting and re-creating the repo I believe). In this case, there's an easier and less hardcore way of solving this.
You simply need to:
Create a new branch from your master branch in data factory
Create the same pipeline you created via Set-AzDataFactoryV2Pipeline
Create a pull request and merge it into master
Voila, you'll hopefully be able to publish again as it now will consider the branches to be in sync again

Micosoft now provides guidance on resolving this issue:
From: https://learn.microsoft.com/en-us/azure/data-factory/source-control#stale-publish-branch
Stale publish branch
If the publish branch is out of sync with the
master branch and contains out-of-date resources despite a recent
publish, try following these steps:
Remove your current Git repository
Reconfigure Git with the same
settings, but make sure Import existing Data Factory resources to
repository is selected and choose New branch
Create a pull request to
merge the changes to the collaboration branch

remove your git repo from data factory and create a new with exact same setting.
Go to azure devops and create a new pull request to merge new branch into master.
Link: https://www.datastackpros.com/2020/05/how-to-fix-data-factory-adfpublish.html

under manage -> git configuration -> over write live mode. Use this option this will reset the data factory with the live code.

Related

How can I check if all changes are merged with Azure Devops Api?

Using Azure Devops Api I need to check whether for given Pull Request all changes have been already merged into the target branch.
I can retrieve this info via browser:
Please take note that the conflict info may be outdated due to Microsoft's approach to PRs.
I can't find an answer to this question within json data from Devops Api.
Anyone has a clue?
If a PR have completed merging changes to the target branch, normally there is a new commit generated to the target branch with the commit message like as the format "Merged PR {PR Number}: {PR Title}" by default. You can find it from the commit history of target branch.
Then you can use the Azure DevOps REST API "Commits - Get Changes" to get all the changes for the new commit on the target branch. They also are all the changes merged from the PR.
Apparently this should be done via request to PullRequestCommits API. If the answer is empty, then there are no changes to merge in this particular pull request.

Azure Data Factory Deployment changes not reflecting after integrate with git repository

I have two instances of azure data factory. One is PROD and another is DEV.
I have my DEV ADF integrated to git repository and will be doing all developments in this adf instance.
Once the code is ready for production deployment, will follow CI/CD steps to deploy the DEV ADF into PROD.
This functionality is working fine.
Now recently I had few changes in my PROD ADF instance by upgrading the ADLS Gen1 to Gen2 and few alterations on pipelines also. These changes has been directly updated in PROD instance of ADF.
Now I have to deploy these changes in DEV instance in order to make both instances in sync, before proceeding with further developments.
In order to achieve this i have followed below steps.
Remove git integration of DEV ADF instance.
Integrate PROD ADF into a new git repository and do a publish
Build Pipelines and Release pipelines has been executed and deployed PROD into DEV
I could see the changes in both PROD and DEV are in sync.
Now i want to re integrate the DEV ADF in order to proceed with further developments
When I re integrate the DEV ADF into the collaboration branch (master) of existing dev instance repository as shown below, I could see the discrepancies in pipeline count and linked service count.
The pipelines and linked services which are deleted from PROD is still there in DEV ADF master branch.
When I remove the git integration of DEV ADF, now both DEV and PROD ADF are in sync.
I tried to integrate the DEV ADF into a new branch of same dev repository as shown below,
Still I could see the deleted pipelines and linked services which are deleted from production is also available in the dev adf.
It seems like the pipelines and linked services which are changed are getting updated, but the items deleted are not removed from the dev master repository.
Is there any way to cleanup master branch and import only the existing resources at the time of git re-integration?
The only possible way i could found is to create new repository instead of re integrating to the existing one, but it seems like difficult to keep on changing repository and also already created branches and changes in the existing repository will be lost.
Is there any way when I re-integrate the repository with ADF, it should take only the existing resources into master branch of repository, not merging with the existing code in master?
These things happen. ADF Git integrations are a bit different, so there's a learning curve to getting a hold of them. I've been there. Never fear. There is a solution.
There are two things to address here:
Fixing your process so this doesn't happen again.
Fixing the current problem.
The first place you went wrong was making changes directly in PRD. You should have made these changes in DEV and promoted according to standard process.
The next places you went wrong were removing DEV from Git and then adding PRD to Git. PRD should not be connected to Git at any point, and you shouldn't be juggling Git integrations. It's dangerous and can lead to lost work.
Ensure that you do not repeat these mistakes, and you will prevent complicating things like this going forward.
In order to fix the current issues it's worth pointing out that with ADF Git integrations, you don't have to use the ADF editor for everything. You are totally able to manipulate Git repos cloned to your local file system with standard Git tools, and this is going to be the key to digging yourself out. (It's also what you should have done in the first place to retrofit PRD changes back into DEV.)
Basically, if your PRD master contains the objects as you want them, then first clone that branch to your local file system. Elsewhere on your drive, clone a feature branch of your DEV repo to the file system. In order to bring these in sync, you just copy the PRD master contents and paste them into the DEV feature branch directory and push changes. Now, this DEV feature branch matches PRD master. A merge and pull request from this DEV feature branch to DEV master will then bring DEV master in sync with PRD master (assuming the merge is done correctly).
Even when not having to do things like this, it can be helpful to have your ADF Git repo cloned locally so you have specific control over things. There are times when ADF orphans objects, and you can clean them up via the file system and Git tools without having to wrestle the ADF editor as such.

Azure datafactory deployment automation from multiple branches

I want to create automated deployment pipeline for azure datafactory.
For one stream of development we can configure it using doc
https://learn.microsoft.com/en-us/azure/data-factory/continuous-integration-deployment
But when it comes to deploying to two diff test datafactories for parrallel features development (in two different branches), it is not working because the adb_publish which gets generated is only specific to the one datafactory.
Currently we are doing deployement using powershell scripts and passing object list which needs to be deployed.
Our repo is in Azure devops.
I tried
linking the repo to multiple df but then it is causing issue, perhaps when finding deltas to publish.
Creating forks of repo instead of branches so that adb_publish can be seperate for the every datafactory - but this approach will not work when there is a conflict, which needs manual merge, so the testing will be required again instead of moving to prod.
Adf_publish get generated whenever you publish. Publishing takes whatever you have in your repo and updates data factory with it.
To develop multiple features in parallel, you need to just use "Save". Save will commit your changes to the branch you are actually working on. Other branches will do the same. Whenever you want to publish, you need to first make a pull request from your branch to master, then publish. Any merge conflict should be solved when merging everything in the master branch. Then just publish and there shouldn't be any conflicts, and adf_publish will get generated after that.
Hope this helped!
Since a GitHub repository can be associated with only one data factory. And you are only allowed to publish to the Data Factory service from your collaboration branch. Check this
It seems there is not a direct and easy way to accomplish this. If forking repo as workaround, you may have to solve the conflicts before merging as #Martin suggested.

VS Team Services trigger option on build pipeline for external git repository

I have a unauthenticated (in vsts) external git repository that works in an local agent pool (who is auth) that is external too. Our build pipepline is the connection with our certificates that does this solution work.
The problem here is that there is no possibility to trigger automatically when someone push some changes on the master branch on this external unauthenticated (in vsts) git repository.
For this trigger option, there is only this configuration:
But after this, if someone push an commit, nothing happen.
Is there a limitation? Any configuration that I need to get this working? At now, it's always done manually to stat the build pipeline.
For the continuous integration with External Git, VSTS will connect to your external repository and check if there is new changes by the polling interval time you set. So you must configure the authentication in VSTS so that VSTS can access to your Git Repository to query new changes. When you configure the external git repository, it should ask you to provide user and password/token, enter the auth information there.

Integrate git merge to master as final step in AWS Codepipeline

We are using GitHub as our source repository, AWS CodeBuild to compile the code from GitHub, Elastic Beanstalk to host environments and CodePipeline to trigger a build on commit and to deploy the code to different environments, with production being the final environment.
What I would like to add as a final step to CodePipeline is a merge back to master after a build has been deployed to production. I did a brief search on google but could not find any good references for how to initiate a git merge.
Does anybody have any experience with triggering a merge from CodePipeline?
Currently there isn't built-in support for merging.
Today most users run their pipeline on master, and merge into that before the code enters their pipeline. One advantage of this approach is that it ensures your pipeline is run on the exact merged version on mainline, rather than a pre-merge version.
However, we're aware that some workflows like a pull-request based workflow would benefit from being able to merge at the end of a pipeline.
The best workaround today is to use a Lambda function, custom action, or CodeBuild step to perform the merge.