Azure datafactory deployment automation from multiple branches - azure-devops

I want to create automated deployment pipeline for azure datafactory.
For one stream of development we can configure it using doc
https://learn.microsoft.com/en-us/azure/data-factory/continuous-integration-deployment
But when it comes to deploying to two diff test datafactories for parrallel features development (in two different branches), it is not working because the adb_publish which gets generated is only specific to the one datafactory.
Currently we are doing deployement using powershell scripts and passing object list which needs to be deployed.
Our repo is in Azure devops.
I tried
linking the repo to multiple df but then it is causing issue, perhaps when finding deltas to publish.
Creating forks of repo instead of branches so that adb_publish can be seperate for the every datafactory - but this approach will not work when there is a conflict, which needs manual merge, so the testing will be required again instead of moving to prod.

Adf_publish get generated whenever you publish. Publishing takes whatever you have in your repo and updates data factory with it.
To develop multiple features in parallel, you need to just use "Save". Save will commit your changes to the branch you are actually working on. Other branches will do the same. Whenever you want to publish, you need to first make a pull request from your branch to master, then publish. Any merge conflict should be solved when merging everything in the master branch. Then just publish and there shouldn't be any conflicts, and adf_publish will get generated after that.
Hope this helped!

Since a GitHub repository can be associated with only one data factory. And you are only allowed to publish to the Data Factory service from your collaboration branch. Check this
It seems there is not a direct and easy way to accomplish this. If forking repo as workaround, you may have to solve the conflicts before merging as #Martin suggested.

Related

Azure Data Factory Deployment changes not reflecting after integrate with git repository

I have two instances of azure data factory. One is PROD and another is DEV.
I have my DEV ADF integrated to git repository and will be doing all developments in this adf instance.
Once the code is ready for production deployment, will follow CI/CD steps to deploy the DEV ADF into PROD.
This functionality is working fine.
Now recently I had few changes in my PROD ADF instance by upgrading the ADLS Gen1 to Gen2 and few alterations on pipelines also. These changes has been directly updated in PROD instance of ADF.
Now I have to deploy these changes in DEV instance in order to make both instances in sync, before proceeding with further developments.
In order to achieve this i have followed below steps.
Remove git integration of DEV ADF instance.
Integrate PROD ADF into a new git repository and do a publish
Build Pipelines and Release pipelines has been executed and deployed PROD into DEV
I could see the changes in both PROD and DEV are in sync.
Now i want to re integrate the DEV ADF in order to proceed with further developments
When I re integrate the DEV ADF into the collaboration branch (master) of existing dev instance repository as shown below, I could see the discrepancies in pipeline count and linked service count.
The pipelines and linked services which are deleted from PROD is still there in DEV ADF master branch.
When I remove the git integration of DEV ADF, now both DEV and PROD ADF are in sync.
I tried to integrate the DEV ADF into a new branch of same dev repository as shown below,
Still I could see the deleted pipelines and linked services which are deleted from production is also available in the dev adf.
It seems like the pipelines and linked services which are changed are getting updated, but the items deleted are not removed from the dev master repository.
Is there any way to cleanup master branch and import only the existing resources at the time of git re-integration?
The only possible way i could found is to create new repository instead of re integrating to the existing one, but it seems like difficult to keep on changing repository and also already created branches and changes in the existing repository will be lost.
Is there any way when I re-integrate the repository with ADF, it should take only the existing resources into master branch of repository, not merging with the existing code in master?
These things happen. ADF Git integrations are a bit different, so there's a learning curve to getting a hold of them. I've been there. Never fear. There is a solution.
There are two things to address here:
Fixing your process so this doesn't happen again.
Fixing the current problem.
The first place you went wrong was making changes directly in PRD. You should have made these changes in DEV and promoted according to standard process.
The next places you went wrong were removing DEV from Git and then adding PRD to Git. PRD should not be connected to Git at any point, and you shouldn't be juggling Git integrations. It's dangerous and can lead to lost work.
Ensure that you do not repeat these mistakes, and you will prevent complicating things like this going forward.
In order to fix the current issues it's worth pointing out that with ADF Git integrations, you don't have to use the ADF editor for everything. You are totally able to manipulate Git repos cloned to your local file system with standard Git tools, and this is going to be the key to digging yourself out. (It's also what you should have done in the first place to retrofit PRD changes back into DEV.)
Basically, if your PRD master contains the objects as you want them, then first clone that branch to your local file system. Elsewhere on your drive, clone a feature branch of your DEV repo to the file system. In order to bring these in sync, you just copy the PRD master contents and paste them into the DEV feature branch directory and push changes. Now, this DEV feature branch matches PRD master. A merge and pull request from this DEV feature branch to DEV master will then bring DEV master in sync with PRD master (assuming the merge is done correctly).
Even when not having to do things like this, it can be helpful to have your ADF Git repo cloned locally so you have specific control over things. There are times when ADF orphans objects, and you can clean them up via the file system and Git tools without having to wrestle the ADF editor as such.

Azure DevOps build from dynamic repo name

Anybody know if it is possible to pass in a repo name / base the build on a dynamic repo name? This would allow us to share the same build definition across different branches, cutting down on definitions when creating a feature branch, etc.
When using a TFVC repo we would store the different releases in the same repo but different paths. We could reuse the same build definition across different releases/FB's by altering the source path such as $/product/$(release)/......
It appears Git likes to have the repo hard-coded into the build (hence the dropdown - no way to plug in a variable.
While the question is targeted to On-prem Azure DevOps, if it is possible in the hosted environment it would be helpful to know.
I recommend using YAML build templates. By default these check out "self" and are stored in the repo. That way they work on forks, branches etc. Each branch can contain tweaks to the build process as well.
With the 'old' UI based builds this isn't possible.
What you are looking for is actually two things:
templates - this allows you reuse definition accross different pipelines
triggers - this allows you to trigger pipeline when commit happens on different branches
Looks like Task Groups solved the need (mostly). I was hoping to have one build definition that could be shared across multiple branches; while this appears to be possible on the hosted model, on prem is different.
I am able to clone a build (or use templates) to have an entry point into the repo/branch to get the sources, then pass off the work to a common task group. If I need to modify the build process for multiple branches, just modify the task group.

Avoid rebuilding artifacts in Jenkins multibranch pipelines

TL;DR
How do I avoid rebuilding artifacts on master when a feature is merged without creating multiple pipelines per project? Where do I access the information about which branch was merged?
More Info
I run Jenkins to build many projects stored in two different VCSs (Gitlab, Bitbucket). Auto-discovery for both VCSs work and create multi-branch pipelines for every project/branch/PR containing a Jenkinsfile (Gitlab Branch Source Plugin, Bitbucket Branch Source Plugin).
Build artifacts get produced and stored on every build (e.g. docker images pushed to registry).
As I follow a feature branch workflow, these features get eventually merged into master, master will then be deployed in irregular intervals.
When doing the merge, there is an artifact already built and stored for this code(see appendix:1). It was built for the feature branch the code originated from (e.g. container mysuperapp:feat-add-better-things-3). I would like to take this artifact and promote it as the new master artifact (e.g. mysuperapp:master), avoiding a rebuild (and unit + integration testing everything).
But, merging a feature branch just kicks off a new build pipeline on branch master without any information about the merged branch (see appendix:2). This is correct behavior concerning master (new commit(s) where pushed) but prevents me from reacting to the merged branch (e.g. the aforementioned promoting or even just deleting unused artifacts). Is there any way to get the information, which branch was merged?
I am aware, that I can create a new pipeline listening for PR webhooks from my VCSs, running a pipeline to do the promotion and ignore builds on master completely. But this moves visibility of this process to a different pipeline and requires additional pipelines for projects, e.g. reducing the advantage of auto-discovery to 50% (have to create these merge pipelines for each project).
How can I keep the advantages of auto-discovery and visibility of executed steps while also executing something on a merge?
Ideas: Tag artifacts differently, but how (needs to be able to clean up correctly)? Parameterize pipelines and setup a single merge pipeline which re-triggers the pipeline 'push on master' with parameters of the merged branch. But can this be done without having to setup the webhooks for every project? Ask the VCSs via REST about which branch belonged to a commit?
Greets and thanks for the help you all! This may be a complicated one, but it would be so cool to get this to work. It's the last barrier for me to enable continuos delivery for a lot of projects!
Appendix:
1: I am also aware, that to have consistent builds, I have to enforce --ff-only merges. This question is not about the pitfalls of git but rather about the way to go with Jenkins.
2: Git provides me with the parent commits, I can easily find out, which commit was merged. But, especially using "Delete branch after merge", leaves me without the branch ref in git. Tagging my docker images with commits instead of branches leaves me with backtracking the last commit on each build to delete the old, obsolete build.

How to fix the data factory v2 adf_publish branch being out of sync with the master branch in azure devops

Recently I ran into the issue with not being able to publish in azure data factory integrated with azure devops/git. This happened because we tried using powershell to automatically create pipelines based on a json template. When this is done in the data factory using Set-AzDataFactoryV2Pipeline, you by-pass the azure devops integration and the pipeline gets published right away without any commits or pull requests. Below is the error message
Publishing Error
The publish branch is out of sync with the collaboration branch. This is likely due to publishing outside of Git mode. To recover from this state, please refer to our Git troubleshooting guide
The MS GIT troubleshooting guide suggests some hardcore measures to resolve this out-of-sync issues (by deleting and re-creating the repo I believe). In this case, there's an easier and less hardcore way of solving this.
You simply need to:
Create a new branch from your master branch in data factory
Create the same pipeline you created via Set-AzDataFactoryV2Pipeline
Create a pull request and merge it into master
Voila, you'll hopefully be able to publish again as it now will consider the branches to be in sync again
Micosoft now provides guidance on resolving this issue:
From: https://learn.microsoft.com/en-us/azure/data-factory/source-control#stale-publish-branch
Stale publish branch
If the publish branch is out of sync with the
master branch and contains out-of-date resources despite a recent
publish, try following these steps:
Remove your current Git repository
Reconfigure Git with the same
settings, but make sure Import existing Data Factory resources to
repository is selected and choose New branch
Create a pull request to
merge the changes to the collaboration branch
remove your git repo from data factory and create a new with exact same setting.
Go to azure devops and create a new pull request to merge new branch into master.
Link: https://www.datastackpros.com/2020/05/how-to-fix-data-factory-adfpublish.html
under manage -> git configuration -> over write live mode. Use this option this will reset the data factory with the live code.

VSTS Filter by repository folder?

I'm using Visual Studio Team Services to build my project which is stored in GitHub (here). The master branch contains multiple projects which make up the solution. Amongst those are a WebAPI project and a Cordova project. I need to build those using two separate build definitions in VSTS.
Previously I had set-up my build definition and used the branch filters to filter on what had been pushed to the repo. For instance:
master/src/API
This worked, but it doesn't any more. It seems as if the underlying code has changed. A filter of 'master' still works and I understand how this feature is probably meant to filter specifically on branches and maybe not on folders within the branch?
It's not a huge problem, but at this time all of my builds will trigger with every check-in, even if nothing changed in the meantime for that source code. So I'm not wondering what a good solution for this issue would be:
Put every project in it's own branch. Seems like a workaround
Some other filter option or maybe another syntax or something?
Leave it as it and don't worry about the extra builds (but that itches, you know...)
Anyone running a similar set-up?
Path filters is not supported for VSTS GitHub CI Build, it is available for Git CI Build on VSTS. You can vote this user voice: https://visualstudio.uservoice.com/forums/330519-team-services/suggestions/15140571-enable-continuous-integration-path-filters-for-git
The workaround is as you said that put every project in its own branch.