Is there any guideline how track changes to an ADF pipeline source code in a Git repository? - azure-data-factory

I am developing have a few relatively complex ADF pipelines and I like to track my changes.
Traditionally with programming languages I keep my code in a git repository and track changes using branches.
How can I do the same for ADF pipelines? What is the recommended directory structure for ADF code?

Source control of ADF should be what you are looking for. It can track the changes in underlying JSON using git.
It also provides some useful features when you integrate with a Github / Azure DevOps Repo:
"auto" save/commit; every time when you save your work, the change is created as git commit as incremental change and you can easily revert it. In this way, you do not have to publish your changes to persist your changes(In some case like developing a new feature on a feature branch, publishing may not an option)
You can leverage branch protection in Github / Azure DevOps Repo to perform code review, code merge... before publishing the ADF code
But to be honest, from my personal experience, it is not very useful for diff as the JSON does not have any linebreaks and it is hard to find the diff between versions.

Related

Delete TFVC Azure Devops Branch API

I'm looking and can't find any TFVC API's, it's all for GIT. I'm needing to delete a branch and then create a new branch from an existing branch in my TFVC repo. Therefore, I'm looking for an API\PowerShell that will accomplish this. I'm wanting an API so that I can add this to my pipeline to perform automatically. PLEASE HELP
Thank you,
There are limited TFVC REST APIs. Branch management is not available via REST.
TFVC is considered feature complete and is not receiving additional investment from Microsoft.
You can use the tf CLI to manage branches, or use the old .NET SOAP client libraries.

Is there a way to manage development of Azure Data Factory using GIT Flow?

Is there a way to manage an azure data factory dev environment with azure DevOps Git integration with GitFlow branching model like https://nvie.com/posts/a-successful-git-branching-model/
Especially how to deal with hot fixes??
If you're writing explicitly versioned software or need to support many versions of your product in the field, git-flow is the way to go.
At this point we could see an Azure Repos Git organization can have numerous repositories, but each repository can only relate to one data factory hence, you cannot maintain multiple versions of ADF in single repository or branch.
If your intention is to maintain multiple versions of ADF on single repo/ master branch using GIT flow, then it is not yet possible. Source control - Azure Data Factory | Microsoft Docs

Azure Data Factory Deployment changes not reflecting after integrate with git repository

I have two instances of azure data factory. One is PROD and another is DEV.
I have my DEV ADF integrated to git repository and will be doing all developments in this adf instance.
Once the code is ready for production deployment, will follow CI/CD steps to deploy the DEV ADF into PROD.
This functionality is working fine.
Now recently I had few changes in my PROD ADF instance by upgrading the ADLS Gen1 to Gen2 and few alterations on pipelines also. These changes has been directly updated in PROD instance of ADF.
Now I have to deploy these changes in DEV instance in order to make both instances in sync, before proceeding with further developments.
In order to achieve this i have followed below steps.
Remove git integration of DEV ADF instance.
Integrate PROD ADF into a new git repository and do a publish
Build Pipelines and Release pipelines has been executed and deployed PROD into DEV
I could see the changes in both PROD and DEV are in sync.
Now i want to re integrate the DEV ADF in order to proceed with further developments
When I re integrate the DEV ADF into the collaboration branch (master) of existing dev instance repository as shown below, I could see the discrepancies in pipeline count and linked service count.
The pipelines and linked services which are deleted from PROD is still there in DEV ADF master branch.
When I remove the git integration of DEV ADF, now both DEV and PROD ADF are in sync.
I tried to integrate the DEV ADF into a new branch of same dev repository as shown below,
Still I could see the deleted pipelines and linked services which are deleted from production is also available in the dev adf.
It seems like the pipelines and linked services which are changed are getting updated, but the items deleted are not removed from the dev master repository.
Is there any way to cleanup master branch and import only the existing resources at the time of git re-integration?
The only possible way i could found is to create new repository instead of re integrating to the existing one, but it seems like difficult to keep on changing repository and also already created branches and changes in the existing repository will be lost.
Is there any way when I re-integrate the repository with ADF, it should take only the existing resources into master branch of repository, not merging with the existing code in master?
These things happen. ADF Git integrations are a bit different, so there's a learning curve to getting a hold of them. I've been there. Never fear. There is a solution.
There are two things to address here:
Fixing your process so this doesn't happen again.
Fixing the current problem.
The first place you went wrong was making changes directly in PRD. You should have made these changes in DEV and promoted according to standard process.
The next places you went wrong were removing DEV from Git and then adding PRD to Git. PRD should not be connected to Git at any point, and you shouldn't be juggling Git integrations. It's dangerous and can lead to lost work.
Ensure that you do not repeat these mistakes, and you will prevent complicating things like this going forward.
In order to fix the current issues it's worth pointing out that with ADF Git integrations, you don't have to use the ADF editor for everything. You are totally able to manipulate Git repos cloned to your local file system with standard Git tools, and this is going to be the key to digging yourself out. (It's also what you should have done in the first place to retrofit PRD changes back into DEV.)
Basically, if your PRD master contains the objects as you want them, then first clone that branch to your local file system. Elsewhere on your drive, clone a feature branch of your DEV repo to the file system. In order to bring these in sync, you just copy the PRD master contents and paste them into the DEV feature branch directory and push changes. Now, this DEV feature branch matches PRD master. A merge and pull request from this DEV feature branch to DEV master will then bring DEV master in sync with PRD master (assuming the merge is done correctly).
Even when not having to do things like this, it can be helpful to have your ADF Git repo cloned locally so you have specific control over things. There are times when ADF orphans objects, and you can clean them up via the file system and Git tools without having to wrestle the ADF editor as such.

Azure DevOps & copying code base from one project to another or finding a better way of doing this

I'm seeking advice on the following:
In my development shop we support a SASS solution to our customers. We currently have 10 sites that we develop and provide technical support. We're a small team, just 2 of us. We're using Azure DevOps services to host and manage our code, right now we're just using it for a code repo. Within our organization, we multiple projects that represent site. Each site uses the same code base, except the web.config. The web.config is used to change the UI\theme for each customer. When we get a request to create a new site, we first create a new project site and then we copy our code base from the "golden copy" project.
We use a "golden copy" code base to make feature changes and bug fixes. Once we develop a new feature (or fix an issue) to the golden copy, and then we push it to test, QA beings testing. If testing is successful, then the development team copies the entire "golden copy" code files and copies the code to each site project, build and deploy to test for QA to ensure that site works with the new changes . This can be time consuming and prone to errors.
I would like to know the following:
- Is there way in dev ops azure where we merge\copy from our golden
copy to our other site project's repos?
- Can you offer a better way for reorganizing our Organization\Projects
setup based on our current setup\workflow.
Thank you,
As Shayki mentioned, you can consider adopting Git branching strategy. Distributed version control systems like Git give you flexibility in how you use version control to share and manage code.
Keep your branch strategy simple. Build your strategy from these three concepts:
Use feature branches for all new features and bug fixes.
Merge feature branches into the master branch using pull requests.
Keep a high quality, up-to-date master branch.
A strategy that extends these concepts and avoids contradictions will result in a version control workflow for your team that is consistent and easy to follow. For details ,please refer to this official document.
Is there way in dev ops azure where we merge\copy from our golden copy
to our other site project's repos?
For this issue , do you refer to synchronize the changes on the golden copy to other projects' repos? If so, I think it can only be done manually(copy the entire "golden copy" code files to each site project) or clone the entire repo into other projects through the following steps.
In other projects, select the Import repository option:

Azure datafactory deployment automation from multiple branches

I want to create automated deployment pipeline for azure datafactory.
For one stream of development we can configure it using doc
https://learn.microsoft.com/en-us/azure/data-factory/continuous-integration-deployment
But when it comes to deploying to two diff test datafactories for parrallel features development (in two different branches), it is not working because the adb_publish which gets generated is only specific to the one datafactory.
Currently we are doing deployement using powershell scripts and passing object list which needs to be deployed.
Our repo is in Azure devops.
I tried
linking the repo to multiple df but then it is causing issue, perhaps when finding deltas to publish.
Creating forks of repo instead of branches so that adb_publish can be seperate for the every datafactory - but this approach will not work when there is a conflict, which needs manual merge, so the testing will be required again instead of moving to prod.
Adf_publish get generated whenever you publish. Publishing takes whatever you have in your repo and updates data factory with it.
To develop multiple features in parallel, you need to just use "Save". Save will commit your changes to the branch you are actually working on. Other branches will do the same. Whenever you want to publish, you need to first make a pull request from your branch to master, then publish. Any merge conflict should be solved when merging everything in the master branch. Then just publish and there shouldn't be any conflicts, and adf_publish will get generated after that.
Hope this helped!
Since a GitHub repository can be associated with only one data factory. And you are only allowed to publish to the Data Factory service from your collaboration branch. Check this
It seems there is not a direct and easy way to accomplish this. If forking repo as workaround, you may have to solve the conflicts before merging as #Martin suggested.