Merge Live view Data Factory into Git repo - azure-data-factory

I have the Power Shell script that, when runs, create few resources (linked services) in my Data factory. The issue I am facing is that when in the portal, I can see the resources created under the "Live View" - not when I select connected git repository.
So, hence the question. Can I deploy few resources to a data Factory in a batch running Power shell script (linked services, data sets, pipelines), and have them be pushed a Git repository (feature branch). Imagine a scenario, as I am given Json files as sources by different team members ... and just want to create them....
I sort of suspect, that I can attempt place all the json sources into certain directory structure on local machine, and push them to Git repo, and then connect that branch to the ADF. But .. I wanted to make sure the power shell script works.

Related

Running a databricks notebook connected to git via ADF independent from git username

In our company for orchestrating of running Databricks notebooks, experimentally we learned to connect our notebooks (affiliated to a git repository) to ADF pipelines, however, there is an issue.
As you can see in the photo attached to this question path to the notebook depends on the employee username, which is not a stable solution at production.
What is/are the solution(s) to solve it?.
update: The main issue is keeping employee username out of production to avoid any future failure. Either in path of ADF or secondary storage place which can be read by lookup but still sitting production side.
Path selection in ADF:
If you want to avoid having the username in the path, then you can just create a folder inside Repos, and do checkout there (here is full instruction):
In the Repos, in the top-level part, click on the ᐯ near the "Repos" header, select "Create" and select "Folder". Give it some name, like, "Staging":
Create a repository inside that folder
Click on the ᐯ near the "Staging" folder, and click "Create" and select "Repo":
After that you can navigate to that repository in the ADF UI.
It's also recommended to set permissions on the folder, so only specific people can update projects inside it.
You can use Azure DevOps source control to manage the developer and production Databrick Notebooks or other related codes/scripts/documents in Git. Learn more here.
Keep your Notebooks in logical distributed repositories in Github and use the same path in your Azure Data Factory in Notebook activity.
If you want to pass the dynamic path in Notebook activity, you should have placeholder of the notebook file paths lists something like a text/csv file or a SQL table where all the notebooks paths will be available.
Then use the Lookup activity in the ADF to get the list of those paths and pass the lookup output to a ForEach activity and have a Notebook activity inside ForEach and pass the path (for each iteration) to the parameters. This way you can avoid hardcoded field path in the pipeline.

How to backup the data on Azure Devops?

I would like to schedule (for my company) a backup of our most important data in Azure DevOps, and that, for different reasons : security, urgent recovery required, virus, migration, etc...
I can execute a backup of the repositories and the Wikis (because it's under GIT, so easy to download), but how can do a backup of the "Board" section (Backlogs, Work items, etc...), and the build pipelines definitions?
How to backup the data on Azure Devops?
In current Azure DevOps, there is no out of the box solution to this. You could manually save the project data through below ways:
Source code and custom build templates: You can download your files
as a zip file. Open ... Repository actions actions for the
repository, file, or folder and choose Download as Zip. You can
also Download from the right side of the screen to download
either all of the files in the currently selected folder, or the
currently selected file.
This process doesn't save any change history or links to other
artifacts.
If you use Git, clone your repositories to retain the full project
history and all the branches.
Build data: To save logs and data in your drop build folders, see
View build results.
Work item tracking data: Create a work item query and open it using
Excel. Save the Excel spreadsheet.
This process doesn't save any attachments, change history, or links
to other artifacts.
build/release defintions: you could export the json file for them and then import them when restoring them.
There has been a related user voice, you could monitor and vote up it: https://developercommunity.visualstudio.com/content/idea/365441/provide-a-backup-service-for-visual-studio-team-se.html.
Here are some tickets(ticket1 ,ticket2) with the same issue you can refer to.
If you want to create scheduled tasks, you can write a script by using the Azure CLI with the Azure Devops Extension
As you said, for the repositories, it's quiet easy as they are Git repositories.
I wrote such a script that we could improve to also backup the Workitems, Backlog, etc...
It's open source, let me know what you would like to backup first and I'll improve it.
Github : azure-devops-repository-backup

Azure Data Factory Deployment changes not reflecting after integrate with git repository

I have two instances of azure data factory. One is PROD and another is DEV.
I have my DEV ADF integrated to git repository and will be doing all developments in this adf instance.
Once the code is ready for production deployment, will follow CI/CD steps to deploy the DEV ADF into PROD.
This functionality is working fine.
Now recently I had few changes in my PROD ADF instance by upgrading the ADLS Gen1 to Gen2 and few alterations on pipelines also. These changes has been directly updated in PROD instance of ADF.
Now I have to deploy these changes in DEV instance in order to make both instances in sync, before proceeding with further developments.
In order to achieve this i have followed below steps.
Remove git integration of DEV ADF instance.
Integrate PROD ADF into a new git repository and do a publish
Build Pipelines and Release pipelines has been executed and deployed PROD into DEV
I could see the changes in both PROD and DEV are in sync.
Now i want to re integrate the DEV ADF in order to proceed with further developments
When I re integrate the DEV ADF into the collaboration branch (master) of existing dev instance repository as shown below, I could see the discrepancies in pipeline count and linked service count.
The pipelines and linked services which are deleted from PROD is still there in DEV ADF master branch.
When I remove the git integration of DEV ADF, now both DEV and PROD ADF are in sync.
I tried to integrate the DEV ADF into a new branch of same dev repository as shown below,
Still I could see the deleted pipelines and linked services which are deleted from production is also available in the dev adf.
It seems like the pipelines and linked services which are changed are getting updated, but the items deleted are not removed from the dev master repository.
Is there any way to cleanup master branch and import only the existing resources at the time of git re-integration?
The only possible way i could found is to create new repository instead of re integrating to the existing one, but it seems like difficult to keep on changing repository and also already created branches and changes in the existing repository will be lost.
Is there any way when I re-integrate the repository with ADF, it should take only the existing resources into master branch of repository, not merging with the existing code in master?
These things happen. ADF Git integrations are a bit different, so there's a learning curve to getting a hold of them. I've been there. Never fear. There is a solution.
There are two things to address here:
Fixing your process so this doesn't happen again.
Fixing the current problem.
The first place you went wrong was making changes directly in PRD. You should have made these changes in DEV and promoted according to standard process.
The next places you went wrong were removing DEV from Git and then adding PRD to Git. PRD should not be connected to Git at any point, and you shouldn't be juggling Git integrations. It's dangerous and can lead to lost work.
Ensure that you do not repeat these mistakes, and you will prevent complicating things like this going forward.
In order to fix the current issues it's worth pointing out that with ADF Git integrations, you don't have to use the ADF editor for everything. You are totally able to manipulate Git repos cloned to your local file system with standard Git tools, and this is going to be the key to digging yourself out. (It's also what you should have done in the first place to retrofit PRD changes back into DEV.)
Basically, if your PRD master contains the objects as you want them, then first clone that branch to your local file system. Elsewhere on your drive, clone a feature branch of your DEV repo to the file system. In order to bring these in sync, you just copy the PRD master contents and paste them into the DEV feature branch directory and push changes. Now, this DEV feature branch matches PRD master. A merge and pull request from this DEV feature branch to DEV master will then bring DEV master in sync with PRD master (assuming the merge is done correctly).
Even when not having to do things like this, it can be helpful to have your ADF Git repo cloned locally so you have specific control over things. There are times when ADF orphans objects, and you can clean them up via the file system and Git tools without having to wrestle the ADF editor as such.

Sending a file to multiple servers

I'm working on a web project(built with the .Net framework) on a remote windows server, and this project is connected to a database my SQL server management studio, now on multiple other remote windows servers exist the same web project linked to the same database, now I change a page's code in my project or add/remove a table or stored procedure in my database, is there a way(or an already existing software) which will my to deploy the changes that I made to all the others(or to choose multiple servers if I don't want to deploy the changes to all of them)?
If it were me, I would stand up a git server somewhere (cloud or local vm), make a branch called something like Prod or Stable, and create a script (powershell if the servers are windows, bash on anything else) on a nightly or hourly job to pull from that branch. Only push to that branch after testing thoroughly. If your code requires compilation, you have the choice to compile once before committing (in which case you're probably going to commit to releases), or on each endpoint after the pull. I would have the script that does the pull also compile and restart the service (only if there was something new in the pull).
You can probably achieve this by following two things :
Create a separate publishing profile for each server.
Use git/vsts branches to keep the code separate. (as suggested by #memtha).
Let's say you have total 6 servers and two branches A and B. So, you'll have to create 6 publishing profiles. Then, you can choose which branch to deploy where. e.g. you can deploy branch B on server 1,3 and 4.
For the codebase you could use Git Hooks.
https://gist.github.com/noelboss/3fe13927025b89757f8fb12e9066f2fa
And for the database, maybe you could use migrations or something similar. You will need to provide more info about your database, do you store your database across multiple servers etc.
If the same web project is connecting to the same database and the database changes, I suspect you would need to update all the web apps to ensure the database changes don't break any of the apps and to keep all the apps updated to prevent any being left behind.
You should look at using Azure Devops to build and deploy your apps and update the database.
If you use Entity Framework, you can run the migrations on startup and have the application update the database when deployed manually or automatically using devops.
To maintain the software updated in multiple server you could use Git with hooks, post-receive hook is what you need.
The idea is to use one server as your Remote Repository and here configure the post-receive hook to update the codebase in the same server and the others.

How do I move an Azure DevOps project to a different organization?

I have got a project in an old org (from VSTS), that I want to move to my new one.
I can't see any options in Azure DevOps on migrating projects, or any information on the interwebs.
Anyone know how to do it?
If you just need to move repos, you can use the built in clone functions:
Go to the Azure Devops source repo -> Files
Click "Clone"
Choose "Generate Git Credentials"
Create the target repo in the target Azure DevOps
Choose "Import a repository"
Use the URL and credentials from Step 3
Done
This is not supported today. But this feature was planned to develop: make it possible to move a Team Project between Team Project Collections
If your Azure Devops project only tracks code versions using a single Git repo, hence no boards, user stories, tasks, pipelines, etc. then you can do the following:
Clone your project repo.
For example with Visual Studio.
You don't need to clone if you already have a local repo.
Destroy the association with the remote.
For this typically, you need to open a command line prompt in the folder that contains the .git database folder, most likely the solution folder of Visual Studio and type git remote rm origin.
Here is an example using git bash showing the content of the solution folder, including the .git database and the *.sln Visual Studio solution file:
Open the solution with Visual Studio if not already done.
It should now show that you have many commits waiting to be pushed to a remote. For illustration purpose, my toy project only have 8 commits in total.
Click the up arrow and choose your new remote, say a brand new Azure DevOps project, in the organization of your choice, then push.
You are now done cloning the project in another organization. If needed, then destroy the project in the old organization to complete the "move" operation.
There are 3 projects that I know of to achieve this.
A paid for option by Ops Hub -
OpsHub Visual Studio Migration Utility
An open source tool that requires making changes to the work item process template - Azure DevOps Migration tools
An lastly an Unofficial but still written by Microsoft tool to create Azure DevOps project templates - Azure DevOps Demo Generator & extractor tool
With the last one (the Demo Generator) you extract the project as a template, then apply it to the new organisation. As it is a tool for demo's there is no support provided and in my experience it works for simple projects but falls over on anything complex.
Expanding on others' answers, this post regards Pipelines.
Azure DevOps API
Migrating nearly all aspects of a project across organizations is doable, but it is a lot of manual work using the Azure DevOps API. The link below shows you all the end points, variables, etc. From there you'll probably want to write a Power Shell script and do a couple test runs to a dummy Organization.
https://learn.microsoft.com/en-us/rest/api/azure/devops/?view=azure-devops-rest-6.1&viewFallbackFrom=azure-devops-rest-6.0
In App options
If you avoid the API, there is no way to migrate pipelines that preserve build or release history, but you can preserve your configurations and processes by going into your Pipelines and selecting View YAML. From here you can either take this away as notes to recreate the GUI steps in your new org/project location, or actually adopt the YAML standard in your git repository.
I do not believe there is a way to migrate pipeline variables outside of the API. However, you can move the variables to Azure Key Vault and change your pipeline settings (YAML) to reference values from key vault. This is not a large amount of effort and is a nice process improvement.
Lastly, if you have any locally installed pipeline agents for releases, you will need to run the Power Shell script for your new organization on the boxes. Very simple 5m step, but right now the Agent Pools are not sharable across organization.
As #Frederic mentioned in his answer, we can actually easily do it with Visual Studio. I have done this without Visual Studio. The steps involved are below.
Add a User to Both Organization
Configure SSH Key
Update the SSH Key in Source DevOps and Clone the Repository
Check out all the Branches and Tags
Update the SSH Key in the Destination DevOps
Remove Old and Add New Origin
Push all the branches
The commands and detailed explanations can be found here.
BTW, if you need to change the entire Devops Organization tied to your personal Tenant (E.g. VS Enterprise Subscription) and move it to new Tenant, you can change the AAD and point it to the new one e.g. your EA Tenant on Azure commercial cloud.
Before you switch your organization directory, make sure the following statements are true:
You're in the Project Collection Administrator group for the
organization.
You're a member or a guest in the source Azure AD and a
member in the destination Azure AD
You have 100 or fewer users in
your source organization. Otherwise you will have to open a support ticket.
You may have to add the users back in destination org if they do not exist becuase they will loose access the moment you switch the AAD.
you could just download as a zip file and then download it to the destination repo