Azure Synapse Notebook Folders Structure not saved in GitHub - github

My Synapse Workspace is configured with GitHub. The code is organized in folders under "NoteBook". Example: Under Notebook, Dev1 folder contains notebook1 and notebook2. Dev2 folder contains notebook3 and notebook4
When Synapse Publishes, the GitHub Repo does not maintain the folder structure. All 4 files are under "repo_name/notebook/ notebook1, notebook2,notebook3, notebook4
How can do I configure Synapse GitHub to keep the folder structure?

The short answer is you don't. When you save a notebook (or SQL script, or anything else for that matter), you are actually saving a JSON representation of the asset. The "folders" in Synapse are not actually folders, but rather properties in the JSON:
This is why file names need to be globally unique, so you can't have two "Notebook1" files in different folders. Again, same for SQL scripts, datasets, pipelines, etc.

Related

How to copy many csv files using Synapse Pipelines from an online source with the date in the file name?

There is this git repository publicly available. It's being refreshed daily. There are several csv files with the structure like "DA-01-12-2022", "DA-02-12-2022", "DA-03-12-2022" and so on. The date is in the file name. It's also in the githublink, so I can copy one file without problem but since there are many CSV files in the git folder how can I use Synapse pipelines to copy all the files in the git repository to a storage in azure. I feel like I have to use loops but how can I tell it to use the date?
Thanks and best regards!
You can use a copy activity to load all csv and store in one parquet files
You can also use ITER activity to make loop with date

Organize Azure Data Factory Objects in Git folders

Azure Data Factory supports organizing it's objects (Pipelines, datasets etc) into Folders with in the ADF Studio.
When the ADF is connected to a Git repository all the ADF Objects are moved into git based on the AD Component Type.
Below is how the Pipelines are organized with in the ADF Studio
After saving the changes, this is how the files are structure in the git
If I organize the P1-Pipeline.json and P2.Pipeline.json into separate folders with in git then the ADF Studio doesn't recognize them.
It is really important for us to organize the Pipelines (and other ADF Components) into folders within git as each Folder represent a separate team.
Is there a possibility to organize the ADF components into git folders? If not, what is the recommended approach for selective deployment of the ADF components
If I organize the P1-Pipeline.json and P2.Pipeline.json into separate folders with in git then the ADF Studio doesn't recognize them.
No, you shouldn't change the location of the pipeline.json file to another folder in Git Repo. You should create any pipeline in a separate folder and if want to move it to another folder, move it in ADF Studio only.
By clicking on the three dots in front of the pipeline name, click on Move item option and then choose the folder in which you want to move the pipeline.
In the below example I have moved the pipeline1 from project1 folder to project2 folder.
And I can still see both the ARM files of these pipelines in Azure Repo.
If you want to keep each pipeline in a separate folder, it’s better to create ARM template for each pipeline by clicking on Save as template option, refer below image.
Your each ARM template as json will be stored in separate sub-folder by pipeline name under main templates folder as shown in below image.

How to exclude files present in the mapped directory when Publishing the artifacts in Azure CI/CD?

I am new to Azure CICD pipelines and I am trying to export the CRM solutions using Build pipeline in azure devops using Power Platform Task. There is a requirement to keep the exported solution from build pipeline to Azure repos. (which I am doing it from command line using tf vc)
I am able to export the solution successfully but the issue is when I publish the artifacts it publishes every file present in the mapped folder. (mapped a directory in Azure repos where all the solution backups are kept)
I see that azure agents copies all the files present in the mapped directory and stores in agent directory. The problem is the mapped directory contains all the backup files of CRM Solutions. I found some articles where it was mentioned to cloak the directory so that the files will not be included in azure agent. But if I cloak the directory then I am not able to check-in the exported solution from command line.
So, I was wondering if there is any way to exclude all files present in the mapped directory and still able to check-in the exported file to that directory using command line.
You can use a .artifactignore file to filter out paths of files that you don't wish to be published as part of the process.
Documentation can be found here

Running a databricks notebook connected to git via ADF independent from git username

In our company for orchestrating of running Databricks notebooks, experimentally we learned to connect our notebooks (affiliated to a git repository) to ADF pipelines, however, there is an issue.
As you can see in the photo attached to this question path to the notebook depends on the employee username, which is not a stable solution at production.
What is/are the solution(s) to solve it?.
update: The main issue is keeping employee username out of production to avoid any future failure. Either in path of ADF or secondary storage place which can be read by lookup but still sitting production side.
Path selection in ADF:
If you want to avoid having the username in the path, then you can just create a folder inside Repos, and do checkout there (here is full instruction):
In the Repos, in the top-level part, click on the ᐯ near the "Repos" header, select "Create" and select "Folder". Give it some name, like, "Staging":
Create a repository inside that folder
Click on the ᐯ near the "Staging" folder, and click "Create" and select "Repo":
After that you can navigate to that repository in the ADF UI.
It's also recommended to set permissions on the folder, so only specific people can update projects inside it.
You can use Azure DevOps source control to manage the developer and production Databrick Notebooks or other related codes/scripts/documents in Git. Learn more here.
Keep your Notebooks in logical distributed repositories in Github and use the same path in your Azure Data Factory in Notebook activity.
If you want to pass the dynamic path in Notebook activity, you should have placeholder of the notebook file paths lists something like a text/csv file or a SQL table where all the notebooks paths will be available.
Then use the Lookup activity in the ADF to get the list of those paths and pass the lookup output to a ForEach activity and have a Notebook activity inside ForEach and pass the path (for each iteration) to the parameters. This way you can avoid hardcoded field path in the pipeline.

How can I copy just new and changed files with an Azure Devops pipeline?

I have a large (lots of dependencies, thousands of files) nodejs app that I am deploying with an Azure Devops YAML build and Azure Devops "classic editor" release pipeline.
Is there some way to copy JUST new and changed files during a file copy, not every single file? My goal is to reduce the time it takes to complete the copy files step of the deploy, as I deploy frequently, but usually with just changes to one or a few files.
About copying only the changed files into artifacts for releasing, if the changed files are in a specific folder , you can copy files in the specified folder by specifying SourceFolder and Contents arguments in the Copy Files task.
If the changed files are distributed in different folders, I am afraid that there is no out-of-the-box method to only pick the changed files when using copy file or PublishArtifacts task.
As workaround, we could add powershell task to deletes all files (recursive) which have timestamp that is < (Now - x min), with this way Artifact directory contains of ONLY CHANGED files. More detailed info please refer this similar case.
Alternatively, you can call Commits-Get Changes rest api through a script in the powershell task, and then retrieve the changed files in the response and then copy them to specific target folder.
GET https://dev.azure.com/{organization}/{project}/_apis/git/repositories/{repositoryId}/commits/{commitId}/changes?api-version=5.0