Log the status of the ADF (Azure Data Factory) pipeline run - azure-data-factory

I have an ADF Pipeline with a trigger that runs the pipeline once every day. I would like to find out the status of the pipeline run (latest pipeline run) and log it somewhere (maybe log analytics). How do I do that?

In case if you want to have a log pertaining to only the recent run, then you would have to write a custom logic within your pipeline (like a script activity that would write in a database the status of the pipeline at the end)
sample reference:
https://datasharkx.wordpress.com/2021/08/19/error-logging-and-the-art-of-avoiding-redundant-activities-in-azure-data-factory/
In case if you are comfortable querying a list of logs and filtering out the latest one based on some filter logics, then you can use log analytics or storage blob by enabling diagnostic settings

Related

Is there any alternative way for trigger in Azure Data Factory?

I am trying to run the pipeline in azure data factory without using trigger.
So Is there any service which we can use instead of trigger in Azure?
In ADF, every pipeline will be executed by manual or by trigger.
The alternative way to ADF triggers can be done by following.
By Powershell.
Logic app.
Using REST API.
By Powershell:
Use Invoke-AzureRmDataFactoryV2Pipeline or Invoke-AzDataFactoryV2Pipeline in Powershell.
Command:
Invoke-AzureRmDataFactoryV2Pipeline -ResourceGroupName "RG name" -DataFactoryName "rakeshdfactory" -PipelineName "MyADFpipeline"
You will get the Pipeline run id in Powershell after executing and you can see the pipeline run in Monitor pipelines-> Trigger runs of ADF.
Schedule this Powershell script using Azure Automation to execute the pipeline daily.
Reference:
Third party tutorial to learn about powershell automation from sharepointdairy by Salaudeen Rajack
Using Logic apps:
ADF event triggers are only permitted to storage accounts, but logic apps can give other options like SQL tables as well.
Here I have created a blob trigger for the logic app. You can create Recurrence trigger for the logic app instead of blob trigger, by which you can schedule the logic app to invoke the ADF pipeline.
Reference:
Article by MITCHELL PEARSON
Assuming you mean without using the Trigger definitions inside Data Factory, then yes there are other options. The two that I use almost exclusively are 1) Logic Apps and 2) the Azure Data Factory SDK (this document may also be useful). I use a Function App to house my SDK code, and in most cases I have Logic Apps call the AF instead of executing the pipeline directly.
NOTE: purely for pedantic purposes, every pipeline run has a Trigger. When you execute by using the Trigger Now feature inside the workspace or execute a pipeline externally using one of the above methods, the Trigger Type will show as "Manual" in the Monitor section.

how to add failure control of each stage in ADF

if any of my stage fails, I need to pick it up from the same failed stage in my next run instead of starting from the first stage, how to achieve this is ADF? and how to send email whenever an stage fails so that all the users are notified?
if any of my stage fails, I need to pick it up from the same failed stage in my next run instead of starting from the first stage, how to achieve this is ADF?
Failde activity
To Rerun this pipeline from copy activity, click on Rerun from this activity symbol
Click on Ok
Output
It skipped first wait activity which is already runned successfully and started from copy activity which is failed.
how to send email whenever a stage fails so that all the users are notified?
For failure alert for each activity an alert can be created in the Azure Data Factory (ADF) Monitor section under the Alerts and Actions option.
For more understanding you can refer this SO Thread by #UtkarshPal-MT
Or else,
You can create a logic app and then configure it with Azure Data factory through that you cand send mail when activity fails you
For more information you can refer this Article by #JEROEN SMANS

Azure Devops - Manage, Run and Track one-time Sql Scripts

We have a database project that uses a dacpac to deploy schema changes and also allows a pre-deployment and post-deployment script.
However, we frequently have to run one-off scripts and security would prefer that developers not have write access in prod (we do not have DBA role at this time). I'm trying to find a solution that would work with azure devops to store one-time run scripts in git, run the script if it has not been run before, and not run the script the next time the pipeline runs. We'd like this done through devops so the SP has access to run the queries and not the dev, and anything flowing through the pipe has been through our peer review process, plus we have record of what was executed.
I'm looking for suggestions from anyone who has done this or is aware of any product which can do this.
Use liquibase. Though I would have it as part of my code base you can also use it from the CLI and run your scripts using that tool.
Liquibase keeps track of what SQL files you have published across deployments so you can have multiple stages say DIT, UAT, STAGING, PROD and it can apply the remaining one off SQL changes over time.
Generally unless you really need support, I doubt you'd need the commercial version. The opensource version is more than sufficient for my system needs and I have a relatively complex system already.
The main reason I like liquibase over other technologies is it allows for SQL based change sets. So the learning curve is a lot lower.
Two tips:
don't rely on the automatic computation of the logicalFilePath, explicitly set it even if it is repeating yourself. This allows you to refactor your scripts so instead of lumping everything into a single folder you may group them later on.
Name your scripts with the date first. That way you can leverage the natural sorting order.
I've faced a similar problem in the past:
Option 1
If you can afford to have an additional table in your database to keep track of what was executed or not, your problem can be easily solved, there is a tool which helps you: https://github.com/DbUp/DbUp
Then you would have a new repository let's call it OneOffSqlScriptsRepository and your pipeline would consume this repository:
resources:
repositories:
- repository: OneOffSqlScriptsRepository
endpoint: OneOffSqlScriptsEndpoint
type: git
Thus you'd create a pipeline to run this DbUp application consuming the scripts from the OneOffSqlScripts repository, the DB would take care of executing the scripts only once (it's configurable).
The username/password for the database can be stored safely in the library combined with azure keyvaults, so only people with the right access rights could access them (apart from the pipeline).
Option 2
This option assumes that you wanna do everything by using only the native resources that azure pipelines can provide.
Create a OneOffSqlScripts as in option1
Create a ScriptsRunner repository
In the ScriptRunner repository, you'd create a folder containing a .json file with the name of the scripts and the amount of times (or a boolean) you've had run them.
eg.:
[{
"id": 1
"scriptName" : "myscript1.sql"
"runs": 0 //or hasRun : false
}]
Then write a python script that reads and writes a json file by updating the amount of runs, thus you'd need to update your repository after each pipeline run. It would mean that your pipeline will perform a git commit / push operation after each run in case there new scripts to be run.
The algorithm is like these, the implementation can be tuned.

Run Powershell script every hour on Azure

I have found this great script which backs up SQL Azure database to BLOB.
I want to run many different variations of this script - e.g. DB1 goes to Customer1Blob, DB2 goes to Customer2Blob.
I have looked at Scheduler Job Collections. However I can only see options (Action settings) for HTTP(S)/ Storage Queue / Service Bus.
Is it possible to run a specific .ps1 script (with commands) scheduled?
You can definitely run a Powershell script as a WebJob. If you want to run a script on a schedule, you can add a settings.job file containing a chron expression with your webjob. The docs for doing so are here.
For this type of automation tasks, I prefer to use the Azure Automation service. You can create runbooks using powershell and then schedule this with the use of the Azure scheduler. You can have it run "on azure" so you do not need to use compute power that you pay for (rather you pay by the minute the job runs) or you can configure it to run with a hybrid worker.
For more information, please see the documentation
When exporting from SQL DB or from SQL Server, make sure you are exporting from a quiescent database. Exporting from a database with active transactions can result in data integrity issues - data being added to various tables while they are also being exported.

How to trigger a build within a build chain after x days?

I am currently using Teamcity to deploy a web application to Azure Cloud Services. We typically deploy using powershell scripts to the Staging Slot and thereafter do a manual swap (Staging to Production) on the Azure Portal.
After the swap, we typically leave the Staging slot active with the old production deployment for a few days (in the event we need to revert/backout of the deployment) and thereafter delete it - this is a manual process.
I am looking to automate this process using Teamcity. My intended solution is to have a Teamcity build kick off x days after the deployment build has suceeded (The details of the build steps are irrelevant since I'd probably use powershell again to delete the staging slot)
This plan has pointed me to look into Teamcity build chains, snapshot dependencies etc.
What I have done so far is
correctly created the build chain by creating a snapshot dependency on the deployment build configuration and
created a Finish Build Trigger
At the moment, the current approach kickoffs the dependent build 'Delete Azure Staging Web' (B) immediately after the deployment build has succeeded. However, I would like this to be a delayed build after x days.
Looking at the above build chain, I would like the build B to run on 13-Aug-2016 at 7.31am (if x=3)
I have looked into the Schedule Trigger option as well, but am slightly lost as to how I can use it to achieve this. As far as I understand, using a cron expression will result in the build continuously running which is not what I want - I would like for the build B to only execute once.
Yes this can be done by making use of the REST api.
I've made a small sample which should convey the fundamental steps. This is a PowerShell script that will clear the triggers on another build configuration (determined by the parameter value in the script) and add a scheduled trigger with a start time X days on from the current time (determined by the parameter value in the script)
1) Add a PowerShell step to the main build, at the end and run add-scheduled-trigger as source code
2) Update the parameter values in the script
$BuildTypeId - This is the id of the configuration you want to add the trigger to
$NumberOfDays - This is the number of days ahead that you want to schedule the trigger for
There is admin / admin embedded in the script = Username / Password authentication for the REST api
One this is done you should see a scheduled trigger created / updated each time you build the first configuration
Hope this helps