Run an Azure Data Factory Pipeline Continuously - azure-data-factory

I have a requirement to incrementally copy data from one SQL table to another SQL table. The watermark or key column is an Identity column. My boss wants me to restart the load as soon as it's done...and as you know, the completion time may vary. In Azure Data Factory, the trigger options are Scheduled, Tumbling Window and Custom Event. Does anyone know which option would allow me to achieve this continuous running of the pipeline and how to configure it?

Create a new pipeline. Call it "run forever". Add an "until" activity with a never-true condition e.g. #equals(1, 2).
Inside the until execute the pipeline which copies between tables. Ensure "wait for completion" is ticked.
If the table copy fails then "run forever" will fail and will have to be manually re-started. You likely do not want "run forever" to be scheduled as the scheduled invocations will queue should it, in fact, not fail.

Pipeline in ADF are batch-based. You can set "micro batches" of 1 min with schedule trigger or 5 mins with tumbling window.

ADF will run in a batch. If you want to continuously load the data then you can go for Stream Analytics / Event Hub which would load real-time data

Related

Event based trigger for a sequential run of the same data factory pipeline

I would like to use an event based trigger to run a data factory pipeline.
The trigger will check a folder in a data lake for any new file and start a pipeline once a new CSV file is copied.
The pipeline will then copy the data to an intermediate table to check its consistency (multiple checks using different data flow activities) and if everything's correct, copies it into a stage table.
It is thus very important that the intermediate table will contain the data from only one single CSV file before it is checked.
I have read though that the event based trigger will start in parallel as many pipelines as the (simultaneously) downloaded CSV files.
Is this right? in this case how can I force each Pipeline to wait until the previous one is done?
Thank you for your help.
There is a flag on the pipeline properties (accssible in the top-right of the editor pane) called concurrency. Set this to 1 and only one copy will run and any other invocations will be queued until that one finishes.

Automation for ADF V2 Pipeline

I need help with implementation for below requirement:
There is one ADF pipeline that runs every two hours (with Tumbling window trigger), now i need to create one more pipeline that will be used for performing maintenance job . This pipeline is scheduled to run once a month (with schedule trigger). Here is the requirement that i'm trying to implement:
Now before running the second pipeline i need to make sure the first pipeline is not running (basically get the status and if its running wait for its completion) and then disable the trigger associated with it.
Run the second pipeline and after its completion , enable the trigger that is associated with first pipeline
Please let me know if this can be achieved within ADF or some kind of custom scripting needed to achieve the result.
First, your idea is achievable.
Second, if you want to use built-in feature in Azure Datafactory, then there is no way.
Basically, you need to use azure function(simple httptrigger, dont give any input, then you can hit and execute it directly.) to achieve your requirement that ADF can't do. From your description, the executing of these two pipelines are mutually exclusive, so you can use sdk to check to status of another pipeline in azure function. If another pipeline is running, then wait a few seconds then re-check the status of another pipeline.(In short, put the main logic and code in the azure function.)
Simple azure function:
https://learn.microsoft.com/en-us/azure/azure-functions/functions-bindings-http-webhook-trigger?tabs=csharp
Use SDK to monitor:
https://learn.microsoft.com/en-us/azure/data-factory/monitor-programmatically#net
(The link I give is C#, you can choose other supported language.)

Execute a pipeline after a completion pipeline

I have a first pipeline that ingest data for multiple country from BigQuery to Azure, it's an operation that copy bigquery transformed data into azure.
On Data Factory, i create multiple folders for each country that will have multiple pipeline, for example, a specific machine learning model only for 1 or 2 countries, a data prepration pipeline for an application for only 5 countries etc.
I think i need this folder construction for each market to keep it clear for anybody that needs to implement a pipeline and avoid errors.
My main problem by doing that is how i can call, for example, a machine learning pipeline in my folder UK that can only start after the first pipeline, the bigquery copy data to azure, completed ?
I can't call the Execution Pipeline activity because my first pipeline bigquerytoazure is executed by himself, it's the very important step that needs to be executed before any other pipeline can be executed.
Is there any way to call completed pipeline without the Execution Pipeline activated ?
I thought about creating a dummy blob storage in the first pipeline that can work as a trigger for all pipeline after this first one ?
Thanks by advance, hope i was clear.
Data Factory event trigger based on the blob storage. I think that's the best way.
Another way you can think about using Logic App, add a trigger to listen the BigQuery table in SQL database, if the BigQuery table modified, then execute a data factory pipeline. Create a work flow for the pipelines run.
Work flow:
SQL Server Trigger: when an item is modified.
Add a parallel branch
Data Factory Action: Get a pipeline run
Reference: Automate workflows for SQL Server or Azure SQL Database by using
Azure Logic Apps
Hope this helps.

Run lot of piplines in the same time

There is a solution if i want to run those piplines in the same time instead of doing it for each pipline
Just add a trigger at same time for all of your pipelines.
In the ADF portal:
Set the same time for trigger configuration:
If you want to execute them in the queue,you could use execute pipeline activity which allows you to invoke another pipeline.
You could also leverage the lookup activity to lookup the pipeline using meta data or a pipeline parameter table and then use the set the for each loop to parallel processing so that it will process upto 50 pipelines at once.
See ForEach Article for more info: https://learn.microsoft.com/en-us/azure/data-factory/control-flow-for-each-activity#parallel-execution

How to pause data factory tumbling window trigger if error occurs

I have a tumbling window trigger for a pipeline. If a window fails I do not want any additional windows ran until the failed window is addressed. What is the best way to handle this?
There is no native way to achieve this in tumbling window trigger, but ADF provides various control and transform activities for you to come up with a combination method:
Involve a flag stored in a table/file to indicate the pipeline running status in a window, after each window execution, update this flag, then check this flag before execute another window run. You may need a Lookup activity to fetch the value of this flag, an IF activity to check the flag then a Custom/Stored Procedure activity to update the value. HTH.