Run lot of piplines in the same time - azure-data-factory

There is a solution if i want to run those piplines in the same time instead of doing it for each pipline

Just add a trigger at same time for all of your pipelines.
In the ADF portal:
Set the same time for trigger configuration:
If you want to execute them in the queue,you could use execute pipeline activity which allows you to invoke another pipeline.

You could also leverage the lookup activity to lookup the pipeline using meta data or a pipeline parameter table and then use the set the for each loop to parallel processing so that it will process upto 50 pipelines at once.
See ForEach Article for more info: https://learn.microsoft.com/en-us/azure/data-factory/control-flow-for-each-activity#parallel-execution

Related

Trigger Date for reruns

My pipelines activities need the date of the run as a parameter. Now I get the current date in the pipeline from the utcnow() function. Ideally this would be something I could enter dynamically in the trigger so I could rerun a failed day and the parameter would be set right, now a rerun would lead to my pipeline being rerun but with the date of today not the failed run date.
I am used to airflow where such things are pretty easy to do, including scheduling reruns. Probably I think too much in terms of airflow but I can't wrap my head around a better solution.
In ADF,it is not supported directly to pass trigger date at which pipeline got failed to trigger.
You can get the trigger time using #pipeline().TriggerTime .
This system variable will give the time at which the trigger triggers the pipeline to run.
You can store this trigger value for every pipeline and use this as a parameter for the trigger which got failed and rerun the pipeline.
Reference: Microsoft document on System Variables on ADF
To resolve my problem I had to create a nested structure of pipelines, the top pipeline setting a variable for the date and then calling other pipelines passing that variable.
With this I still can't rerun the top pipeline but rerunning Execute Pipeline1/2/3 reruns them with the right variable set. It is still not perfect since the top pipeline run stays an error and it is difficult to keep track of what needs to be rerun, however it is a partial solution.

ADF pipeline should be triggered after Formrecognizer analysis is completed

I am calling AzureFormRecognizer from my Azure data factory 2 pipeline and sending a 200 page document
FormRecognizer takes around 5 mins to complete analysis and untill then status is "Running"
So I have added a Wait activity to wait for 5 mins and then I call GetAnalyze results by calling form recognizer api
Question
Is there any way to trigger ADF pipeline once Form recognizer completes its analysis ?
You can use Execute pipeline activity for this. Keep the Formrecognizer activity in a separate pipeline and call that pipeline in main pipeline.
Here for sample, I have used set variable actvity in the pipeline.
You can use your Formrecognizer activity in this.
Check on wait on completion so that it waits for the activity to get complete.
After Execute pipeline use your next activities which will be executed after the completion of FormRecognizer activity. For this demo I have used another set variable activity.
Or You can use another Execute pipeline after this if you want to trigger them in a separate pipeline.

Can we call the pipeline in parallel and have multiple instances running?

I have a scenario where I have to execute a pipeline from different pipelines to log the validations.
So I'm planning to have a single pipeline for all the validations (like counts, duplicates, count drops etc..) and this pipeline should be trigger when a particular table execution completes.
for example: There are two pipelines P1 & P2 which both invokes this validation pipeline upon completion. so there is a chance that this validation pipeline may trigger twice at same time.
can we run a pipeline like this? is there any lock will applied automatically?
You can reuse a pipeline which acts as generic pipeline in other pipelines and call them parallelly and there is no lock aspect.
Just that make sure the generic pipeline is allowed parallel executions else it would be in queue

Automation for ADF V2 Pipeline

I need help with implementation for below requirement:
There is one ADF pipeline that runs every two hours (with Tumbling window trigger), now i need to create one more pipeline that will be used for performing maintenance job . This pipeline is scheduled to run once a month (with schedule trigger). Here is the requirement that i'm trying to implement:
Now before running the second pipeline i need to make sure the first pipeline is not running (basically get the status and if its running wait for its completion) and then disable the trigger associated with it.
Run the second pipeline and after its completion , enable the trigger that is associated with first pipeline
Please let me know if this can be achieved within ADF or some kind of custom scripting needed to achieve the result.
First, your idea is achievable.
Second, if you want to use built-in feature in Azure Datafactory, then there is no way.
Basically, you need to use azure function(simple httptrigger, dont give any input, then you can hit and execute it directly.) to achieve your requirement that ADF can't do. From your description, the executing of these two pipelines are mutually exclusive, so you can use sdk to check to status of another pipeline in azure function. If another pipeline is running, then wait a few seconds then re-check the status of another pipeline.(In short, put the main logic and code in the azure function.)
Simple azure function:
https://learn.microsoft.com/en-us/azure/azure-functions/functions-bindings-http-webhook-trigger?tabs=csharp
Use SDK to monitor:
https://learn.microsoft.com/en-us/azure/data-factory/monitor-programmatically#net
(The link I give is C#, you can choose other supported language.)

Execute a pipeline after a completion pipeline

I have a first pipeline that ingest data for multiple country from BigQuery to Azure, it's an operation that copy bigquery transformed data into azure.
On Data Factory, i create multiple folders for each country that will have multiple pipeline, for example, a specific machine learning model only for 1 or 2 countries, a data prepration pipeline for an application for only 5 countries etc.
I think i need this folder construction for each market to keep it clear for anybody that needs to implement a pipeline and avoid errors.
My main problem by doing that is how i can call, for example, a machine learning pipeline in my folder UK that can only start after the first pipeline, the bigquery copy data to azure, completed ?
I can't call the Execution Pipeline activity because my first pipeline bigquerytoazure is executed by himself, it's the very important step that needs to be executed before any other pipeline can be executed.
Is there any way to call completed pipeline without the Execution Pipeline activated ?
I thought about creating a dummy blob storage in the first pipeline that can work as a trigger for all pipeline after this first one ?
Thanks by advance, hope i was clear.
Data Factory event trigger based on the blob storage. I think that's the best way.
Another way you can think about using Logic App, add a trigger to listen the BigQuery table in SQL database, if the BigQuery table modified, then execute a data factory pipeline. Create a work flow for the pipelines run.
Work flow:
SQL Server Trigger: when an item is modified.
Add a parallel branch
Data Factory Action: Get a pipeline run
Reference: Automate workflows for SQL Server or Azure SQL Database by using
Azure Logic Apps
Hope this helps.