Trigger Date for reruns - azure-data-factory

My pipelines activities need the date of the run as a parameter. Now I get the current date in the pipeline from the utcnow() function. Ideally this would be something I could enter dynamically in the trigger so I could rerun a failed day and the parameter would be set right, now a rerun would lead to my pipeline being rerun but with the date of today not the failed run date.
I am used to airflow where such things are pretty easy to do, including scheduling reruns. Probably I think too much in terms of airflow but I can't wrap my head around a better solution.

In ADF,it is not supported directly to pass trigger date at which pipeline got failed to trigger.
You can get the trigger time using #pipeline().TriggerTime .
This system variable will give the time at which the trigger triggers the pipeline to run.
You can store this trigger value for every pipeline and use this as a parameter for the trigger which got failed and rerun the pipeline.
Reference: Microsoft document on System Variables on ADF

To resolve my problem I had to create a nested structure of pipelines, the top pipeline setting a variable for the date and then calling other pipelines passing that variable.
With this I still can't rerun the top pipeline but rerunning Execute Pipeline1/2/3 reruns them with the right variable set. It is still not perfect since the top pipeline run stays an error and it is difficult to keep track of what needs to be rerun, however it is a partial solution.

Related

Debug and triggered pipeline executing twice

I've created a simple pipeline with a number of copy activities. When I attempt to debug or "trigger now" the pipeline starts executing and then after about a minute, a message flashes, and another instance of the pipeline starts.
Message Screenshot:
Anybody experience this or know how to prevent it?
Please check the trigger definition, below is a sample trigger.
In this you can set the trigger schedules.
In your case, I think the Recurrence is Every 1 Min(s).
You can change it according to your use case.
Thanks!
What happens when you just do a debug ? I think you have two triggers whcih are associated and so when you do "trigger now" both of them are getting fired . May be the below screenshot help to check that .
As a supplement, you could go to Monitor-->Pipeline run-->Trigger/Degbug to check the pipeline run history.
Then you could know how the pipeline runs, manual debug(triggered) running or trigger triggered.
Then remove the trigger related like #HimanshuSinha-msft said.
Also please make sure you are the only one who operate the pipeline in the current time and the pipeline is not triggered or called by other services or executed by other pipeline actives, , such as logic app, etc..
HTH.

Automation for ADF V2 Pipeline

I need help with implementation for below requirement:
There is one ADF pipeline that runs every two hours (with Tumbling window trigger), now i need to create one more pipeline that will be used for performing maintenance job . This pipeline is scheduled to run once a month (with schedule trigger). Here is the requirement that i'm trying to implement:
Now before running the second pipeline i need to make sure the first pipeline is not running (basically get the status and if its running wait for its completion) and then disable the trigger associated with it.
Run the second pipeline and after its completion , enable the trigger that is associated with first pipeline
Please let me know if this can be achieved within ADF or some kind of custom scripting needed to achieve the result.
First, your idea is achievable.
Second, if you want to use built-in feature in Azure Datafactory, then there is no way.
Basically, you need to use azure function(simple httptrigger, dont give any input, then you can hit and execute it directly.) to achieve your requirement that ADF can't do. From your description, the executing of these two pipelines are mutually exclusive, so you can use sdk to check to status of another pipeline in azure function. If another pipeline is running, then wait a few seconds then re-check the status of another pipeline.(In short, put the main logic and code in the azure function.)
Simple azure function:
https://learn.microsoft.com/en-us/azure/azure-functions/functions-bindings-http-webhook-trigger?tabs=csharp
Use SDK to monitor:
https://learn.microsoft.com/en-us/azure/data-factory/monitor-programmatically#net
(The link I give is C#, you can choose other supported language.)

Do pipeline variables persist between runs?

I'm doing a simple data flow pipeline between 2 cosmos dbs. The pipeline starts with the dataflow, which grabs the pipeline variable "LastPipelineStartTime" and passes that parameter to the dataflow for the query to use in order to get all new data where c._ts >= "LastPipelineStartTime". Then, on data flow success, updates the variable via Set Variable to the pipeline.TriggerTime(). Essentially so I'm always grabbing new data between pipeline runs.
My question is: it looks like the variable during each debug run reverts back to its Default Value of 0, and instead grabs everything each time. Am I misunderstanding or using pipeline variables wrong? Thanks!
As i know,the variable which is set in the Set Variable Activity has it's own life cycle: during current execution of pipeline.Any change of variable can't persist until next execution stage.
To implement your needs,pls refer to my workarounds as below:
1.If you execute ADF pipeline in the schedule,you could just pass the schedule time as parameter into it to make sure you grab new data.
2.If the frequency is random,persist the trigger time into other residence(e.g. simple file in the blob storage),before data flow activity,use LookUp Activity to grab that time from blob storage file.

Run lot of piplines in the same time

There is a solution if i want to run those piplines in the same time instead of doing it for each pipline
Just add a trigger at same time for all of your pipelines.
In the ADF portal:
Set the same time for trigger configuration:
If you want to execute them in the queue,you could use execute pipeline activity which allows you to invoke another pipeline.
You could also leverage the lookup activity to lookup the pipeline using meta data or a pipeline parameter table and then use the set the for each loop to parallel processing so that it will process upto 50 pipelines at once.
See ForEach Article for more info: https://learn.microsoft.com/en-us/azure/data-factory/control-flow-for-each-activity#parallel-execution

Getting actual trigger run start time for Tumbling Window trigger

I am interested in getting actual run start time for Tumbling Window trigger. I don't want Schedule Trigger. My scenario demands for Tumbling Window trigger specifically, but also some logic also requires knowing exactly at what time a triggered run started. As per the documentation I tried using #pipeline().TriggerTime , basically I passed it as a value to one of the pipeline parameters, but then it was not converted into a value -- then I realized the scope of this expression is within pipeline so I can't use it in a trigger. #trigger().outputs.windowStartTime can be used in a trigger but it doesn't serve my purpose -- I am not looking for a window start time , which is fixed no matter when a trigger is executed. I want actual run start time for Tumbling Window trigger. Is there any solution to this?
One solution I found is that we create Append Variable activity and call #pipeline().TriggerTime in the value section of the activity. Since this is part of the pipeline, it gets converted into a value there.
Another solution is to simply call utcnow() in the append variable activity.