I'm having some troubles with the execution order of scheduled pipelines in Data Factory.
My pipeline is as follows:
{
"name": "Copy_Stage_Upsert",
"properties": {
"description": "",
"activities": [
{
"type": "Copy",
"typeProperties": {
"source": {
"type": "BlobSource"
},
"sink": {
"type": "SqlDWSink",
"writeBatchSize": 10000,
"writeBatchTimeout": "00:10:00"
}
},
"inputs": [
{
"name": "csv_extracted_file"
}
],
"outputs": [
{
"name": "stage_table"
}
],
"policy": {
"timeout": "01:00:00",
"retry": 2
},
"scheduler": {
"frequency": "Hour",
"interval": 1
},
"name": "Copy to stage table"
},
{
"type": "Copy",
"typeProperties": {
"source": {
"type": "SqlDWSource",
"sqlReaderQuery": "SELECT * from table WHERE id NOT IN (SELECT id from stage_table) UNION ALL SELECT * from stage_table"
},
"sink": {
"type": "SqlDWSink",
"writeBatchSize": 10000,
"writeBatchTimeout": "00:10:00"
}
},
"inputs": [
{
"name": "stage_table"
}
],
"outputs": [
{
"name": "upsert_table"
}
],
"policy": {
"timeout": "01:00:00",
"retry": 2
},
"scheduler": {
"frequency": "Hour",
"interval": 1
},
"name": "Copy"
},
{
"type": "SqlServerStoredProcedure",
"typeProperties": {
"storedProcedureName": "sp_rename_tables"
},
"inputs": [
{
"name": "upsert_table"
}
],
"outputs": [
{
"name": "table"
}
],
"scheduler": {
"frequency": "Hour",
"interval": 1
},
"name": "Rename tables"
}
],
"start": "2017-02-09T18:00:00Z",
"end": "9999-02-06T15:00:00Z",
"isPaused": false,
"hubName": "",
"pipelineMode": "Scheduled"
}
}
For simplicity imagine that I've one pipeline called A with three simple tasks:
Task 1, Task 2 and finally Task 3.
Scenario A
One execution of Pipeline A scheduled.
It runs as:
Task 1 -> Task 2 -> Task 3
Scenario B
Two or more executions of Pipeline A scheduled to be executed.
It runs as:
First Scheduled Pipeline Task 1 -> Second Scheduled Pipeline Task 1 -> First Scheduled Pipeline Task 2 -> Second Scheduled Pipeline Task 2 -> First Scheduled Pipeline Task 2 -> First Scheduled Pipeline Task 3 -> Second Scheduled Pipeline Task 3.
Is it possible run the second scenario as:
First Scheduled Pipeline Task 1 -> First Scheduled Pipeline Task 2 -> First Scheduled Pipeline Task 3, Second Scheduled Pipeline Task 1 -> Second Scheduled Pipeline Task 2 -> Second Scheduled Pipeline Task 3
In other words, I need to finish the first scheduled pipeline before the second pipeline starts.
Thank you in advance!
It's possible. However, it will require some fake input and output datasets to enforce the dependency behaviour and create the chain as you describe. So possible, but a dirty hack!
This isn't a great solution and it will become complicated if your outputs/downstream datasets in the second pipeline have different time slice intervals to the first. I recommend testing and understanding this.
Really ADF isn't designed to do what you want. It's not a tool to sequence things like steps in a SQL Agent job. ADF is for scale and parallel work streams.
As I understand in whispers from Microsoft peeps there might be more event driven scheduling coming soon in ADF. But I don't know for sure.
Hope this helps.
Related
enter image description hereI have a scenario where I have to check two conditions and if both are true then execute set of activities in ADF.
I tried if condition activity inside a if condition but ADF is not allowing it.
so basically my design is two lookups to read data and then if condition to check condition 1, if that is true then go inside again two lookups to read data and if condition to check condition 2. But it is not working.
is there any other work around for this?
I tried AND condition inside IF condition activity but it is not working. Please suggest.
Since we cannot use IF within IF activity, we can leverage multiple conditions within an IF activity via expressions which includes if,and ,or etc functions.
The below JSON pipeline is somewhat an example :
{
"name": "pipeline7",
"properties": {
"activities": [
{
"name": "If Condition1",
"type": "IfCondition",
"dependsOn": [],
"userProperties": [],
"typeProperties": {
"expression": {
"value": "#and(greater(pipeline().parameters.Test2, pipeline().parameters.Test1),greater(pipeline().parameters.Test4, pipeline().parameters.Test3))",
"type": "Expression"
},
"ifFalseActivities": [
{
"name": "Wait2",
"type": "Wait",
"dependsOn": [],
"userProperties": [],
"typeProperties": {
"waitTimeInSeconds": 1
}
}
],
"ifTrueActivities": [
{
"name": "Wait1",
"type": "Wait",
"dependsOn": [],
"userProperties": [],
"typeProperties": {
"waitTimeInSeconds": 1
}
}
]
}
}
],
"parameters": {
"Test1": {
"type": "int",
"defaultValue": 1
},
"Test2": {
"type": "int",
"defaultValue": 2
},
"Test3": {
"type": "int",
"defaultValue": 3
},
"Test4": {
"type": "int",
"defaultValue": 4
}
},
"annotations": []
}
}
Your approach is correct in case if you dont want to create another pipeline and use execute activity within the IF activity for another comparision
I'm trying to create a tumbling window trigger to run every 1 hour and 10 minutes delay before the pipeline starts executing.
I created a test trigger with time interval of 5 minutes and delay of 10 minutes.
I expected the pipeline to run every 15 minutes (5 min interval + 10 min delay).
What I actually see in the Monitor section of the pipelines Runs and Triggers Runs that it runs every 5 minutes.
Isn't the delay should delay the pipeline execution?
Am I doing something wrong here?
Updated
Here's my trigger template:
{
"name": "[concat(parameters('factoryName'), '/trigger_test')]",
"type": "Microsoft.DataFactory/factories/triggers",
"apiVersion": "2018-06-01",
"properties": {
"annotations": [],
"runtimeState": "Started",
"pipeline": {
"pipelineReference": {
"referenceName": "exportData",
"type": "PipelineReference"
},
"parameters": {}
},
"type": "TumblingWindowTrigger",
"typeProperties": {
"frequency": "Minute",
"interval": 5,
"startTime": "2021-07-25T07:46:00Z",
"delay": "00:10:00",
"maxConcurrency": 50,
"retryPolicy": {
"intervalInSeconds": 30
},
"dependsOn": []
}
},
"dependsOn": [
"[concat(variables('factoryId'), '/pipelines/exportData')]"
]
}
I haven't found a concrete example and the docs are not very clear in terms for terminology.
From what I understand, when one trigger window finished running, the next trigger window starts running regardless of the delay specified.
According to the docs, "the delay doesn't alter the window startTime" which I assume means what I have mentioned above.
I'm doing some initial ADF deployment from my adf-dev to adf-staging environment. In the MS docs it says:
Deployment can fail if you try to update active triggers. To update active triggers, you need to manually stop them and then restart them after the deployment.
Does this mean I need to turn off my dev or staging triggers pre/post deployment?
2nd issue. I need to schedule the same set of triggers to run on different days in dev (sat) vs staging (sun). Do I need to make a separate set of triggers for each environment then or can I rewrite the trigger schedules for the existing triggers during deployment?
You will need your staging triggers stopped before you start the deployment, and restarted after deployment is complete.
this page have a PowerShell script for stopping triggers: https://learn.microsoft.com/en-us/azure/data-factory/continuous-integration-deployment#updating-active-triggers
Also you could use the custom petameters configuration file to update your trigger settings: https://learn.microsoft.com/en-us/azure/data-factory/continuous-integration-deployment#triggers
To parametrise the trigger deployment in Arm template, first here is a sample weekly trigger that runs on a specific day:
{
"name": "OnceAWeekTrigger",
"properties": {
"annotations": [],
"runtimeState": "Stopped",
"pipelines": [],
"type": "ScheduleTrigger",
"typeProperties": {
"recurrence": {
"frequency": "Week",
"interval": 1,
"startTime": "2021-05-25T22:59:00Z",
"timeZone": "UTC",
"schedule": {
"weekDays": [
"Sunday"
]
}
}
}
}
}
Create an arm-template-parameters-definition.json file as follow:
{
"Microsoft.DataFactory/factories/triggers": {
"properties": {
"typeProperties": {
"recurrence": {
"schedule": {
"weekDays": "=:-weekDays:array"
}
}
}
}
}
}
this file specifies that you want to prarametrise schedule_weekDays property.
after running ADFUtilities export function:
npm run build export c:\git\adf /subscriptions/<subscriptionid>/resourceGroups/datafactorydev/providers/Microsoft.DataFactory/factories/<datafactory_name> "ArmTemplate"
You now get arm template for trigger properties parametrised as follows:
... {
"name": "[concat(parameters('factoryName'), '/OnceAWeekTrigger')]",
"type": "Microsoft.DataFactory/factories/triggers",
"apiVersion": "2018-06-01",
"properties": {
"annotations": [],
"runtimeState": "Stopped",
"pipelines": [],
"type": "ScheduleTrigger",
"typeProperties": {
"recurrence": {
"frequency": "Week",
"interval": 1,
"startTime": "2021-05-25T22:59:00Z",
"timeZone": "UTC",
"schedule": {
"weekDays": "[parameters('OnceAWeekTrigger_weekDays')]"
}
}
}
}, ...
and the parameters file ArmTemplate\ARMTemplateParametersForFactory.json looks as follows:
{
"$schema": "https://schema.management.azure.com/schemas/2015-01-01/deploymentParameters.json#",
"contentVersion": "1.0.0.0",
"parameters": {
"factoryName": {
"value": "factory_name"
},
"OnceAWeekTrigger_weekDays": {
"value": [
"Sunday"
]
}
}
}
you could then create different parameter files for dev and staging with different days of the week by modifying the array value for OnceAWeekTrigger_weekDays
I have a very simple pipeline that I have setup to test tumbling window trigger dependency. So the pipeline has a single Wait activity. Here is the pipeline code:-
{
"name": "pl-something",
"properties": {
"activities": [
{
"name": "Wait1",
"type": "Wait",
"dependsOn": [],
"userProperties": [],
"typeProperties": {
"waitTimeInSeconds": 25
}
}
],
"parameters": {
"date_id": {
"type": "string"
}
},
"annotations": []
},
"type": "Microsoft.DataFactory/factories/pipelines"
}
I have created the following hourly trigger on it which just executes it at hourly intervals:-
{
"name": "trg-hourly",
"properties": {
"annotations": [],
"runtimeState": "Started",
"pipeline": {
"pipelineReference": {
"referenceName": "pl-something",
"type": "PipelineReference"
},
"parameters": {
"date_id": "#formatDateTime(triggerOutputs().windowStartTime, 'yyyyMMddHH')"
}
},
"type": "TumblingWindowTrigger",
"typeProperties": {
"frequency": "Hour",
"interval": 1,
"startTime": "2019-11-01T00:00:00.000Z",
"delay": "00:00:00",
"maxConcurrency": 1,
"retryPolicy": {
"intervalInSeconds": 30
},
"dependsOn": []
}
}
}
The parameter date_id exists so I know exactly which hourly window a trigger instance is running for. Now this executes fine. My goal is to create another trigger on the same pipeline but which will execute as a daily thing and which depends on the hourly trigger. So that unless all the 24 hours in a day are processed , the daily trigger should not run. So in the screenshow below you can see how I am trying to setup this new trigger dependent on the hourly trigger (trg-hourly), but somehow the 'OK' button is not activated whenever I try to specify 24 hours window and you can see the error too that the window size is not valid. There is no json to show , since it's not even allowing me to create the trigger. What's the issue here?
Maybe it is expecting 1.00:00:00 instead of 0.24:00:00 because there are 24 hours in a day.
I'm working on an Azure Data Factory V2 Pipeline but I having a problem when I try to execute a "Custom activity" inside an "If Condition Activity".
If I try to test my pipeline with "Test Run" button on the ADF's Web interface, this error appeare:
{"code":"BadRequest","message":"Activity PPL_ANYFBRF01 failed: Invalid linked service reference. Name: LNK_BATCH_AZURE","target"...}
I'm sure that there is no error in the linked service reference's name. If I create a "Custom Activity" directly in my pipeline, it's working.
I think it can be a syntax error on my activity but I can't find it.
Here is my "If Condition Activity"'s Json template (the expression "#equal(0,0)" is just for testing purpose):
{
"name": "IfPointComptageNotExist",
"type": "IfCondition",
"dependsOn": [
{
"activity": "PointComptage",
"dependencyConditions": [
"Succeeded"
]
},
{
"activity": "SousPointComptage",
"dependencyConditions": [
"Succeeded"
]
}
],
"typeProperties": {
"expression": {
"value": "#equal(0,0)",
"type": "Expression"
},
"ifTrueActivities": [
{
"type": "Custom",
"name": "CustomActivityTest",
"linkedServiceName": {
"referenceName": "LNK_BATCH_AZURE",
"type": "LinkedServiceReference"
},
"typeProperties": {
"command": "Batch.exe",
"resourceLinkedService": {
"referenceName": "LNK_BLOB_STORAGE",
"type": "LinkedServiceReference"
},
"folderPath": "/test/app/"
}
}
]
}
},
Thank you in advance for your help.
The problem is now solved. I have recreate the pipeline and it's working now.
Regards,
Julien.