I'm trying to create a tumbling window trigger to run every 1 hour and 10 minutes delay before the pipeline starts executing.
I created a test trigger with time interval of 5 minutes and delay of 10 minutes.
I expected the pipeline to run every 15 minutes (5 min interval + 10 min delay).
What I actually see in the Monitor section of the pipelines Runs and Triggers Runs that it runs every 5 minutes.
Isn't the delay should delay the pipeline execution?
Am I doing something wrong here?
Updated
Here's my trigger template:
{
"name": "[concat(parameters('factoryName'), '/trigger_test')]",
"type": "Microsoft.DataFactory/factories/triggers",
"apiVersion": "2018-06-01",
"properties": {
"annotations": [],
"runtimeState": "Started",
"pipeline": {
"pipelineReference": {
"referenceName": "exportData",
"type": "PipelineReference"
},
"parameters": {}
},
"type": "TumblingWindowTrigger",
"typeProperties": {
"frequency": "Minute",
"interval": 5,
"startTime": "2021-07-25T07:46:00Z",
"delay": "00:10:00",
"maxConcurrency": 50,
"retryPolicy": {
"intervalInSeconds": 30
},
"dependsOn": []
}
},
"dependsOn": [
"[concat(variables('factoryId'), '/pipelines/exportData')]"
]
}
I haven't found a concrete example and the docs are not very clear in terms for terminology.
From what I understand, when one trigger window finished running, the next trigger window starts running regardless of the delay specified.
According to the docs, "the delay doesn't alter the window startTime" which I assume means what I have mentioned above.
Related
I'm doing some initial ADF deployment from my adf-dev to adf-staging environment. In the MS docs it says:
Deployment can fail if you try to update active triggers. To update active triggers, you need to manually stop them and then restart them after the deployment.
Does this mean I need to turn off my dev or staging triggers pre/post deployment?
2nd issue. I need to schedule the same set of triggers to run on different days in dev (sat) vs staging (sun). Do I need to make a separate set of triggers for each environment then or can I rewrite the trigger schedules for the existing triggers during deployment?
You will need your staging triggers stopped before you start the deployment, and restarted after deployment is complete.
this page have a PowerShell script for stopping triggers: https://learn.microsoft.com/en-us/azure/data-factory/continuous-integration-deployment#updating-active-triggers
Also you could use the custom petameters configuration file to update your trigger settings: https://learn.microsoft.com/en-us/azure/data-factory/continuous-integration-deployment#triggers
To parametrise the trigger deployment in Arm template, first here is a sample weekly trigger that runs on a specific day:
{
"name": "OnceAWeekTrigger",
"properties": {
"annotations": [],
"runtimeState": "Stopped",
"pipelines": [],
"type": "ScheduleTrigger",
"typeProperties": {
"recurrence": {
"frequency": "Week",
"interval": 1,
"startTime": "2021-05-25T22:59:00Z",
"timeZone": "UTC",
"schedule": {
"weekDays": [
"Sunday"
]
}
}
}
}
}
Create an arm-template-parameters-definition.json file as follow:
{
"Microsoft.DataFactory/factories/triggers": {
"properties": {
"typeProperties": {
"recurrence": {
"schedule": {
"weekDays": "=:-weekDays:array"
}
}
}
}
}
}
this file specifies that you want to prarametrise schedule_weekDays property.
after running ADFUtilities export function:
npm run build export c:\git\adf /subscriptions/<subscriptionid>/resourceGroups/datafactorydev/providers/Microsoft.DataFactory/factories/<datafactory_name> "ArmTemplate"
You now get arm template for trigger properties parametrised as follows:
... {
"name": "[concat(parameters('factoryName'), '/OnceAWeekTrigger')]",
"type": "Microsoft.DataFactory/factories/triggers",
"apiVersion": "2018-06-01",
"properties": {
"annotations": [],
"runtimeState": "Stopped",
"pipelines": [],
"type": "ScheduleTrigger",
"typeProperties": {
"recurrence": {
"frequency": "Week",
"interval": 1,
"startTime": "2021-05-25T22:59:00Z",
"timeZone": "UTC",
"schedule": {
"weekDays": "[parameters('OnceAWeekTrigger_weekDays')]"
}
}
}
}, ...
and the parameters file ArmTemplate\ARMTemplateParametersForFactory.json looks as follows:
{
"$schema": "https://schema.management.azure.com/schemas/2015-01-01/deploymentParameters.json#",
"contentVersion": "1.0.0.0",
"parameters": {
"factoryName": {
"value": "factory_name"
},
"OnceAWeekTrigger_weekDays": {
"value": [
"Sunday"
]
}
}
}
you could then create different parameter files for dev and staging with different days of the week by modifying the array value for OnceAWeekTrigger_weekDays
I have a very simple pipeline that I have setup to test tumbling window trigger dependency. So the pipeline has a single Wait activity. Here is the pipeline code:-
{
"name": "pl-something",
"properties": {
"activities": [
{
"name": "Wait1",
"type": "Wait",
"dependsOn": [],
"userProperties": [],
"typeProperties": {
"waitTimeInSeconds": 25
}
}
],
"parameters": {
"date_id": {
"type": "string"
}
},
"annotations": []
},
"type": "Microsoft.DataFactory/factories/pipelines"
}
I have created the following hourly trigger on it which just executes it at hourly intervals:-
{
"name": "trg-hourly",
"properties": {
"annotations": [],
"runtimeState": "Started",
"pipeline": {
"pipelineReference": {
"referenceName": "pl-something",
"type": "PipelineReference"
},
"parameters": {
"date_id": "#formatDateTime(triggerOutputs().windowStartTime, 'yyyyMMddHH')"
}
},
"type": "TumblingWindowTrigger",
"typeProperties": {
"frequency": "Hour",
"interval": 1,
"startTime": "2019-11-01T00:00:00.000Z",
"delay": "00:00:00",
"maxConcurrency": 1,
"retryPolicy": {
"intervalInSeconds": 30
},
"dependsOn": []
}
}
}
The parameter date_id exists so I know exactly which hourly window a trigger instance is running for. Now this executes fine. My goal is to create another trigger on the same pipeline but which will execute as a daily thing and which depends on the hourly trigger. So that unless all the 24 hours in a day are processed , the daily trigger should not run. So in the screenshow below you can see how I am trying to setup this new trigger dependent on the hourly trigger (trg-hourly), but somehow the 'OK' button is not activated whenever I try to specify 24 hours window and you can see the error too that the window size is not valid. There is no json to show , since it's not even allowing me to create the trigger. What's the issue here?
Maybe it is expecting 1.00:00:00 instead of 0.24:00:00 because there are 24 hours in a day.
I have created this subscription :
curl localhost:1026/v2/subscriptions -s -S -H 'Accept: application/json' | python -mjson.tool
[
{
"description": "Update room temperature",
"expires": "2020-04-05T14:00:00.00Z",
"id": "5b104ace028f2284c5517f51",
"notification": {
"attrs": [
"temperature"
],
"attrsFormat": "normalized",
"http": {
"url": "http://MyUrl/getSub"
},
"lastNotification": "2018-05-31T19:19:42.00Z",
"metadata": [
"5b019ae132232812eccb6d50",
"device",
"16",
"Auto",
"30",
"greater"
],
"timesSent": 1
},
"status": "active",
"subject": {
"condition": {
"attrs": [
"temperature"
]
},
"entities": [
{
"id": "5aff0eef23102126a4aeeea2",
"type": "room"
}
]
},
"throttling": 60
}
and even though I have set the throttling at 60 (1 minute if I understand it right), when I change the value of the temperature, orion sends me a notification even if the change happened before the one minute mark (for example I change the temperature value every 10 seconds). Shouldn't a notification be sent only if a change occurred after 60 seconds or am I understanding something wrong?
What you describe seems to be the right behaviour. I mean, if the subscription has a throttling of 60 seconds, you shouldn't receive any new notifiction until 60 seconds have passed from the previous one.
Possible causes:
You have another subscripion in place that is being triggered. But I understand this is not the case, as such subscription should be shown in the GET /v2/subscriptions result.
There is a bug in Orion which causes throttling to be ignored. In that case, it would be interesting to do the same test using a subscription created using NGSIv1 (using POST /v1/subscribeContext) in order to know the reach of the bug.
I'm having some troubles with the execution order of scheduled pipelines in Data Factory.
My pipeline is as follows:
{
"name": "Copy_Stage_Upsert",
"properties": {
"description": "",
"activities": [
{
"type": "Copy",
"typeProperties": {
"source": {
"type": "BlobSource"
},
"sink": {
"type": "SqlDWSink",
"writeBatchSize": 10000,
"writeBatchTimeout": "00:10:00"
}
},
"inputs": [
{
"name": "csv_extracted_file"
}
],
"outputs": [
{
"name": "stage_table"
}
],
"policy": {
"timeout": "01:00:00",
"retry": 2
},
"scheduler": {
"frequency": "Hour",
"interval": 1
},
"name": "Copy to stage table"
},
{
"type": "Copy",
"typeProperties": {
"source": {
"type": "SqlDWSource",
"sqlReaderQuery": "SELECT * from table WHERE id NOT IN (SELECT id from stage_table) UNION ALL SELECT * from stage_table"
},
"sink": {
"type": "SqlDWSink",
"writeBatchSize": 10000,
"writeBatchTimeout": "00:10:00"
}
},
"inputs": [
{
"name": "stage_table"
}
],
"outputs": [
{
"name": "upsert_table"
}
],
"policy": {
"timeout": "01:00:00",
"retry": 2
},
"scheduler": {
"frequency": "Hour",
"interval": 1
},
"name": "Copy"
},
{
"type": "SqlServerStoredProcedure",
"typeProperties": {
"storedProcedureName": "sp_rename_tables"
},
"inputs": [
{
"name": "upsert_table"
}
],
"outputs": [
{
"name": "table"
}
],
"scheduler": {
"frequency": "Hour",
"interval": 1
},
"name": "Rename tables"
}
],
"start": "2017-02-09T18:00:00Z",
"end": "9999-02-06T15:00:00Z",
"isPaused": false,
"hubName": "",
"pipelineMode": "Scheduled"
}
}
For simplicity imagine that I've one pipeline called A with three simple tasks:
Task 1, Task 2 and finally Task 3.
Scenario A
One execution of Pipeline A scheduled.
It runs as:
Task 1 -> Task 2 -> Task 3
Scenario B
Two or more executions of Pipeline A scheduled to be executed.
It runs as:
First Scheduled Pipeline Task 1 -> Second Scheduled Pipeline Task 1 -> First Scheduled Pipeline Task 2 -> Second Scheduled Pipeline Task 2 -> First Scheduled Pipeline Task 2 -> First Scheduled Pipeline Task 3 -> Second Scheduled Pipeline Task 3.
Is it possible run the second scenario as:
First Scheduled Pipeline Task 1 -> First Scheduled Pipeline Task 2 -> First Scheduled Pipeline Task 3, Second Scheduled Pipeline Task 1 -> Second Scheduled Pipeline Task 2 -> Second Scheduled Pipeline Task 3
In other words, I need to finish the first scheduled pipeline before the second pipeline starts.
Thank you in advance!
It's possible. However, it will require some fake input and output datasets to enforce the dependency behaviour and create the chain as you describe. So possible, but a dirty hack!
This isn't a great solution and it will become complicated if your outputs/downstream datasets in the second pipeline have different time slice intervals to the first. I recommend testing and understanding this.
Really ADF isn't designed to do what you want. It's not a tool to sequence things like steps in a SQL Agent job. ADF is for scale and parallel work streams.
As I understand in whispers from Microsoft peeps there might be more event driven scheduling coming soon in ADF. But I don't know for sure.
Hope this helps.
I have a pipeline that needs to run daily... but the data only arrives at around 2pm on that day (for the previous day)... so when midnight ticks over, the data isn't available, and therefore everything falls over ;)
I have tried this:
"start": "2016-02-10T15:00:00Z",
"end": "2016-05-31T00:00:00Z",
but it still kicks off at midnight, I assume because I have my scheduler is as follows:
"scheduler": {
"frequency": "Day",
"interval": 1
},
i think i need to use either anchordatetime, or offset.. but i'm not sure which?
Following in pipeline just means that you want pipeline enabled during that period.
"start": "2016-02-10T15:00:00Z",
"end": "2016-05-31T00:00:00Z"
Following in activity definition means that you want the activity to run at the end of the day, thus, midnight in UTC time.
"scheduler": {
"frequency": "Day",
"interval": 1
},
If you want the activity to kick off at 2pm, use following
"scheduler": {
"frequency": "Hour",
"interval": 24,
"anchorDateTime": "2016-02-10T14:00:00"
},
You should make it consistent in target dataset definition as well.
"availability": {
"frequency": "Hour",
"interval": 24,
"anchorDateTime": "2016-02-10T14:00:00"
}
Hope that helps!!