Unable to create TWO Data Factory Pipeline activities pointing to same Target Dataset - azure-data-factory

I am looking for a solution to load the data from SQL DW DMVs from 2 different database into single table on one SQL DW table.
I went with a ADF Pipeline activity - which helps in loading the data for every 15 minutes, But I am seeing a issue - when I create two activities into one pipeline in which it has 2 different source (Input dataset) but both loads the data into same target ( Output dataset).
I also wanted to make sure - I build a dependency between the activities , so that they wont run at the same time. Activity 2 starts only after Activity 1 is completed/not-running.
My ADF Code is as below:
{
"name": "Execution_Requests_Hist",
"properties": {
"description": "Execution Requests history data",
"activities": [
{
"type": "Copy",
"typeProperties": {
"source": {
"type": "SqlDWSource",
"sqlReaderQuery": "select * from dm_pdw_exec_requests_hist_view"
},
"sink": {
"type": "SqlDWSink",
"writeBatchSize": 0,
"writeBatchTimeout": "00:00:00"
},
"translator": {
"type": "TabularTranslator",
"columnMappings": "request_id:request_id,session_id:session_id,status:status,submit_time:submit_time,start_time:start_time,end_compile_time:end_compile_time,total_elapsed_time:total_elapsed_time,end_time:end_time,label:label,error_id:error_id,command:command,resource_class:resource_class,database_id:database_id,login_name:login_name,app_name:app_name,client_id:client_id,DMV_Source:DMV_Source,source:source,type:type,create_time:create_time,details:details"
},
"enableSkipIncompatibleRow": true
},
"inputs": [
{
"name": "ID_Exec_Requests"
}
],
"outputs": [
{
"name": "OD_Exec_Requests"
}
],
"policy": {
"timeout": "1.00:00:00",
"concurrency": 1,
"executionPriorityOrder": "NewestFirst",
"style": "StartOfInterval",
"retry": 3,
"longRetry": 0,
"longRetryInterval": "00:00:00"
},
"scheduler": {
"frequency": "Minute",
"interval": 15
},
"name": "PRD_DMV_Load"
},
{
"type": "Copy",
"typeProperties": {
"source": {
"type": "SqlDWSource",
"sqlReaderQuery": "select * from dm_pdw_exec_requests_hist_view"
},
"sink": {
"type": "SqlDWSink",
"writeBatchSize": 0,
"writeBatchTimeout": "00:00:00"
},
"translator": {
"type": "TabularTranslator",
"columnMappings": "request_id:request_id,session_id:session_id,status:status,submit_time:submit_time,start_time:start_time,end_compile_time:end_compile_time,total_elapsed_time:total_elapsed_time,end_time:end_time,label:label,error_id:error_id,command:command,resource_class:resource_class,database_id:database_id,login_name:login_name,app_name:app_name,client_id:client_id,DMV_Source:DMV_Source,source:source,type:type,create_time:create_time,details:details"
},
"enableSkipIncompatibleRow": true
},
"inputs": [
{
"name": "OD_Exec_Requests",
"name": "ITG_Exec_Requests"
}
],
"outputs": [
{
"name": "OD_Exec_Requests"
}
],
"policy": {
"timeout": "1.00:00:00",
"concurrency": 1,
"executionPriorityOrder": "NewestFirst",
"style": "StartOfInterval",
"retry": 3,
"longRetry": 0,
"longRetryInterval": "00:00:00"
},
"scheduler": {
"frequency": "Minute",
"interval": 15
},
"name": "ITG_DMV_Load"
}
],
"start": "2017-08-20T04:22:00Z",
"end": "2018-08-20T04:22:00Z",
"isPaused": false,
"hubName": "xyz-adf_hub",
"pipelineMode": "Scheduled"
}
}
When I try to deploy this - Its giving below error message:
Error Activities 'PRD_DMV_Load' and 'ITG_DMV_Load' have the same
output Dataset 'OD_Exec_Requests'. Two activities cannot output the
same Dataset over the same active period.
How can I resolve this? Can I say - run ITG_DMV_Load only after PRD_DMV_Load is completed?

You have two issues here.
You cannot produce the same dataset slice from two different activities/pipelines. To workaround this one you can create another dataset which will point on the same table, but from ADF perspective this will be different sink. You also need to move your second activity into separate pipeline configuration (so you end up with one activity per pipeline).
You need to somehow order your pipelines. I see two possible ways:
you can try using scheduler configuration options - e.q. you can use offset property (or style) to schedule one pipeline in the middle of the interval:
For example, if first pipeline is configured like this:
"scheduler": {
"frequency": "Minute",
"interval": 15
},
Configure the second like this:
"scheduler": {
"frequency": "Minute",
"interval": 15,
"offset" : 5
},
This approach may require some tuning depends on how long does it take your pipeline to complete.
Another approach is to specify output of first pipeline as input of second. In this case second activity won't start till first one is completed. The schedule of activities must match in this case (i.e. both should have same scheduler.frequency and scheduler.interval).

As #arghtype says you cannot use the same ADF dataset in two active pipelines or activities. You will need to create a second, identical output dataset for the ITG_DMV_Load but you do not have to split the pipeline. You can ensure the second activity does not run until the first is completed by making the output of the first a secondary input to the second. I would suggest something like this...
{
"name": "Execution_Requests_Hist",
"properties": {
"description": "Execution Requests history data",
"activities": [
{
"type": "Copy",
"typeProperties": {
"source": {
"type": "SqlDWSource",
"sqlReaderQuery": "select * from dm_pdw_exec_requests_hist_view"
},
"sink": {
"type": "SqlDWSink",
"writeBatchSize": 0,
"writeBatchTimeout": "00:00:00"
},
"translator": {
"type": "TabularTranslator",
"columnMappings": "request_id:request_id,session_id:session_id,status:status,submit_time:submit_time,start_time:start_time,end_compile_time:end_compile_time,total_elapsed_time:total_elapsed_time,end_time:end_time,label:label,error_id:error_id,command:command,resource_class:resource_class,database_id:database_id,login_name:login_name,app_name:app_name,client_id:client_id,DMV_Source:DMV_Source,source:source,type:type,create_time:create_time,details:details"
},
"enableSkipIncompatibleRow": true
},
"inputs": [
{
"name": "ID_Exec_Requests"
}
],
"outputs": [
{
"name": "OD_Exec_Requests_PRD"
}
],
"policy": {
"timeout": "1.00:00:00",
"concurrency": 1,
"executionPriorityOrder": "NewestFirst",
"style": "StartOfInterval",
"retry": 3,
"longRetry": 0,
"longRetryInterval": "00:00:00"
},
"scheduler": {
"frequency": "Minute",
"interval": 15
},
"name": "PRD_DMV_Load"
},
{
"type": "Copy",
"typeProperties": {
"source": {
"type": "SqlDWSource",
"sqlReaderQuery": "select * from dm_pdw_exec_requests_hist_view"
},
"sink": {
"type": "SqlDWSink",
"writeBatchSize": 0,
"writeBatchTimeout": "00:00:00"
},
"translator": {
"type": "TabularTranslator",
"columnMappings": "request_id:request_id,session_id:session_id,status:status,submit_time:submit_time,start_time:start_time,end_compile_time:end_compile_time,total_elapsed_time:total_elapsed_time,end_time:end_time,label:label,error_id:error_id,command:command,resource_class:resource_class,database_id:database_id,login_name:login_name,app_name:app_name,client_id:client_id,DMV_Source:DMV_Source,source:source,type:type,create_time:create_time,details:details"
},
"enableSkipIncompatibleRow": true
},
"inputs": [
{
"name": "ITG_Exec_Requests",
"name": "OD_Exec_Requests_PRD"
}
],
"outputs": [
{
"name": "OD_Exec_Requests_ITG"
}
],
"policy": {
"timeout": "1.00:00:00",
"concurrency": 1,
"executionPriorityOrder": "NewestFirst",
"style": "StartOfInterval",
"retry": 3,
"longRetry": 0,
"longRetryInterval": "00:00:00"
},
"scheduler": {
"frequency": "Minute",
"interval": 15
},
"name": "ITG_DMV_Load"
}
],
"start": "2017-08-20T04:22:00Z",
"end": "2018-08-20T04:22:00Z",
"isPaused": false,
"hubName": "xyz-adf_hub",
"pipelineMode": "Scheduled"
}

Related

Copy CSV file data from blob storage to Azure SQL database based on the number of columns in the file

I have data files landing in a single Azure blob storage container every other day or so. These files have either 8, 16, 24, or 32 columns of data. Each column has a unique name within a file, and the names are consistent across files. I.e the column names in the 8-column file will always be the first 8 column names of the 16, 24, and 32 column files. I have the appropriate 4 tables in an Azure SQL database set up to receive the files. I need to create a pipeline(s) in Azure Data Factory that will
trigger upon the landing of a new file in the blob storage container
check the # of columns in that file
use the number of columns to copy the file from the blob into the appropriate Azure SQL database table. Meaning the 8 column blob file copies to the 8 column SQL table and so on.
delete the file
I've researched the various pieces to complete this but cannot seem to put them together. Schema drift solution got me close but parameterization of the file names lost me. Multiple pipelines to achieve this is okay, as long as the single storage container is maintained. Thanks
Use Blob trigger to trigger upon the landing of a new file in the blob storage container
use get meta data activity to get the # of column in file details
use a switch activity based on number of columns and based on that have copy activity and delete activity within the switch counterparts to copy the data and also delete the file
I agree with #Nandan's approach. Also, you can try the below alternative using look up and filter if you want to avoid creating Switch cases.
For this approach, you should not have any other tables in your target database other than the above.
First create pipeline parameter and a Storage event trigger. Give trigger parameter#triggerBody().fileName to the pipeline parameter.
Now, use lookup activity query to get the list of table schema, table name and column count as an array of objects.
SELECT TABLE_SCHEMA
, TABLE_NAME
, number = COUNT(*)
FROM INFORMATION_SCHEMA.COLUMNS
where TABLE_NAME!='database_firewall_rules'
GROUP BY TABLE_SCHEMA, TABLE_NAME;
This will give the JSON array like this.
Next, use Get Meta activity by giving the triggered file name with dataset parameters and get the column count from it.
Now, use Filter activity to filter the correct SQL table which has same column count as our triggered file.
items: #activity('Lookup1').output.value
Condition: #equals(activity('Get Metadata1').output.columnCount, item().number)
Filter output:
Now, use copy activity with dataset parameters.
Source with dataset parameters:
Sink:
Then use delete activity for the triggered file.
My pipeline JSON:
{
"name": "pipeline5_copy1",
"properties": {
"activities": [
{
"name": "Lookup1",
"type": "Lookup",
"dependsOn": [],
"policy": {
"timeout": "0.12:00:00",
"retry": 0,
"retryIntervalInSeconds": 30,
"secureOutput": false,
"secureInput": false
},
"userProperties": [],
"typeProperties": {
"source": {
"type": "AzureSqlSource",
"sqlReaderQuery": "SELECT TABLE_SCHEMA\n , TABLE_NAME\n , number = COUNT(*) \nFROM INFORMATION_SCHEMA.COLUMNS \nwhere TABLE_NAME!='database_firewall_rules'\nGROUP BY TABLE_SCHEMA, TABLE_NAME;",
"queryTimeout": "02:00:00",
"partitionOption": "None"
},
"dataset": {
"referenceName": "Dataset_for_column_count",
"type": "DatasetReference"
},
"firstRowOnly": false
}
},
{
"name": "Filter1",
"type": "Filter",
"dependsOn": [
{
"activity": "Get Metadata1",
"dependencyConditions": [
"Succeeded"
]
}
],
"userProperties": [],
"typeProperties": {
"items": {
"value": "#activity('Lookup1').output.value",
"type": "Expression"
},
"condition": {
"value": "#equals(activity('Get Metadata1').output.columnCount, item().number)",
"type": "Expression"
}
}
},
{
"name": "Get Metadata1",
"type": "GetMetadata",
"dependsOn": [
{
"activity": "Lookup1",
"dependencyConditions": [
"Succeeded"
]
}
],
"policy": {
"timeout": "0.12:00:00",
"retry": 0,
"retryIntervalInSeconds": 30,
"secureOutput": false,
"secureInput": false
},
"userProperties": [],
"typeProperties": {
"dataset": {
"referenceName": "Sourcefile",
"type": "DatasetReference",
"parameters": {
"sourcefilename": {
"value": "#pipeline().parameters.tfilename",
"type": "Expression"
}
}
},
"fieldList": [
"columnCount"
],
"storeSettings": {
"type": "AzureBlobFSReadSettings",
"enablePartitionDiscovery": false
},
"formatSettings": {
"type": "DelimitedTextReadSettings"
}
}
},
{
"name": "Copy data1",
"type": "Copy",
"dependsOn": [
{
"activity": "Filter1",
"dependencyConditions": [
"Succeeded"
]
}
],
"policy": {
"timeout": "0.12:00:00",
"retry": 0,
"retryIntervalInSeconds": 30,
"secureOutput": false,
"secureInput": false
},
"userProperties": [],
"typeProperties": {
"source": {
"type": "DelimitedTextSource",
"storeSettings": {
"type": "AzureBlobFSReadSettings",
"recursive": true,
"enablePartitionDiscovery": false
},
"formatSettings": {
"type": "DelimitedTextReadSettings"
}
},
"sink": {
"type": "AzureSqlSink",
"writeBehavior": "insert",
"sqlWriterUseTableLock": false
},
"enableStaging": false,
"translator": {
"type": "TabularTranslator",
"typeConversion": true,
"typeConversionSettings": {
"allowDataTruncation": true,
"treatBooleanAsNumber": false
}
}
},
"inputs": [
{
"referenceName": "Sourcefile",
"type": "DatasetReference",
"parameters": {
"sourcefilename": {
"value": "#pipeline().parameters.tfilename",
"type": "Expression"
}
}
}
],
"outputs": [
{
"referenceName": "AzureSqlTable1",
"type": "DatasetReference",
"parameters": {
"schema": {
"value": "#activity('Filter1').output.Value[0].TABLE_SCHEMA",
"type": "Expression"
},
"table_name": {
"value": "#activity('Filter1').output.Value[0].TABLE_NAME",
"type": "Expression"
}
}
}
]
},
{
"name": "Delete1",
"type": "Delete",
"dependsOn": [
{
"activity": "Copy data1",
"dependencyConditions": [
"Succeeded"
]
}
],
"policy": {
"timeout": "0.12:00:00",
"retry": 0,
"retryIntervalInSeconds": 30,
"secureOutput": false,
"secureInput": false
},
"userProperties": [],
"typeProperties": {
"dataset": {
"referenceName": "Sourcefile",
"type": "DatasetReference",
"parameters": {
"sourcefilename": {
"value": "#pipeline().parameters.tfilename",
"type": "Expression"
}
}
},
"enableLogging": false,
"storeSettings": {
"type": "AzureBlobFSReadSettings",
"recursive": true,
"enablePartitionDiscovery": false
}
}
}
],
"parameters": {
"tfilename": {
"type": "string"
}
},
"variables": {
"sample": {
"type": "String"
}
},
"annotations": []
}
}
Result:

azure-data-factory waiting for source

I'm trying to copy a sample data from one SQL server DB to another.
For some reason the pipeline keeps waiting for source data.
When I'm looking at the source dataset, there were no slices created.
The following are my JSONS:
Destination table:
{
"name": "DestTable1",
"properties": {
"structure": [
{
"name": "C1",
"type": "Int16"
},
{
"name": "C2",
"type": "Int16"
},
{
"name": "C3",
"type": "String"
},
{
"name": "C4",
"type": "String"
}
],
"published": false,
"type": "SqlServerTable",
"linkedServiceName": "SqlServer2",
"typeProperties": {
"tableName": "OferTarget1"
},
"availability": {
"frequency": "Hour",
"interval": 1
}
}
}
Source Table:
{
"name": "SourceTable1",
"properties": {
"structure": [
{
"name": "C1",
"type": "Int16"
},
{
"name": "C2",
"type": "Int16"
},
{
"name": "C3",
"type": "String"
},
{
"name": "C4",
"type": "String"
}
],
"published": false,
"type": "SqlServerTable",
"linkedServiceName": "SqlServer",
"typeProperties": {
"tableName": "OferSource1"
},
"availability": {
"frequency": "Hour",
"interval": 1
},
"external": true,
"policy": { }
}
}
Pipeline:
{
"name": "CopyTablePipeline",
"properties": {
"description": "Copy data from source table to target table",
"activities": [
{
"type": "Copy",
"typeProperties": {
"source": {
"type": "SqlSource",
"sqlReaderQuery": "select c1,c2,c3,c4 from OferSource1"
},
"sink": {
"type": "SqlSink",
"writeBatchSize": 1000,
"writeBatchTimeout": "60.00:00:00"
}
},
"inputs": [
{
"name": "SourceTable1"
}
],
"outputs": [
{
"name": "DestTable1"
}
],
"policy": {
"timeout": "01:00:00",
"concurrency": 1
},
"scheduler": {
"frequency": "Hour",
"interval": 1
},
"name": "CopySqlToSql",
"description": "Demo Copy"
}
],
"start": "2017-10-22T09:55:00Z",
"end": "2017-10-22T13:55:00Z",
"isPaused": true,
"hubName": "wer-dev-datafactoryv1_hub",
"pipelineMode": "Scheduled"
}
}
I can see the process in the monitor view, but the pipeline is stuck and waiting for the source data to arrive.
What am I doing wrong?
Schedule can be a bit tricky initially. there are few reasons why a time slice might be waiting on a trigger
Activity Level
Source Properties
if the source is of type SqlServerTable then the property external should be true. I have personally fallen into the trap where I was copy pasting the JSON files and it took me a while to understand this. More literature is available here : https://github.com/aelij/azure-content/blob/master/articles/data-factory/data-factory-sqlserver-connector.md
Setting “external”: ”true” and specifying externalData policy information the Azure Data Factory service that this is a table that is external to the data factory and not produced by an activity in the data factory.
Concurrency (not likely in your case) : An activity can also be held up if multiple slices of the activity are valid in the specific time window. for example your start / end date is 01-01-2014 to 01-01-2015 for a monthly activity, if the concurrency is set to 4, 4 months will run in parallel while the rest are stuck with the message "Waiting on Concurrency"
Pipeline Level
Ensure that the DateTime.Now lies between the start and end accounting for the delay. More on how the scheduling of activities work is explained in this article https://blogs.msdn.microsoft.com/ukdataplatform/2016/05/03/demystifying-activity-scheduling-with-azure-data-factory/
Paused : a pipeline can be paused, in which case the time slice will appear in the monitor with the message "Waiting for the pipeline to Resume". You can either author the pipeline JSON and make paused : true or can even resume the pipeline by right clicking and hitting resume.
A good way to check when your next iteration is scheduled is by using the Monitor option

How to map a stored procedure's resulting data to the output dataset

Is it possible to map a stored procedure's resulting data to the output dataset using data factory.
I have the following:
{
"$schema": "http://datafactories.schema.management.azure.com/schemas/2015-09-01/Microsoft.DataFactory.Pipeline.json",
"name": "GetViewsByDateRange",
"properties": {
"description": "Some description here",
"activities": [
{
"name": "Stored Procedure Activity Template",
"type": "SqlServerStoredProcedure",
"inputs": [
{
"name": "InputDataset"
}
],
"outputs": [
{
"name": "OutputDataset"
}
],
"typeProperties": {
"storedProcedureName": "GetViewsByDateRange",
"storedProcedureParameters": {
"startdateid": "20170421",
"enddateid": "20170514"
},
"translator": {
"type": "TabularTranslator",
"ColumnMappings": "Id: Test_Id, ViewCount: TestViews"
}
},
"policy": {
"concurrency": 1,
"executionPriorityOrder": "OldestFirst",
"retry": 3,
"timeout": "01:00:00"
},
"scheduler": {
"frequency": "Minute",
"interval": "15"
}
}
],
"start": "2017-05-05T00:00:00Z",
"end": "2017-05-05T00:00:00Z"
}
}
But it returns this error:
15/05/2017 15:57:09- Failed to Create Pipeline GetViewsByDateRange
test-guid-test "message":"Input is malformed. Reason:
batchPipelineRequest.typeProperties.translator : Could not find member
'translator' on object of type 'SprocActivityParameters'. Path
'typeProperties.translator'.. ","code":"InputIsMalformedDetailed"
I'm afraid this is not supported. You will have to dump the result to storage for further process.

How to use stored procedure as input dataset in ADF (How to assign database it uses)

I want to run a stored procedure against a linkedservice (azure sql database) and output the result of that stored procedure to a dataset (azure sql database).
Is this possible?
I currently have ended up with this:
Pipeline: It should use a stored procedure that is found on a database defined as a linkedservice and copy that over to the output dataset (an azure sql database)
{
"$schema": "http://datafactories.schema.management.azure.com/schemas/2015-09-01/Microsoft.DataFactory.Pipeline.json",
"name": "CopyGetViewsByDateRange",
"properties": {
"description": "<Enter the Pipeline description here>",
"activities": [
{
"name": "CopyActivityTemplate",
"type": "Copy",
"inputs": [
{
"name": "InputDataset"
}
],
"outputs": [
{
"name": "OutputDataset"
}
],
"typeProperties": {
"source": {
"type": "SqlSource",
"sqlReaderStoredProcedureName": "Analytics_GetViewsByDateRange2",
"storedProcedureParameters": {
"clientid": { "value": "12345", "type": "Int" },
"startdateid": { "value": "20170421", "type": "Int" },
"enddateid": { "value": "20170514", "type": "Int" }
}
},
"sink": {
"type": "SqlSink"
}
},
"policy": {
"concurrency": 1,
"executionPriorityOrder": "OldestFirst",
"retry": 3,
"timeout": "01:00:00"
},
"scheduler": {
"frequency": "Minute",
"interval": "15"
}
}
],
"start": "2017-05-15T00:00:00Z",
"end": "2017-05-17T00:00:00Z"
}
}
Input dataset (Note the comments):
{
"$schema": "http://datafactories.schema.management.azure.com/schemas/2015-09-01/Microsoft.DataFactory.Table.json",
"name": "InputDataset",
"properties": {
"type": "AzureSqlTable", // This surely needs to be a stored procedure type
"linkedServiceName": "AnalyticsAMECDevDB",
"structure": [
{
"name": "Client_Id",
"type": "Int64"
},
{
"name": "DimDate_Id",
"type": "Int64"
},
{
"name": "TotalContentViews",
"type": "Int64"
} // The structure represents what the stored procedure is outputting
],
"typeProperties": {
"tableName": "Analytics.FactPageViews" // This is obviously not right
},
"availability": {
"frequency": "Minute",
"interval": "15"
},
"external": true
}
}
My stored procedure looks like this:
SELECT
#clientid as Client_Id,
[DimDateId] as DimDate_Id,
count(1) as TotalContentViews
FROM
[Analytics].[FactPageViews] as pageviews
inner join Analytics.DimPages as pages
on pageviews.dimpageid = pages.id
where
DimDateId between #startdateid and #enddateid
group by
dimdateid
order by
dimdateid
EDIT (got something to work atleast)
I am currently managing it by defining a query and running the command there:
"activities": [
{
"type": "Copy",
"typeProperties": {
"source": {
"type": "SqlSource",
"sqlReaderQuery": "$$Text.Format('EXEC [dbo].[GetViewsByDateRange] 2, 20170421, 20170514', WindowStart, WindowEnd)"
},
"sink": {
"type": "SqlSink",
"writeBatchSize": 0,
"writeBatchTimeout": "00:00:00"
},
"translator": {
"type": "TabularTranslator",
"columnMappings": "Client_Id:Client_Id,DimDate_Id:DimDate_Id,TotalContentViews:TotalContentViews"
}
},
"inputs": [
{
"name": "InputDataset-0af"
}
],
"outputs": [
{
"name": "OutputDataset-0af"
}
],
I think you've got everything right. To answer your question. Simply, you don't need to have an input dataset defined in your pipeline/activity. So yes, certainly possible.
Just have the output dataset defined as the result of the stored proc.
Hope this helps
I'm not sure this may help you to solve your problem,
Change your input and output dataset as below.
Input dataset
{
"$schema": "http://datafactories.schema.management.azure.com/schemas/2015-09-01/Microsoft.DataFactory.Table.json",
"name": "ofcspdataset",
"properties": {
"type": "AzureSqlTable",
"linkedServiceName": "sproctestout",
"typeProperties": {
"tableName": "dbo.emp" ==> >>need to refer any table be in the source database.
},
"external": true,
"availability": {
"frequency": "Day",
"interval": 1
}
}
}
Output Dataset:
{
"$schema": "http://datafactories.schema.management.azure.com/schemas/2015-09-01/Microsoft.DataFactory.Table.json",
"name": "OfficeTestOuputTable",
"properties": {
"published": false,
"type": "AzureSqlTable",
"linkedServiceName": "sproctestout",
"structure": [
{ "name": "Id" },
{ "name": "GroupId" }
],
"typeProperties": {
"tableName": "dbo.testAdf_temp"
},
"availability": {
"frequency": "Day",
"interval": 1
}
}
}
And I'm sure your pipeline is good. Just try to change the input and output dataset.
For me its works.

data factory multiple activities scheduled daily once not working

After doing multiple tests by making changes to the below pipeline I am posting this in this forum to seek help from the experts out there. The basic idea of the below pipeline is Activity-1 will do some computation by calling 'U-SQL' script that will output the result to a 'Data Lake store'. Now Activity-2 will take the data produced from Activity-1 and copy that data to 'Azure-Sql'. Both the activities are scheduled to run Daily once. However i don't see the pipeline to be triggered ever. If it is scheduled to run for every 15 Minutes it works fine, what am i doing wrong ?
{
"name": "IncrementalLoad_Pipeline",
"properties": {
"description": "This is a pipeline to to pick files from Data Lake as per the slice start date time.",
"activities": [
{
"type": "DataLakeAnalyticsU-SQL",
"typeProperties": {
"scriptPath": "andeblobcontainer\\script.usql",
"scriptLinkedService": "AzureStorageLinkedService",
"degreeOfParallelism": 3,
"priority": 100,
"parameters": {
"in": "$$Text.Format('/Input/SyncToCentralDataLog_{0:dd_MM_yyyy}.txt', Date.AddDays(SliceStart,-7))",
"out": "$$Text.Format('/Output/incremental_load/StcAnalytics_{0:dd_MM_yyyy}.tsv', Date.AddDays(SliceStart,-7))"
}
},
"inputs": [
{
"name": "IncrementalLoad_Input"
}
],
"outputs": [
{
"name": "IncrementalLoad_Output"
}
],
"scheduler": {
"frequency": "Day",
"interval": 1
},
"name": "IncrementalLoad",
"linkedServiceName": "AzureDataLakeAnalyticsLinkedService"
},
{
"type": "Copy",
"typeProperties": {
"source": {
"type": "AzureDataLakeStoreSource",
"recursive": false
},
"sink": {
"type": "SqlSink",
"writeBatchSize": 0,
"writeBatchTimeout": "00:00:00"
}
},
"inputs": [
{
"name": "IncrementalLoad_Input2"
},
{
"name": "IncrementalLoad_Output"
}
],
"outputs": [
{
"name": "AzureSQLDatasetOutput"
}
],
"scheduler": {
"frequency": "Day",
"interval": 1
},
"name": "CopyToAzureSql"
}
],
"start": "2016-09-12T23:45:00Z",
"end": "2016-09-13T01:00:00Z",
"isPaused": false,
"hubName": "vijaytest-datafactory_hub",
"pipelineMode": "Scheduled"
}
}
With the JSON you've provided above the start and end periods are not big enough. ADF can't provision a set of daily time slices for less than a days time frame.
Try increasing the start and end period to cover 1 week. Eg:
"start": "2016-09-12",
"end": "2016-09-18",
You should be able to extend the end date without dropping the pipeline.
Hope this helps.