How to map a stored procedure's resulting data to the output dataset - azure-data-factory

Is it possible to map a stored procedure's resulting data to the output dataset using data factory.
I have the following:
{
"$schema": "http://datafactories.schema.management.azure.com/schemas/2015-09-01/Microsoft.DataFactory.Pipeline.json",
"name": "GetViewsByDateRange",
"properties": {
"description": "Some description here",
"activities": [
{
"name": "Stored Procedure Activity Template",
"type": "SqlServerStoredProcedure",
"inputs": [
{
"name": "InputDataset"
}
],
"outputs": [
{
"name": "OutputDataset"
}
],
"typeProperties": {
"storedProcedureName": "GetViewsByDateRange",
"storedProcedureParameters": {
"startdateid": "20170421",
"enddateid": "20170514"
},
"translator": {
"type": "TabularTranslator",
"ColumnMappings": "Id: Test_Id, ViewCount: TestViews"
}
},
"policy": {
"concurrency": 1,
"executionPriorityOrder": "OldestFirst",
"retry": 3,
"timeout": "01:00:00"
},
"scheduler": {
"frequency": "Minute",
"interval": "15"
}
}
],
"start": "2017-05-05T00:00:00Z",
"end": "2017-05-05T00:00:00Z"
}
}
But it returns this error:
15/05/2017 15:57:09- Failed to Create Pipeline GetViewsByDateRange
test-guid-test "message":"Input is malformed. Reason:
batchPipelineRequest.typeProperties.translator : Could not find member
'translator' on object of type 'SprocActivityParameters'. Path
'typeProperties.translator'.. ","code":"InputIsMalformedDetailed"

I'm afraid this is not supported. You will have to dump the result to storage for further process.

Related

Get the list of names via Get Metadata activity

Below is the output of Get Metadata activity which contains name and type values for child items:
Is it possible to just get the name values and stored within an array variable without using any iteration.
Output = [csv1.csv,csv2.csv,csv3.csv,csv4.csv]
This was achieved via Foreach and append variable, we don't want to use iterations.
APPROACH 1 :
Using for each would be easier to complete the job. However, you can use string manipulation in the following way to get the desired result.
Store the output of get metadata child items in a variable as a string:
#string(activity('Get Metadata1').output.childItems)
Now replace all the unnecessary data with empty string '' using the following dynamic content:
#replace(replace(replace(replace(replace(replace(replace(replace(variables('tp'),'[{',''),'}]',''),'{',''),'}',''),'"type":"File"',''),'"',''),'name:',''),',,',',')
Now, ignore the last comma and split the above string with , as delimiter.
#split(substring(variables('ans'),0,sub(length(variables('ans')),1)),',')
APPROACH 2 :
Let's say your source has a combination of folders and files, and you want only the names of objects whose type is File in an array, then you can use the following approach. Here there is no need of for each, but you will have to use copy data and dataflows.
Create a copy data activity with a sample file with data like below:
Now create an additional column my_json with value as the following dynamic content:
#replace(string(activity('Get Metadata1').output.childItems),'"',pipeline().parameters.single_quote)
The following is the sink dataset configuration that I have taken:
In the mapping, just select this newly created column and remove the rest (demo) column.
Once this copy data executes, the file generated will be as shown below:
In dataflow, with the above file as source with settings as shown in the below image:
The data would be read as shown below:
Now, use aggregate transformation to group by the type column and collect() on name column.
The result would be as shown below:
Now, use conditional split to separate the file type data and folder type data with the condition type == 'File'
Now write the fileType data to sink cache. The data would look like this:
Back in the pipeline, use the following dynamic content to get the required array:
#activity('Data flow1').output.runStatus.output.sink1.value[0].array_of_types
Pipeline JSON for reference:
{
"name": "pipeline3",
"properties": {
"activities": [
{
"name": "Get Metadata1",
"type": "GetMetadata",
"dependsOn": [],
"policy": {
"timeout": "0.12:00:00",
"retry": 0,
"retryIntervalInSeconds": 30,
"secureOutput": false,
"secureInput": false
},
"userProperties": [],
"typeProperties": {
"dataset": {
"referenceName": "source1",
"type": "DatasetReference"
},
"fieldList": [
"childItems"
],
"storeSettings": {
"type": "AzureBlobFSReadSettings",
"recursive": true,
"enablePartitionDiscovery": false
},
"formatSettings": {
"type": "DelimitedTextReadSettings"
}
}
},
{
"name": "Copy data1",
"type": "Copy",
"dependsOn": [
{
"activity": "Get Metadata1",
"dependencyConditions": [
"Succeeded"
]
}
],
"policy": {
"timeout": "0.12:00:00",
"retry": 0,
"retryIntervalInSeconds": 30,
"secureOutput": false,
"secureInput": false
},
"userProperties": [],
"typeProperties": {
"source": {
"type": "DelimitedTextSource",
"additionalColumns": [
{
"name": "my_json",
"value": {
"value": "#replace(string(activity('Get Metadata1').output.childItems),'\"',pipeline().parameters.single_quote)",
"type": "Expression"
}
}
],
"storeSettings": {
"type": "AzureBlobFSReadSettings",
"recursive": true,
"enablePartitionDiscovery": false
},
"formatSettings": {
"type": "DelimitedTextReadSettings"
}
},
"sink": {
"type": "DelimitedTextSink",
"storeSettings": {
"type": "AzureBlobFSWriteSettings"
},
"formatSettings": {
"type": "DelimitedTextWriteSettings",
"quoteAllText": true,
"fileExtension": ".txt"
}
},
"enableStaging": false,
"translator": {
"type": "TabularTranslator",
"mappings": [
{
"source": {
"name": "my_json",
"type": "String"
},
"sink": {
"type": "String",
"physicalType": "String",
"ordinal": 1
}
}
],
"typeConversion": true,
"typeConversionSettings": {
"allowDataTruncation": true,
"treatBooleanAsNumber": false
}
}
},
"inputs": [
{
"referenceName": "csv1",
"type": "DatasetReference"
}
],
"outputs": [
{
"referenceName": "sink1",
"type": "DatasetReference"
}
]
},
{
"name": "Data flow1",
"type": "ExecuteDataFlow",
"dependsOn": [
{
"activity": "Copy data1",
"dependencyConditions": [
"Succeeded"
]
}
],
"policy": {
"timeout": "0.12:00:00",
"retry": 0,
"retryIntervalInSeconds": 30,
"secureOutput": false,
"secureInput": false
},
"userProperties": [],
"typeProperties": {
"dataflow": {
"referenceName": "dataflow2",
"type": "DataFlowReference"
},
"compute": {
"coreCount": 8,
"computeType": "General"
},
"traceLevel": "None"
}
},
{
"name": "Set variable2",
"type": "SetVariable",
"dependsOn": [
{
"activity": "Data flow1",
"dependencyConditions": [
"Succeeded"
]
}
],
"userProperties": [],
"typeProperties": {
"variableName": "req",
"value": {
"value": "#activity('Data flow1').output.runStatus.output.sink1.value[0].array_of_types",
"type": "Expression"
}
}
}
],
"parameters": {
"single_quote": {
"type": "string",
"defaultValue": "'"
}
},
"variables": {
"req": {
"type": "Array"
},
"tp": {
"type": "String"
},
"ans": {
"type": "String"
},
"req_array": {
"type": "Array"
}
},
"annotations": [],
"lastPublishTime": "2023-02-03T06:09:07Z"
},
"type": "Microsoft.DataFactory/factories/pipelines"
}
Dataflow JSON for reference:
{
"name": "dataflow2",
"properties": {
"type": "MappingDataFlow",
"typeProperties": {
"sources": [
{
"dataset": {
"referenceName": "Json3",
"type": "DatasetReference"
},
"name": "source1"
}
],
"sinks": [
{
"name": "sink1"
}
],
"transformations": [
{
"name": "aggregate1"
},
{
"name": "split1"
}
],
"scriptLines": [
"source(output(",
" name as string,",
" type as string",
" ),",
" allowSchemaDrift: true,",
" validateSchema: false,",
" ignoreNoFilesFound: false,",
" documentForm: 'arrayOfDocuments',",
" singleQuoted: true) ~> source1",
"source1 aggregate(groupBy(type),",
" array_of_types = collect(name)) ~> aggregate1",
"aggregate1 split(type == 'File',",
" disjoint: false) ~> split1#(fileType, folderType)",
"split1#fileType sink(validateSchema: false,",
" skipDuplicateMapInputs: true,",
" skipDuplicateMapOutputs: true,",
" store: 'cache',",
" format: 'inline',",
" output: true,",
" saveOrder: 1) ~> sink1"
]
}
}
}

ADF Column Name Validation and Data Validation

I am trying to add some validation to my ADF pipeline. Is there a way to achieve the following validation in ADF?
Validate column header and return error message. There is a list of required column names that I need to check against the raw Excel file. For example, the raw file might have column A,B,C,D, but the required columns are A,B,E. So is there a way to validate and return an error message that the column E is missing in the raw file?
Validate the data type in data mapping flow, if column A should be a numeric field but some of the cells have text in it, or column B should be datetime type but has a number in it. Is there a way to check values in each row and return error message if the data validation fails on that row?
Adding to #Nandan, you can use Get Meta data activity structure like below.
This is my repro for your reference:
First, I have used 2 parameters for column names and Data types.
Get Meta data activity:
Get Meta activity output array:
Then I have created two arrays to get the above names and columns using forEach.
Then I have used two filter activities to filter the above parameter arrays.
The used if activity to check the parameter arrays length and filter activity output arrays lengths.
If its true, the inside True activities you can use your copy activity or Data flow as per your requirement. Inside False activities, use a fail activity.
My pipeline JSON:
{
"name": "pipeline1",
"properties": {
"activities": [
{
"name": "Get Metadata1",
"type": "GetMetadata",
"dependsOn": [],
"policy": {
"timeout": "0.12:00:00",
"retry": 0,
"retryIntervalInSeconds": 30,
"secureOutput": false,
"secureInput": false
},
"userProperties": [],
"typeProperties": {
"dataset": {
"referenceName": "Excel1",
"type": "DatasetReference"
},
"fieldList": [
"structure"
],
"storeSettings": {
"type": "AzureBlobFSReadSettings",
"enablePartitionDiscovery": false
}
}
},
{
"name": "Filtering names",
"type": "Filter",
"dependsOn": [
{
"activity": "Getting names and columns as list",
"dependencyConditions": [
"Succeeded"
]
}
],
"userProperties": [],
"typeProperties": {
"items": {
"value": "#pipeline().parameters.names",
"type": "Expression"
},
"condition": {
"value": "#contains(variables('namesvararray'),item())",
"type": "Expression"
}
}
},
{
"name": "Filtering types",
"type": "Filter",
"dependsOn": [
{
"activity": "Filtering names",
"dependencyConditions": [
"Succeeded"
]
}
],
"userProperties": [],
"typeProperties": {
"items": {
"value": "#variables('typevararray')",
"type": "Expression"
},
"condition": {
"value": "#contains(variables('typevararray'), item())",
"type": "Expression"
}
}
},
{
"name": "Getting names and columns as list",
"type": "ForEach",
"dependsOn": [
{
"activity": "Get Metadata1",
"dependencyConditions": [
"Succeeded"
]
}
],
"userProperties": [],
"typeProperties": {
"items": {
"value": "#activity('Get Metadata1').output.structure",
"type": "Expression"
},
"isSequential": true,
"activities": [
{
"name": "Append names",
"type": "AppendVariable",
"dependsOn": [],
"userProperties": [],
"typeProperties": {
"variableName": "namesvararray",
"value": {
"value": "#item().name",
"type": "Expression"
}
}
},
{
"name": "Append types",
"type": "AppendVariable",
"dependsOn": [
{
"activity": "Append names",
"dependencyConditions": [
"Succeeded"
]
}
],
"userProperties": [],
"typeProperties": {
"variableName": "typevararray",
"value": {
"value": "#item().type",
"type": "Expression"
}
}
}
]
}
},
{
"name": "If Condition1",
"type": "IfCondition",
"dependsOn": [
{
"activity": "Filtering types",
"dependencyConditions": [
"Succeeded"
]
}
],
"userProperties": [],
"typeProperties": {
"expression": {
"value": "#and(equals(length(pipeline().parameters.names),activity('Filtering names').output.FilteredItemsCount),equals(length(pipeline().parameters.columns),activity('Filtering types').output.FilteredItemsCount))",
"type": "Expression"
},
"ifFalseActivities": [
{
"name": "Fail1",
"type": "Fail",
"dependsOn": [],
"userProperties": [],
"typeProperties": {
"message": "Some of the headers or types are not as required",
"errorCode": "240"
}
}
],
"ifTrueActivities": [
{
"name": "Set variable1",
"type": "SetVariable",
"dependsOn": [],
"userProperties": [],
"typeProperties": {
"variableName": "sample",
"value": "All good"
}
}
]
}
}
],
"parameters": {
"names": {
"type": "array",
"defaultValue": [
"A",
"B",
"C"
]
},
"columns": {
"type": "array",
"defaultValue": [
"String",
"String",
"String"
]
}
},
"variables": {
"namesvararray": {
"type": "Array"
},
"typevararray": {
"type": "Array"
},
"sample": {
"type": "String"
}
},
"annotations": []
}
}
My pipeline failed and got error:
you can use a lookup activity on the dataset and return 1st row(with dataset header property disabled)this would give you the list of columns present in the excel file which you can then compare against the expected values, if the values/sequence match you can proceed further else you can thro error.
Note: you can also use Get meta data activity to get the column details
For data type, you can use column patterns in dataflows:
https://learn.microsoft.com/en-us/azure/data-factory/concepts-data-flow-column-pattern
#rakeshGovindula: any more thoughts?

Azure Data Factory Dataset with # in column name

I have a Dataset coming from a Rest webservice having an # in the column name:
Like:
{
data[{
#id : 1,
#value : "a"
}, {
#id : 2,
#value : "b"
}
]
}
i want to use it in a foreach and access the specific column:
in the foreach i get the output like #activity('Lookup').output.value
in the foreach there is a stored procedure
as parameter input i tried to get the column: i tried #item().#value but got the error "the string character '#' at position 'xx' is not expected".
is there a way to escape the # in the column name? or can i rename the column?
Thank you very much
edit:
here is the JSON from the ADF pipeline:
{
"name": "pipeline3",
"properties": {
"activities": [
{
"name": "Lookup1",
"type": "Lookup",
"policy": {
"timeout": "7.00:00:00",
"retry": 0,
"retryIntervalInSeconds": 30,
"secureOutput": false,
"secureInput": false
},
"typeProperties": {
"source": {
"type": "HttpSource",
"httpRequestTimeout": "00:01:40"
},
"dataset": {
"referenceName": "HttpFile1",
"type": "DatasetReference"
},
"firstRowOnly": false
}
},
{
"name": "ForEach2",
"type": "ForEach",
"dependsOn": [
{
"activity": "Lookup1",
"dependencyConditions": [
"Succeeded"
]
}
],
"typeProperties": {
"items": {
"value": "#activity('Lookup1').output.value",
"type": "Expression"
},
"activities": [
{
"name": "Stored Procedure12",
"type": "SqlServerStoredProcedure",
"policy": {
"timeout": "7.00:00:00",
"retry": 0,
"retryIntervalInSeconds": 30,
"secureOutput": false,
"secureInput": false
},
"typeProperties": {
"storedProcedureName": "[dbo].[testnv]",
"storedProcedureParameters": {
"d": {
"value": {
"value": "#item().#accno",
"type": "Expression"
},
"type": "String"
}
}
},
"linkedServiceName": {
"referenceName": "AzureSqlDatabase1",
"type": "LinkedServiceReference"
}
}
]
}
}
]
},
"type": "Microsoft.DataFactory/factories/pipelines"
}
Please try "#item()['#accno']" for #item().#accno
Also replied in MSDN.

Unable to create TWO Data Factory Pipeline activities pointing to same Target Dataset

I am looking for a solution to load the data from SQL DW DMVs from 2 different database into single table on one SQL DW table.
I went with a ADF Pipeline activity - which helps in loading the data for every 15 minutes, But I am seeing a issue - when I create two activities into one pipeline in which it has 2 different source (Input dataset) but both loads the data into same target ( Output dataset).
I also wanted to make sure - I build a dependency between the activities , so that they wont run at the same time. Activity 2 starts only after Activity 1 is completed/not-running.
My ADF Code is as below:
{
"name": "Execution_Requests_Hist",
"properties": {
"description": "Execution Requests history data",
"activities": [
{
"type": "Copy",
"typeProperties": {
"source": {
"type": "SqlDWSource",
"sqlReaderQuery": "select * from dm_pdw_exec_requests_hist_view"
},
"sink": {
"type": "SqlDWSink",
"writeBatchSize": 0,
"writeBatchTimeout": "00:00:00"
},
"translator": {
"type": "TabularTranslator",
"columnMappings": "request_id:request_id,session_id:session_id,status:status,submit_time:submit_time,start_time:start_time,end_compile_time:end_compile_time,total_elapsed_time:total_elapsed_time,end_time:end_time,label:label,error_id:error_id,command:command,resource_class:resource_class,database_id:database_id,login_name:login_name,app_name:app_name,client_id:client_id,DMV_Source:DMV_Source,source:source,type:type,create_time:create_time,details:details"
},
"enableSkipIncompatibleRow": true
},
"inputs": [
{
"name": "ID_Exec_Requests"
}
],
"outputs": [
{
"name": "OD_Exec_Requests"
}
],
"policy": {
"timeout": "1.00:00:00",
"concurrency": 1,
"executionPriorityOrder": "NewestFirst",
"style": "StartOfInterval",
"retry": 3,
"longRetry": 0,
"longRetryInterval": "00:00:00"
},
"scheduler": {
"frequency": "Minute",
"interval": 15
},
"name": "PRD_DMV_Load"
},
{
"type": "Copy",
"typeProperties": {
"source": {
"type": "SqlDWSource",
"sqlReaderQuery": "select * from dm_pdw_exec_requests_hist_view"
},
"sink": {
"type": "SqlDWSink",
"writeBatchSize": 0,
"writeBatchTimeout": "00:00:00"
},
"translator": {
"type": "TabularTranslator",
"columnMappings": "request_id:request_id,session_id:session_id,status:status,submit_time:submit_time,start_time:start_time,end_compile_time:end_compile_time,total_elapsed_time:total_elapsed_time,end_time:end_time,label:label,error_id:error_id,command:command,resource_class:resource_class,database_id:database_id,login_name:login_name,app_name:app_name,client_id:client_id,DMV_Source:DMV_Source,source:source,type:type,create_time:create_time,details:details"
},
"enableSkipIncompatibleRow": true
},
"inputs": [
{
"name": "OD_Exec_Requests",
"name": "ITG_Exec_Requests"
}
],
"outputs": [
{
"name": "OD_Exec_Requests"
}
],
"policy": {
"timeout": "1.00:00:00",
"concurrency": 1,
"executionPriorityOrder": "NewestFirst",
"style": "StartOfInterval",
"retry": 3,
"longRetry": 0,
"longRetryInterval": "00:00:00"
},
"scheduler": {
"frequency": "Minute",
"interval": 15
},
"name": "ITG_DMV_Load"
}
],
"start": "2017-08-20T04:22:00Z",
"end": "2018-08-20T04:22:00Z",
"isPaused": false,
"hubName": "xyz-adf_hub",
"pipelineMode": "Scheduled"
}
}
When I try to deploy this - Its giving below error message:
Error Activities 'PRD_DMV_Load' and 'ITG_DMV_Load' have the same
output Dataset 'OD_Exec_Requests'. Two activities cannot output the
same Dataset over the same active period.
How can I resolve this? Can I say - run ITG_DMV_Load only after PRD_DMV_Load is completed?
You have two issues here.
You cannot produce the same dataset slice from two different activities/pipelines. To workaround this one you can create another dataset which will point on the same table, but from ADF perspective this will be different sink. You also need to move your second activity into separate pipeline configuration (so you end up with one activity per pipeline).
You need to somehow order your pipelines. I see two possible ways:
you can try using scheduler configuration options - e.q. you can use offset property (or style) to schedule one pipeline in the middle of the interval:
For example, if first pipeline is configured like this:
"scheduler": {
"frequency": "Minute",
"interval": 15
},
Configure the second like this:
"scheduler": {
"frequency": "Minute",
"interval": 15,
"offset" : 5
},
This approach may require some tuning depends on how long does it take your pipeline to complete.
Another approach is to specify output of first pipeline as input of second. In this case second activity won't start till first one is completed. The schedule of activities must match in this case (i.e. both should have same scheduler.frequency and scheduler.interval).
As #arghtype says you cannot use the same ADF dataset in two active pipelines or activities. You will need to create a second, identical output dataset for the ITG_DMV_Load but you do not have to split the pipeline. You can ensure the second activity does not run until the first is completed by making the output of the first a secondary input to the second. I would suggest something like this...
{
"name": "Execution_Requests_Hist",
"properties": {
"description": "Execution Requests history data",
"activities": [
{
"type": "Copy",
"typeProperties": {
"source": {
"type": "SqlDWSource",
"sqlReaderQuery": "select * from dm_pdw_exec_requests_hist_view"
},
"sink": {
"type": "SqlDWSink",
"writeBatchSize": 0,
"writeBatchTimeout": "00:00:00"
},
"translator": {
"type": "TabularTranslator",
"columnMappings": "request_id:request_id,session_id:session_id,status:status,submit_time:submit_time,start_time:start_time,end_compile_time:end_compile_time,total_elapsed_time:total_elapsed_time,end_time:end_time,label:label,error_id:error_id,command:command,resource_class:resource_class,database_id:database_id,login_name:login_name,app_name:app_name,client_id:client_id,DMV_Source:DMV_Source,source:source,type:type,create_time:create_time,details:details"
},
"enableSkipIncompatibleRow": true
},
"inputs": [
{
"name": "ID_Exec_Requests"
}
],
"outputs": [
{
"name": "OD_Exec_Requests_PRD"
}
],
"policy": {
"timeout": "1.00:00:00",
"concurrency": 1,
"executionPriorityOrder": "NewestFirst",
"style": "StartOfInterval",
"retry": 3,
"longRetry": 0,
"longRetryInterval": "00:00:00"
},
"scheduler": {
"frequency": "Minute",
"interval": 15
},
"name": "PRD_DMV_Load"
},
{
"type": "Copy",
"typeProperties": {
"source": {
"type": "SqlDWSource",
"sqlReaderQuery": "select * from dm_pdw_exec_requests_hist_view"
},
"sink": {
"type": "SqlDWSink",
"writeBatchSize": 0,
"writeBatchTimeout": "00:00:00"
},
"translator": {
"type": "TabularTranslator",
"columnMappings": "request_id:request_id,session_id:session_id,status:status,submit_time:submit_time,start_time:start_time,end_compile_time:end_compile_time,total_elapsed_time:total_elapsed_time,end_time:end_time,label:label,error_id:error_id,command:command,resource_class:resource_class,database_id:database_id,login_name:login_name,app_name:app_name,client_id:client_id,DMV_Source:DMV_Source,source:source,type:type,create_time:create_time,details:details"
},
"enableSkipIncompatibleRow": true
},
"inputs": [
{
"name": "ITG_Exec_Requests",
"name": "OD_Exec_Requests_PRD"
}
],
"outputs": [
{
"name": "OD_Exec_Requests_ITG"
}
],
"policy": {
"timeout": "1.00:00:00",
"concurrency": 1,
"executionPriorityOrder": "NewestFirst",
"style": "StartOfInterval",
"retry": 3,
"longRetry": 0,
"longRetryInterval": "00:00:00"
},
"scheduler": {
"frequency": "Minute",
"interval": 15
},
"name": "ITG_DMV_Load"
}
],
"start": "2017-08-20T04:22:00Z",
"end": "2018-08-20T04:22:00Z",
"isPaused": false,
"hubName": "xyz-adf_hub",
"pipelineMode": "Scheduled"
}

How to use stored procedure as input dataset in ADF (How to assign database it uses)

I want to run a stored procedure against a linkedservice (azure sql database) and output the result of that stored procedure to a dataset (azure sql database).
Is this possible?
I currently have ended up with this:
Pipeline: It should use a stored procedure that is found on a database defined as a linkedservice and copy that over to the output dataset (an azure sql database)
{
"$schema": "http://datafactories.schema.management.azure.com/schemas/2015-09-01/Microsoft.DataFactory.Pipeline.json",
"name": "CopyGetViewsByDateRange",
"properties": {
"description": "<Enter the Pipeline description here>",
"activities": [
{
"name": "CopyActivityTemplate",
"type": "Copy",
"inputs": [
{
"name": "InputDataset"
}
],
"outputs": [
{
"name": "OutputDataset"
}
],
"typeProperties": {
"source": {
"type": "SqlSource",
"sqlReaderStoredProcedureName": "Analytics_GetViewsByDateRange2",
"storedProcedureParameters": {
"clientid": { "value": "12345", "type": "Int" },
"startdateid": { "value": "20170421", "type": "Int" },
"enddateid": { "value": "20170514", "type": "Int" }
}
},
"sink": {
"type": "SqlSink"
}
},
"policy": {
"concurrency": 1,
"executionPriorityOrder": "OldestFirst",
"retry": 3,
"timeout": "01:00:00"
},
"scheduler": {
"frequency": "Minute",
"interval": "15"
}
}
],
"start": "2017-05-15T00:00:00Z",
"end": "2017-05-17T00:00:00Z"
}
}
Input dataset (Note the comments):
{
"$schema": "http://datafactories.schema.management.azure.com/schemas/2015-09-01/Microsoft.DataFactory.Table.json",
"name": "InputDataset",
"properties": {
"type": "AzureSqlTable", // This surely needs to be a stored procedure type
"linkedServiceName": "AnalyticsAMECDevDB",
"structure": [
{
"name": "Client_Id",
"type": "Int64"
},
{
"name": "DimDate_Id",
"type": "Int64"
},
{
"name": "TotalContentViews",
"type": "Int64"
} // The structure represents what the stored procedure is outputting
],
"typeProperties": {
"tableName": "Analytics.FactPageViews" // This is obviously not right
},
"availability": {
"frequency": "Minute",
"interval": "15"
},
"external": true
}
}
My stored procedure looks like this:
SELECT
#clientid as Client_Id,
[DimDateId] as DimDate_Id,
count(1) as TotalContentViews
FROM
[Analytics].[FactPageViews] as pageviews
inner join Analytics.DimPages as pages
on pageviews.dimpageid = pages.id
where
DimDateId between #startdateid and #enddateid
group by
dimdateid
order by
dimdateid
EDIT (got something to work atleast)
I am currently managing it by defining a query and running the command there:
"activities": [
{
"type": "Copy",
"typeProperties": {
"source": {
"type": "SqlSource",
"sqlReaderQuery": "$$Text.Format('EXEC [dbo].[GetViewsByDateRange] 2, 20170421, 20170514', WindowStart, WindowEnd)"
},
"sink": {
"type": "SqlSink",
"writeBatchSize": 0,
"writeBatchTimeout": "00:00:00"
},
"translator": {
"type": "TabularTranslator",
"columnMappings": "Client_Id:Client_Id,DimDate_Id:DimDate_Id,TotalContentViews:TotalContentViews"
}
},
"inputs": [
{
"name": "InputDataset-0af"
}
],
"outputs": [
{
"name": "OutputDataset-0af"
}
],
I think you've got everything right. To answer your question. Simply, you don't need to have an input dataset defined in your pipeline/activity. So yes, certainly possible.
Just have the output dataset defined as the result of the stored proc.
Hope this helps
I'm not sure this may help you to solve your problem,
Change your input and output dataset as below.
Input dataset
{
"$schema": "http://datafactories.schema.management.azure.com/schemas/2015-09-01/Microsoft.DataFactory.Table.json",
"name": "ofcspdataset",
"properties": {
"type": "AzureSqlTable",
"linkedServiceName": "sproctestout",
"typeProperties": {
"tableName": "dbo.emp" ==> >>need to refer any table be in the source database.
},
"external": true,
"availability": {
"frequency": "Day",
"interval": 1
}
}
}
Output Dataset:
{
"$schema": "http://datafactories.schema.management.azure.com/schemas/2015-09-01/Microsoft.DataFactory.Table.json",
"name": "OfficeTestOuputTable",
"properties": {
"published": false,
"type": "AzureSqlTable",
"linkedServiceName": "sproctestout",
"structure": [
{ "name": "Id" },
{ "name": "GroupId" }
],
"typeProperties": {
"tableName": "dbo.testAdf_temp"
},
"availability": {
"frequency": "Day",
"interval": 1
}
}
}
And I'm sure your pipeline is good. Just try to change the input and output dataset.
For me its works.