Azure Data Factory V2 Dataset Dynamic Folder - azure-data-factory

In Azure Data Factory (V1) I was able to create a slide and store the output to a specific folder (i.e. {Year}/{Month}/{Day}. See code below.
How do you create the same type of slice in Azure Data Factory V2? I did find that you have to create a paramater. Yes, I was unable to figure out how to pass the parameter.
"folderPath": "#{dataset().path}",
"parameters": {
"path": {
"type": "String"
Here is original ADF V1 code.
{
"name": "EMS_EMSActivations_L1_Snapshot",
"properties": {
"published": false,
"type": "AzureDataLakeStore",
"linkedServiceName": "SalesIntelligence_ADLS_LS",
"typeProperties": {
"fileName": "EMS.FACT_EMSActivations_WTA.tsv",
"folderPath": "/Snapshots/EMS/FACT_EMSActivations_WTA/{Year}/{Month}/{Day}",
"format": {
"type": "TextFormat",
"rowDelimiter": "␀",
"columnDelimiter": "\t",
"nullValue": "#NULL#",
"quoteChar": "\""
},
"partitionedBy": [
{
"name": "Year",
"value": {
"type": "DateTime",
"date": "SliceStart",
"format": "yyyy"
}
},
{
"name": "Month",
"value": {
"type": "DateTime",
"date": "SliceStart",
"format": "MM"
}
},
{
"name": "Day",
"value": {
"type": "DateTime",
"date": "SliceStart",
"format": "dd"
}
},
{
"name": "Hour",
"value": {
"type": "DateTime",
"date": "SliceStart",
"format": "HH"
}
},
{
"name": "Minute",
"value": {
"type": "DateTime",
"date": "SliceStart",
"format": "mm"
}
}
]
},
"availability": {
"frequency": "Day",
"interval": 1
}
}
}

Here is how you create a dynamic folder path when importing data from SQL into ADL. Look at folderPath line.
{
"name": "EBC_BriefingActivitySummary_L1_Snapshot",
"properties": {
"linkedServiceName": {
"referenceName": "SIAzureDataLakeStore",
"type": "LinkedServiceReference"
},
"type": "AzureDataLakeStoreFile",
"typeProperties": {
"format": {
"type": "TextFormat",
"columnDelimiter": ",",
"rowDelimiter": "",
"nullValue": "\\N",
"treatEmptyAsNull": false,
"firstRowAsHeader": false
},
"fileName": {
"value": "EBC.rpt_BriefingActivitySummary.tsv",
"type": "Expression"
},
"folderPath": {
"value": "#concat('/Snapshots/EBC/rpt_BriefingActivitySummary/', formatDateTime(pipeline().parameters.scheduledRunTime, 'yyyy'), '/', formatDateTime(pipeline().parameters.scheduledRunTime, 'MM'), '/', formatDateTime(pipeline().parameters.scheduledRunTime, 'dd'), '/')",
"type": "Expression"
}
}
}
}

Step 1:
Use WindowStartTime / WindowEndTime in folderpath
"folderPath": {
"value": "<<path>>/#{formatDateTime(pipeline().parameters.windowStart,'yyyy')}-#{formatDateTime(pipeline().parameters.windowStart,'MM')}-#{formatDateTime(pipeline().parameters.windowStart,'dd')}/#{formatDateTime(pipeline().parameters.windowStart,'HH')}/",
"type": "Expression"
}
Step2 : Add in Pipeline JSON
"parameters": {
"windowStart": {
"type": "String"
},
"windowEnd": {
"type": "String"
}
}
Step3 : Add Run Parameter in TumblingWindow Trigger
( This is referred in Step 2 )
"parameters": {
"windowStart": {
"type": "Expression",
"value": "#trigger().outputs.windowStartTime"
},
"windowEnd": {
"type": "Expression",
"value": "#trigger().outputs.windowEndTime"
}
}
For more details to understand , Refer
Refer this link.
https://github.com/MicrosoftDocs/azure-docs/blob/master/articles/data-factory/how-to-create-tumbling-window-trigger.md

Related

NULL being passed from pipeline to linked service baseURL

I have a pipeline that will iterate over files and copy them to a storage location.
The baseURL and relativeURL are stored in a json file.
I can read in this file and it is valid.
I have parameterized the linked service baseURL and this works when testing from the linked service, and from the dataset.
When I try to debug the pipeline however, I get an error:
"code":"BadRequest"
"message":null
"target":"pipeline//runid/310b8ac1-2ce6-4c7c-a1ad-433ee9019e9b"
"details":null
"error":null
It appears that from the activity in the pipeline, a null value is being passed instead of the baseURL.
I have iterated over the values from my config file and it is being read and the values are correct. It really seems like the pipeline is not passing in the correct value for baseURL.
Do I have to modify the json code behind the pipeline to get this to work?
If it helps, the json for the linked service, data set and pipeline are below:
--Linked Service:
{
"name": "ls_http_opendata_ecdc_europe_eu",
"properties": {
"parameters": {
"baseURL": {
"type": "string"
}
},
"annotations": [],
"type": "HttpServer",
"typeProperties": {
"url": "#linkedService().baseURL",
"enableServerCertificateValidation": true,
"authenticationType": "Anonymous"
}
}
}
--dataset
{
"name": "ds_ecdc_raw_csv_http",
"properties": {
"linkedServiceName": {
"referenceName": "ls_http_opendata_ecdc_europe_eu",
"type": "LinkedServiceReference"
},
"parameters": {
"relativeURL": {
"type": "string"
},
"baseURL": {
"type": "string"
}
},
"annotations": [],
"type": "DelimitedText",
"typeProperties": {
"location": {
"type": "HttpServerLocation",
"relativeUrl": {
"value": "#dataset().relativeURL",
"type": "Expression"
}
},
"columnDelimiter": ",",
"escapeChar": "\\",
"firstRowAsHeader": true,
"quoteChar": "\""
},
"schema": []
}
}
--pipeline
{
"name": "pl_ingest_ecdc_data",
"properties": {
"activities": [
{
"name": "lookup ecdc filelist",
"type": "Lookup",
"dependsOn": [],
"policy": {
"timeout": "7.00:00:00",
"retry": 0,
"retryIntervalInSeconds": 30,
"secureOutput": false,
"secureInput": false
},
"userProperties": [],
"typeProperties": {
"source": {
"type": "JsonSource",
"storeSettings": {
"type": "AzureBlobStorageReadSettings",
"recursive": true,
"enablePartitionDiscovery": false
},
"formatSettings": {
"type": "JsonReadSettings"
}
},
"dataset": {
"referenceName": "ds_ecdc_file_list",
"type": "DatasetReference"
},
"firstRowOnly": false
}
},
{
"name": "execute copy for every record",
"type": "ForEach",
"dependsOn": [
{
"activity": "lookup ecdc filelist",
"dependencyConditions": [
"Succeeded"
]
}
],
"userProperties": [],
"typeProperties": {
"items": {
"value": "#activity('lookup ecdc filelist').output.value",
"type": "Expression"
},
"activities": [
{
"name": "Copy data1",
"type": "Copy",
"dependsOn": [],
"policy": {
"timeout": "7.00:00:00",
"retry": 0,
"retryIntervalInSeconds": 30,
"secureOutput": false,
"secureInput": false
},
"userProperties": [],
"typeProperties": {
"source": {
"type": "DelimitedTextSource",
"storeSettings": {
"type": "HttpReadSettings",
"requestMethod": "GET"
},
"formatSettings": {
"type": "DelimitedTextReadSettings"
}
},
"sink": {
"type": "DelimitedTextSink",
"storeSettings": {
"type": "AzureBlobFSWriteSettings"
},
"formatSettings": {
"type": "DelimitedTextWriteSettings",
"quoteAllText": true,
"fileExtension": ".txt"
}
},
"enableStaging": false,
"translator": {
"type": "TabularTranslator",
"typeConversion": true,
"typeConversionSettings": {
"allowDataTruncation": true,
"treatBooleanAsNumber": false
}
}
},
"inputs": [
{
"referenceName": "DelimitedText1",
"type": "DatasetReference",
"parameters": {
"sourceBaseURL": {
"value": "#item().sourceBaseURL",
"type": "Expression"
},
"sourceRelativeURL": {
"value": "#item().sourceRelativeURL",
"type": "Expression"
}
}
}
],
"outputs": [
{
"referenceName": "ds_ecdc_raw_csv_dl",
"type": "DatasetReference",
"parameters": {
"fileName": {
"value": "#item().sinkFileName",
"type": "Expression"
}
}
}
]
}
]
}
}
],
"concurrency": 1,
"annotations": []
}
}
I reproduced your error.
{"code":"BadRequest","message":null,"target":"pipeline//runid/abd35329-3625-490b-85cf-f6d0de3dac86","details":null,"error":null}
It is because you didn't pass your baseURL to link service in Source Dataset. Please do this:
And Dataset JSON code should be like this:
{
"name": "ds_ecdc_raw_csv_http",
"properties": {
"linkedServiceName": {
"referenceName": "ls_http_opendata_ecdc_europe_eu",
"type": "LinkedServiceReference",
"parameters": {
"baseURL": {
"value": "#dataset().baseURL",
"type": "Expression"
}
}
},
"parameters": {
"relativeURL": {
"type": "string"
},
"baseURL": {
"type": "string"
}
},
"annotations": [],
"type": "DelimitedText",
"typeProperties": {
"location": {
"type": "HttpServerLocation",
"relativeUrl": {
"value": "#dataset().relativeURL",
"type": "Expression"
}
},
"columnDelimiter": ",",
"escapeChar": "\\",
"firstRowAsHeader": true,
"quoteChar": "\""
},
"schema": []
}
}

Getting error on null and empty string while copying a csv file from blob container to Azure SQL DB

I tried all combination on the datatype of my data but each time my data factory pipeline is giving me this error:
{
"errorCode": "2200",
"message": "ErrorCode=UserErrorColumnNameNotAllowNull,'Type=Microsoft.DataTransfer.Common.Shared.HybridDeliveryException,Message=Empty or Null string found in Column Name 2. Please make sure column name not null and try again.,Source=Microsoft.DataTransfer.Common,'",
"failureType": "UserError",
"target": "xxx",
"details": []
}
My Copy data source code is something like this:{
"name": "xxx",
"description": "uuu",
"type": "Copy",
"dependsOn": [],
"policy": {
"timeout": "7.00:00:00",
"retry": 0,
"retryIntervalInSeconds": 30,
"secureOutput": false,
"secureInput": false
},
"userProperties": [],
"typeProperties": {
"source": {
"type": "DelimitedTextSource",
"storeSettings": {
"type": "AzureBlobStorageReadSettings",
"recursive": true,
"wildcardFileName": "*"
},
"formatSettings": {
"type": "DelimitedTextReadSettings"
}
},
"sink": {
"type": "AzureSqlSink"
},
"enableStaging": false,
"translator": {
"type": "TabularTranslator",
"mappings": [
{
"source": {
"name": "populationId",
"type": "Guid"
},
"sink": {
"name": "PopulationID",
"type": "String"
}
},
{
"source": {
"name": "inputTime",
"type": "DateTime"
},
"sink": {
"name": "inputTime",
"type": "DateTime"
}
},
{
"source": {
"name": "inputCount",
"type": "Decimal"
},
"sink": {
"name": "inputCount",
"type": "Decimal"
}
},
{
"source": {
"name": "inputBiomass",
"type": "Decimal"
},
"sink": {
"name": "inputBiomass",
"type": "Decimal"
}
},
{
"source": {
"name": "inputNumber",
"type": "Decimal"
},
"sink": {
"name": "inputNumber",
"type": "Decimal"
}
},
{
"source": {
"name": "utcOffset",
"type": "String"
},
"sink": {
"name": "utcOffset",
"type": "Int32"
}
},
{
"source": {
"name": "fishGroupName",
"type": "String"
},
"sink": {
"name": "fishgroupname",
"type": "String"
}
},
{
"source": {
"name": "yearClass",
"type": "String"
},
"sink": {
"name": "yearclass",
"type": "String"
}
}
]
}
},
"inputs": [
{
"referenceName": "DelimitedTextFTDimensions",
"type": "DatasetReference"
}
],
"outputs": [
{
"referenceName": "AzureSqlTable1",
"type": "DatasetReference"
}
]
}
Can anyone please help me understand the issue. I see in some blogs they ask me use treatnullasempty but I am not allowed to modify the JSON. is there a way to do that??
I suggest to using Data Flow DerivedColumn, DerivedColumn can help you build expression to replace the null column.
For example:
Derived Column, if Column_2 is null =true, return 'dd' :
iifNull(Column_2,'dd')
Mapping the column
Reference: Data transformation expressions in mapping data flow
Hope this helps.
fixed it.it was a easy fix as one of my column in destination was marked as not null, i changed it as null and it worked.

azure data factory start pipeline different from starting job

I am getting crazy on this issue, I am running an Azure data factory V1, I need to schedule a copy job every week from 01/03/2009 through 01/31/2009, so I defined this schedule on the pipeline:
"start": "2009-01-03T00:00:00Z",
"end": "2009-01-31T00:00:00Z",
"isPaused": false,
monitoring the pipeline, the data factory schedule on these date:
12/29/2008
01/05/2009
01/12/2009
01/19/2009
01/26/2009
instead of this wanted schedule:
01/03/2009
01/10/2009
01/17/2009
01/24/2009
01/31/2009
why the starting date defined on the pipeline doesn't correspond to the schedule date on the monitor?
Many thanks!
Here is the JSON Pipeline:
{
"name": "CopyPipeline-blob2datalake",
"properties": {
"description": "copy from blob storage to datalake directory structure",
"activities": [
{
"type": "DataLakeAnalyticsU-SQL",
"typeProperties": {
"scriptPath": "script/dat230.usql",
"scriptLinkedService": "AzureStorageLinkedService",
"degreeOfParallelism": 5,
"priority": 100,
"parameters": {
"salesfile": "$$Text.Format('/DAT230/{0:yyyy}/{0:MM}/{0:dd}.txt', Date.StartOfDay (SliceStart))",
"lineitemsfile": "$$Text.Format('/dat230/dataloads/{0:yyyy}/{0:MM}/{0:dd}/factinventory/fact.csv', Date.StartOfDay (SliceStart))"
}
},
"inputs": [
{
"name": "InputDataset-dat230"
}
],
"outputs": [
{
"name": "OutputDataset-dat230"
}
],
"policy": {
"timeout": "01:00:00",
"concurrency": 1,
"retry": 1
},
"scheduler": {
"frequency": "Day",
"interval": 7
},
"name": "DataLakeAnalyticsUSqlActivityTemplate",
"linkedServiceName": "AzureDataLakeAnalyticsLinkedService"
}
],
"start": "2009-01-03T00:00:00Z",
"end": "2009-01-11T00:00:00Z",
"isPaused": false,
"hubName": "edxlearningdf_hub",
"pipelineMode": "Scheduled"
}
}
and here the datasets:
{
"name": "InputDataset-dat230",
"properties": {
"structure": [
{
"name": "Date",
"type": "Datetime"
},
{
"name": "StoreID",
"type": "Int64"
},
{
"name": "StoreName",
"type": "String"
},
{
"name": "ProductID",
"type": "Int64"
},
{
"name": "ProductName",
"type": "String"
},
{
"name": "Color",
"type": "String"
},
{
"name": "Size",
"type": "String"
},
{
"name": "Manufacturer",
"type": "String"
},
{
"name": "OnHandQuantity",
"type": "Int64"
},
{
"name": "OnOrderQuantity",
"type": "Int64"
},
{
"name": "SafetyStockQuantity",
"type": "Int64"
},
{
"name": "UnitCost",
"type": "Double"
},
{
"name": "DaysInStock",
"type": "Int64"
},
{
"name": "MinDayInStock",
"type": "Int64"
},
{
"name": "MaxDayInStock",
"type": "Int64"
}
],
"published": false,
"type": "AzureBlob",
"linkedServiceName": "Source-BlobStorage-dat230",
"typeProperties": {
"fileName": "*.txt.gz",
"folderPath": "dat230/{year}/{month}/{day}/",
"format": {
"type": "TextFormat",
"columnDelimiter": "\t",
"firstRowAsHeader": true
},
"partitionedBy": [
{
"name": "year",
"value": {
"type": "DateTime",
"date": "WindowStart",
"format": "yyyy"
}
},
{
"name": "month",
"value": {
"type": "DateTime",
"date": "WindowStart",
"format": "MM"
}
},
{
"name": "day",
"value": {
"type": "DateTime",
"date": "WindowStart",
"format": "dd"
}
}
],
"compression": {
"type": "GZip"
}
},
"availability": {
"frequency": "Day",
"interval": 7
},
"external": true,
"policy": {}
}
}
{
"name": "OutputDataset-dat230",
"properties": {
"structure": [
{
"name": "Date",
"type": "Datetime"
},
{
"name": "StoreID",
"type": "Int64"
},
{
"name": "StoreName",
"type": "String"
},
{
"name": "ProductID",
"type": "Int64"
},
{
"name": "ProductName",
"type": "String"
},
{
"name": "Color",
"type": "String"
},
{
"name": "Size",
"type": "String"
},
{
"name": "Manufacturer",
"type": "String"
},
{
"name": "OnHandQuantity",
"type": "Int64"
},
{
"name": "OnOrderQuantity",
"type": "Int64"
},
{
"name": "SafetyStockQuantity",
"type": "Int64"
},
{
"name": "UnitCost",
"type": "Double"
},
{
"name": "DaysInStock",
"type": "Int64"
},
{
"name": "MinDayInStock",
"type": "Int64"
},
{
"name": "MaxDayInStock",
"type": "Int64"
}
],
"published": false,
"type": "AzureDataLakeStore",
"linkedServiceName": "Destination-DataLakeStore-dat230",
"typeProperties": {
"fileName": "txt.gz",
"folderPath": "dat230/dataloads/{year}/{month}/{day}/factinventory/",
"format": {
"type": "TextFormat",
"columnDelimiter": "\t"
},
"partitionedBy": [
{
"name": "year",
"value": {
"type": "DateTime",
"date": "WindowStart",
"format": "yyyy"
}
},
{
"name": "month",
"value": {
"type": "DateTime",
"date": "WindowStart",
"format": "MM"
}
},
{
"name": "day",
"value": {
"type": "DateTime",
"date": "WindowStart",
"format": "dd"
}
}
]
},
"availability": {
"frequency": "Day",
"interval": 7
},
"external": false,
"policy": {}
}
}
You need to look at the time slices for the datasets and there activity.
The pipeline schedule (badly named) only defines the start and end period in which any activities can use to provision and run there time slices.
ADFv1 doesn't use a recursive schedule like the SQL Server Agent. Each execution has to be provisioned at an interval on the time line (the schedule) you create.
For example, if you pipeline start and end is for 1 year. But your dataset and activity has a frequency of monthly and interval of 1 month you will only get 12 executions of the whatever is happening.
Apologies, but the concept of time slices is a little difficult to explain if you aren't already familiar. Maybe read this post: https://blogs.msdn.microsoft.com/ukdataplatform/2016/05/03/demystifying-activity-scheduling-with-azure-data-factory/
Hope this helps.
Would you share with us the json for the datasets and the pipeline? It would be easier to help you having that.
In the meanwhile, check if you are using "style": "StartOfInterval" at the scheduler property of the activity, and also check if you are using an offset.
Cheers!

How to use stored procedure as input dataset in ADF (How to assign database it uses)

I want to run a stored procedure against a linkedservice (azure sql database) and output the result of that stored procedure to a dataset (azure sql database).
Is this possible?
I currently have ended up with this:
Pipeline: It should use a stored procedure that is found on a database defined as a linkedservice and copy that over to the output dataset (an azure sql database)
{
"$schema": "http://datafactories.schema.management.azure.com/schemas/2015-09-01/Microsoft.DataFactory.Pipeline.json",
"name": "CopyGetViewsByDateRange",
"properties": {
"description": "<Enter the Pipeline description here>",
"activities": [
{
"name": "CopyActivityTemplate",
"type": "Copy",
"inputs": [
{
"name": "InputDataset"
}
],
"outputs": [
{
"name": "OutputDataset"
}
],
"typeProperties": {
"source": {
"type": "SqlSource",
"sqlReaderStoredProcedureName": "Analytics_GetViewsByDateRange2",
"storedProcedureParameters": {
"clientid": { "value": "12345", "type": "Int" },
"startdateid": { "value": "20170421", "type": "Int" },
"enddateid": { "value": "20170514", "type": "Int" }
}
},
"sink": {
"type": "SqlSink"
}
},
"policy": {
"concurrency": 1,
"executionPriorityOrder": "OldestFirst",
"retry": 3,
"timeout": "01:00:00"
},
"scheduler": {
"frequency": "Minute",
"interval": "15"
}
}
],
"start": "2017-05-15T00:00:00Z",
"end": "2017-05-17T00:00:00Z"
}
}
Input dataset (Note the comments):
{
"$schema": "http://datafactories.schema.management.azure.com/schemas/2015-09-01/Microsoft.DataFactory.Table.json",
"name": "InputDataset",
"properties": {
"type": "AzureSqlTable", // This surely needs to be a stored procedure type
"linkedServiceName": "AnalyticsAMECDevDB",
"structure": [
{
"name": "Client_Id",
"type": "Int64"
},
{
"name": "DimDate_Id",
"type": "Int64"
},
{
"name": "TotalContentViews",
"type": "Int64"
} // The structure represents what the stored procedure is outputting
],
"typeProperties": {
"tableName": "Analytics.FactPageViews" // This is obviously not right
},
"availability": {
"frequency": "Minute",
"interval": "15"
},
"external": true
}
}
My stored procedure looks like this:
SELECT
#clientid as Client_Id,
[DimDateId] as DimDate_Id,
count(1) as TotalContentViews
FROM
[Analytics].[FactPageViews] as pageviews
inner join Analytics.DimPages as pages
on pageviews.dimpageid = pages.id
where
DimDateId between #startdateid and #enddateid
group by
dimdateid
order by
dimdateid
EDIT (got something to work atleast)
I am currently managing it by defining a query and running the command there:
"activities": [
{
"type": "Copy",
"typeProperties": {
"source": {
"type": "SqlSource",
"sqlReaderQuery": "$$Text.Format('EXEC [dbo].[GetViewsByDateRange] 2, 20170421, 20170514', WindowStart, WindowEnd)"
},
"sink": {
"type": "SqlSink",
"writeBatchSize": 0,
"writeBatchTimeout": "00:00:00"
},
"translator": {
"type": "TabularTranslator",
"columnMappings": "Client_Id:Client_Id,DimDate_Id:DimDate_Id,TotalContentViews:TotalContentViews"
}
},
"inputs": [
{
"name": "InputDataset-0af"
}
],
"outputs": [
{
"name": "OutputDataset-0af"
}
],
I think you've got everything right. To answer your question. Simply, you don't need to have an input dataset defined in your pipeline/activity. So yes, certainly possible.
Just have the output dataset defined as the result of the stored proc.
Hope this helps
I'm not sure this may help you to solve your problem,
Change your input and output dataset as below.
Input dataset
{
"$schema": "http://datafactories.schema.management.azure.com/schemas/2015-09-01/Microsoft.DataFactory.Table.json",
"name": "ofcspdataset",
"properties": {
"type": "AzureSqlTable",
"linkedServiceName": "sproctestout",
"typeProperties": {
"tableName": "dbo.emp" ==> >>need to refer any table be in the source database.
},
"external": true,
"availability": {
"frequency": "Day",
"interval": 1
}
}
}
Output Dataset:
{
"$schema": "http://datafactories.schema.management.azure.com/schemas/2015-09-01/Microsoft.DataFactory.Table.json",
"name": "OfficeTestOuputTable",
"properties": {
"published": false,
"type": "AzureSqlTable",
"linkedServiceName": "sproctestout",
"structure": [
{ "name": "Id" },
{ "name": "GroupId" }
],
"typeProperties": {
"tableName": "dbo.testAdf_temp"
},
"availability": {
"frequency": "Day",
"interval": 1
}
}
}
And I'm sure your pipeline is good. Just try to change the input and output dataset.
For me its works.

Azure Data Factory - how to structure slices for erratically arriving blobs

I have blobs that do not arrive on a fixed schedule but the contents need to be loaded into an Azure SQL DB as timely as possible, and there is some lag on when they can arrive.
The blobs are now named with the following convention logs/{year}/{month}/{day}/{hour}/{minute}/{second}
How should a data factory be coded to load these files as soon as possible, and ideally not generate failures if a file is missing?
What I have so far
Input data
{
"$schema": "http://datafactories.schema.management.azure.com/schemas/2015-09-01/Microsoft.DataFactory.Table.json",
"name": "blobs",
"properties": {
"availability": {
"frequency": "Minute",
"interval": 15
},
"external": true,
"linkedServiceName": "blob",
"policy": { "externalData": { "dataDelay": "1:00:00" } },
"structure": [
{
"name": "Column0",
"type": "Int64"
}
],
"type": "AzureBlob",
"typeProperties": {
"folderPath": "myblobs/{Year}/{Month}/{Day}/{Hour}/{Minute}",
"format": {
"type": "TextFormat",
"rowDelimiter": "\n",
"columnDelimiter": "\t"
},
"partitionedBy": [
{
"name": "Year",
"value": {
"type": "DateTime",
"date": "SliceStart",
"format": "yyyy"
}
},
{
"name": "Month",
"value": {
"type": "DateTime",
"date": "SliceStart",
"format": "%M"
}
},
{
"name": "Day",
"value": {
"type": "DateTime",
"date": "SliceStart",
"format": "%d"
}
},
{
"name": "Hour",
"value": {
"type": "DateTime",
"date": "SliceStart",
"format": "%H"
}
},
{
"name": "Minute",
"value": {
"type": "DateTime",
"date": "SliceStart",
"format": "%m"
}
}
]
}
}
}
Pipeline
{
"$schema": "http://datafactories.schema.management.azure.com/schemas/2015-09-01/Microsoft.DataFactory.Pipeline.json",
"name": "insert",
"properties": {
"description": "Insert data from blobs to sql db",
"activities": [
{
"name": "copyblobtosql",
"type": "Copy",
"inputs": [
{
"name": "blobs"
}
],
"outputs": [
{
"name": "tbl"
}
],
"typeProperties": {
"source": {
"type": "BlobSource",
"recursive": false
},
"sink": {
"type": "SqlSink",
"writeBatchSize": 0,
"writeBatchTimeout": "00:00:00"
},
"translator": {
"type": "TabularTranslator",
"columnMappings": "Column0:id"
}
},
"policy": {
"concurrency": 10,
"executionPriorityOrder": "OldestFirst",
"retry": 3,
"timeout": "01:00:00"
},
"scheduler": {
"frequency": "Minute",
"interval": 15
}
}
],
"start": "2016-01-01T00:00:00Z",
"end": "2099-05-05T00:00:00Z"
}
}
Output data
{
"$schema": "http://datafactories.schema.management.azure.com/schemas/2015-09-01/Microsoft.DataFactory.Table.json",
"name": "tbl",
"properties": {
"type": "AzureSqlTable",
"linkedServiceName": "db",
"structure": [
{"name": "id","type": "Int32"}
],
"typeProperties": {
"tableName": "tbl"
},
"availability": {
"frequency": "Minute",
"interval": 15
}
}
}