Getting error on null and empty string while copying a csv file from blob container to Azure SQL DB

Getting error on null and empty string while copying a csv file from blob container to Azure SQL DB - azure-data-factory

I tried all combination on the datatype of my data but each time my data factory pipeline is giving me this error:
{
"errorCode": "2200",
"message": "ErrorCode=UserErrorColumnNameNotAllowNull,'Type=Microsoft.DataTransfer.Common.Shared.HybridDeliveryException,Message=Empty or Null string found in Column Name 2. Please make sure column name not null and try again.,Source=Microsoft.DataTransfer.Common,'",
"failureType": "UserError",
"target": "xxx",
"details": []
}
My Copy data source code is something like this:{
"name": "xxx",
"description": "uuu",
"type": "Copy",
"dependsOn": [],
"policy": {
"timeout": "7.00:00:00",
"retry": 0,
"retryIntervalInSeconds": 30,
"secureOutput": false,
"secureInput": false
},
"userProperties": [],
"typeProperties": {
"source": {
"type": "DelimitedTextSource",
"storeSettings": {
"type": "AzureBlobStorageReadSettings",
"recursive": true,
"wildcardFileName": "*"
},
"formatSettings": {
"type": "DelimitedTextReadSettings"
}
},
"sink": {
"type": "AzureSqlSink"
},
"enableStaging": false,
"translator": {
"type": "TabularTranslator",
"mappings": [
{
"source": {
"name": "populationId",
"type": "Guid"
},
"sink": {
"name": "PopulationID",
"type": "String"
}
},
{
"source": {
"name": "inputTime",
"type": "DateTime"
},
"sink": {
"name": "inputTime",
"type": "DateTime"
}
},
{
"source": {
"name": "inputCount",
"type": "Decimal"
},
"sink": {
"name": "inputCount",
"type": "Decimal"
}
},
{
"source": {
"name": "inputBiomass",
"type": "Decimal"
},
"sink": {
"name": "inputBiomass",
"type": "Decimal"
}
},
{
"source": {
"name": "inputNumber",
"type": "Decimal"
},
"sink": {
"name": "inputNumber",
"type": "Decimal"
}
},
{
"source": {
"name": "utcOffset",
"type": "String"
},
"sink": {
"name": "utcOffset",
"type": "Int32"
}
},
{
"source": {
"name": "fishGroupName",
"type": "String"
},
"sink": {
"name": "fishgroupname",
"type": "String"
}
},
{
"source": {
"name": "yearClass",
"type": "String"
},
"sink": {
"name": "yearclass",
"type": "String"
}
}
]
}
},
"inputs": [
{
"referenceName": "DelimitedTextFTDimensions",
"type": "DatasetReference"
}
],
"outputs": [
{
"referenceName": "AzureSqlTable1",
"type": "DatasetReference"
}
]
}
Can anyone please help me understand the issue. I see in some blogs they ask me use treatnullasempty but I am not allowed to modify the JSON. is there a way to do that??

I suggest to using Data Flow DerivedColumn, DerivedColumn can help you build expression to replace the null column.
For example:
Derived Column, if Column_2 is null =true, return 'dd' :
iifNull(Column_2,'dd')
Mapping the column
Reference: Data transformation expressions in mapping data flow
Hope this helps.

fixed it.it was a easy fix as one of my column in destination was marked as not null, i changed it as null and it worked.

Related

NULL being passed from pipeline to linked service baseURL

I have a pipeline that will iterate over files and copy them to a storage location.
The baseURL and relativeURL are stored in a json file.
I can read in this file and it is valid.
I have parameterized the linked service baseURL and this works when testing from the linked service, and from the dataset.
When I try to debug the pipeline however, I get an error:
"code":"BadRequest"
"message":null
"target":"pipeline//runid/310b8ac1-2ce6-4c7c-a1ad-433ee9019e9b"
"details":null
"error":null
It appears that from the activity in the pipeline, a null value is being passed instead of the baseURL.
I have iterated over the values from my config file and it is being read and the values are correct. It really seems like the pipeline is not passing in the correct value for baseURL.
Do I have to modify the json code behind the pipeline to get this to work?
If it helps, the json for the linked service, data set and pipeline are below:
--Linked Service:
{
"name": "ls_http_opendata_ecdc_europe_eu",
"properties": {
"parameters": {
"baseURL": {
"type": "string"
}
},
"annotations": [],
"type": "HttpServer",
"typeProperties": {
"url": "#linkedService().baseURL",
"enableServerCertificateValidation": true,
"authenticationType": "Anonymous"
}
}
}
--dataset
{
"name": "ds_ecdc_raw_csv_http",
"properties": {
"linkedServiceName": {
"referenceName": "ls_http_opendata_ecdc_europe_eu",
"type": "LinkedServiceReference"
},
"parameters": {
"relativeURL": {
"type": "string"
},
"baseURL": {
"type": "string"
}
},
"annotations": [],
"type": "DelimitedText",
"typeProperties": {
"location": {
"type": "HttpServerLocation",
"relativeUrl": {
"value": "#dataset().relativeURL",
"type": "Expression"
}
},
"columnDelimiter": ",",
"escapeChar": "\\",
"firstRowAsHeader": true,
"quoteChar": "\""
},
"schema": []
}
}
--pipeline
{
"name": "pl_ingest_ecdc_data",
"properties": {
"activities": [
{
"name": "lookup ecdc filelist",
"type": "Lookup",
"dependsOn": [],
"policy": {
"timeout": "7.00:00:00",
"retry": 0,
"retryIntervalInSeconds": 30,
"secureOutput": false,
"secureInput": false
},
"userProperties": [],
"typeProperties": {
"source": {
"type": "JsonSource",
"storeSettings": {
"type": "AzureBlobStorageReadSettings",
"recursive": true,
"enablePartitionDiscovery": false
},
"formatSettings": {
"type": "JsonReadSettings"
}
},
"dataset": {
"referenceName": "ds_ecdc_file_list",
"type": "DatasetReference"
},
"firstRowOnly": false
}
},
{
"name": "execute copy for every record",
"type": "ForEach",
"dependsOn": [
{
"activity": "lookup ecdc filelist",
"dependencyConditions": [
"Succeeded"
]
}
],
"userProperties": [],
"typeProperties": {
"items": {
"value": "#activity('lookup ecdc filelist').output.value",
"type": "Expression"
},
"activities": [
{
"name": "Copy data1",
"type": "Copy",
"dependsOn": [],
"policy": {
"timeout": "7.00:00:00",
"retry": 0,
"retryIntervalInSeconds": 30,
"secureOutput": false,
"secureInput": false
},
"userProperties": [],
"typeProperties": {
"source": {
"type": "DelimitedTextSource",
"storeSettings": {
"type": "HttpReadSettings",
"requestMethod": "GET"
},
"formatSettings": {
"type": "DelimitedTextReadSettings"
}
},
"sink": {
"type": "DelimitedTextSink",
"storeSettings": {
"type": "AzureBlobFSWriteSettings"
},
"formatSettings": {
"type": "DelimitedTextWriteSettings",
"quoteAllText": true,
"fileExtension": ".txt"
}
},
"enableStaging": false,
"translator": {
"type": "TabularTranslator",
"typeConversion": true,
"typeConversionSettings": {
"allowDataTruncation": true,
"treatBooleanAsNumber": false
}
}
},
"inputs": [
{
"referenceName": "DelimitedText1",
"type": "DatasetReference",
"parameters": {
"sourceBaseURL": {
"value": "#item().sourceBaseURL",
"type": "Expression"
},
"sourceRelativeURL": {
"value": "#item().sourceRelativeURL",
"type": "Expression"
}
}
}
],
"outputs": [
{
"referenceName": "ds_ecdc_raw_csv_dl",
"type": "DatasetReference",
"parameters": {
"fileName": {
"value": "#item().sinkFileName",
"type": "Expression"
}
}
}
]
}
]
}
}
],
"concurrency": 1,
"annotations": []
}
}

I reproduced your error.
{"code":"BadRequest","message":null,"target":"pipeline//runid/abd35329-3625-490b-85cf-f6d0de3dac86","details":null,"error":null}
It is because you didn't pass your baseURL to link service in Source Dataset. Please do this:
And Dataset JSON code should be like this:
{
"name": "ds_ecdc_raw_csv_http",
"properties": {
"linkedServiceName": {
"referenceName": "ls_http_opendata_ecdc_europe_eu",
"type": "LinkedServiceReference",
"parameters": {
"baseURL": {
"value": "#dataset().baseURL",
"type": "Expression"
}
}
},
"parameters": {
"relativeURL": {
"type": "string"
},
"baseURL": {
"type": "string"
}
},
"annotations": [],
"type": "DelimitedText",
"typeProperties": {
"location": {
"type": "HttpServerLocation",
"relativeUrl": {
"value": "#dataset().relativeURL",
"type": "Expression"
}
},
"columnDelimiter": ",",
"escapeChar": "\\",
"firstRowAsHeader": true,
"quoteChar": "\""
},
"schema": []
}
}

Conversion of JSON to Avro failed: Failed to convert JSON to Avro: Unknown union branch

I am trying to send a JSON message to a Kafka topic using Kafka-rest service to serialize JSON as an Avro object, but the JSON message is failed to get accepted by Kafka-rest with the following error:
Conversion of JSON to Avro failed: Failed to convert JSON to Avro: Unknown union branch postId
I suspect that there is an issue with the Avro schema I am using as it is a nested record type with nullable fields.
Avro schema:
{
"type": "record",
"name": "ExportRequest",
"namespace": "com.example.avro.model",
"fields": [
{
"name": "context",
"type": {
"type": "map",
"values": {
"type": "string",
"avro.java.string": "String"
},
"avro.java.string": "String"
}
},
{
"name": "exportInfo",
"type": {
"type": "record",
"name": "ExportInfo",
"fields": [
{
"name": "exportId",
"type": {
"type": "string",
"avro.java.string": "String"
}
},
{
"name": "exportType",
"type": {
"type": "string",
"avro.java.string": "String"
}
},
{
"name": "exportQuery",
"type": {
"type": "record",
"name": "ExportQuery",
"fields": [
{
"name": "postExport",
"type": [
"null",
{
"type": "record",
"name": "PostExport",
"fields": [
{
"name": "postId",
"type": {
"type": "string",
"avro.java.string": "String"
}
},
{
"name": "isCommentIncluded",
"type": "boolean"
}
]
}
],
"default": null
},
{
"name": "feedExport",
"type": [
"null",
{
"type": "record",
"name": "FeedExport",
"fields": [
{
"name": "accounts",
"type": {
"type": "array",
"items": {
"type": "string",
"avro.java.string": "String"
}
}
},
{
"name": "recordTypes",
"type": {
"type": "array",
"items": {
"type": "string",
"avro.java.string": "String"
}
}
},
{
"name": "actions",
"type": {
"type": "array",
"items": {
"type": "string",
"avro.java.string": "String"
}
}
},
{
"name": "contentTypes",
"type": {
"type": "array",
"items": {
"type": "string",
"avro.java.string": "String"
}
}
},
{
"name": "startDate",
"type": "long"
},
{
"name": "endDate",
"type": "long"
},
{
"name": "advancedSearch",
"type": [
"null",
{
"type": "record",
"name": "AdvancedSearchExport",
"fields": [
{
"name": "allOfTheWords",
"type": {
"type": "array",
"items": {
"type": "string",
"avro.java.string": "String"
}
}
},
{
"name": "anyOfTheWords",
"type": {
"type": "array",
"items": {
"type": "string",
"avro.java.string": "String"
}
}
},
{
"name": "noneOfTheWords",
"type": {
"type": "array",
"items": {
"type": "string",
"avro.java.string": "String"
}
}
},
{
"name": "hashtags",
"type": {
"type": "array",
"items": {
"type": "string",
"avro.java.string": "String"
}
}
},
{
"name": "keyword",
"type": {
"type": "string",
"avro.java.string": "String"
}
},
{
"name": "exactPhrase",
"type": {
"type": "string",
"avro.java.string": "String"
}
}
]
}
],
"default": null
}
]
}
],
"default": null
}
]
}
}
]
}
}
]
}
Json message:
{
"context": {
"user_id": "1",
"group_id": "1",
"organization_id": "1"
},
"exportInfo": {
"exportId": "93874dd7-35d7-4f1f-8cf8-051c606d920b",
"exportType": "json",
"exportQuery": {
"postExport": {
"postId": "dd",
"isCommentIncluded": false
},
"feedExport": {
"accounts": [
"1677143852565319"
],
"recordTypes": [],
"actions": [],
"contentTypes": [],
"startDate": 0,
"endDate": 0,
"advancedSearch": {
"allOfTheWords": [
"string"
],
"anyOfTheWords": [
"string"
],
"noneOfTheWords": [
"string"
],
"hashtags": [
"string"
],
"keyword": "string",
"exactPhrase": "string"
}
}
}
}
}
I would appreciate it if someone could help me to understand what the issue is.

Both of your JSON and Avro looks good.
You are facing the issue because JSON doesn't conform to Avro's JSON encoding spec.
So, if you convert your JSON accordingly, it will somehow look like this
{
"context": {
"user_id": "1",
"group_id": "1",
"organization_id": "1"
},
"exportInfo": {
"exportId": "93874dd7-35d7-4f1f-8cf8-051c606d920b",
"exportType": "json",
"exportQuery": {
"postExport": {
"com.example.avro.model.PostExport": {
"postId": "dd",
"isCommentIncluded": false
}
},
"feedExport": {
"com.example.avro.model.FeedExport": {
"accounts": [
"1677143852565319"
],
"recordTypes": [],
"actions": [],
"contentTypes": [],
"startDate": 0,
"endDate": 0,
"advancedSearch": {
"com.example.avro.model.AdvancedSearchExport": {
"allOfTheWords": [
"string"
],
"anyOfTheWords": [
"string"
],
"noneOfTheWords": [
"string"
],
"hashtags": [
"string"
],
"keyword": "string",
"exactPhrase": "string"
}
}
}
}
}
}
}

avro.io.AvroTypeException: The datum [object] is not an example of the schema

I have been struggling through this issue quite for some time. I am working on AvroProducer(confluent kafka) and getting error related to schema defined.
Here is the complete stacktrace of the issue I am getting:
<!--language: lang-none-->
raise AvroTypeException(self.writer_schema, datum)
avro.io.AvroTypeException: The datum {'totalDifficulty': 2726165051, 'stateRoot': '0xf09bd6730b3ae7f5728836564837d7f776a8f7333628c8b84cb57d7c6d48ebba', 'sha3Uncles': '0x1dcc4de8dec75d7aab85b567b6ccd41ad312451b948a7413f0a142fd40d49347', 'size': 538, 'logs': [], 'gasLimit': 8000000, 'mixHash': '0x410b2b19519be16496727c93515f399072ffecf06defe4913d00eb4d10bb7351', 'logsBloom': '0x00000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000', 'nonce': '0x18dc6c0d30839c91', 'proofOfAuthorityData': '0xd883010817846765746888676f312e31302e34856c696e7578', 'number': 5414, 'timestamp': 1552577641, 'difficulty': 589091, 'gasUsed': 0, 'miner': '0x48FA5EBc2f0D82B5D52faAe624Fa2426998ab492', 'hash': '0x71259991acb407a85befa8b3c5df26a94a11a6c08f92f3e3b7c9c0e8e1f5916d', 'transactionsRoot': '0x56e81f171bcc55a6ff8345e692c0f86e5b48e01b996cadc001622fb5e363b421', 'receiptsRoot': '0x56e81f171bcc55a6ff8345e692c0f86e5b48e01b996cadc001622fb5e363b421', 'transactions': [], 'parentHash': '0x9f0c25eeab86fc144296cb034c94857beed331936016d60c0986a35ac07d9c68', 'uncles': []} is not an example of the schema {
"type": "record",
"name": "value",
"namespace": "exporter.value.opsnetBlock",
"fields": [
{
"type": "int",
"name": "difficulty"
},
{
"type": "string",
"name": "proofOfAuthorityData"
},
{
"type": "int",
"name": "gasLimit"
},
{
"type": "int",
"name": "gasUsed"
},
{
"type": "string",
"name": "hash"
},
{
"type": "string",
"name": "logsBloom"
},
{
"type": "int",
"name": "size"
},
{
"type": "string",
"name": "miner"
},
{
"type": "string",
"name": "mixHash"
},
{
"type": "string",
"name": "nonce"
},
{
"type": "int",
"name": "number"
},
{
"type": "string",
"name": "parentHash"
},
{
"type": "string",
"name": "receiptsRoot"
},
{
"type": "string",
"name": "sha3Uncles"
},
{
"type": "string",
"name": "stateRoot"
},
{
"type": "int",
"name": "timestamp"
},
{
"type": "int",
"name": "totalDifficulty"
},
{
"type": "string",
"name": "transactionsRoot"
},
{
"type": {
"type": "array",
"items": "string"
},
"name": "transactions"
},
{
"type": {
"type": "array",
"items": "string"
},
"name": "uncles"
},
{
"type": {
"type": "array",
"items": {
"type": "record",
"name": "Child",
"namespace": "exporter.value.opsnetBlock",
"fields": [
{
"type": "string",
"name": "address"
},
{
"type": "string",
"name": "blockHash"
},
{
"type": "int",
"name": "blockNumber"
},
{
"type": "string",
"name": "data"
},
{
"type": "int",
"name": "logIndex"
},
{
"type": "boolean",
"name": "removed"
},
{
"type": {
"type": "array",
"items": "string"
},
"name": "topics"
},
{
"type": "string",
"name": "transactionHash"
},
{
"type": "int",
"name": "transactionIndex"
}
]
}
},
"name": "logs"
}
]
}
Can anybody please tell me where am I going wrong in this?
Thanks in advance

Azure Data Factory V2 Dataset Dynamic Folder

In Azure Data Factory (V1) I was able to create a slide and store the output to a specific folder (i.e. {Year}/{Month}/{Day}. See code below.
How do you create the same type of slice in Azure Data Factory V2? I did find that you have to create a paramater. Yes, I was unable to figure out how to pass the parameter.
"folderPath": "#{dataset().path}",
"parameters": {
"path": {
"type": "String"
Here is original ADF V1 code.
{
"name": "EMS_EMSActivations_L1_Snapshot",
"properties": {
"published": false,
"type": "AzureDataLakeStore",
"linkedServiceName": "SalesIntelligence_ADLS_LS",
"typeProperties": {
"fileName": "EMS.FACT_EMSActivations_WTA.tsv",
"folderPath": "/Snapshots/EMS/FACT_EMSActivations_WTA/{Year}/{Month}/{Day}",
"format": {
"type": "TextFormat",
"rowDelimiter": "␀",
"columnDelimiter": "\t",
"nullValue": "#NULL#",
"quoteChar": "\""
},
"partitionedBy": [
{
"name": "Year",
"value": {
"type": "DateTime",
"date": "SliceStart",
"format": "yyyy"
}
},
{
"name": "Month",
"value": {
"type": "DateTime",
"date": "SliceStart",
"format": "MM"
}
},
{
"name": "Day",
"value": {
"type": "DateTime",
"date": "SliceStart",
"format": "dd"
}
},
{
"name": "Hour",
"value": {
"type": "DateTime",
"date": "SliceStart",
"format": "HH"
}
},
{
"name": "Minute",
"value": {
"type": "DateTime",
"date": "SliceStart",
"format": "mm"
}
}
]
},
"availability": {
"frequency": "Day",
"interval": 1
}
}
}

Here is how you create a dynamic folder path when importing data from SQL into ADL. Look at folderPath line.
{
"name": "EBC_BriefingActivitySummary_L1_Snapshot",
"properties": {
"linkedServiceName": {
"referenceName": "SIAzureDataLakeStore",
"type": "LinkedServiceReference"
},
"type": "AzureDataLakeStoreFile",
"typeProperties": {
"format": {
"type": "TextFormat",
"columnDelimiter": ",",
"rowDelimiter": "",
"nullValue": "\\N",
"treatEmptyAsNull": false,
"firstRowAsHeader": false
},
"fileName": {
"value": "EBC.rpt_BriefingActivitySummary.tsv",
"type": "Expression"
},
"folderPath": {
"value": "#concat('/Snapshots/EBC/rpt_BriefingActivitySummary/', formatDateTime(pipeline().parameters.scheduledRunTime, 'yyyy'), '/', formatDateTime(pipeline().parameters.scheduledRunTime, 'MM'), '/', formatDateTime(pipeline().parameters.scheduledRunTime, 'dd'), '/')",
"type": "Expression"
}
}
}
}

Step 1:
Use WindowStartTime / WindowEndTime in folderpath
"folderPath": {
"value": "<<path>>/#{formatDateTime(pipeline().parameters.windowStart,'yyyy')}-#{formatDateTime(pipeline().parameters.windowStart,'MM')}-#{formatDateTime(pipeline().parameters.windowStart,'dd')}/#{formatDateTime(pipeline().parameters.windowStart,'HH')}/",
"type": "Expression"
}
Step2 : Add in Pipeline JSON
"parameters": {
"windowStart": {
"type": "String"
},
"windowEnd": {
"type": "String"
}
}
Step3 : Add Run Parameter in TumblingWindow Trigger
( This is referred in Step 2 )
"parameters": {
"windowStart": {
"type": "Expression",
"value": "#trigger().outputs.windowStartTime"
},
"windowEnd": {
"type": "Expression",
"value": "#trigger().outputs.windowEndTime"
}
}
For more details to understand , Refer
Refer this link.
https://github.com/MicrosoftDocs/azure-docs/blob/master/articles/data-factory/how-to-create-tumbling-window-trigger.md

azure data factory start pipeline different from starting job

I am getting crazy on this issue, I am running an Azure data factory V1, I need to schedule a copy job every week from 01/03/2009 through 01/31/2009, so I defined this schedule on the pipeline:
"start": "2009-01-03T00:00:00Z",
"end": "2009-01-31T00:00:00Z",
"isPaused": false,
monitoring the pipeline, the data factory schedule on these date:
12/29/2008
01/05/2009
01/12/2009
01/19/2009
01/26/2009
instead of this wanted schedule:
01/03/2009
01/10/2009
01/17/2009
01/24/2009
01/31/2009
why the starting date defined on the pipeline doesn't correspond to the schedule date on the monitor?
Many thanks!
Here is the JSON Pipeline:
{
"name": "CopyPipeline-blob2datalake",
"properties": {
"description": "copy from blob storage to datalake directory structure",
"activities": [
{
"type": "DataLakeAnalyticsU-SQL",
"typeProperties": {
"scriptPath": "script/dat230.usql",
"scriptLinkedService": "AzureStorageLinkedService",
"degreeOfParallelism": 5,
"priority": 100,
"parameters": {
"salesfile": "$$Text.Format('/DAT230/{0:yyyy}/{0:MM}/{0:dd}.txt', Date.StartOfDay (SliceStart))",
"lineitemsfile": "$$Text.Format('/dat230/dataloads/{0:yyyy}/{0:MM}/{0:dd}/factinventory/fact.csv', Date.StartOfDay (SliceStart))"
}
},
"inputs": [
{
"name": "InputDataset-dat230"
}
],
"outputs": [
{
"name": "OutputDataset-dat230"
}
],
"policy": {
"timeout": "01:00:00",
"concurrency": 1,
"retry": 1
},
"scheduler": {
"frequency": "Day",
"interval": 7
},
"name": "DataLakeAnalyticsUSqlActivityTemplate",
"linkedServiceName": "AzureDataLakeAnalyticsLinkedService"
}
],
"start": "2009-01-03T00:00:00Z",
"end": "2009-01-11T00:00:00Z",
"isPaused": false,
"hubName": "edxlearningdf_hub",
"pipelineMode": "Scheduled"
}
}
and here the datasets:
{
"name": "InputDataset-dat230",
"properties": {
"structure": [
{
"name": "Date",
"type": "Datetime"
},
{
"name": "StoreID",
"type": "Int64"
},
{
"name": "StoreName",
"type": "String"
},
{
"name": "ProductID",
"type": "Int64"
},
{
"name": "ProductName",
"type": "String"
},
{
"name": "Color",
"type": "String"
},
{
"name": "Size",
"type": "String"
},
{
"name": "Manufacturer",
"type": "String"
},
{
"name": "OnHandQuantity",
"type": "Int64"
},
{
"name": "OnOrderQuantity",
"type": "Int64"
},
{
"name": "SafetyStockQuantity",
"type": "Int64"
},
{
"name": "UnitCost",
"type": "Double"
},
{
"name": "DaysInStock",
"type": "Int64"
},
{
"name": "MinDayInStock",
"type": "Int64"
},
{
"name": "MaxDayInStock",
"type": "Int64"
}
],
"published": false,
"type": "AzureBlob",
"linkedServiceName": "Source-BlobStorage-dat230",
"typeProperties": {
"fileName": "*.txt.gz",
"folderPath": "dat230/{year}/{month}/{day}/",
"format": {
"type": "TextFormat",
"columnDelimiter": "\t",
"firstRowAsHeader": true
},
"partitionedBy": [
{
"name": "year",
"value": {
"type": "DateTime",
"date": "WindowStart",
"format": "yyyy"
}
},
{
"name": "month",
"value": {
"type": "DateTime",
"date": "WindowStart",
"format": "MM"
}
},
{
"name": "day",
"value": {
"type": "DateTime",
"date": "WindowStart",
"format": "dd"
}
}
],
"compression": {
"type": "GZip"
}
},
"availability": {
"frequency": "Day",
"interval": 7
},
"external": true,
"policy": {}
}
}
{
"name": "OutputDataset-dat230",
"properties": {
"structure": [
{
"name": "Date",
"type": "Datetime"
},
{
"name": "StoreID",
"type": "Int64"
},
{
"name": "StoreName",
"type": "String"
},
{
"name": "ProductID",
"type": "Int64"
},
{
"name": "ProductName",
"type": "String"
},
{
"name": "Color",
"type": "String"
},
{
"name": "Size",
"type": "String"
},
{
"name": "Manufacturer",
"type": "String"
},
{
"name": "OnHandQuantity",
"type": "Int64"
},
{
"name": "OnOrderQuantity",
"type": "Int64"
},
{
"name": "SafetyStockQuantity",
"type": "Int64"
},
{
"name": "UnitCost",
"type": "Double"
},
{
"name": "DaysInStock",
"type": "Int64"
},
{
"name": "MinDayInStock",
"type": "Int64"
},
{
"name": "MaxDayInStock",
"type": "Int64"
}
],
"published": false,
"type": "AzureDataLakeStore",
"linkedServiceName": "Destination-DataLakeStore-dat230",
"typeProperties": {
"fileName": "txt.gz",
"folderPath": "dat230/dataloads/{year}/{month}/{day}/factinventory/",
"format": {
"type": "TextFormat",
"columnDelimiter": "\t"
},
"partitionedBy": [
{
"name": "year",
"value": {
"type": "DateTime",
"date": "WindowStart",
"format": "yyyy"
}
},
{
"name": "month",
"value": {
"type": "DateTime",
"date": "WindowStart",
"format": "MM"
}
},
{
"name": "day",
"value": {
"type": "DateTime",
"date": "WindowStart",
"format": "dd"
}
}
]
},
"availability": {
"frequency": "Day",
"interval": 7
},
"external": false,
"policy": {}
}
}

You need to look at the time slices for the datasets and there activity.
The pipeline schedule (badly named) only defines the start and end period in which any activities can use to provision and run there time slices.
ADFv1 doesn't use a recursive schedule like the SQL Server Agent. Each execution has to be provisioned at an interval on the time line (the schedule) you create.
For example, if you pipeline start and end is for 1 year. But your dataset and activity has a frequency of monthly and interval of 1 month you will only get 12 executions of the whatever is happening.
Apologies, but the concept of time slices is a little difficult to explain if you aren't already familiar. Maybe read this post: https://blogs.msdn.microsoft.com/ukdataplatform/2016/05/03/demystifying-activity-scheduling-with-azure-data-factory/
Hope this helps.

Would you share with us the json for the datasets and the pipeline? It would be easier to help you having that.
In the meanwhile, check if you are using "style": "StartOfInterval" at the scheduler property of the activity, and also check if you are using an offset.
Cheers!

We Keep Coding

iphone swift flutter scala powershell matlab mongodb postgresql perl eclipse

Getting error on null and empty string while copying a csv file from blob container to Azure SQL DB - azure-data-factory

fixed it.it was a easy fix as one of my column in destination was marked as not null, i changed it as null and it worked.

Related

NULL being passed from pipeline to linked service baseURL

Conversion of JSON to Avro failed: Failed to convert JSON to Avro: Unknown union branch

avro.io.AvroTypeException: The datum [object] is not an example of the schema

Azure Data Factory V2 Dataset Dynamic Folder

azure data factory start pipeline different from starting job

Categories

Resources