ADF Column Name Validation and Data Validation - azure-data-factory

I am trying to add some validation to my ADF pipeline. Is there a way to achieve the following validation in ADF?
Validate column header and return error message. There is a list of required column names that I need to check against the raw Excel file. For example, the raw file might have column A,B,C,D, but the required columns are A,B,E. So is there a way to validate and return an error message that the column E is missing in the raw file?
Validate the data type in data mapping flow, if column A should be a numeric field but some of the cells have text in it, or column B should be datetime type but has a number in it. Is there a way to check values in each row and return error message if the data validation fails on that row?

Adding to #Nandan, you can use Get Meta data activity structure like below.
This is my repro for your reference:
First, I have used 2 parameters for column names and Data types.
Get Meta data activity:
Get Meta activity output array:
Then I have created two arrays to get the above names and columns using forEach.
Then I have used two filter activities to filter the above parameter arrays.
The used if activity to check the parameter arrays length and filter activity output arrays lengths.
If its true, the inside True activities you can use your copy activity or Data flow as per your requirement. Inside False activities, use a fail activity.
My pipeline JSON:
{
"name": "pipeline1",
"properties": {
"activities": [
{
"name": "Get Metadata1",
"type": "GetMetadata",
"dependsOn": [],
"policy": {
"timeout": "0.12:00:00",
"retry": 0,
"retryIntervalInSeconds": 30,
"secureOutput": false,
"secureInput": false
},
"userProperties": [],
"typeProperties": {
"dataset": {
"referenceName": "Excel1",
"type": "DatasetReference"
},
"fieldList": [
"structure"
],
"storeSettings": {
"type": "AzureBlobFSReadSettings",
"enablePartitionDiscovery": false
}
}
},
{
"name": "Filtering names",
"type": "Filter",
"dependsOn": [
{
"activity": "Getting names and columns as list",
"dependencyConditions": [
"Succeeded"
]
}
],
"userProperties": [],
"typeProperties": {
"items": {
"value": "#pipeline().parameters.names",
"type": "Expression"
},
"condition": {
"value": "#contains(variables('namesvararray'),item())",
"type": "Expression"
}
}
},
{
"name": "Filtering types",
"type": "Filter",
"dependsOn": [
{
"activity": "Filtering names",
"dependencyConditions": [
"Succeeded"
]
}
],
"userProperties": [],
"typeProperties": {
"items": {
"value": "#variables('typevararray')",
"type": "Expression"
},
"condition": {
"value": "#contains(variables('typevararray'), item())",
"type": "Expression"
}
}
},
{
"name": "Getting names and columns as list",
"type": "ForEach",
"dependsOn": [
{
"activity": "Get Metadata1",
"dependencyConditions": [
"Succeeded"
]
}
],
"userProperties": [],
"typeProperties": {
"items": {
"value": "#activity('Get Metadata1').output.structure",
"type": "Expression"
},
"isSequential": true,
"activities": [
{
"name": "Append names",
"type": "AppendVariable",
"dependsOn": [],
"userProperties": [],
"typeProperties": {
"variableName": "namesvararray",
"value": {
"value": "#item().name",
"type": "Expression"
}
}
},
{
"name": "Append types",
"type": "AppendVariable",
"dependsOn": [
{
"activity": "Append names",
"dependencyConditions": [
"Succeeded"
]
}
],
"userProperties": [],
"typeProperties": {
"variableName": "typevararray",
"value": {
"value": "#item().type",
"type": "Expression"
}
}
}
]
}
},
{
"name": "If Condition1",
"type": "IfCondition",
"dependsOn": [
{
"activity": "Filtering types",
"dependencyConditions": [
"Succeeded"
]
}
],
"userProperties": [],
"typeProperties": {
"expression": {
"value": "#and(equals(length(pipeline().parameters.names),activity('Filtering names').output.FilteredItemsCount),equals(length(pipeline().parameters.columns),activity('Filtering types').output.FilteredItemsCount))",
"type": "Expression"
},
"ifFalseActivities": [
{
"name": "Fail1",
"type": "Fail",
"dependsOn": [],
"userProperties": [],
"typeProperties": {
"message": "Some of the headers or types are not as required",
"errorCode": "240"
}
}
],
"ifTrueActivities": [
{
"name": "Set variable1",
"type": "SetVariable",
"dependsOn": [],
"userProperties": [],
"typeProperties": {
"variableName": "sample",
"value": "All good"
}
}
]
}
}
],
"parameters": {
"names": {
"type": "array",
"defaultValue": [
"A",
"B",
"C"
]
},
"columns": {
"type": "array",
"defaultValue": [
"String",
"String",
"String"
]
}
},
"variables": {
"namesvararray": {
"type": "Array"
},
"typevararray": {
"type": "Array"
},
"sample": {
"type": "String"
}
},
"annotations": []
}
}
My pipeline failed and got error:

you can use a lookup activity on the dataset and return 1st row(with dataset header property disabled)this would give you the list of columns present in the excel file which you can then compare against the expected values, if the values/sequence match you can proceed further else you can thro error.
Note: you can also use Get meta data activity to get the column details
For data type, you can use column patterns in dataflows:
https://learn.microsoft.com/en-us/azure/data-factory/concepts-data-flow-column-pattern
#rakeshGovindula: any more thoughts?

Related

Copy CSV file data from blob storage to Azure SQL database based on the number of columns in the file

I have data files landing in a single Azure blob storage container every other day or so. These files have either 8, 16, 24, or 32 columns of data. Each column has a unique name within a file, and the names are consistent across files. I.e the column names in the 8-column file will always be the first 8 column names of the 16, 24, and 32 column files. I have the appropriate 4 tables in an Azure SQL database set up to receive the files. I need to create a pipeline(s) in Azure Data Factory that will
trigger upon the landing of a new file in the blob storage container
check the # of columns in that file
use the number of columns to copy the file from the blob into the appropriate Azure SQL database table. Meaning the 8 column blob file copies to the 8 column SQL table and so on.
delete the file
I've researched the various pieces to complete this but cannot seem to put them together. Schema drift solution got me close but parameterization of the file names lost me. Multiple pipelines to achieve this is okay, as long as the single storage container is maintained. Thanks
Use Blob trigger to trigger upon the landing of a new file in the blob storage container
use get meta data activity to get the # of column in file details
use a switch activity based on number of columns and based on that have copy activity and delete activity within the switch counterparts to copy the data and also delete the file
I agree with #Nandan's approach. Also, you can try the below alternative using look up and filter if you want to avoid creating Switch cases.
For this approach, you should not have any other tables in your target database other than the above.
First create pipeline parameter and a Storage event trigger. Give trigger parameter#triggerBody().fileName to the pipeline parameter.
Now, use lookup activity query to get the list of table schema, table name and column count as an array of objects.
SELECT TABLE_SCHEMA
, TABLE_NAME
, number = COUNT(*)
FROM INFORMATION_SCHEMA.COLUMNS
where TABLE_NAME!='database_firewall_rules'
GROUP BY TABLE_SCHEMA, TABLE_NAME;
This will give the JSON array like this.
Next, use Get Meta activity by giving the triggered file name with dataset parameters and get the column count from it.
Now, use Filter activity to filter the correct SQL table which has same column count as our triggered file.
items: #activity('Lookup1').output.value
Condition: #equals(activity('Get Metadata1').output.columnCount, item().number)
Filter output:
Now, use copy activity with dataset parameters.
Source with dataset parameters:
Sink:
Then use delete activity for the triggered file.
My pipeline JSON:
{
"name": "pipeline5_copy1",
"properties": {
"activities": [
{
"name": "Lookup1",
"type": "Lookup",
"dependsOn": [],
"policy": {
"timeout": "0.12:00:00",
"retry": 0,
"retryIntervalInSeconds": 30,
"secureOutput": false,
"secureInput": false
},
"userProperties": [],
"typeProperties": {
"source": {
"type": "AzureSqlSource",
"sqlReaderQuery": "SELECT TABLE_SCHEMA\n , TABLE_NAME\n , number = COUNT(*) \nFROM INFORMATION_SCHEMA.COLUMNS \nwhere TABLE_NAME!='database_firewall_rules'\nGROUP BY TABLE_SCHEMA, TABLE_NAME;",
"queryTimeout": "02:00:00",
"partitionOption": "None"
},
"dataset": {
"referenceName": "Dataset_for_column_count",
"type": "DatasetReference"
},
"firstRowOnly": false
}
},
{
"name": "Filter1",
"type": "Filter",
"dependsOn": [
{
"activity": "Get Metadata1",
"dependencyConditions": [
"Succeeded"
]
}
],
"userProperties": [],
"typeProperties": {
"items": {
"value": "#activity('Lookup1').output.value",
"type": "Expression"
},
"condition": {
"value": "#equals(activity('Get Metadata1').output.columnCount, item().number)",
"type": "Expression"
}
}
},
{
"name": "Get Metadata1",
"type": "GetMetadata",
"dependsOn": [
{
"activity": "Lookup1",
"dependencyConditions": [
"Succeeded"
]
}
],
"policy": {
"timeout": "0.12:00:00",
"retry": 0,
"retryIntervalInSeconds": 30,
"secureOutput": false,
"secureInput": false
},
"userProperties": [],
"typeProperties": {
"dataset": {
"referenceName": "Sourcefile",
"type": "DatasetReference",
"parameters": {
"sourcefilename": {
"value": "#pipeline().parameters.tfilename",
"type": "Expression"
}
}
},
"fieldList": [
"columnCount"
],
"storeSettings": {
"type": "AzureBlobFSReadSettings",
"enablePartitionDiscovery": false
},
"formatSettings": {
"type": "DelimitedTextReadSettings"
}
}
},
{
"name": "Copy data1",
"type": "Copy",
"dependsOn": [
{
"activity": "Filter1",
"dependencyConditions": [
"Succeeded"
]
}
],
"policy": {
"timeout": "0.12:00:00",
"retry": 0,
"retryIntervalInSeconds": 30,
"secureOutput": false,
"secureInput": false
},
"userProperties": [],
"typeProperties": {
"source": {
"type": "DelimitedTextSource",
"storeSettings": {
"type": "AzureBlobFSReadSettings",
"recursive": true,
"enablePartitionDiscovery": false
},
"formatSettings": {
"type": "DelimitedTextReadSettings"
}
},
"sink": {
"type": "AzureSqlSink",
"writeBehavior": "insert",
"sqlWriterUseTableLock": false
},
"enableStaging": false,
"translator": {
"type": "TabularTranslator",
"typeConversion": true,
"typeConversionSettings": {
"allowDataTruncation": true,
"treatBooleanAsNumber": false
}
}
},
"inputs": [
{
"referenceName": "Sourcefile",
"type": "DatasetReference",
"parameters": {
"sourcefilename": {
"value": "#pipeline().parameters.tfilename",
"type": "Expression"
}
}
}
],
"outputs": [
{
"referenceName": "AzureSqlTable1",
"type": "DatasetReference",
"parameters": {
"schema": {
"value": "#activity('Filter1').output.Value[0].TABLE_SCHEMA",
"type": "Expression"
},
"table_name": {
"value": "#activity('Filter1').output.Value[0].TABLE_NAME",
"type": "Expression"
}
}
}
]
},
{
"name": "Delete1",
"type": "Delete",
"dependsOn": [
{
"activity": "Copy data1",
"dependencyConditions": [
"Succeeded"
]
}
],
"policy": {
"timeout": "0.12:00:00",
"retry": 0,
"retryIntervalInSeconds": 30,
"secureOutput": false,
"secureInput": false
},
"userProperties": [],
"typeProperties": {
"dataset": {
"referenceName": "Sourcefile",
"type": "DatasetReference",
"parameters": {
"sourcefilename": {
"value": "#pipeline().parameters.tfilename",
"type": "Expression"
}
}
},
"enableLogging": false,
"storeSettings": {
"type": "AzureBlobFSReadSettings",
"recursive": true,
"enablePartitionDiscovery": false
}
}
}
],
"parameters": {
"tfilename": {
"type": "string"
}
},
"variables": {
"sample": {
"type": "String"
}
},
"annotations": []
}
}
Result:

Copy Files Names from a Folder as single File Name

I have a requirement as ADLS container contains a folder with multiple files
Assume
Product.csv
Market.csv
3.Sales.csv
I need to consider file names Product, Market, Sales and form as Product_Market_Sales.csv in a destination path.
I tried multiple ways to achieve this.
Can anyone help me.
Thanks
Shanu
If all of your files are having same schema, then the below approach will work for you.
These are my variables in the pipeline.
First use Get Meta data activity to get the ChildItems list from the folder.
It will give the array like below in an alphabetical order of file names.
Then, Give this array to a ForEach activity. Inside ForEach, split each file name with '.' or '.c' and store the first element of split in an append variable activity to res_filename variable using below dynamic content.
#split(item().name,'.')[0]
Now, join this array with '_' and store it in a string variable.
#concat(join(variables('names'),'_'),'.csv')
After this, use copy activity and give wild card path *.csv in the source. Use dataset parameters for the sink dataset and give the above variable as file name and give the Copy behavior as Merge files.
This is my Pipeline JSON
{
"name": "filenames",
"properties": {
"activities": [
{
"name": "Get Metadata1",
"type": "GetMetadata",
"dependsOn": [],
"policy": {
"timeout": "0.12:00:00",
"retry": 0,
"retryIntervalInSeconds": 30,
"secureOutput": false,
"secureInput": false
},
"userProperties": [],
"typeProperties": {
"dataset": {
"referenceName": "Sourcefiles",
"type": "DatasetReference"
},
"fieldList": [
"childItems"
],
"storeSettings": {
"type": "AzureBlobFSReadSettings",
"enablePartitionDiscovery": false
},
"formatSettings": {
"type": "DelimitedTextReadSettings"
}
}
},
{
"name": "ForEach1",
"type": "ForEach",
"dependsOn": [
{
"activity": "Get Metadata1",
"dependencyConditions": [
"Succeeded"
]
}
],
"userProperties": [],
"typeProperties": {
"items": {
"value": "#activity('Get Metadata1').output.childItems",
"type": "Expression"
},
"isSequential": true,
"activities": [
{
"name": "Append variable1",
"type": "AppendVariable",
"dependsOn": [],
"userProperties": [],
"typeProperties": {
"variableName": "names",
"value": {
"value": "#split(item().name,'.')[0]",
"type": "Expression"
}
}
}
]
}
},
{
"name": "Set variable1",
"type": "SetVariable",
"dependsOn": [
{
"activity": "ForEach1",
"dependencyConditions": [
"Succeeded"
]
}
],
"userProperties": [],
"typeProperties": {
"variableName": "res_filename",
"value": {
"value": "#concat(join(variables('names'),'_'),'.csv')",
"type": "Expression"
}
}
},
{
"name": "Copy data1",
"type": "Copy",
"dependsOn": [
{
"activity": "Set variable1",
"dependencyConditions": [
"Succeeded"
]
}
],
"policy": {
"timeout": "0.12:00:00",
"retry": 0,
"retryIntervalInSeconds": 30,
"secureOutput": false,
"secureInput": false
},
"userProperties": [],
"typeProperties": {
"source": {
"type": "DelimitedTextSource",
"storeSettings": {
"type": "AzureBlobFSReadSettings",
"recursive": true,
"wildcardFileName": "*.csv",
"enablePartitionDiscovery": false
},
"formatSettings": {
"type": "DelimitedTextReadSettings"
}
},
"sink": {
"type": "DelimitedTextSink",
"storeSettings": {
"type": "AzureBlobFSWriteSettings",
"copyBehavior": "MergeFiles"
},
"formatSettings": {
"type": "DelimitedTextWriteSettings",
"quoteAllText": true,
"fileExtension": ".txt"
}
},
"enableStaging": false,
"translator": {
"type": "TabularTranslator",
"typeConversion": true,
"typeConversionSettings": {
"allowDataTruncation": true,
"treatBooleanAsNumber": false
}
}
},
"inputs": [
{
"referenceName": "Sourcefiles",
"type": "DatasetReference"
}
],
"outputs": [
{
"referenceName": "target",
"type": "DatasetReference",
"parameters": {
"targetfilename": {
"value": "#variables('res_filename')",
"type": "Expression"
}
}
}
]
}
],
"variables": {
"names": {
"type": "Array"
},
"res_filename": {
"type": "String"
}
},
"annotations": []
}
}
Result file:
If you want to merge the above with different source schemas, then as per my knowledge its better do that by code as suggested in comments with databricks or Azure functions.

Get the list of names via Get Metadata activity

Below is the output of Get Metadata activity which contains name and type values for child items:
Is it possible to just get the name values and stored within an array variable without using any iteration.
Output = [csv1.csv,csv2.csv,csv3.csv,csv4.csv]
This was achieved via Foreach and append variable, we don't want to use iterations.
APPROACH 1 :
Using for each would be easier to complete the job. However, you can use string manipulation in the following way to get the desired result.
Store the output of get metadata child items in a variable as a string:
#string(activity('Get Metadata1').output.childItems)
Now replace all the unnecessary data with empty string '' using the following dynamic content:
#replace(replace(replace(replace(replace(replace(replace(replace(variables('tp'),'[{',''),'}]',''),'{',''),'}',''),'"type":"File"',''),'"',''),'name:',''),',,',',')
Now, ignore the last comma and split the above string with , as delimiter.
#split(substring(variables('ans'),0,sub(length(variables('ans')),1)),',')
APPROACH 2 :
Let's say your source has a combination of folders and files, and you want only the names of objects whose type is File in an array, then you can use the following approach. Here there is no need of for each, but you will have to use copy data and dataflows.
Create a copy data activity with a sample file with data like below:
Now create an additional column my_json with value as the following dynamic content:
#replace(string(activity('Get Metadata1').output.childItems),'"',pipeline().parameters.single_quote)
The following is the sink dataset configuration that I have taken:
In the mapping, just select this newly created column and remove the rest (demo) column.
Once this copy data executes, the file generated will be as shown below:
In dataflow, with the above file as source with settings as shown in the below image:
The data would be read as shown below:
Now, use aggregate transformation to group by the type column and collect() on name column.
The result would be as shown below:
Now, use conditional split to separate the file type data and folder type data with the condition type == 'File'
Now write the fileType data to sink cache. The data would look like this:
Back in the pipeline, use the following dynamic content to get the required array:
#activity('Data flow1').output.runStatus.output.sink1.value[0].array_of_types
Pipeline JSON for reference:
{
"name": "pipeline3",
"properties": {
"activities": [
{
"name": "Get Metadata1",
"type": "GetMetadata",
"dependsOn": [],
"policy": {
"timeout": "0.12:00:00",
"retry": 0,
"retryIntervalInSeconds": 30,
"secureOutput": false,
"secureInput": false
},
"userProperties": [],
"typeProperties": {
"dataset": {
"referenceName": "source1",
"type": "DatasetReference"
},
"fieldList": [
"childItems"
],
"storeSettings": {
"type": "AzureBlobFSReadSettings",
"recursive": true,
"enablePartitionDiscovery": false
},
"formatSettings": {
"type": "DelimitedTextReadSettings"
}
}
},
{
"name": "Copy data1",
"type": "Copy",
"dependsOn": [
{
"activity": "Get Metadata1",
"dependencyConditions": [
"Succeeded"
]
}
],
"policy": {
"timeout": "0.12:00:00",
"retry": 0,
"retryIntervalInSeconds": 30,
"secureOutput": false,
"secureInput": false
},
"userProperties": [],
"typeProperties": {
"source": {
"type": "DelimitedTextSource",
"additionalColumns": [
{
"name": "my_json",
"value": {
"value": "#replace(string(activity('Get Metadata1').output.childItems),'\"',pipeline().parameters.single_quote)",
"type": "Expression"
}
}
],
"storeSettings": {
"type": "AzureBlobFSReadSettings",
"recursive": true,
"enablePartitionDiscovery": false
},
"formatSettings": {
"type": "DelimitedTextReadSettings"
}
},
"sink": {
"type": "DelimitedTextSink",
"storeSettings": {
"type": "AzureBlobFSWriteSettings"
},
"formatSettings": {
"type": "DelimitedTextWriteSettings",
"quoteAllText": true,
"fileExtension": ".txt"
}
},
"enableStaging": false,
"translator": {
"type": "TabularTranslator",
"mappings": [
{
"source": {
"name": "my_json",
"type": "String"
},
"sink": {
"type": "String",
"physicalType": "String",
"ordinal": 1
}
}
],
"typeConversion": true,
"typeConversionSettings": {
"allowDataTruncation": true,
"treatBooleanAsNumber": false
}
}
},
"inputs": [
{
"referenceName": "csv1",
"type": "DatasetReference"
}
],
"outputs": [
{
"referenceName": "sink1",
"type": "DatasetReference"
}
]
},
{
"name": "Data flow1",
"type": "ExecuteDataFlow",
"dependsOn": [
{
"activity": "Copy data1",
"dependencyConditions": [
"Succeeded"
]
}
],
"policy": {
"timeout": "0.12:00:00",
"retry": 0,
"retryIntervalInSeconds": 30,
"secureOutput": false,
"secureInput": false
},
"userProperties": [],
"typeProperties": {
"dataflow": {
"referenceName": "dataflow2",
"type": "DataFlowReference"
},
"compute": {
"coreCount": 8,
"computeType": "General"
},
"traceLevel": "None"
}
},
{
"name": "Set variable2",
"type": "SetVariable",
"dependsOn": [
{
"activity": "Data flow1",
"dependencyConditions": [
"Succeeded"
]
}
],
"userProperties": [],
"typeProperties": {
"variableName": "req",
"value": {
"value": "#activity('Data flow1').output.runStatus.output.sink1.value[0].array_of_types",
"type": "Expression"
}
}
}
],
"parameters": {
"single_quote": {
"type": "string",
"defaultValue": "'"
}
},
"variables": {
"req": {
"type": "Array"
},
"tp": {
"type": "String"
},
"ans": {
"type": "String"
},
"req_array": {
"type": "Array"
}
},
"annotations": [],
"lastPublishTime": "2023-02-03T06:09:07Z"
},
"type": "Microsoft.DataFactory/factories/pipelines"
}
Dataflow JSON for reference:
{
"name": "dataflow2",
"properties": {
"type": "MappingDataFlow",
"typeProperties": {
"sources": [
{
"dataset": {
"referenceName": "Json3",
"type": "DatasetReference"
},
"name": "source1"
}
],
"sinks": [
{
"name": "sink1"
}
],
"transformations": [
{
"name": "aggregate1"
},
{
"name": "split1"
}
],
"scriptLines": [
"source(output(",
" name as string,",
" type as string",
" ),",
" allowSchemaDrift: true,",
" validateSchema: false,",
" ignoreNoFilesFound: false,",
" documentForm: 'arrayOfDocuments',",
" singleQuoted: true) ~> source1",
"source1 aggregate(groupBy(type),",
" array_of_types = collect(name)) ~> aggregate1",
"aggregate1 split(type == 'File',",
" disjoint: false) ~> split1#(fileType, folderType)",
"split1#fileType sink(validateSchema: false,",
" skipDuplicateMapInputs: true,",
" skipDuplicateMapOutputs: true,",
" store: 'cache',",
" format: 'inline',",
" output: true,",
" saveOrder: 1) ~> sink1"
]
}
}
}

Getting error on null and empty string while copying a csv file from blob container to Azure SQL DB

I tried all combination on the datatype of my data but each time my data factory pipeline is giving me this error:
{
"errorCode": "2200",
"message": "ErrorCode=UserErrorColumnNameNotAllowNull,'Type=Microsoft.DataTransfer.Common.Shared.HybridDeliveryException,Message=Empty or Null string found in Column Name 2. Please make sure column name not null and try again.,Source=Microsoft.DataTransfer.Common,'",
"failureType": "UserError",
"target": "xxx",
"details": []
}
My Copy data source code is something like this:{
"name": "xxx",
"description": "uuu",
"type": "Copy",
"dependsOn": [],
"policy": {
"timeout": "7.00:00:00",
"retry": 0,
"retryIntervalInSeconds": 30,
"secureOutput": false,
"secureInput": false
},
"userProperties": [],
"typeProperties": {
"source": {
"type": "DelimitedTextSource",
"storeSettings": {
"type": "AzureBlobStorageReadSettings",
"recursive": true,
"wildcardFileName": "*"
},
"formatSettings": {
"type": "DelimitedTextReadSettings"
}
},
"sink": {
"type": "AzureSqlSink"
},
"enableStaging": false,
"translator": {
"type": "TabularTranslator",
"mappings": [
{
"source": {
"name": "populationId",
"type": "Guid"
},
"sink": {
"name": "PopulationID",
"type": "String"
}
},
{
"source": {
"name": "inputTime",
"type": "DateTime"
},
"sink": {
"name": "inputTime",
"type": "DateTime"
}
},
{
"source": {
"name": "inputCount",
"type": "Decimal"
},
"sink": {
"name": "inputCount",
"type": "Decimal"
}
},
{
"source": {
"name": "inputBiomass",
"type": "Decimal"
},
"sink": {
"name": "inputBiomass",
"type": "Decimal"
}
},
{
"source": {
"name": "inputNumber",
"type": "Decimal"
},
"sink": {
"name": "inputNumber",
"type": "Decimal"
}
},
{
"source": {
"name": "utcOffset",
"type": "String"
},
"sink": {
"name": "utcOffset",
"type": "Int32"
}
},
{
"source": {
"name": "fishGroupName",
"type": "String"
},
"sink": {
"name": "fishgroupname",
"type": "String"
}
},
{
"source": {
"name": "yearClass",
"type": "String"
},
"sink": {
"name": "yearclass",
"type": "String"
}
}
]
}
},
"inputs": [
{
"referenceName": "DelimitedTextFTDimensions",
"type": "DatasetReference"
}
],
"outputs": [
{
"referenceName": "AzureSqlTable1",
"type": "DatasetReference"
}
]
}
Can anyone please help me understand the issue. I see in some blogs they ask me use treatnullasempty but I am not allowed to modify the JSON. is there a way to do that??
I suggest to using Data Flow DerivedColumn, DerivedColumn can help you build expression to replace the null column.
For example:
Derived Column, if Column_2 is null =true, return 'dd' :
iifNull(Column_2,'dd')
Mapping the column
Reference: Data transformation expressions in mapping data flow
Hope this helps.
fixed it.it was a easy fix as one of my column in destination was marked as not null, i changed it as null and it worked.

Azure Data Factory Dataset with # in column name

I have a Dataset coming from a Rest webservice having an # in the column name:
Like:
{
data[{
#id : 1,
#value : "a"
}, {
#id : 2,
#value : "b"
}
]
}
i want to use it in a foreach and access the specific column:
in the foreach i get the output like #activity('Lookup').output.value
in the foreach there is a stored procedure
as parameter input i tried to get the column: i tried #item().#value but got the error "the string character '#' at position 'xx' is not expected".
is there a way to escape the # in the column name? or can i rename the column?
Thank you very much
edit:
here is the JSON from the ADF pipeline:
{
"name": "pipeline3",
"properties": {
"activities": [
{
"name": "Lookup1",
"type": "Lookup",
"policy": {
"timeout": "7.00:00:00",
"retry": 0,
"retryIntervalInSeconds": 30,
"secureOutput": false,
"secureInput": false
},
"typeProperties": {
"source": {
"type": "HttpSource",
"httpRequestTimeout": "00:01:40"
},
"dataset": {
"referenceName": "HttpFile1",
"type": "DatasetReference"
},
"firstRowOnly": false
}
},
{
"name": "ForEach2",
"type": "ForEach",
"dependsOn": [
{
"activity": "Lookup1",
"dependencyConditions": [
"Succeeded"
]
}
],
"typeProperties": {
"items": {
"value": "#activity('Lookup1').output.value",
"type": "Expression"
},
"activities": [
{
"name": "Stored Procedure12",
"type": "SqlServerStoredProcedure",
"policy": {
"timeout": "7.00:00:00",
"retry": 0,
"retryIntervalInSeconds": 30,
"secureOutput": false,
"secureInput": false
},
"typeProperties": {
"storedProcedureName": "[dbo].[testnv]",
"storedProcedureParameters": {
"d": {
"value": {
"value": "#item().#accno",
"type": "Expression"
},
"type": "String"
}
}
},
"linkedServiceName": {
"referenceName": "AzureSqlDatabase1",
"type": "LinkedServiceReference"
}
}
]
}
}
]
},
"type": "Microsoft.DataFactory/factories/pipelines"
}
Please try "#item()['#accno']" for #item().#accno
Also replied in MSDN.