Can't convert table value to string in ADF - azure-data-factory

I'm trying to loop over data in an SQL table, but when I'm trying to use the value inside a foreach loop action using the #item() i get the error:
"Failed to convert the value in 'table' property to 'System.String' type. Please make sure the payload structure and value are correct."
So the row value can't be converted to a string.
Could that be my problem? and if so, what can I do about it?
Here is the pipeline:

I reproduced the above scenario with SQL table containing table names in lookup and csv files from ADLS gen2 in copy activity and got the same error.
The above error arises when we gave the lookup output array items directly into a string parameter inside ForEach.
If we look at the below lookup output,
The above value array is not a normal array, it is an array of objects. So, #item() in 1st iteration of ForEach means one object { "tablename": "sample1.csv" }. But our parameter expects a string value and that's why it is giving the above error.
To resolve this, give #item().tablename which will give our table names in every iteration inside ForEach.
My repro for your reference:
I have given same in sink also and this is my output.
Pipeline Execution
Copied data in target

Related

How to add null value in Azure Datafactory Derived columns expression builder

I am currently using Azure Datafactory in that I am creating a Derived column and since the field will always will be blank, so I want the value to be NULL
currently Derived Column I am doing this for adding the expression e.g. toString("null") and toString(null()) but this is appearing as string. I only want null to appear without quotes in Json document
I have reproduced the above and got below results.
I tried to give null() to a column and it gave the error like below.
So, in ADF Dataflow, there should be any wrap over null() with the functions like toInteger() or toString()
When I give toString(null()) when id is 4 in derived column of dataflow and the sink is a JSON, it gave me the below output.
You can see the row with id==4 skipped the null valued key in JSON. If you give toString(null()) same key in every row will be skipped.
You can go through this link by #ShaikMaheer-MSFT to understand more about this.
AFAIK, The workaround for this can be to store the null as 'null' string to get that key in JSON like this and later use this as per your requirement.

I don't have access to foreach item from expression builder under data source in ADF

I'm trying to save some values in a SQL table and then loop over the values to use each value as path in an OData source.
First I have defined an array where to save the values:
then the variable is set to #activity('Lookup1').output.value
Now the data is accessed from foreach.
Inside the foreach loop I have a copy activity where the Odata source should be se as the value.
But I don't have access to the item. Why is that?
The above approach will work for you when debug the pipeline even through it is giving warning in the dataset.
The dataset dynamic content doesn't know about ForEach #item() at first because it belongs to pipeline dynamic content. Thats why it is giving warning in the dataset.
But at debug time, it identifies the #item() value.
Please go through the below 2 scenarios to understand it better.
Here I am using ADLS as source and target with array of files for sample and passing to the ForEach.
These are my source files.
I have Created an array variable with above names like ["sample1.csv","sample2.csv"] and passed to ForEach,
Using #item() in dataset:
Source dataset and target dataset.
You can see it is giving the same warning, but it will give the correct result when you debug. But in dataset preview, it will give the error.
Copied files to target successfully.
Using #item() inside ForEach using dataset parameters:
I have created the parameters and used in datasets.
Source:
Target:
Copy activity inside ForEach:
Source parameter #item():
sink parameter #item():
Files copied to target successfully.

Azure Data Factory: append array to array in ForEach

In Azure Data Factory pipeline, I have ForEach1 loop over Databricks Activities. Those Databricks Activities output arrays of different sizes. I would like to union and pass those arrays to another ForEach2 loop so that every element of every array would be an item in this new ForEach2 loop.
How could I collect output arrays from ForEach1 into one big array? I've tried Append Variable Activity, but got the following error:
The value of type 'Array' cannot be appended to the variable of type 'Array'.
The action type 'AppendToArrayVariable' only supports values of types 'Float, Integer, String, Boolean, Object'.
Is there a way to union/merge arrays inside ForEach1 loop? Or are there any other ways to pass arrays to ForEach2 where each element of each array would be considered as a separate item and ForEach2 would loop over each item?
There is a collection function called union() in Azure data factory which takes 2 arguments (both of type array or object). This can be used to achieve your requirement. You can follow the following example which I have tried with Get Metadata activity instead of Databricks Notebook activity.
I have a container called input and there are 2 folders a and b in it. These folders have some files. Using the similar approach, I am appending the array of child items (file names) generated by Get Metadata activity in each iteration to get a list of all file names (one big array). The following is my folder structure inside which files are present:
First I used get metadata to get names of folders inside container.
I used #activity('folder_names').output.childItems for items value in ForEach activity. Now Iniside for each, I have again used get metadata to get child items of each folder (I created a dataset and gave dynamic value for folder name in path).
You can use the procedure from below to get the requirement:
I have given the output of the 2nd get metadata activity (files in folder) to a set variable activity. I created a new variable current_file_list (array type) and gave its value as:
#union(variables('list_of_files'),activity('files in folder').output.childItems)
Note: Instead of activity('files in folder').output.childItems) in the above union, you can use the array returned by your databricks activity in Foreach1 for each iteration.
The list_of_files is another array type variable (created using another set variable activity) which is the final array that contains all elements as one big array. I am assigning this variable value as #variables('current_file_list')
This indirectly means that list_of_files value is the union of previous array elements and the current folder child item array.
Reason: If we use one variable (say list_of_files) and give its value as #union(variables('list_of_files'),activity('files in folder').output.childItems) it throws an error. We cannot use self-reference of a variable in azure data factory dynamic content so, we need to create 2 variables to overcome this.
When I debug the pipeline, it gives expected output, and we can see the output produced after each activity.
The following are the output images:
Input for list of files in current folder first iteration will be:
Input for list of files in current folder second iteration will be:
Final list_of_files array with all elements in single array:
You can follow the above approach to the combined array and use it for your ForEach2 activity.

Creating an array of columns from an array of column names in data flow

How can I create an array of columns from an array of column names in dataflow?
The following creates an array of sorted columns with and exception of the last column:
sort(slice(columnNames(), 1, size(columnNames()) - 1), compare(#item1, #item2))
I want to get an array of the columns for this array of column names. I tried this:
toString(byNames(sort(slice(columnNames(), 1, size(columnNames()) - 1), compare(#item1, #item2))))
But I keep getting the error:
Column name function 'byNames' does not accept column or argument parameters
Please can anyone help me with a workaround for this?
Update--
It seems using ColumnNames() in any way (directly or assigning it to a parameter) seems to be leading to error. As at runtime on Spark it is fed to the byNames() function. Due to unavailability of a way to re-introduce as parameter or assign variable in Data Flow directly, see below which works for me.
Have empty string array type parameter in DataFLow
Use sha2 function as usual in derived column with parameter sha2(256,byNames($cols))
Create pipeline, there use getMetadata to get Structure from which you can get column names.
For each column, inside ForEach activity append to a variable.
Next, connect to DataFLow and pass the variable containing Column names.
The documentation for the byNames function states 'Computed inputs are not supported but you can use parameter substitutions'. This explains why you should use a parameter as input to create the array used in the byNames function:
Example: Where $cols parameter hold the list of columns.
sha2(256,byNames(split($cols,',')))
You can use computed columns names as input by creating the array prior to using in function. Instead of creating the expression in-line in the function call, set the column values in a parameter prior and then use it in your function directly afterwards.
For a parameter $cols of type array:
$cols = sort(slice(columnNames(), 1, size(columnNames()) - 1), compare(#item1, #item2))
toString(byNames(sort(slice($cols, compare(#item1, #item2))))
Refer: byNames

Expression invalid

Azure Data Factory error:
The expression 'item().$v.collection.$v' cannot be evaluated because property '$v' doesn't exist, available properties are '$t, $v._id.$t, $v._id.$v, $v.id.$t, $v.id.$v, $v.database.$t, $v.database.$v, $v.collection.$t, $v.collection.$v, id, _self, _etag, _rid, _attachments, _ts'
How can I get around that ?
I am using this expression in forEach which is connected to lookup activity which is reading values from CosmosDB. I am interested only in single column, but SQL:
select collection from backups
didn't work, hence I switched from "Query" to "Table", hence output of lookup activity contains json object with fields containing $
this error results from for each activity treats "." as the property accessor, please use the expression "#item()['$v.collection.$v']" to get around the error. Thanks.