How to add null value in Azure Datafactory Derived columns expression builder - azure-data-factory

I am currently using Azure Datafactory in that I am creating a Derived column and since the field will always will be blank, so I want the value to be NULL
currently Derived Column I am doing this for adding the expression e.g. toString("null") and toString(null()) but this is appearing as string. I only want null to appear without quotes in Json document

I have reproduced the above and got below results.
I tried to give null() to a column and it gave the error like below.
So, in ADF Dataflow, there should be any wrap over null() with the functions like toInteger() or toString()
When I give toString(null()) when id is 4 in derived column of dataflow and the sink is a JSON, it gave me the below output.
You can see the row with id==4 skipped the null valued key in JSON. If you give toString(null()) same key in every row will be skipped.
You can go through this link by #ShaikMaheer-MSFT to understand more about this.
AFAIK, The workaround for this can be to store the null as 'null' string to get that key in JSON like this and later use this as per your requirement.

Related

How to format the negative values in dataflow?

I have below column in my table
I need an output as below
I am using Dataflow in the Azure data factory and unable to get the above output. I used derived column but no success. I used replace function, but it's not coming correct. Can anyone advise how to format this in dataflow?
Source is taken in data flow with data as in below image.
Derived column transformation is added next to source.
New column is added and the expression is given as
iif(left(id,1)=='-', replace(replace(id,"USD",""),"-","-$"), concat("$", replace(id,"USD","")))
Output of Derived Column activity

Azure Data Factory DataFlow md5 specific columns

First of all, I have my array of columns parameter called $array_merge_keys
$array_merge_keys = ['Column1', 'Column2', 'NoColumnInSomeCases']
So then I am going to hash them, if the third NoColumnInSomeCases is not existed, I would like to treat it as null or some strings else there value.
But actually, when I use them with byNames(), it would return NULL because the last is not existed, even though first and second still have values. So I would expect byNames($array_merge_keys) always return value in order to hash them.
Since that problem cannot be solved, I am back to filter these only existed column
filter(columnNames('', true()), contains(['Column1', 'Column2', 'NoColumnInSomeCases'], #item_1 == #item)) => ['Column1', 'Column2']
But it comes to another problem that byNames() cannot compute on the fly, it said 'byNames' does not accept column or argument
array(byNames(filter(columnNames('', true()), contains(['Column1', 'Column2', 'NoColumnInSomeCases'], #item_1 == #item))))
Spark job failed: { "text/plain":
"{"runId":"649f28bf-35af-4472-a170-1b6ece50c551","sessionId":"a26089f4-b0f4-4d24-8b79-d2a91a9c52af","status":"Failed","payload":{"statusCode":400,"shortMessage":"DF-EXPR-030
at Derive 'CreateTypeFromFile'(Line 35/Col 36): Column name function
'byNames' does not accept column or argument
parameters","detailedMessage":"Failure 2022-04-13 05:26:31.317
failed DebugManager.processJob,
run=649f28bf-35af-4472-a170-1b6ece50c551, errorMessage=DF-EXPR-030 at
Derive 'CreateTypeFromFile'(Line 35/Col 36): Column name function
'byNames' does not accept column or argument parameters"}}\n" } -
RunId: 649f28bf-35af-4472-a170-1b6ece50c551
I have tried lots of ways, even created a new derived column (before that stream name) to store ['Column1', 'Column2']. But it said that column cannot be referenced within byNames() function
Do we have any elegant solution?
It is true that byName() cannot evaluate with late binding. You need to either use a Select transformation to set the columns in the stream you wish to hash first or send in the column names via a parameter. Since that is "early column binding", byName() will work.
You can use a get metadata activity in the pipeline to inspect which columns are present in your source before calling the data flow, allowing you to send a pipeline parameter with just those columns you wish to hash.
Alternatively, you can create a new branch, use a select matching rule, then hash the row based on those columns (see example below).

How to Validate Data issue for fixed length file in Azure Data Factory

I am reading a fixed-width file in mapping Data Flow and loading it to the table. I want to validate the fields, datatype, lengths of the field that I am extracting in the Derived column using substring.
How to Achieve this in ADF
Use a Conditional Split and add a condition for each property of the field that you wish to test for. For data type checking, we literally just landed new isInteger(), isString() ... functions today. The docs are still in the printing press, but you'll find them in the expression builder. For length use length().

Azure Data flow convert null to whitespace/blank

I am using an expression builder in derived column action of Azure data factory. I have an iif statement that that adds objects to a single array of objects based on whether 5 columns are null. Within the iif statement if the object is not null it adds it to the array object and I did not specify an action for when the columns is null. So if the 3 columns have a value then there should be 3 total objects in the array but the issue is for those 2 empty columns they show up as 2 "null" values within the array. I don't want that. I just want to cleanly have only the 3 objects in the array. How can I convert the null values to whitespace or is there a better way to get this done?
I've made a test to conver null value to whitespace successfully.
My source data is a csv file with 6 columns and some columns may contains Null value:
In the dataflow, I'm using Derived Column to convert the Null value.
In the data preview, we can see the Null value was replaced with whitespace/blank
Summary:
So we can use expression iif(isNull(<Column_Name>),'\n',<Column_Name>) to replace the NULL value to a whitespace.

Talend Data Itegration: Avoid nulls coming out of tExtractXMLField?

I have this simple flow in Talend DI 6 (simplified for posting on SO):
The last step crashes with a NullPointerException, because missing XML attributes are returned as null.
Is there a way to get empty string values instead of nulls?
For now I'm using a tReplace step to remove nulls as a work-around, but it's tedious and adds to the cost of maintenance by creating one more place where the list of attributes needs to be maintained.
In Talend DI 5.6.2 it is possible to add default data values to the schema. The column in the schema is called "Default". If you expect strings, you can set an empty string, which is set if the column value is null:
Talend schema view with Default column
Works also for other data types. Talend DI 6 should still be able to do this, although the field might be renamed.