I am using an expression builder in derived column action of Azure data factory. I have an iif statement that that adds objects to a single array of objects based on whether 5 columns are null. Within the iif statement if the object is not null it adds it to the array object and I did not specify an action for when the columns is null. So if the 3 columns have a value then there should be 3 total objects in the array but the issue is for those 2 empty columns they show up as 2 "null" values within the array. I don't want that. I just want to cleanly have only the 3 objects in the array. How can I convert the null values to whitespace or is there a better way to get this done?
I've made a test to conver null value to whitespace successfully.
My source data is a csv file with 6 columns and some columns may contains Null value:
In the dataflow, I'm using Derived Column to convert the Null value.
In the data preview, we can see the Null value was replaced with whitespace/blank
Summary:
So we can use expression iif(isNull(<Column_Name>),'\n',<Column_Name>) to replace the NULL value to a whitespace.
Related
I am currently using Azure Datafactory in that I am creating a Derived column and since the field will always will be blank, so I want the value to be NULL
currently Derived Column I am doing this for adding the expression e.g. toString("null") and toString(null()) but this is appearing as string. I only want null to appear without quotes in Json document
I have reproduced the above and got below results.
I tried to give null() to a column and it gave the error like below.
So, in ADF Dataflow, there should be any wrap over null() with the functions like toInteger() or toString()
When I give toString(null()) when id is 4 in derived column of dataflow and the sink is a JSON, it gave me the below output.
You can see the row with id==4 skipped the null valued key in JSON. If you give toString(null()) same key in every row will be skipped.
You can go through this link by #ShaikMaheer-MSFT to understand more about this.
AFAIK, The workaround for this can be to store the null as 'null' string to get that key in JSON like this and later use this as per your requirement.
an absolute newbie here trying out Nifi and postgresql on docker compose.
I have a sample CSV file with 4 columns.
I want to split this CSV file into two
based on whether if it contains a row with null value or not.
Grade ,BreedNm ,Gender ,Price
C++ ,beef_cattle ,Female ,10094
C++ ,milk_cow ,Female ,null
null ,beef_cattle ,Male ,12704
B++ ,milk_cow ,Female ,16942
for example, above table should be split into two tables each containing row 1,4 and 2,3
and save each of them into a Postgresql table.
Below is what I have tried so far.
I was trying to
split flowfile into 2 and only save rows without null value on left side and with null values on right side.
Write each of them into a table each named 'valid' and 'invalid'
but I do not know how to split the csv file and save them as a psql table through Nifi.
Can anyone help?
What you could do is use a RouteOnContent with the "Content Must Contain Match" factor, with the match being null. Therefore, anything that matches null would be routed that way, and anything not matching null would be routed a different way. Not sure if it's possible the way you're doing it, but that is 1 possibility. The match could be something like (.*?)null
I used QueryRecord processor with two SQL statements each sorting out the rows with null value and the other without the null value and it worked as intended!
I have a cross tab element in my layout. One of the values in the column group is null and I don't want to display that column with null value in the output.
I have tried the checking the blank when null value and modifying print when expression property. But all it does is replacing the null value with blank but the column still comes in the output.
Current output
Expected output
To change the name in column header from null to something else you can modify the bucketExpression
<bucketExpression><![CDATA[($F{myField==null}?"New name":$F{myField})]]></bucketExpression>
Using this you can also move the values into a new bucket (column).
If you like to remove the whole column, AFIK there is no other method then to filter away the null values in your datasource before you pass it to the crosstab.
The options are:
If you are using an sql datasource just add the field in the where clause, for mysql that would be something like WHERE myField is not null
Use a filterExpression on your datasource eg.
<filterExpression><![CDATA[($F{myField}!=null)]]></filterExpression>
Develop a custom JRDatasource wrapping the datasource that ignores and jump if record has null value.
Conclusion: To remove the column, you need to remove the records from datasource before you pass it to crosstab.
I am using a tFilterRow to avoid empty rows. While trying to use it I am getting only one function value 'absolute value'.
I want to filter values with a length greater than 0.
Why I am not getting any other functions?
As mentioned in the comments, the length function is only available to schema columns that have the String data type.
To filter out any rows that have a null value in a column you can use a tFilterRow but configured so that the column being checked is not equal to null like so:
In the case you are dealing with the primitive int (rather than the Integer class) then the primitive can never be null and instead defaults to 0 so you'll want to set it as not equal to 0 instead.
When i have the follow query:
str(db(db.items.id==int(row)).select(db.items.imageName)) + "\n"
The output includes the field name:
items.imageName
homegear\homegear.jpg
How do i remove it so that field name will not be included and just the selected imagename.
i tried referencing it like a list [1] gives me an out of range error and [0] i end up with:
<Row {'imageName': 'homegear\\homegear.jpg'}>
The above is not a list, what object is that and how can i reference on it?
Thanks!
John
db(db.items.id==int(row)).select(db.items.imageName) returns a Rows object, and its __str__ method converts it to CSV output, which is what you are seeing.
A Rows object contains Row objects, and a Row object contains field values. To access an individual field value, you must first index the Rows object to extract the Row, and then get the individual field value as an attribute of the Row. So, in this case, it would be:
db(db.items.id==int(row)).select(db.items.imageName)[0].imageName
or:
db(db.items.id==int(row)).select(db.items.imageName).first().imageName
The advantage of rows.first() over rows[0] is that the former returns None in case there are no rows, whereas the latter will generate an exception (this doesn't help in the above case, because the subsequent attempt to access the .imageName attribute would raise an exception in either case if there were no rows).
Note, even when the select returns just a single row with a single field, you still have to explicitly extract the row and the field value as above.