Azure data factory Dataset - azure-data-factory

I have a DelimitedText ADF dataset. It is pipe delimited. When I use this as a source in copy data activity in a pipeline and write the file data to a SQL database table, blank values are loaded as NULL.
How can I avoid this? I want blank values to read as blank values and write into database table as blank values.
I tried keeping NULL value as blank and "treatEmptyAsNull": false in dataset json; both didnt work.
Any suggestions?

Tested and set NULL value property with '', it works:
Result comparation:
I tested expression concat(''), it also works.
Hope this helps.

Related

How to add null value in Azure Datafactory Derived columns expression builder

I am currently using Azure Datafactory in that I am creating a Derived column and since the field will always will be blank, so I want the value to be NULL
currently Derived Column I am doing this for adding the expression e.g. toString("null") and toString(null()) but this is appearing as string. I only want null to appear without quotes in Json document
I have reproduced the above and got below results.
I tried to give null() to a column and it gave the error like below.
So, in ADF Dataflow, there should be any wrap over null() with the functions like toInteger() or toString()
When I give toString(null()) when id is 4 in derived column of dataflow and the sink is a JSON, it gave me the below output.
You can see the row with id==4 skipped the null valued key in JSON. If you give toString(null()) same key in every row will be skipped.
You can go through this link by #ShaikMaheer-MSFT to understand more about this.
AFAIK, The workaround for this can be to store the null as 'null' string to get that key in JSON like this and later use this as per your requirement.

Azure Data Factory schema mapping not working with SQL sink

I have a simple pipeline that loads data from a csv file to an Azure SQL db.
I have added a data flow where I have ensured all schema matches the SQL table. I have a specific field which contains numbers with leading zeros. The data type in the source - projection is set to string. The field is mapped to the SQL sink showing as string data-type. The field in SQL has nvarchar(50) data-type.
Once the pipeline is run, all the leading zeros are lost and the field appears to be treated as decimal:
Original data: 0012345
Inserted data: 12345.0
The CSV data shown in the data preview is showing correctly, however for some reason it loses its formatting during insert.
Any ideas how I can get it to insert correctly?
I had repro’d in my lab and was able to load as expected. Please see the below repro details.
Source file (CSV file):
Sink table (SQL table):
ADF:
Connect the data flow source to the CSV source file. As my file is in text format, all the source columns in the projection are in a string.
Source data preview:
Connect sink to Azure SQL database to load the data to the destination table.
Data in Azure SQL database table.
Note: You can all add derived columns before sink to convert the value to string as the sink data type is a string.
Thank you very much for your response.
As per your post the DF dataflow appears to be working correctly. I have finally discovered an issue with the transformation - I have an Azure batch service which runs a python script, which does a basic transformation and saves the output to a csv file.
Interestingly, when I preview the data in the dataflow, it looks as expected. However, the values stored in SQL are not.
For the sake of others having a similar issue, my existing python script used to convert a 'float' datatype column to string-type. Upon conversion, it used to retain 1 decimal number but as all of my numbers are integers, they were ending up with .0.
The solution was to convert values to integer and then to string:
df['col_name'] = df['col_name'].astype('Int64').astype('str')

Need recommendation in adf pipeline source properties while loading delimited text files from azure blob to snowflake

We are trying to load a delimited file which has blank data for few columns located in azure blob and would like to get a value like NA in our target snowflake table whenever we encounter a blank value in source csv file. We have been trying to provide a NA against the Null option but it is not working, any suggestions?
Here is the screenshot of what i have mentioned above.
I have used data flow activity in Azure data factory to resolve this issue.
Source file with NULL value in “Name” column.
Now use Derived Column transformation. In Derived column's settings Select column name and use iifNull({Name}, 'NA') expression.
In data preview, Null value in Name column is replaced with NA.
You can follow the above steps to replace Null values and Sink data from blob storage to Snowflake.

Azure ADF Copy Activity with Trailing Column Delimiter

I have a strange source CSV file where it contains a trailing column delimiter at the end of each record just before the carriage return/new line.
When ADF is previewing this data, it displays only 2 columns without issue and all the data rows. However, when using the copy activity, it fails with the following exception.
ErrorCode=DelimitedTextColumnNameNotAllowNull,'Type=Microsoft.DataTransfer.Common.Shared.HybridDeliveryException,Message=The
name of column index 3 is empty. Make sure column name is properly
specified in the header
Now I understand why it's complaining about this due to trailing delimiter, but my question is whether or not there is a way to deal with this condition? I've tried including the trailing comma in the record delimiter (,\r\n), but then it just pivots the data where all the columns become rows.
Is there a way to address this condition in copy activity?
When preview the data in dataset, it seams correct:
But actually in copy actives, the data will derived to 3 columns by the column delimiter ",", the third column is empty or NULL value. This will cause the error.
If you use Data Flow import projection from source, you can see the third column:
Just for now, copy active doesn't support modify the data schema. You must use Data flow Derived Column to create a new schema for the source. For example:
Then mapping the new column/schema to sink will solve the problem.
HTH.
Use a different encoding for your CSV. CSV utf-8 will do the trick.

db2 import csv with null date

I run this
db2 "IMPORT FROM C:\my.csv OF DEL MODIFIED BY COLDEL, LOBSINFILE DATEFORMAT=\"D/MM/YYYY\" SKIPCOUNT 1 REPLACE INTO scratch.table_name"
However some of my rows have a empty date field so I get this error
SQL3191N which begins with """" does not match the user specified DATEFORMAT, TIMEFORMAT, or TIMESTAMPFORMAT. The row will be rejected.
My CSV file looks like this
"XX","25/10/1985"
"YY",""
"ZZ","25/10/1985"
I realise if I insert charater instead of a blank string I could use NULL INDICATORS paramater.
However I do not have access to change the CSV file. Is there a way to ignore import a blank string as a null?
This is an error in your input file. DB2 differentiates between a NULL and a zero-length string. If you need to have NULL dates, a NULL would have no quotes at all, like:
"AA",
If you can't change the format of the input file, you have 2 options:
Insert your data into a staging table (changing the DATE column to a char) and then using SQL to populate the ultimate target table
Write a program to parse ("fix") the input file and then import the resulting fixed data. You can often do this without having to write the entire file out to disk – your program could write to a named pipe, and the DB2 IMPORT (and LOAD) utility is capable of reading from named pipes.
I'm not aware of anything. Yes, ideally that date field should be null.
Probably the best thing to do would be load the data into a scratch/temp table where that isn't a date column - just leave it as character data (it looks like you're already using a scratch table anyways). It should be trivial after that to use a CASE statement to transform the information into a null date if the value is blank, when doing your INSERT to the real table.