Decimal number from JSON truncated when read from Mapping Data Flow - azure-data-factory

When reading a decimal number from JSON data files using a Mapping Data Flow, the decimal digits are truncated.
[Source data]
{
"value": 1123456789.12345678912345678912
}
In Data Factory, the source dataset is configured with no schema. The Mapping Data Flow projection defines a decimal data type with sufficient precision and scale.
[Mapping Data Flow script]
source(output(
value as decimal(35,20)
),
...
However, when viewing the value in the 'Data preview' window, or reviewing pipeline output, the value is truncated.
[Output]
1123456789.12345670000000000000
This issue doesn't occur with other file formats, such as delimited text.
When previewing the data from the source dataset, the decimal digits are truncated in the same way. This occurs whether or not a schema is set. If a schema is set, the data type is number rather than decimal since it's JSON. Mozilla Developer Network documentation calls out the varied number of decimal digits supported by browsers, so I wonder if this is down to the JSON parser being used.
Is this expected behaviour? Can Data Factory be configured to support a the full number of decimal places when working with JSON? Unfortunately this is calling into question whether it's viable to perform aggregate calculations in Data Factory.

I've created a same test as yours and got the same result as follows:
After I changed the soure data, I put double quotes on the value:
Then I use a toDecimal(Value,35,20) to convert the string type to decimal type:
It seems work well. So we can get conclusion:
Don't let ADF do default data type conversion, it will truncate the length of the value.
This issue doesn't occur with other file formats, such as delimited text. Because the default value is the string type.

Its common issue with JSON parser, FloatParseHandling setting is available in .net library but not in ADF.
FloatParseHandling can set to Decimal while parsing the file through .net.
Until the setting is made available in ADF need to try the workaround - using quotes ["] at both end to make the value string and get it converted after loading.

Related

Azure Data Factory schema mapping not working with SQL sink

I have a simple pipeline that loads data from a csv file to an Azure SQL db.
I have added a data flow where I have ensured all schema matches the SQL table. I have a specific field which contains numbers with leading zeros. The data type in the source - projection is set to string. The field is mapped to the SQL sink showing as string data-type. The field in SQL has nvarchar(50) data-type.
Once the pipeline is run, all the leading zeros are lost and the field appears to be treated as decimal:
Original data: 0012345
Inserted data: 12345.0
The CSV data shown in the data preview is showing correctly, however for some reason it loses its formatting during insert.
Any ideas how I can get it to insert correctly?
I had repro’d in my lab and was able to load as expected. Please see the below repro details.
Source file (CSV file):
Sink table (SQL table):
ADF:
Connect the data flow source to the CSV source file. As my file is in text format, all the source columns in the projection are in a string.
Source data preview:
Connect sink to Azure SQL database to load the data to the destination table.
Data in Azure SQL database table.
Note: You can all add derived columns before sink to convert the value to string as the sink data type is a string.
Thank you very much for your response.
As per your post the DF dataflow appears to be working correctly. I have finally discovered an issue with the transformation - I have an Azure batch service which runs a python script, which does a basic transformation and saves the output to a csv file.
Interestingly, when I preview the data in the dataflow, it looks as expected. However, the values stored in SQL are not.
For the sake of others having a similar issue, my existing python script used to convert a 'float' datatype column to string-type. Upon conversion, it used to retain 1 decimal number but as all of my numbers are integers, they were ending up with .0.
The solution was to convert values to integer and then to string:
df['col_name'] = df['col_name'].astype('Int64').astype('str')

How to Validate Data issue for fixed length file in Azure Data Factory

I am reading a fixed-width file in mapping Data Flow and loading it to the table. I want to validate the fields, datatype, lengths of the field that I am extracting in the Derived column using substring.
How to Achieve this in ADF
Use a Conditional Split and add a condition for each property of the field that you wish to test for. For data type checking, we literally just landed new isInteger(), isString() ... functions today. The docs are still in the printing press, but you'll find them in the expression builder. For length use length().

ADF copy task field type boolean as lowercase

In ADF i have a copy task that copies data from JSON to Delimited text, i get the result as
A | B | C
"name"|False|"description"
Json record is like
{"A":"name","B":"false","C":"description"}
Excepted result is as below
A | B | C
"name"|false|"description"
The bool value have to be in lowercase in the resulting Delimited text file, what am i missing?
I can reproduce this. The reason is you are converting the string to the ADF dataytpe "Boolean" which for some reason renders the values in Proper case.
Do you really have a receiving process which is case-sensitive? If you need to maintain the case of the source value simply remove the mapping, ie
If you do need some kind of custom mapping, then simply change the mapping data type to String and not Boolean.
UPDATE after new JSON provided
OK, so your first json sample has the "false" value in quotes so is treated as a string. In your second example, your json "true" is not in quotes so is a genuine json boolean value. ADF is auto-detecting this at run time and it seems like it can not be over-ridden as far as I can tell. Happy to be corrected. As an alternative, consider altering your original json to a string, as per you original example OR copying the file to Blob Store or Azure Data Lake, runniing some transform on it (eg Databricks) and then outputting the file. Alternately consider Mapping Data Flows.

BLOB to String conversion - DB2

A table in DB2 contains BLOB data. I need to convert it into String so that it can be viewed in a readable format. I tried options like
getting blob object and converting to byte array
string buffer reader
sqoop import using --map-column-java and --map-column-hive options.
After these conversions also i am not able to view the data in readable format. Its in an unreadable format like 1f8b0000..
Please suggest a solution on how to handle this scenario.
I think you need to look at the CAST function.
SELECT CAST(BLOB_VAR as VARCHAR(SIZE) CCSID UNICODE) as CHAR_FLD
Also, be advised that max value of SIZE is 32K.
Let me know if you tried this.
1f8b0000 indicates the data in gzip form and therefore you have to unzip it.

deserialize cassandra row key

I'm trying to use the sstablekeys utility for Cassandra to retrieve all the keys currently in the sstables for a Cassandra cluster. The format they come back in what appears to be serialized format when I run sstablekeys for a single sstable. Does anyone know how to deserialize the keys or get them back into their original format? They were inserted into Cassandra using astyanax, where the serialized type is a tuple in Scala. The key_validation_class for the column family is a CompositeType.
Thanks to the comment by Thomas Stets, I was able to figure out that the keys are actually just converted to hex and printed out. See here for a way of getting them back to their original format.
For the specific problem of figuring out the format of a CompositeType row key and unhexifying it, see the Cassandra source which describes the format of a CompositeType key that gets output by sstablekeys. With CompositeType(UTF8Type, UTF8Type, Int32Type), the UTF8Type treats bytes as ASCII characters (so the function in the link above works in this case), but with Int32Type, you must interpret the bytes as one number.