Convert Scientific Notation to decimal in Azure Data Flows - azure-data-factory

I have an input file that has a column with mixture of decimal and scientific notation values. I have tried to convert the column using toDecimal, which seems to work for the non-zero decimals and the scientific notation values, but zeroes are converted to scientific notation. Is it possible to keep the zero values as plain zeroes?

Looks like by default 0/0.0 is converted to exponential value when using the decimal function. One way is to convert decimal to string.
case(Value == 0.0, toString(0.0),toString(toDecimal(Value,20,10)))

Related

Precision of double values in Spark

I am reading some data from a CSV file, and I have custom code to parse string values into different data types. For numbers, I use:
val format = NumberFormat.getNumberInstance()
which returns a DecimalFormat, and I call parse function on that to get my numeric value. DecimalFormat has arbitrary precision, so I am not losing any precision there. However, when the data is pushed into a Spark DataFrame, it is stored using DoubleType. At this point, I am expecting to see some precision issues, however I do not. I tried entering values from 0.1, 0.01, 0.001, ..., 1e-11 in my CSV file, and when I look at the values stored in the Spark DataFrame, they are all accurately represented (i.e. not like 0.099999999). I am surprised by this behavior since I do not expect a double value to store arbitrary precision. Can anyone help me understand the magic here?
Cheers!
There are probably two issues here: the number of significant digits that a Double can represent in its mantissa; and the range of its exponent.
Roughly, a Double has about 16 (decimal) digits of precision, and the exponent can cover the range from about 10^-308 to 10^+308. (Obviously, the actual limits are set by the binary representation used by the ieee754 format.)
When you try to store a number like 1e-11, this can be accurately approximated within the 56 bits available in the mantissa. Where you'll get accuracy issues is when you want to subtract two numbers that are so close together that they only differ by a small number of the least significant bits (assuming that their mantissas have been aligned shifted so that their exponents are the same).
For example, if you try (1e20 + 2) - (1e20 + 1), you'd hope to get 1, but actually you'll get zero. This is because a Double does not have enough precision to represent the 20 (decimal) digits needed. However, (1e100 + 2e90) - (1e100 + 1e90) is computed to be almost exactly 1e90, as it should be.

Matlab convert form hex to float

I'm working with a device that send to me hex values, and I need convert those values to his real float value. Someone know how to convert from hex values to float in matlab?
Thx
Take a look at hex2dec, to convert your hex to decimal.
Hex format is inherently integer (the floating point position is not defined), so you will have to give more info: Does the hex represent a mantissa-exponent floating point number? Does it represent a fixed-point number?
the hex represent a mantissa-exponent floating point number. for exemple 0x44ADE000 equal to 1391.0

How to fscanf a combination of float and string?

The data is like this
5.1,3.5,1.4,0.2,Iris-setosa
while I read it using this
data = fscanf(file, '%f,%f,%f,%f,%s');
and it turned out that data is an array of float rather than a combination of float and string. So how do I read this data from txt?
From the Matlab docs for fscanf:
Output Arguments A: An array. If the format includes:
Only numeric specifiers, A is numeric. ... Only character or
string specifiers (%c or %s), A is a character array. ... A
combination of numeric and character specifiers, A is numeric, of
class double. MATLAB converts each character to its numeric
equivalent. This conversion occurs even when the format explicitly
skips all numeric values (for example, a format of '%*d %s').
So your best bet is to read everything in as strings, and then convert the numeric strings to numeric values, using str2num or str2double or similar.
Alternatively, since you know there are 4 floating point values that really store a floating point value, and then the rest store the numeric ASCII values for the string, you can always split up your data and cast the part you know should be a string to char. Something like:
flt = data(1:4);
str = char(data(5:end));

num2hex vs dec2hex in MATLAB

I don't understand the difference between hex2dec and hex2num and their opposites in MATLAB.
Say I had a hex value, 3FD3B502C055FE00. When I use hex2dec, I get 4.5992e+018
. When I use hex2num, I get 0.3079. What's going on?
These functions work very differently, as you noticed. hex2dec converts a hexadecimal string to a floating-point number by raw byte conversion, and I think you found that this works as you were expecting. However, hex2num converts a hexadecimal string to its IEEE double-precision representation.
The IEEE 754 double precision standard calls for a one-bit sign, a 11-bit exponent, and a 52-bit fraction. So hex2num parses the hexadecimal in this format, yielding a very different result from hex2dec.
hex2dec -
Convert hexadecimal number string to decimal number
Description
d = hex2dec('hex_value') converts hex_value to its floating-point integer representation. The argument hex_value is a hexadecimal integer stored in a MATLAB string. The value of hex_value must be smaller than hexadecimal 10,000,000,000,000.
If hex_value is a character array, each row is interpreted as a hexadecimal string.
hex2num -
Convert hexadecimal number string to double-precision number
Description
n = hex2num(S), where S is a 16 character string representing a hexadecimal number, returns the IEEEĀ® double-precision floating-point number n that it represents. Fewer than 16 characters are padded on the right with zeros. If S is a character array, each row is interpreted as a double-precision number.
NaNs, infinities and denorms are handled correctly.
Knowing that 3FD3B502C055FE00 is bigger than (10,000,000,000,000)16, out of range.

Float to text behavior of MATLAB's fprintf()

When using fprintf to convert floats to text in a decimal representation, the output is a series of decimal digits (potentially beginning with 0).
How does this representation work?
>>fprintf('%tu\n',pi)
>>1078530011
>>fprintf('%bu\n',pi)
>>04614256656552045848
Apologies if this is very trivial; I can't find an answer elsewhere, in part because searches are swamped by the various decimal data types available.
Note that the %t and %b flags are two of the differences from C's fprintf(). According to the documentation, it prints a float or double respectively "rather than an unsigned integer." o, x and u switches between octal, hex and decimal.
This representation is the binary IEEE 754 floating point representation of the number, printed as an unsigned integer.
The IEEE 754 Converter website tells us that the IEEE 754 single-precision representation of Pi (approximately 3.1415927) is 40490FDB hexadecimal, which is 1078530011 decimal (the number that you saw printed). The '%bu' format specifier works similarly but outputs the double-precision representation.
The purpose of these format specifiers is to allow you to store a bit-exact representation of a floating-point value to a text file. The alternative approach of printing the floating-point value in human-readable form requires a lot of care if you want to guarantee bit-exact storage, and there might be some edge cases (denormalized values...?) that you won't be able to store precisely at all.
If you were to print the number as hexadecimals:
>> fprintf('%bx\n', pi)
400921fb54442d18
>> fprintf('%tx\n', single(pi))
40490fdb
then the formatters '%bx' and '%tx' are simply equivalent to using NUM2HEX:
>> num2hex( pi )
400921fb54442d18
>> num2hex( single(pi) )
40490fdb
Another way is to simply set the default output format to hexadecimals using:
>> format hex
>> pi
400921fb54442d18
>> single(pi)
40490fdb
On a related note, there was a recent article by #Loren:
"How Many Digits to Write?"
where they try to find how many decimal digits you need to write in order to retain the number's full precision when re-read in MATLAB.