Exponential values when converted to double in Pyspark

Exponential values when converted to double in Pyspark - pyspark

I have a data frame with a column in string.
I'm trying to limit the decimal precision to 6 digits.
Tried multiple methods to do this, these methods returns values with datatype of String or Decimal Type.
However I need the value in double and when i convert it the values changes to exponential form, which is not accepted.
Methods tried already,
cp = period.withColumn('converted', F.format_string('%.6f',period.qty.cast('double')))
cp = period.withColumn('qty', F.col("qty").cast(DecimalType(10, 6)))

Related

filtering in Pyspark using integer vs decimal values

I am filtering a DataFrame and when I pass an integer value, it considers only those that satisfy the condition when the DataFrame column value is rounded to an integer. Why is this happening? See the screenshot below, the two filters give different results. I am using Spark 2.2. I tested it with python 2.6 and python 3.5. The results are the same.
Update
I tried it with Spark-SQL. If I do not convert the field to double, it gives the same answer as the first one above. However, if I cast the column to double before filtering, it gives correct answer.

for lat > 60
Given a double and an integer spark is implicitly converting both of them to integers. The result is appropriate, showing latitudes >= 61
for lat > cast(60 as double) or lat > 60.0
Given two doubles spark returns everything in the set [Infinity, 60.0), as expected
This might be slightly un-intuitive, but you must remember that spark is performing implicit conversions between IntegerType() and DoubleType()

Although you use pyspark, under the hood it is in Scala and ultimately Java. So Java's conversion rules apply here.
To be specific
https://docs.oracle.com/javase/specs/jls/se10/html/jls-5.html#jls-5.1.3
...Otherwise, if the floating-point number is not an infinity, the floating-point value is rounded to an integer value V, rounding toward zero using IEEE 754 round-toward-zero mode (§4.2.3).

Precision of double values in Spark

I am reading some data from a CSV file, and I have custom code to parse string values into different data types. For numbers, I use:
val format = NumberFormat.getNumberInstance()
which returns a DecimalFormat, and I call parse function on that to get my numeric value. DecimalFormat has arbitrary precision, so I am not losing any precision there. However, when the data is pushed into a Spark DataFrame, it is stored using DoubleType. At this point, I am expecting to see some precision issues, however I do not. I tried entering values from 0.1, 0.01, 0.001, ..., 1e-11 in my CSV file, and when I look at the values stored in the Spark DataFrame, they are all accurately represented (i.e. not like 0.099999999). I am surprised by this behavior since I do not expect a double value to store arbitrary precision. Can anyone help me understand the magic here?
Cheers!

There are probably two issues here: the number of significant digits that a Double can represent in its mantissa; and the range of its exponent.
Roughly, a Double has about 16 (decimal) digits of precision, and the exponent can cover the range from about 10^-308 to 10^+308. (Obviously, the actual limits are set by the binary representation used by the ieee754 format.)
When you try to store a number like 1e-11, this can be accurately approximated within the 56 bits available in the mantissa. Where you'll get accuracy issues is when you want to subtract two numbers that are so close together that they only differ by a small number of the least significant bits (assuming that their mantissas have been aligned shifted so that their exponents are the same).
For example, if you try (1e20 + 2) - (1e20 + 1), you'd hope to get 1, but actually you'll get zero. This is because a Double does not have enough precision to represent the 20 (decimal) digits needed. However, (1e100 + 2e90) - (1e100 + 1e90) is computed to be almost exactly 1e90, as it should be.

Convert two hexadecimal strings into datetime object

I have a hex number (uint32) stored as a character string. The character string is 'DA5CE697'. I want to convert this to a normal hex number on which I can perform some hex arithmetic operations. Are there any functions that can do this in matlab (like str2num for normal numbers)? Or if there is any other way of going about it?
Update
The character string provided above is the first part of an NTP timestamp. I am using:
datetime(t1 + 1/t2, 'ConvertFrom', 'epochtime', 'epoch', '1900-01-01')
To get the exact time from a data file. Both t1 and t2 are 4 bytes. The values for them are:
t1 = 'DA5CE697';
t2 = '7F14FCE7';
Ideally, I could have gone about reading 4 bytes at once and get the values for t1 and t2. But I have to traverse the file 1 byte at a time (some constraints). So, I am stitching back the values for t1 and t2 (to avoid missing zeros. Otherwise, it stores '05' as '5').

Your string is a hexadecimal representation of your uint32 number. If you instead want the integer (decimal) version of your hexadecimal string, you will need to convert it using hex2dec to be able to perform arithmetic within MATLAB.
f = hex2dec('DA5CE697')
%// 3663521431
Alternately, if you want your hexadecimal value to be cast as a floating point number you can instead use hex2num.
MATLAB does not have a hexadecimal datatype so I'm not sure what "hexadecimal arithmetic" you're expecting to be able to perform.
Update
Now that you have provided more information, you will combine t1 and t2 into a time stamp using hex2dec on both of them (as I showed above) and then perform the arithmetic using the decimal values.
t1 = 'DA5CE697';
t2 = '7F14FCE7';
datetime(hex2dec(t1) + 1/hex2dec(t2), 'ConvertFrom', 'epochtime', 'epoch', '1900-01-01')
%// 03-Feb-2016 20:50:31

Understanding the datatype Double

I am trying to write a program in C to get the percent of even numbers in an array. I am thinking of writing it using int datatype. But some one mentioned to me using double will be easier. I don't understand that. Can anyone guide me with it?
What does double datatype return?
Can the return statement be given as return (double)? What will that give?
Can double convert a real number to a percent? Eg: 0.5 to 50.0

The int datatype is, as the name would suggest, integers (or whole numbers). So you cannot represent a decimal like 0.5. A double is decimal number. So you can hold numbers like 0.5. Common practice is to store your percentage as a simple decimal number like 0.5 (using the double type). Then when you need to display nicely as 50.0%, just multiply by 100 before displaying.
Here's a useful link on C's basic datatypes: http://www.tutorialspoint.com/ansi_c/c_basic_datatypes.htm

How to use Bitxor for Double Numbers?

I want to use xor for my double numbers in matlab,but bitxor is only working for int numbers. Is there a function that could convert double to int in Matlab?

The functions You are looking for might be: int8(number), int16(number), uint32(number) Any of them will convert Double to an Integer, but You must pick the best one for the result You want to achieve. Remember that You cannot cast from Double to Integer without rounding the number.
If I understood You correcly, You could create a function that would simply remove the "comma" from the Double number by multiplying your starting value by 2^n and then casting it to Integer using any of the functions mentioned earlier, performing whatever you want and then returning comma to its original position by dividing the number by 2^n
Multiplying the starting value by 2^n is a hack that will decrease the rounding error.
The perfect value for n would be the number of digits after the comma if this number is relatively small.
Please also specify, why are You trying to do this? This doesn't seem to be the optimal solution.

You can just cast to an integer:
a = 1.003
int8(a)
ans =
1
That gives you an 8 bit signed integer, you can also get other size i.e. int16 or else unsigned i.e. uint8 depending on what you want to do

We Keep Coding

iphone swift flutter scala powershell matlab mongodb postgresql perl eclipse

Exponential values when converted to double in Pyspark - pyspark

Related

filtering in Pyspark using integer vs decimal values

Precision of double values in Spark

Convert two hexadecimal strings into datetime object

Understanding the datatype Double

How to use Bitxor for Double Numbers?

Categories

Resources