When I'm converting double like 0.0001, 0.0002, 0.0003, 0.0004 to BigDecimal
It result in 0.00010, 0.00020, 0.00030
double d = 0.0001;
BigDecimal bigDecimal = BigDecimal.valueOf(d);
Here my double value will be upto 4 decimal places, but one extra gets appended it causes issue in my code.
Related
I want to truncate all decimals of a double without rounding. I have two possibilities here:
double x = 13.5;
int x1 = x.toInt(); // x1 = 13
int x2 = x.floor(); // x2 = 13
Is there any difference between those two approaches?
As explained by the documentation:
floor:
Rounds fractional values towards negative infinity.
toInt:
Equivalent to truncate.
truncate:
Rounds fractional values towards zero.
So floor rounds toward negative infinity, but toInt/truncate round toward zero. For positive values, this doesn't matter, but for negative fractional values, floor will return a number less than the original, whereas toInt/truncate will return a greater number.
I am casting a 64bit fixed point number to floating point. How should that be done in Matlab? The following code gives different results. What is the difference between typecast and the double(x)
temp = 2^32*uint64(MSB) + uint64(LSB);
out_0(1, 1) = typecast(temp, 'double');
out_1(1, 1) = double(temp);
An Example:
temp = 4618350711997530112
data = typecast(temp, 'double')
data =
5.9194
>> double(temp)
ans =
4.6184e+18
If you want to maintain the same number, you should definitely use double to convert it:
double Convert to double precision.
double(X) returns the double precision value for X.
Whereas typecast maintains the internal representation, i.e. the bytes are maintained the same but or differently interpreted:
typecast Convert datatypes without changing underlying data.
Y = typecast(X, DATATYPE) convert X to DATATYPE. If DATATYPE has
fewer bits than the class of X, Y will have more elements than X. If
DATATYPE has more bits than the class of X, Y will have fewer
elements than X.
Note that it is only possible to use typecast when the number of bytes are the same, which is not true for double as it tries to represent the same number as close as possible in double precision. For example, you cannot typecast uint32 to double, but you can typecast two uint32 to one double number. If you use double to convert it, you will obtain respectively one and two doubles.
C++ equivalent
X = double(uint64(123));
=> int64_t x = 123; double X = x;
X = typecast(uint64(123), 'double')
=> int64_t x = 123; double X = reinterpret_cast<double>(x);
In addition, because it seems you have two 32-bit uint values MSB and LSB; To convert them to uint 64 you can use typecast.
U = typecast([MSB,LSB],'uint64')
Then conversion to double as suggested by m7913d
D = double(U)
So you see typecast has a very different function compared to double.
When I want to output a double (or float) number, such as 4.999999999999999999999, which is over the double precision (15 digits), the result of double is 5.000000000000000 and the result of float is 5.000000.
How can I print the original number without losing precision?
I would let the output speak for itself:
>> numFiles, meanTangle, sdTangle
numFiles =
526
meanTangle =
0.4405
sdTangle =
0.1285
Now, when I create a vector out of these variables:
>> [numFiles meanTangle sdTangle]
ans =
526 0 0
Also, just for clarification:
>> class(numFiles)
ans =
int32
>> class(meanTangle)
ans =
double
>> class(sdTangle)
ans =
double
Why does MATLAB convert floats (meanTangle and sdTangle) to int without cast?
It converts all of your doubles to ints because your array contains a single int. This has to do with a precision issue.
It converts the entire array into type int32:
>> class(ans)
ans =
int32
For reasons not explained, combining an integer data type in an array with floating point data is defined by MATLAB to return an integer data type.
Check this for more info Float becomes integer
.Your numFiles is an integer here so It converts all other variables also as integer.
I'm trying to get a string from a double like this:
double aDouble;
NSString* doubleString = [NSString stringWithFormat:#"%g", aDouble];
With most numbers I get the desired results, but for
10000.03
I get:
10000
as my string. Is there a way to prevent this behavior? I would like a result a string of
10000.03
%g can be tricky in the absence of a precision specifier. Note that %f can be used with double values, and you can limit the number of digits after the decimal point by using %.2f (replace 2 with whichever number of digits you want).
%g is a format specifier that chooses between %f or %e according to the following algorithm:
If a non-zero precision has been specified, let P be that precision
If a zero precision has been specified, let P be 1
If a precision hasn’t been specified, let P be 6
Let X be the exponent if the conversion were to use the %e format specifier
If P > X >= -4, the conversion is with style %f and precision P - (X + 1)
Otherwise, the conversion is with style %e and precision P - 1.
Unless the # flag is used, any trailing zeros are removed from the fractional portion of the result, and the decimal point character is removed if there is no fractional portion remaining.
In your case, %g doesn’t specify a precision, hence P = 6. When converting 10000.03 with %e, which gives 1.000003e+04, the exponent is 4, hence X = 4. Since P > X >= -4, the conversion is with style %f and precision P - (X + 1) = 6 - (4 + 1) = 1. But 10000.03 with precision 1 (one digit after the decimal point) yields 10000.0, which has no actual fractional portion, hence %g formats 10000.03 as 10000.
Try %.2f instead of %g
floats and double are base two representations of number that we like to see in base ten, just as there is no way to exactly represent the number 1/3 in base ten with a finite number of digits there are many base 10 number which can not be exactly represented in base 2. For example 0.1 (1/10) can not be represented exactly in base 2.