How to print double or float number without losing precision? - double

When I want to output a double (or float) number, such as 4.999999999999999999999, which is over the double precision (15 digits), the result of double is 5.000000000000000 and the result of float is 5.000000.
How can I print the original number without losing precision?

Related

0.0001 double conversion to BigDecimal gives 0.00010

When I'm converting double like 0.0001, 0.0002, 0.0003, 0.0004 to BigDecimal
It result in 0.00010, 0.00020, 0.00030
double d = 0.0001;
BigDecimal bigDecimal = BigDecimal.valueOf(d);
Here my double value will be upto 4 decimal places, but one extra gets appended it causes issue in my code.

Dart: What is the difference between floor() and toInt()

I want to truncate all decimals of a double without rounding. I have two possibilities here:
double x = 13.5;
int x1 = x.toInt(); // x1 = 13
int x2 = x.floor(); // x2 = 13
Is there any difference between those two approaches?
As explained by the documentation:
floor:
Rounds fractional values towards negative infinity.
toInt:
Equivalent to truncate.
truncate:
Rounds fractional values towards zero.
So floor rounds toward negative infinity, but toInt/truncate round toward zero. For positive values, this doesn't matter, but for negative fractional values, floor will return a number less than the original, whereas toInt/truncate will return a greater number.

Is there a way to prevent stringWithFormat from rounding?

I'm trying to get a string from a double like this:
double aDouble;
NSString* doubleString = [NSString stringWithFormat:#"%g", aDouble];
With most numbers I get the desired results, but for
10000.03
I get:
10000
as my string. Is there a way to prevent this behavior? I would like a result a string of
10000.03
%g can be tricky in the absence of a precision specifier. Note that %f can be used with double values, and you can limit the number of digits after the decimal point by using %.2f (replace 2 with whichever number of digits you want).
%g is a format specifier that chooses between %f or %e according to the following algorithm:
If a non-zero precision has been specified, let P be that precision
If a zero precision has been specified, let P be 1
If a precision hasn’t been specified, let P be 6
Let X be the exponent if the conversion were to use the %e format specifier
If P > X >= -4, the conversion is with style %f and precision P - (X + 1)
Otherwise, the conversion is with style %e and precision P - 1.
Unless the # flag is used, any trailing zeros are removed from the fractional portion of the result, and the decimal point character is removed if there is no fractional portion remaining.
In your case, %g doesn’t specify a precision, hence P = 6. When converting 10000.03 with %e, which gives 1.000003e+04, the exponent is 4, hence X = 4. Since P > X >= -4, the conversion is with style %f and precision P - (X + 1) = 6 - (4 + 1) = 1. But 10000.03 with precision 1 (one digit after the decimal point) yields 10000.0, which has no actual fractional portion, hence %g formats 10000.03 as 10000.
Try %.2f instead of %g
floats and double are base two representations of number that we like to see in base ten, just as there is no way to exactly represent the number 1/3 in base ten with a finite number of digits there are many base 10 number which can not be exactly represented in base 2. For example 0.1 (1/10) can not be represented exactly in base 2.

Strange problem comparing floats in objective-C

At some point in an algorithm I need to compare the float value of a property of a class to a float. So I do this:
if (self.scroller.currentValue <= 0.1) {
}
where currentValue is a float property.
However, when I have equality and self.scroller.currentValue = 0.1 the if statement is not fulfilled and the code not executed! I found out that I can fix this by casting 0.1 to float. Like this:
if (self.scroller.currentValue <= (float)0.1) {
}
This works fine.
Can anyone explain to my why this is happening? Is 0.1 defined as a double by default or something?
Thanks.
I believe, having not found the standard that says so, that when comparing a float to a double the float is cast to a double before comparing. Floating point numbers without a modifier are considered to be double in C.
However, in C there is no exact representation of 0.1 in floats and doubles. Now, using a float gives you a small error. Using a double gives you an even smaller error. The problem now is, that by casting the float to a double you carry over the bigger of error of the float. Of course they aren't gone compare equal now.
Instead of using (float)0.1 you could use 0.1f which is a bit nicer to read.
The problem is, as you have suggested in your question, that you are comparing a float with a double.
There is a more general problem with comparing floats, this happens because when you do a calculation on a floating point number the result from the calculation may not be exactly what you expect. It is fairly common that the last bit of the resulting float will be wrong (although the inaccuracy can be larger than just the last bit). If you use == to compare two floats then all the bits have to be the same for the floats to be equal. If your calculation gives a slightly inaccurate result then they won't compare equal when you expect them to. Instead of comparing the values like this, you can compare them to see if they are nearly equal. To do this you can take the positive difference between the floats and see if it is smaller than a given value (called an epsilon).
To choose a good epsilon you need to understand a bit about floating point numbers. Floating point numbers work similarly to representing a number to a given number of significant figures. If we work to 5 significant figures and your calculation results in the last digit of the result being wrong then 1.2345 will have an error of +-0.0001 whereas 1234500 will have an error of +-100. If you always base your margin of error on the value 1.2345 then your compare routine will be identical to == for all values great than 10 (when using decimal). This is worse in binary, it's all values greater than 2. This means that the epsilon we choose has to be relative to the size of the floats that we are comparing.
FLT_EPSILON is the gap between 1 and the next closest float. This means that it may be a good epsilon to choose if your number is between 1 and 2, but if your value is greater than 2 using this epsilon is pointless because the gap between 2 and the next nearest float is larger than epsilon. So we have to choose an epsilon relative to the size of our floats (as the error in the calculation is relative to the size of our floats).
A good(ish) floating point compare routine looks something like this:
bool compareNearlyEqual (float a, float b, unsigned epsilonMultiplier)
{
float epsilon;
/* May as well do the easy check first. */
if (a == b)
return true;
if (a > b) {
epsilon = scalbnf(1.0f, ilogb(a)) * FLT_EPSILON * epsilonMultiplier;
} else {
epsilon = scalbnf(1.0, ilogb(b)) * FLT_EPSILON * epsilonMultiplier;
}
return fabs (a - b) <= epsilon;
}
This comparison routine compares floats relative to the size of the largest float passed in. scalbnf(1.0f, ilogb(a)) * FLT_EPSILON finds the gap between a and the next nearest float. This is then multiplied by the epsilonMultiplier, so the size of the difference can be adjusted, depending on how inaccurate the result of the calculation is likely to be.
You can make a simple compareLessThan routine like this:
bool compareLessThan (float a, float b, unsigned epsilonMultiplier)
{
if (compareNearlyEqual (a, b, epsilonMultiplier)
return false;
return a < b;
}
You could also write a very similar compareGreaterThan function.
It's worth noting that comparing floats like this may not always be what you want. For instance this will never find that a float is close to 0 unless it is 0. To fix this you'd need to decide what value you thought was close to zero, and write an additional test for this.
Sometimes the inaccuracies you get won't depend on the size of the result of a calculation, but will depend on the values that you put into a calculation. For instance sin(1.0f + (float)(200 * M_PI)) will give a much less accurate result than sin(1.0f) (the results should be identical). In this case your compare routine would have to look at the number you put into the calculation to know the margin of error of the answer.
Doubles and floats have different values for the mantissa store in binary (float is 23 bits, double 54). These will almost never be equal.
The IEEE Float Point article on wikipedia may help you understand this distinction.
In C, a floating-point literal like 0.1 is a double, not a float. Since the types of the data items being compared are different, the comparison is done in the more precise type (double). In all implementations I know about, float has a shorter representation than double (usually expressed as something like 6 vs. 14 decimal places). Moreover, the arithmetic is in binary, and 1/10 does not have an exact representation in binary.
Therefore, you're taking a float 0.1, which loses accuracy, extending it to double, and expecting it to compare equal to a double 0.1, which loses less accuracy.
Suppose we were doing this in decimal, with float being three digits and double being six, and we were comparing to 1/3.
We have the stored float value being 0.333. We're comparing it to a double with value 0.333333. We convert the float 0.333 to double 0.333000, and find it different.
0.1 is actually a very dificult value to store binary. In base 2, 1/10 is the infinitely repeating fraction
0.0001100110011001100110011001100110011001100110011...
As several has pointed out, the comparison has to made with a constant of the exact same precision.
Generally, in any language, you can't really count on equality of float-like types. In your case since it looks like you have more control, it does appear that 0.1 is not float by default. You could probably find that out with sizeof(0.1) (vs. sizeof(self.scroller.currentValue).
Convert it to a string, then compare:
NSString* numberA = [NSString stringWithFormat:#"%.6f", a];
NSString* numberB = [NSString stringWithFormat:#"%.6f", b];
return [numberA isEqualToString: numberB];

t-sql decimal assignment changes value

Why does the select statement below return two different values ?
declare #tempDec decimal
set #tempDec = 1.0 / (1.0 + 1.0)
select #tempDec, 1.0 / (1.0 + 1.0)
That's fine for literals like 1.0, but if you're pulling the data from table columns, you need to cast/convert the first evaluated number in your equation:
convert(decimal, [col1]) / ([col2] + [col3])
-or-
convert(decimal(15, 2), [col1]) / ([col2] + [col3])
I found out from a coworker just as I posted this.
You need to specify the default precision and scale.
This works in this scenario:
declare #tempDec decimal(3,2)
From MSDN:
decimal[ (p[ , s] )] and numeric[ (p[ , s] )]
Fixed precision and scale numbers. When maximum precision is used, valid values are from - 10^38 +1 through 10^38 - 1. The SQL-92 synonyms for decimal are dec and dec(p, s). numeric is functionally equivalent to decimal.
p (precision)
The maximum total number of decimal digits that can be stored, both to the left and to the right of the decimal point. The precision must be a value from 1 through the maximum precision of 38. The default precision is 18.
s (scale)
The maximum number of decimal digits that can be stored to the right of the decimal point. Scale must be a value from 0 through p. Scale can be specified only if precision is specified. The default scale is 0; therefore, 0 <= s <= p. Maximum storage sizes vary, based on the precision.