I want to truncate all decimals of a double without rounding. I have two possibilities here:
double x = 13.5;
int x1 = x.toInt(); // x1 = 13
int x2 = x.floor(); // x2 = 13
Is there any difference between those two approaches?
As explained by the documentation:
floor:
Rounds fractional values towards negative infinity.
toInt:
Equivalent to truncate.
truncate:
Rounds fractional values towards zero.
So floor rounds toward negative infinity, but toInt/truncate round toward zero. For positive values, this doesn't matter, but for negative fractional values, floor will return a number less than the original, whereas toInt/truncate will return a greater number.
I tried this code on Dart: I get 28.5
void main() {
double modulo = -1.5 % 30.0;
print(modulo);
}
The same code in Javascript returns -1.5
let modulo = -1.5 % 30;
console.log(modulo);
What is the equivalent of the javascript code above in Dart ?
The documentation for num.operator % states (emphasis mine):
Returns the remainder of the Euclidean division. The Euclidean division of two integers a and b yields two integers q and r such that a == b * q + r and 0 <= r < b.abs().
...
The sign of the returned value r is always positive.
See remainder for the remainder of the truncating division.
Meanwhile, num.remainder says (again, emphasis mine):
The result r of this operation satisfies: this == (this ~/ other) * other + r. As a consequence the remainder r has the same sign as the divider this.
So if you use:
void main() {
double modulo = (-1.5).remainder(30.0);
print(modulo);
}
you'll get -1.5.
Note that both values are mathematically correct; there are two different answers that correspond to the two different ways that you can compute a negative quotient when performing integer division. You have a choice between rounding a negative quotient toward zero (also known as truncation) or toward negative infinity (flooring). remainder corresponds to a truncating division, and % corresponds to a flooring division.
An issue was raised about this in the dart-lang repo a while ago. Apparently the % symbol in dart is an "Euclidean modulo operator rather than remainder."
An equivalent of what you are trying to do can be accomplished with
(-1.5).remainder(30.0)
In this answer gire mentioned to better not use == when comparing doubles.
When creating a increment variable in a for loop using start:step:stop notation, it's type will be of double. If one wants to use this loop variable for indexing and == comparisons, might that cause problems due to floating point precision?!
Should one use integers? If so, is there a way to do so with the s:s:s notation?
Here's an example
a = rand(1, 5);
for ii = length(a):-1:1
if (ii == 1) % Comparing var of type double with ==
b = 0;
else
b = a(ii); % Using double for indexing
end
... % Code
end
Note that the floating point double specification uses 52 bits to store the mantissa (the part after the decimal point) so you can exactly represent any integer in the range
-4503599627370496 <= x <= 4503599627370496
Note that this is larger than the range of an int32, which can only represent
-2147483648 <= x <= 2147483647
If you are just using the double as a loop variable, and only incrementing it in integer steps, and you are not counting above 4,503,599,627,370,496 then you are fine to use a double, and to use == to compare doubles.
One reason people suggest for not using doubles is that you can't represent some common decimals exactly, e.g. 0.1 has no exact representation as a double. Therefore if you are working with monetary values, it may be better to separately store the data as an int and remember a scale factor of 10x or 100x or whatever.
It's sometimes bad to directly compare floating point numbers for equality because rounding issues can cause two floats to be not equal, even though the numbers are mathematically equal. This generally happens when the numbers are not exactly representable as floats, or when there is a significant size difference between the numbers, e.g.
>> 0.3 - 0.2 == 0.1
ans =
0
If you're indexing between integer bounds with integer steps (even though the variable class is actually double), it is ok to use == for comparisons with other integers.
You can cast the indices, if you really want to be safe.
For example:
for ii = int16(length(a):-1:1)
if (ii == 1)
b = 0;
end
end
newT = [b(i) d(i) a(i) z(i)];
newT, b(i), a(i)
Prints
newT =
123 364 123 902
ans =
1.234e+02
ans =
1.234e+02
What is the problem here? Why are the first and third entry in newT rounded to integer values? Why aren't they correctly assigned?
Unlike most other programming languages, integer types in Matlab take precedence over floating point types. When you combine them, either through concatenation or arithmetic, the floating point values are implicitly narrowed to integers, instead of the integers being widened to floating point.
>> int32(3) + 0.4
ans =
3
>> [int32(3) 0.4]
ans =
3 0
This is for historical reasons, because (IIRC) Matlab originally didn't have support for integers at all, so all numeric constants in Matlab produce double values, and the promotion rules were created to make it possible to mix integer types with floating-point constants.
To fix this, explicitly convert those int types to doubles before concatenating.
newT = [b(i) double(d(i)) a(i) double(z(i))];
At some point in an algorithm I need to compare the float value of a property of a class to a float. So I do this:
if (self.scroller.currentValue <= 0.1) {
}
where currentValue is a float property.
However, when I have equality and self.scroller.currentValue = 0.1 the if statement is not fulfilled and the code not executed! I found out that I can fix this by casting 0.1 to float. Like this:
if (self.scroller.currentValue <= (float)0.1) {
}
This works fine.
Can anyone explain to my why this is happening? Is 0.1 defined as a double by default or something?
Thanks.
I believe, having not found the standard that says so, that when comparing a float to a double the float is cast to a double before comparing. Floating point numbers without a modifier are considered to be double in C.
However, in C there is no exact representation of 0.1 in floats and doubles. Now, using a float gives you a small error. Using a double gives you an even smaller error. The problem now is, that by casting the float to a double you carry over the bigger of error of the float. Of course they aren't gone compare equal now.
Instead of using (float)0.1 you could use 0.1f which is a bit nicer to read.
The problem is, as you have suggested in your question, that you are comparing a float with a double.
There is a more general problem with comparing floats, this happens because when you do a calculation on a floating point number the result from the calculation may not be exactly what you expect. It is fairly common that the last bit of the resulting float will be wrong (although the inaccuracy can be larger than just the last bit). If you use == to compare two floats then all the bits have to be the same for the floats to be equal. If your calculation gives a slightly inaccurate result then they won't compare equal when you expect them to. Instead of comparing the values like this, you can compare them to see if they are nearly equal. To do this you can take the positive difference between the floats and see if it is smaller than a given value (called an epsilon).
To choose a good epsilon you need to understand a bit about floating point numbers. Floating point numbers work similarly to representing a number to a given number of significant figures. If we work to 5 significant figures and your calculation results in the last digit of the result being wrong then 1.2345 will have an error of +-0.0001 whereas 1234500 will have an error of +-100. If you always base your margin of error on the value 1.2345 then your compare routine will be identical to == for all values great than 10 (when using decimal). This is worse in binary, it's all values greater than 2. This means that the epsilon we choose has to be relative to the size of the floats that we are comparing.
FLT_EPSILON is the gap between 1 and the next closest float. This means that it may be a good epsilon to choose if your number is between 1 and 2, but if your value is greater than 2 using this epsilon is pointless because the gap between 2 and the next nearest float is larger than epsilon. So we have to choose an epsilon relative to the size of our floats (as the error in the calculation is relative to the size of our floats).
A good(ish) floating point compare routine looks something like this:
bool compareNearlyEqual (float a, float b, unsigned epsilonMultiplier)
{
float epsilon;
/* May as well do the easy check first. */
if (a == b)
return true;
if (a > b) {
epsilon = scalbnf(1.0f, ilogb(a)) * FLT_EPSILON * epsilonMultiplier;
} else {
epsilon = scalbnf(1.0, ilogb(b)) * FLT_EPSILON * epsilonMultiplier;
}
return fabs (a - b) <= epsilon;
}
This comparison routine compares floats relative to the size of the largest float passed in. scalbnf(1.0f, ilogb(a)) * FLT_EPSILON finds the gap between a and the next nearest float. This is then multiplied by the epsilonMultiplier, so the size of the difference can be adjusted, depending on how inaccurate the result of the calculation is likely to be.
You can make a simple compareLessThan routine like this:
bool compareLessThan (float a, float b, unsigned epsilonMultiplier)
{
if (compareNearlyEqual (a, b, epsilonMultiplier)
return false;
return a < b;
}
You could also write a very similar compareGreaterThan function.
It's worth noting that comparing floats like this may not always be what you want. For instance this will never find that a float is close to 0 unless it is 0. To fix this you'd need to decide what value you thought was close to zero, and write an additional test for this.
Sometimes the inaccuracies you get won't depend on the size of the result of a calculation, but will depend on the values that you put into a calculation. For instance sin(1.0f + (float)(200 * M_PI)) will give a much less accurate result than sin(1.0f) (the results should be identical). In this case your compare routine would have to look at the number you put into the calculation to know the margin of error of the answer.
Doubles and floats have different values for the mantissa store in binary (float is 23 bits, double 54). These will almost never be equal.
The IEEE Float Point article on wikipedia may help you understand this distinction.
In C, a floating-point literal like 0.1 is a double, not a float. Since the types of the data items being compared are different, the comparison is done in the more precise type (double). In all implementations I know about, float has a shorter representation than double (usually expressed as something like 6 vs. 14 decimal places). Moreover, the arithmetic is in binary, and 1/10 does not have an exact representation in binary.
Therefore, you're taking a float 0.1, which loses accuracy, extending it to double, and expecting it to compare equal to a double 0.1, which loses less accuracy.
Suppose we were doing this in decimal, with float being three digits and double being six, and we were comparing to 1/3.
We have the stored float value being 0.333. We're comparing it to a double with value 0.333333. We convert the float 0.333 to double 0.333000, and find it different.
0.1 is actually a very dificult value to store binary. In base 2, 1/10 is the infinitely repeating fraction
0.0001100110011001100110011001100110011001100110011...
As several has pointed out, the comparison has to made with a constant of the exact same precision.
Generally, in any language, you can't really count on equality of float-like types. In your case since it looks like you have more control, it does appear that 0.1 is not float by default. You could probably find that out with sizeof(0.1) (vs. sizeof(self.scroller.currentValue).
Convert it to a string, then compare:
NSString* numberA = [NSString stringWithFormat:#"%.6f", a];
NSString* numberB = [NSString stringWithFormat:#"%.6f", b];
return [numberA isEqualToString: numberB];