Okay to use double for == comparison and indexing? - matlab

In this answer gire mentioned to better not use == when comparing doubles.
When creating a increment variable in a for loop using start:step:stop notation, it's type will be of double. If one wants to use this loop variable for indexing and == comparisons, might that cause problems due to floating point precision?!
Should one use integers? If so, is there a way to do so with the s:s:s notation?
Here's an example
a = rand(1, 5);
for ii = length(a):-1:1
if (ii == 1) % Comparing var of type double with ==
b = 0;
else
b = a(ii); % Using double for indexing
end
... % Code
end

Note that the floating point double specification uses 52 bits to store the mantissa (the part after the decimal point) so you can exactly represent any integer in the range
-4503599627370496 <= x <= 4503599627370496
Note that this is larger than the range of an int32, which can only represent
-2147483648 <= x <= 2147483647
If you are just using the double as a loop variable, and only incrementing it in integer steps, and you are not counting above 4,503,599,627,370,496 then you are fine to use a double, and to use == to compare doubles.
One reason people suggest for not using doubles is that you can't represent some common decimals exactly, e.g. 0.1 has no exact representation as a double. Therefore if you are working with monetary values, it may be better to separately store the data as an int and remember a scale factor of 10x or 100x or whatever.
It's sometimes bad to directly compare floating point numbers for equality because rounding issues can cause two floats to be not equal, even though the numbers are mathematically equal. This generally happens when the numbers are not exactly representable as floats, or when there is a significant size difference between the numbers, e.g.
>> 0.3 - 0.2 == 0.1
ans =
0

If you're indexing between integer bounds with integer steps (even though the variable class is actually double), it is ok to use == for comparisons with other integers.
You can cast the indices, if you really want to be safe.
For example:
for ii = int16(length(a):-1:1)
if (ii == 1)
b = 0;
end
end

Related

How do I prevent precision loss in table values?

I currently have the following code:
count = 20;
n = zeros(count, 1);
P = zeros(count, 1);
for i = 1:count
n(i) = i;
P(i) = sym(i)^i + (sym(1-i))^(i-1);
if i == (count)
T = table(n,P)
end
end
This gives me a table with a set of values. However, some of the values are losing precision because they have too many digits.
I am aware that MATLAB allows for up to 536870913 digits of precision. How do I make my table values not lose precision?
Note: if I were to just do the following operation (for example): sym(200)^2010, I would get the full precision value, which has 4626 digits. Doing this for table values doesn't seem to work, though, for some strange reason.
If someone could please help me with this I would be extremely grateful as I have been struggling with this for several hours now.
As #Daniel commented, the problem is that you are casting to double when storing it in P. MATLAB only has the precision you mention when using symbolic variables, but when you get into the numerical world, you can only store a finite amount of precision.
To be exact, once you define P as a double (zeros returns a double vector), the biggest integer you can store such that all of its smaller integers are precise is 2^53, way smaller than your P(20). This means that any integer bigger than 2^53 is not ensured to be precise on a double valued vector.
Your solution is thus to avoid casting, to store the variable on a sym type P. Note that the above also applies to later maths. If you plan to use this variable in some equation, remember that when you pass it to numerical form, you will lose precision. Often this does not matter, as the precision lost is very small, but you should know it.
If you want to learn more about how numerical precision work on computers, I suggest reading the following Q&A: Why is 24.0000 not equal to 24.0000 in MATLAB?
Sym solution:
count = 20;
n = zeros(count, 1);
P = sym('integer',[count, 1]);
for i = 1:count
n(i) = i;
P(i) = sym(i)^i + (sym(1-i))^(i-1);
if i == (count)
T = table(n,P)
end
end
returns
>> T.P(20)
ans =
102879180344339686410876021

What happens when I change int 8 to int 16

I understand that the int value is in relation with the storage capacity.But, if I change int 8 to int 16 only the capacity will be altered?
Since the other answers have not yet formalized this I will give you and explanation including some keywords to look up and some more elaborate explanation.
The size of a data type is actually based on the capacity of the storage.
int8 - 8 bits signed integer, MSB (most significant bit) representing the sign. Range [-2^7,2^7-1] = [-128,127]
uint8 - 8 bits unsigned integer, MSB denotes the highest power of 2. Range [0,2^8-1] = [0,255]
int16 - 16 bits signed integer
uint16 - 16 bits unsigned integer
I could keep going but you probably get the picture. There is also int32, uint32, int64, uint64.
There is also char which can be used for text, but also instead of uint8 (in MATLAB a char is 16-bits though) (with the difference that char is printed as a char and not a number). This is normal to do for many c-like languages.
The types float and double is different since they use a floating point precision, standardized by the IEEE. The format is used to represent large numbers in a good way.
This data type uses an exponential representation of the numbers. The data type allocates a fixed set of bits for exponential and precision digits and one bit for sign. A floating point number can be divided like this,
(sign),Mantissa,exponent
For double the bit allocation is 1-bit for sign, 11-bits for exponent, 52-bits for Mantissa. For single it is 8-bits for exponent, 23-bits for Mantissa.
This is some necessary background for discussing type conversion. When discussing type conversion you normally speak about implicit conversion and explicit conversion. The terms is not really relevant for Matlab since Matlab can identify type automatically.
a = 2; % a is a double
b=a; % b is a double
c = int8(57); % c is an int8
d = c; % d is an int 8
However explicit conversion is done with the built in conversion functions.
c = int8(57);
d = char(c);
When discussing different kinds of conversion we often talk about type promotion and type demotion. Type promotion is when a data type of lower precision is promoted to a type of higher precision.
a = int8(57);
b = int16(a);
This is lossless and considered safe. Type demotion is on the other hand when a type of higher precsion is converted to a type of lower precision.
c = int16(1234);
d = int8(c); % Unsafe! Data loss
This is generally considered risky and shuold be avoided. Normally the word type demotion is not used so often since this conversion is uncommon. A conversion from higher to lower precision needs to be checked.
function b = int16ToInt8(a)
if (any(a < -128 | a > 127))
error('Variable is out of range, conversion cannot be done.');
end
b=int8(a);
In most languages type demotion cannot be done implicitly. Also conversion between floating point types and integer types should be avoided.
An important note here is how Matlab initiates variables. If a "constructor" is not used (like a=int8(57)), then Matlab will automatically set the variable to double precision. Also, when you initiate a vector/matrix of integers like int64([257,3745,67]), then the "constructor" for the matrix is called first. So int64() will actually be called on a matrix of doubles. This is important because if the integer needs more precsion than 52-bits, the precision is too low for a double. So
int64([2^53+2^0,2^54+2^1, 2^59+2^2]) ~=
[int64(2^53) + int64(2^0), int64(2^54)+ int64(2^1), int64(2^59)+ int64(2^2)]
Further, in case the memory on the device allows it is commonly recommended to use int32 or int64 and double
Each type of integer has a different range of storage capacity:
int 8: Values range from -128 to 127
int 16: Values range from -32,768 to 32,767
You only need to worry about data loss when converting to a lower precision datatype. Because int16 is higher precision than int8, your existing data will remain intact but your data can span twice the range of values at the cost of taking up twice as much space (2 bytes vs. 1 byte)
a = int8(127);
b = int16(a);
a == b
% 1
whos('a', 'b')
% Name Size Bytes Class Attributes
%
% a 1x1 1 int8
% b 1x1 2 int16
int8 variables can range from -128 to 127, while this range for int16 class is from -32,768 to 32,767. Obviously, memory is the price to pay for the wider range ;)
Note 1: These limits do not apply only on the variables when defining them, but also usually on the outputs of calculations!
Example:
>> A = int8([0, 10, 20, 30]);
>> A .^ 2
ans =
0 100 127 127
>> int16(A) .^ 2
ans =
0 100 400 900
Note 2: Once you switch to int16, usually you should do it for all variables that participate in calculations together.
Example:
>> A + int16(A)
Error using +
Integers can only be combined with integers of the same class, or scalar doubles.

MATLAB - integers vs decimals assignment strange bug

newT = [b(i) d(i) a(i) z(i)];
newT, b(i), a(i)
Prints
newT =
123 364 123 902
ans =
1.234e+02
ans =
1.234e+02
What is the problem here? Why are the first and third entry in newT rounded to integer values? Why aren't they correctly assigned?
Unlike most other programming languages, integer types in Matlab take precedence over floating point types. When you combine them, either through concatenation or arithmetic, the floating point values are implicitly narrowed to integers, instead of the integers being widened to floating point.
>> int32(3) + 0.4
ans =
3
>> [int32(3) 0.4]
ans =
3 0
This is for historical reasons, because (IIRC) Matlab originally didn't have support for integers at all, so all numeric constants in Matlab produce double values, and the promotion rules were created to make it possible to mix integer types with floating-point constants.
To fix this, explicitly convert those int types to doubles before concatenating.
newT = [b(i) double(d(i)) a(i) double(z(i))];

Filter some type of data in float

how to filter out float values with only zeroes beyond decimal & others having some non-zero values beyond decimal too.
for example.
13.000000
13.120001
i want it like this:
13.0
13.120001
If your application stores the floating point values in something like C/C++'s float's or double's, you may have a problem here.
First of all, float/double in the majority of cases represents values using base-2 and when you do something like double x = 0.1;, x in fact will never be equal to 0.1, but it will be very close to 0.1. That is because not every decimal (base-10) fraction can be represented exactly in base-2.
When you print or convert that x to a string using one of the printf-like functions, the resultant string can vary from something like 0.099999999 to 0.1 to 0.10000000000000000555. The result will depend on what's in x and on how you convert it to a string (how many digits you allow for the integer part, fractional part or all).
Generally there's no one-size-fits-all solution here. In one case the exactness and rounding issues can be unimportant, while in another they can be the most important. When performance is more important than the exact representation of decimal fractions, you use base-2 types. Otherwise, for example, when you're counting money, you use special types (and you may first need to construct them) that can represent decimal fractions exactly and then you avoid some of the described issues at a cost of extra computation time and maybe storage.
You may sometimes use fixed-point types and arithmetic. For example, if your numbers should not have more than 3 fractional digits and aren't too big, you can use scaled integers:
long x = 123456; // x represents 123.456
You add, subtract and compare them just as regular integers, but you multiply and divide them differently, for example like this:
long x = 123456; // x represents 123.456
long y = 12345; // y represents 12.345
long p = (long)(((long long)x * y) / 1000); // p=1524064, representing x*y=1524.064
long q = (long)((long long)x * 1000) / y); // q=10000, representing x/y=10.000
And in this case it is relatively easy to figure out the digits of the fractional part, just look at (x%1000)/100, (x%100)/10 and x%10.
split 13.00000 or 13.20001 into 2 values like in 13.0000 as 13 and 0000 ...so compare 2nd value whether it is greater than 0 (ex:13.20001 where 2001>0) if its yes then set same value . else (ex 13.0000 where 0000=0) then set %.2f=> 13.00

Strange problem comparing floats in objective-C

At some point in an algorithm I need to compare the float value of a property of a class to a float. So I do this:
if (self.scroller.currentValue <= 0.1) {
}
where currentValue is a float property.
However, when I have equality and self.scroller.currentValue = 0.1 the if statement is not fulfilled and the code not executed! I found out that I can fix this by casting 0.1 to float. Like this:
if (self.scroller.currentValue <= (float)0.1) {
}
This works fine.
Can anyone explain to my why this is happening? Is 0.1 defined as a double by default or something?
Thanks.
I believe, having not found the standard that says so, that when comparing a float to a double the float is cast to a double before comparing. Floating point numbers without a modifier are considered to be double in C.
However, in C there is no exact representation of 0.1 in floats and doubles. Now, using a float gives you a small error. Using a double gives you an even smaller error. The problem now is, that by casting the float to a double you carry over the bigger of error of the float. Of course they aren't gone compare equal now.
Instead of using (float)0.1 you could use 0.1f which is a bit nicer to read.
The problem is, as you have suggested in your question, that you are comparing a float with a double.
There is a more general problem with comparing floats, this happens because when you do a calculation on a floating point number the result from the calculation may not be exactly what you expect. It is fairly common that the last bit of the resulting float will be wrong (although the inaccuracy can be larger than just the last bit). If you use == to compare two floats then all the bits have to be the same for the floats to be equal. If your calculation gives a slightly inaccurate result then they won't compare equal when you expect them to. Instead of comparing the values like this, you can compare them to see if they are nearly equal. To do this you can take the positive difference between the floats and see if it is smaller than a given value (called an epsilon).
To choose a good epsilon you need to understand a bit about floating point numbers. Floating point numbers work similarly to representing a number to a given number of significant figures. If we work to 5 significant figures and your calculation results in the last digit of the result being wrong then 1.2345 will have an error of +-0.0001 whereas 1234500 will have an error of +-100. If you always base your margin of error on the value 1.2345 then your compare routine will be identical to == for all values great than 10 (when using decimal). This is worse in binary, it's all values greater than 2. This means that the epsilon we choose has to be relative to the size of the floats that we are comparing.
FLT_EPSILON is the gap between 1 and the next closest float. This means that it may be a good epsilon to choose if your number is between 1 and 2, but if your value is greater than 2 using this epsilon is pointless because the gap between 2 and the next nearest float is larger than epsilon. So we have to choose an epsilon relative to the size of our floats (as the error in the calculation is relative to the size of our floats).
A good(ish) floating point compare routine looks something like this:
bool compareNearlyEqual (float a, float b, unsigned epsilonMultiplier)
{
float epsilon;
/* May as well do the easy check first. */
if (a == b)
return true;
if (a > b) {
epsilon = scalbnf(1.0f, ilogb(a)) * FLT_EPSILON * epsilonMultiplier;
} else {
epsilon = scalbnf(1.0, ilogb(b)) * FLT_EPSILON * epsilonMultiplier;
}
return fabs (a - b) <= epsilon;
}
This comparison routine compares floats relative to the size of the largest float passed in. scalbnf(1.0f, ilogb(a)) * FLT_EPSILON finds the gap between a and the next nearest float. This is then multiplied by the epsilonMultiplier, so the size of the difference can be adjusted, depending on how inaccurate the result of the calculation is likely to be.
You can make a simple compareLessThan routine like this:
bool compareLessThan (float a, float b, unsigned epsilonMultiplier)
{
if (compareNearlyEqual (a, b, epsilonMultiplier)
return false;
return a < b;
}
You could also write a very similar compareGreaterThan function.
It's worth noting that comparing floats like this may not always be what you want. For instance this will never find that a float is close to 0 unless it is 0. To fix this you'd need to decide what value you thought was close to zero, and write an additional test for this.
Sometimes the inaccuracies you get won't depend on the size of the result of a calculation, but will depend on the values that you put into a calculation. For instance sin(1.0f + (float)(200 * M_PI)) will give a much less accurate result than sin(1.0f) (the results should be identical). In this case your compare routine would have to look at the number you put into the calculation to know the margin of error of the answer.
Doubles and floats have different values for the mantissa store in binary (float is 23 bits, double 54). These will almost never be equal.
The IEEE Float Point article on wikipedia may help you understand this distinction.
In C, a floating-point literal like 0.1 is a double, not a float. Since the types of the data items being compared are different, the comparison is done in the more precise type (double). In all implementations I know about, float has a shorter representation than double (usually expressed as something like 6 vs. 14 decimal places). Moreover, the arithmetic is in binary, and 1/10 does not have an exact representation in binary.
Therefore, you're taking a float 0.1, which loses accuracy, extending it to double, and expecting it to compare equal to a double 0.1, which loses less accuracy.
Suppose we were doing this in decimal, with float being three digits and double being six, and we were comparing to 1/3.
We have the stored float value being 0.333. We're comparing it to a double with value 0.333333. We convert the float 0.333 to double 0.333000, and find it different.
0.1 is actually a very dificult value to store binary. In base 2, 1/10 is the infinitely repeating fraction
0.0001100110011001100110011001100110011001100110011...
As several has pointed out, the comparison has to made with a constant of the exact same precision.
Generally, in any language, you can't really count on equality of float-like types. In your case since it looks like you have more control, it does appear that 0.1 is not float by default. You could probably find that out with sizeof(0.1) (vs. sizeof(self.scroller.currentValue).
Convert it to a string, then compare:
NSString* numberA = [NSString stringWithFormat:#"%.6f", a];
NSString* numberB = [NSString stringWithFormat:#"%.6f", b];
return [numberA isEqualToString: numberB];