As this example suggests:
0.1 + 0.1 + 0.1 == 0.3
gives the surprising result of false due to precision issues when storing 0.1.
However,
1 + 1 + 1 == 3
evaluates to true as expected. Note that there is still the issue of comparison of floating numbers here. As pointed out in the comments, the default numeric data type is double unless specified otherwise.
My question is, if a + b + c + ...= T, and a, b, c, ... , T are all integers, is the evaluated value of
a + b + c + ... == T
always true?
What other ways are there to relax the condition of a, b, c, ... , T being integers and still maintain the truth of the above statement?
As helpfully pointed out in one answer, any sort of overflow encountered at any point of the evaluation will cause this to evaluate to false:
A = realmax('double')
A - A + A == A % evaluates to true
A + A - A == A % evaluates to false
Thus, for the purposes of my original question, assume that no overflow problems at encountered at any stage necessary to evaluate the expression.
If integers are represented using fixed-sized data types, then there is a limit to how large (in magnitude) a number can be represented. Thus, if a calculation would go over that limit (a condition known as overflow), the result would not be correct.
If you think about how integers are represented in floating point notation, as long as there is no representation error there will be no problem with equality comparison. For "small" integers it is never an issue, because you have plenty of bits for the mantissa and they can be represented exactly. If you try adding very (really) large integers, then issues may arise:
>> 2^50 == 2^50+1
ans =
0
While:
>> 2^53 == 2^53+1
ans =
1
This is the overflow that Scott Hunter is talking about. Look for IEEE Standard 754 to learn more about the representation of floating point numbers.
Related
Here is an example of convolution given:
I have two questions here:
Why is the vector 𝑥 padded with two 0s on each side? As, the length of kernel ℎ is 3. If 𝑥 is padded with one 0 on each side, the middle element of convolution output would be within the range of the length of 𝑥, why not one 0 on each side?
Explain the following output to me:
>> x = [1, 2, 1, 3];
>> h = [2, 0, 1];
>> y = conv(x, h, 'valid')
y =
3 8
>>
What is valid doing here in the context of the previously shown mathematics on vectors 𝑥 and ℎ?
I can't speak as to the amount of zero padding that is proper .... That being said, any zero padding is making up data that is not there. This isn't necessarily wrong, but you should be aware that the values computing this information may be biased. Sometimes you care about this, sometimes you don't. Introducing 1 zero (in this case) would leave the middle kernel value always in the data, but why should that be a stopping criteria? Importantly, adding on 2 zeros still leaves one multiplication of values that are actually present in the data and the kernel (the x[0]*h[0] and x[3]*h[2] - using 0 based indexing). Adding on a 3rd zero (or more) would just yield zeros in the output since 3 is the length of the kernel. In other words zero padding will always yield an output that is partially based on the actual data (but not completely) for any zero padding from n=1 to n = length(h)-1 (in this case either 1 or 2).
Even though zero padding with length 2 or 1 still has multiplications based on real data, some values are summed over "fake" data (those multiplied with a padded zero). In this case Matlab gives you 3 options for how you want the data returned. First, you can get the full convolution, which includes values that are biased because they include adding in 0 values that aren't really in the data. Alternatively you can get same, which means the length of the output is the length of the data y = [4 3 8 1]. This corresponds to 1 zero but note that for longer kernels you could technically get other lengths between full and same, Matlab just doesn't return those for you.
Finally, and probably most important to understand out of all this, you have the valid option. In your example only 2 samples of the output are computed from summations that occur only from multiplications over real data (i.e. from multiplying samples of the kernel with samples from x and not from zeros). More specifically:
y[2] = h[2]*x[0] + h[1]*x[1] + h[2]*x[2] = 3 //0 based indexing like example
y[3] = h[2]*x[1] + h[1]*x[2] + h[2]*x[3] = 8
Note none of the other y values are computed with only h and x, they all involve a padded zero which is not necessarily indicative of the real data. For example:
y[4] = h[2]*x[2] + h[1]*x[3] + h[2]*0 <= padded zero
In this answer gire mentioned to better not use == when comparing doubles.
When creating a increment variable in a for loop using start:step:stop notation, it's type will be of double. If one wants to use this loop variable for indexing and == comparisons, might that cause problems due to floating point precision?!
Should one use integers? If so, is there a way to do so with the s:s:s notation?
Here's an example
a = rand(1, 5);
for ii = length(a):-1:1
if (ii == 1) % Comparing var of type double with ==
b = 0;
else
b = a(ii); % Using double for indexing
end
... % Code
end
Note that the floating point double specification uses 52 bits to store the mantissa (the part after the decimal point) so you can exactly represent any integer in the range
-4503599627370496 <= x <= 4503599627370496
Note that this is larger than the range of an int32, which can only represent
-2147483648 <= x <= 2147483647
If you are just using the double as a loop variable, and only incrementing it in integer steps, and you are not counting above 4,503,599,627,370,496 then you are fine to use a double, and to use == to compare doubles.
One reason people suggest for not using doubles is that you can't represent some common decimals exactly, e.g. 0.1 has no exact representation as a double. Therefore if you are working with monetary values, it may be better to separately store the data as an int and remember a scale factor of 10x or 100x or whatever.
It's sometimes bad to directly compare floating point numbers for equality because rounding issues can cause two floats to be not equal, even though the numbers are mathematically equal. This generally happens when the numbers are not exactly representable as floats, or when there is a significant size difference between the numbers, e.g.
>> 0.3 - 0.2 == 0.1
ans =
0
If you're indexing between integer bounds with integer steps (even though the variable class is actually double), it is ok to use == for comparisons with other integers.
You can cast the indices, if you really want to be safe.
For example:
for ii = int16(length(a):-1:1)
if (ii == 1)
b = 0;
end
end
I do:
T = inv([1.0 0.956 0.621; 1.0 -0.272 -0.647; 1.0 -1.106 1.703]);
obtaining:
T =
0.2989 0.5870 0.1140
0.5959 -0.2744 -0.3216
0.2115 -0.5229 0.3114
if I do
T(1,1)==0.2989
i obtain
ans =
0
same for other elements.
why this?
Because they are not equal. Its just a display artefact. To see this:
fprintf('%.8f\n', T(1,1))
will give you
0.29893602
MATLAB stores more digits than you usually see. The 0.2989 is actually 0.298936021293776 (and even that is not the end of the story).
Try your code and add
format long
T(1,1)
but still,
T(1,1) == 0.298936021293776
ans =
0
So try
T(1,1) - 0.298936021293776
You just don't see all the digits. T(1,1) is what it is supposed to be.
It's dangerous to test floating point numbers for exact equality. Per default, MATLAB uses 64 bits to store a floating point value, and some values such as 0.1 cannot even exactly be represented by an arbitrary number of bits. T(1,1) is not exactly 0.2989, which you can see when printing the value with a greater precision:
>> T = inv([1.0 0.956 0.621; 1.0 -0.272 -0.647; 1.0 -1.106 1.703])
T =
0.2989 0.5870 0.1140
0.5959 -0.2744 -0.3216
0.2115 -0.5229 0.3114
>> fprintf('%1.10f\n', T(1,1))
0.2989360213
This is why T(1,1) == 0.2989 returns false. It is safer to test whether two floating point values are almost equal, i.e. with regards to a tolerance value tol:
>> tol = 1/1000;
>> abs(T(1,1) - 0.2989) < tol
ans =
1
Here's something you should probably read: click
I hate having to ask this because I assume the answer must be simple, but I cannot for the life of me seem to track down the source. While trying to rewrite a function I ran across this problem:
a = -j
x = real(a)
y = imag(a)
y/x
Which spits out Inf, unexpectedly for me. However...
a = 0
b = -1
b/a
returns -Inf, like I would expect. Inquiring further, a == x, b == y. Clearly that isn't true however. I finally tracked down the problem to this after a lot of frustration. If the original input for a is instead 0-j (vs. -j) then there is no problem.
Both real(-j) and real(0-j) return zero and test as zero, but obviously seem to retain some metadata relating to their origin that I absolutely cannot discover. What precisely am I missing here? It will feel downright wrong if I have to solve this with something like if (x == 0) then x = 0;
Not metadata, just the sign bit of the double precision float.
>> a = 0-j;
>> b = -j;
>> ra = real(a)
ra =
0
>> rb = real(b)
rb =
0
>> ra==0
ans =
1
>> isequal(ra,rb)
ans =
1
Looks the same so far. However, the difference is that with b, we set the sign bit for both the real and imaginary parts when we do -j = -complex(0,1) vs. 0-j = complex(0,-1) (see Creating Complex Numbers). Looking deeper with typecast, which does no conversion of the underlying data:
>> dec2bin(typecast(ra,'uint64'),64)
ans =
0000000000000000000000000000000000000000000000000000000000000000
>> dec2bin(typecast(rb,'uint64'),64)
ans =
1000000000000000000000000000000000000000000000000000000000000000
That 1 is bit 63 (of 0) in the IEEE 754 double precision floating point representation:
Voila! -0 exists in MATLAB too!
When using IEEE 754 floating point numbers there is a convention to have a number approaching zero that cannot be represented by the smallest possible float called an underflow where the precision of the number is being lost with each step below the smallest possible float. Some operating systems will consider the underflow to be equal to zero.
I was surprised to be testing some software and find that a threshold test of zero actually went below zero almost as far as the smallest possible negative float.
Perhaps this is why your getting a negative infinity instead of a divide by zero error which I am assuming is the problem your referring to.
newT = [b(i) d(i) a(i) z(i)];
newT, b(i), a(i)
Prints
newT =
123 364 123 902
ans =
1.234e+02
ans =
1.234e+02
What is the problem here? Why are the first and third entry in newT rounded to integer values? Why aren't they correctly assigned?
Unlike most other programming languages, integer types in Matlab take precedence over floating point types. When you combine them, either through concatenation or arithmetic, the floating point values are implicitly narrowed to integers, instead of the integers being widened to floating point.
>> int32(3) + 0.4
ans =
3
>> [int32(3) 0.4]
ans =
3 0
This is for historical reasons, because (IIRC) Matlab originally didn't have support for integers at all, so all numeric constants in Matlab produce double values, and the promotion rules were created to make it possible to mix integer types with floating-point constants.
To fix this, explicitly convert those int types to doubles before concatenating.
newT = [b(i) double(d(i)) a(i) double(z(i))];