How to recognize overflow bugs in Matlab? - matlab

I spent part of yesterday and today tracking down a bug in some Matlab code. I had thought my problem was indexing (with many structures that I didn't define and am still getting used to), but it turned out to be an overflow bug. I missed this for a very specific reason:
>> uint8(2) - uint8(1)
ans =
1
>> uint8(2) - uint8(2)
ans =
0
>> uint8(2) - uint8(3)
ans =
0
I would have expected the last one to be something like -1 (or 255). In the middle of a big vector, the erroneous 0s were difficult to detect, but a 255 would have stood out easily.
Any tips on how to detect these problems easily in the future? (Ideally, I'd like to turn off the overflow checking to make it work like C.) Changing to double works, of course, but if I don't realize it's a uint8 to begin with, that doesn't help.

You can start by turning on integer warnings:
intwarning('on')
This will give you a warning when integer arithmetic overflows.
Beware though, as outlined here, this does slow down integer arithmetic so only use this during debug.

Starting with release R2010b and later, the function INTWARNING has been removed, along with these warning messages for integer math and conversion:
MATLAB:intConvertNaN
MATLAB:intConvertNonIntVal
MATLAB:intConvertOverflow
MATLAB:intMathOverflow
So using INTWARNING is no longer a viable option for determining when integer overflows occur. An alternative is to use the CLASS function to test the class of your data and recast it accordingly before performing the operation. Here's an example:
if strcmp(class(data),'uint8') %# Check if data is a uint8
data = double(data); %# Convert data to a double
end
You could also use the ISA function as well:
if ~isa(data,'single') %# Check if data is not a single
data = single(data); %# Convert data to a single
end

See INTWARNING function to control warnings on integer operations.
http://www.mathworks.com/access/helpdesk/help/techdoc/ref/intwarning.html

Related

Accuracy error in binomials using MATLAB?

The value is absolute integer, not a floating point to be doubted, also, it is not about an overflow since a double value can hold until 2^1024.
fprintf('%f',realmax)
179769313486231570000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000
The problem I am facing in nchoosek function that it doesn't produce exact values
fprintf('%f\n',nchoosek(55,24));
2488589544741302.000000
While it is a percentage error of 2 regarding that binomian(n,m)=binomial(n-1,m)+binomial(n-1,m-1) as follows
fprintf('%f',nchoosek(55-1,24)+nchoosek(55-1,24-1))
2488589544741301.000000
Ps: The exact value is 2488589544741300
this demo shows
What is wrong with MATLAB?
Your understanding of the realmax function is wrong. It's the maximum value which can be stored, but with such large numbers you have a floating point precision error far above 1. The first integer which can not be stored in a double value is 2^53+1, try 2^53==2^53+1 for a simple example.
If the symbolic toolbox is available, the easiest to implement solution is using it:
>> nchoosek(sym(55),sym(24))
ans =
2488589544741300
There is a difference between something that looks like an integer (55) and something that's actually an integer (in terms of variable type).
The way you're calculating it, your values are stored as floating point (which is what realmax is pointing you to - the largest positive floating point number - check intmax('int64') for the largest possible integer value), so you can get floating point errors. An absolute difference of 2 in a large value is not that unexpected - the actual percentage error is tiny.
Plus, you're using %f in your format string - e.g. asking it to display as floating point.
For nchoosek specifically, from the docs, the output is returned as a nonnegative scalar value, of the same type as inputs n and k, or, if they are different types, of the non-double type (you can only have different input types if one is a double).
In Matlab, when you type a number directly into a function input, it generally defaults to a float. You have to force it to be an integer.
Try instead:
fprintf('%d\n',nchoosek(int64(55),int64(24)));
Note: %d not %f, converting both inputs to specifically integer. The output of nchoosek here should be of type int64.
I don't have access to MATLAB, but since you're obviously okay working with Octave I'll post my observations based on that.
If you look at the Octave source code using edit nchoosek or here you'll see that the equation for calculating the binomial coefficient is quite simple:
A = round (prod ((v-k+1:v)./(1:k)));
As you can see, there are k divisions, each with the possibility of introducing some small error. The next line attempts to be helpful and warn you of the possibility of loss of precision:
if (A*2*k*eps >= 0.5)
warning ("nchoosek", "nchoosek: possible loss of precision");
So, if I may slightly modify your final question, what is wrong with Octave? I would say nothing is wrong. The authors obviously knew of the possibility of imprecision and included a check to warn users when that possibility arises. So the function is working as intended. If you require greater precision for your application than the built-in function provides, it looks as though you'll need to code (or find) something that calculates the intermediate results with greater precision.

Matlab not taking last number in array?

I've got a Subscripted assignment dimension mismatch problem.
I've already localized the issue, and know exactly what is going on, I just don't know why.
Here's the problematic piece of code:
mFrames(:,i) = vSignal(round(start:1:frameLength*samplingRate));
start=start+frameShift*samplingRate;
frameLength = frameLength+frameShift;
I've already checked what's going on in debugmode; usually my resulting column length of mFrames is 128, this stays the same until i=1004. Then, my column length changes to 127.
I've checked the values involved while in debug mode, and it simply does not make sense what is going on. At i==1004 start=32097 and frameLength*samplingRate=32224.
That's a difference of 127 meaning 128 points, that should work.
BUT when i assign a vector A=round(start:1:frameLength*samplingRate)
OR B=start:1:frameLength*samplingRate
In both cases I get a vector going from 32097 to 32223. This ALTHOUGH when I give in frameLength*samplingRate matlab is giving me 32224.
In other words, matlab is telling me it's using one number, but when I test I find it's using a different one.
Any help appreciated.
I suspect your 32224 is not actually 32224. MATLAB's default format only displays so many decimal places, so when dealing with floating point numbers, what is printed on screen is not necessarily the "exact" value.
Let's go back a step and look at how the synatx x = start:step:end works.
1:1:10 should give us numbers in steps of 1 from 1 to 10. Fair enough, that makes sense. What if we set the end value to something that's slightly above 10?
e.g.:
1:1:10.1
Well, it still gives us 1:1:10, (or 1:10, 1 being the default step) because we can't have values higher than the end-point, so 11 isn't a correct step.
So what about this:
1:1:9.99
Spoiler: it's the same as 1:9
And this?
1:1:9.9999999
Yep, still 1:9
But if we do this:
a = 9.9999999;
Then with default format, the value of a will be shown on the command line and in your list of workspace variables as 10.0000.
Now, if frameLength and samplingRate are both stored as floating point numbers, it's possible that the number you see as 32224 is not 32224 but very slightly below that. You can check this by changing your default format - e.g. format long at the command line - to show more decimal places.
The simplest solution is probably to do something like:
B=start:1:round(frameLength*samplingRate)
Or try to store the relevant values as integers (e.g., uint32).

Logical indexing and double precision numbers

I am trying to solve a non-linear system of equations using the Newton-Raphson iterative method, and in order to explore the parameter space of my variables, it is useful to store the previous solutions and use them as my first initial guess so that I stay in the basin of attraction.
I currently save my solutions in a structure array that I store in a .mat file, in about this way:
load('solutions.mat','sol');
str = struct('a',Param1,'b',Param2,'solution',SolutionVector);
sol=[sol;str];
save('solutions.mat','sol');
Now, I do another run, in which I need the above solution for different parameters NewParam1 and NewParam2. If Param1 = NewParam1-deltaParam1, and Param2 = NewParam2 - deltaParam2, then
load('solutions.mat','sol');
index = [sol.a]== NewParam1 - deltaParam1 & [sol.b]== NewParam2 - deltaParam2;
% logical index to find solution from first block
SolutionVector = sol(index).solution;
I sometimes get an error message saying that no such solution exists. The problem lies in the double precisions of my parameters, since 2-1 ~= 1 can happen in Matlab, but I can't seem to find an alternative way to achieve the same result. I have tried changing the numerical parameters to strings in the saving process, but then I ran into problems with logical indexing with strings.
Ideally, I would like to avoid multiplying my parameters by a power of 10 to make them integers as this will make the code quite messy to understand due to the number of parameters. Other than that, any help will be greatly appreciated. Thanks!
You should never use == when comparing double precision numbers in MATLAB. The reason is, as you state in the the question, that some numbers can't be represented precisely using binary numbers the same way 1/3 can't be written precisely using decimal numbers.
What you should do is something like this:
index = abs([sol.a] - (NewParam1 - deltaParam1)) < 1e-10 & ...
abs([sol.b] - (NewParam2 - deltaParam2)) < 1e-10;
I actually recommend not using eps, as it's so small that it might actually fail in some situations. You can however use a smaller number than 1e-10 if you need a very high level of accuracy (but how often do we work with numbers less than 1e-10)?

Matlab not accepting whole number as index

I am using a while loop with an index t starting from 1 and increasing with each loop.
I'm having problems with this index in the following bit of code within the loop:
dt = 100000^(-1);
t = 1;
equi = false;
while equi==false
***some code that populates the arrays S(t) and I(t)***
t=t+1;
if (t>2/dt)
n = [S(t) I(t)];
np = [S(t-1/dt) I(t-1/dt)];
if sum((n-np).^2)<1e-5
equi=true;
end
end
First, the code in the "if" statement is accessed at t==200000 instead of at t==200001.
Second, the expression S(t-1/dt) results in the error message "Subscript indices must either be real positive integers or logicals", even though (t-1/dt) is whole and equals 1.0000e+005 .
I guess I can solve this using "round", but this worked before and suddenly doesn't work and I'd like to figure out why.
Thanks!
the expression S(t-1/dt) results in the error message "Subscript indices must either be real positive integers or logicals", even though (t-1/dt) is whole and equals 1.0000e+005
Is it really? ;)
mod(200000 - 1/dt, 1)
%ans = 1.455191522836685e-11
Your index is not an integer. This is one of the things to be aware of when working with floating point arithmetic. I suggest reading this excellent resource: "What every computer scientist should know about floating-point Arithmetic".
You can either use round as you did, or store 1/dt as a separate variable (many options exist).
Matlab is lying to you. You're running into floating point inaccuracies and Matlab does not have an honest printing policy. Try printing the numbers with full precision:
dt = 100000^(-1);
t = 200000;
fprintf('2/dt == %.12f\n',2/dt) % 199999.999999999971
fprintf('t - 1/dt == %.12f\n',t - 1/dt) % 100000.000000000015
While powers of 10 are very nice for us to type and read, 1e-5 (your dt) cannot be represented exactly as a floating point number. That's why your resulting calculations aren't coming out as even integers.
The statement
S(t-1/dt)
can be replaced by
S(uint32(t-1/dt))
And similarly for I.
Also you might want to save 1/dt hardcoded as 100000 as suggested above.
I reckon this will improve the comparison.

Numerical problems in Matlab: Same input, same code -> different output?

I am experiencing problems when I compare results from different runs of my Matlab software with the same input. To narrow the problem, I did the following:
save all relevant variables using Matlab's save() method
call a method which calculates something
save all relevant output variables again using save()
Without changing the called method, I did another run with
load the variables saved above and compare with the current input variables using isequal()
call my method again with the current input variables
load the out variables saved above and compare.
I can't believe the comparison in the last "line" detects slight differences. The calculations include single and double precision numbers, the error is in the magnitude of 1e-10 (the output is a double number).
The only possible explanation I could imagine is that either Matlab looses some precision when saving the variables (which I consider very unlikely, I use the default binary Matlab format) or that there are calculations included like a=b+c+d, which can be calculated as a=(b+c)+d or a=b+(c+d) which might lead to numerical differences.
Do you know what might be the reason for the observations described above?
Thanks a lot!
it really seems to be caused by the single/double mix in the calculations. Since I have switched to double precision only, the problem did not occur anymore. Thanks to everybody for your thoughts.
these could be rounding errors. you can find the floating point accuracy of you system like so:
>> eps('single')
ans =
1.1921e-07
On my system this reports 10^-7 which would explain discrepancies of your order
To ensure reproducible results, especially if you are using any random generating functions (either directly or indirectly), you should restore the same state at the beginning of each run:
%# save state (do this once)
defaultStream = RandStream.getDefaultStream;
savedState = defaultStream.State;
save rndStream.mat savedState
%# load state (do this at before each run)
load rndStream.mat savedState
defaultStream = RandStream.getDefaultStream();
defaultStream.State = savedState;