Numerical problems in Matlab: Same input, same code -> different output? - matlab

I am experiencing problems when I compare results from different runs of my Matlab software with the same input. To narrow the problem, I did the following:
save all relevant variables using Matlab's save() method
call a method which calculates something
save all relevant output variables again using save()
Without changing the called method, I did another run with
load the variables saved above and compare with the current input variables using isequal()
call my method again with the current input variables
load the out variables saved above and compare.
I can't believe the comparison in the last "line" detects slight differences. The calculations include single and double precision numbers, the error is in the magnitude of 1e-10 (the output is a double number).
The only possible explanation I could imagine is that either Matlab looses some precision when saving the variables (which I consider very unlikely, I use the default binary Matlab format) or that there are calculations included like a=b+c+d, which can be calculated as a=(b+c)+d or a=b+(c+d) which might lead to numerical differences.
Do you know what might be the reason for the observations described above?
Thanks a lot!

it really seems to be caused by the single/double mix in the calculations. Since I have switched to double precision only, the problem did not occur anymore. Thanks to everybody for your thoughts.

these could be rounding errors. you can find the floating point accuracy of you system like so:
>> eps('single')
ans =
1.1921e-07
On my system this reports 10^-7 which would explain discrepancies of your order

To ensure reproducible results, especially if you are using any random generating functions (either directly or indirectly), you should restore the same state at the beginning of each run:
%# save state (do this once)
defaultStream = RandStream.getDefaultStream;
savedState = defaultStream.State;
save rndStream.mat savedState
%# load state (do this at before each run)
load rndStream.mat savedState
defaultStream = RandStream.getDefaultStream();
defaultStream.State = savedState;

Related

Clean methodology for running a function for a large set of input parameters (in Matlab)

I have a differential equation that's a function of around 30 constants. The differential equation is a system of (N^2+1) equations (where N is typically 4). Solving this system produces N^2+1 functions.
Often I want to see how the solution of the differential equation functionally depends on constants. For example, I might want to plot the maximum value of one of the output functions and see how that maximum changes for each solution of the differential equation as I linearly increase one of the input constants.
Is there a particularly clean method of doing this?
Right now I turn my differential-equation-solving script into a large function that returns an array of output functions. (Some of the inputs are vectors & matrices). For example:
for i = 1:N
[OutputArray1(i, :), OutputArray2(i, :), OutputArray3(i, :), OutputArray4(i, :), OutputArray5(i, :)] = DE_Simulation(Parameter1Array(i));
end
Here I loop through the function. The function solves a differential equation, and then returns the set of solution functions for that input parameter, and then each is appended as a row to a matrix.
There are a few issues I have with my method:
If I want to see the solution to the differential equation for a different parameter, I have to redefine the function so that it is an input of one of the thirty other parameters. For the sake of code readability, I cannot see myself explicitly writing all of the input parameters as individual inputs. (Although I've read that structures might be helpful here, but I'm not sure how that would be implemented.)
I typically get lost in parameter space and often have to update the same parameter across multiple scripts. I have a script that runs the differential-equation-solving function, and I have a second script that plots the set of simulated data. (And I will save the local variables to a file so that I can load them explicitly for plotting, but I often get lost figuring out which file is associated with what set of parameters). The remaining parameters that are not in the input of the function are inside the function itself. I've tried making the parameters global, but doing so drastically slows down the speed of my code. Additionally, some of the inputs are arrays I would like to plot and see before running the solver. (Some of the inputs are time-dependent boundary conditions, and I often want to see what they look like first.)
I'm trying to figure out a good method for me to keep track of everything. I'm trying to come up with a smart method of saving generated figures with a file tag that displays all the parameters associated with that figure. I can save such a file as a notepad file with a generic tagging-number that's listed in the title of the figure, but I feel like this is an awkward system. It's particularly awkward because it's not easy to see what's different about a long list of 30+ parameters.
Overall, I feel as though what I'm doing is fairly simple, yet I feel as though I don't have a good coding methodology and consequently end up wasting a lot of time saving almost-identical functions and scripts to solve fairly simple tasks.
It seems like what you really want here is something that deals with N-D arrays instead of splitting up the outputs.
If all of the OutputArray_ variables have the same number of rows, then the line
for i = 1:N
[OutputArray1(i, :), OutputArray2(i, :), OutputArray3(i, :), OutputArray4(i, :), OutputArray5(i, :)] = DE_Simulation(Parameter1Array(i));
end
seems to suggest that what you really want your function to return is an M x K array (where in this case, K = 5), and you want to pack that output into an M x K x N array. That is, it seems like you'd want to refactor your DE_Simulation to give you something like
for i = 1:N
OutputArray(:,:,i) = DE_Simulation(Parameter1Array(i));
end
If they aren't the same size, then a struct or a table is probably the best way to go, as you could assign to one element of the struct array per loop iteration or one row of the table per loop iteration (the table approach would assume that the size of the variables doesn't change from iteration to iteration).
If, for some reason, you really need to have these as separate outputs (and perhaps later as separate inputs), then what you probably want is a cell array. In that case you'd be able to deal with the variable number of inputs doing something like
for i = 1:N
[OutputArray{i, 1:K}] = DE_Simulation(Parameter1Array(i));
end
I hesitate to even write that, though, because this almost certainly seems like the wrong data structure for what you're trying to do.

What does 'maxfev' do in iPython Notebook?

I was using the curve_fit function to find two coefficients and could not get a result until I altered something called maxfev to be a much larger value, since my error was that 'maxfev=600 has been reached', I took a total guess and added maxfev=10000 into my curve_fit function and this seemed to work.
My question is: what is maxfev? What does it do, how does it work, and how has this affected my data?
The function curve_fit is a wrapper around leastsq (both from the scipy.optimize library). The parameter that you are adjusting specifies how many times the parameters for the model that you are trying to fit are allowed to be altered, while the program is attempting to find a local minimum (see below example).
data = [(1,0),(2,1),(3,2),(4,3)...]
model = a*x+b
Let us assume that you initialize the a and b to 0. The program attempts it once, gets a given array of leastsquares back, then the program will attempt to alter either a or b and run it again. This repeats itself until an optimal value for a and b were found (yielding the lowest least squares, which should be a=1 and b=-1).
The fact that your program can not find the optimal value after 600 alterations of the parameters is a clear indication that you are fitting the wrong model.
PS: Your problem has nothing to do with the IPython Notebook

Logical indexing and double precision numbers

I am trying to solve a non-linear system of equations using the Newton-Raphson iterative method, and in order to explore the parameter space of my variables, it is useful to store the previous solutions and use them as my first initial guess so that I stay in the basin of attraction.
I currently save my solutions in a structure array that I store in a .mat file, in about this way:
load('solutions.mat','sol');
str = struct('a',Param1,'b',Param2,'solution',SolutionVector);
sol=[sol;str];
save('solutions.mat','sol');
Now, I do another run, in which I need the above solution for different parameters NewParam1 and NewParam2. If Param1 = NewParam1-deltaParam1, and Param2 = NewParam2 - deltaParam2, then
load('solutions.mat','sol');
index = [sol.a]== NewParam1 - deltaParam1 & [sol.b]== NewParam2 - deltaParam2;
% logical index to find solution from first block
SolutionVector = sol(index).solution;
I sometimes get an error message saying that no such solution exists. The problem lies in the double precisions of my parameters, since 2-1 ~= 1 can happen in Matlab, but I can't seem to find an alternative way to achieve the same result. I have tried changing the numerical parameters to strings in the saving process, but then I ran into problems with logical indexing with strings.
Ideally, I would like to avoid multiplying my parameters by a power of 10 to make them integers as this will make the code quite messy to understand due to the number of parameters. Other than that, any help will be greatly appreciated. Thanks!
You should never use == when comparing double precision numbers in MATLAB. The reason is, as you state in the the question, that some numbers can't be represented precisely using binary numbers the same way 1/3 can't be written precisely using decimal numbers.
What you should do is something like this:
index = abs([sol.a] - (NewParam1 - deltaParam1)) < 1e-10 & ...
abs([sol.b] - (NewParam2 - deltaParam2)) < 1e-10;
I actually recommend not using eps, as it's so small that it might actually fail in some situations. You can however use a smaller number than 1e-10 if you need a very high level of accuracy (but how often do we work with numbers less than 1e-10)?

How to determine if an output of a function-call is unused?

Say I have a function foo that can return three values given an input:
function [a,b,c] = foo(input)
The calculations of variables b and c take a long time, so sometimes I may wish to ignore their calculation within foo. If I want to ignore both calculations, I simply call the function like this:
output1 = foo(input);
and then include nargout within foo:
if nargout == 1
% Code to calculate "a" only
else
% Code to calculate other variables
The problem occurs if I want to calculate the last output, but not the second. In that case my function call would be:
[output1,~,output3] = foo(input);
Now if I use nargout within foo to check how many outputs are in the function-call, it will always return 3 because the tilde operator (~) is considered a valid output. Therefore, I cannot use nargout to determine whether or not to calculate the second output, b, within foo.
Is there any other way to do this? I.e., is it possible to check which outputs of a function-call are discarded from within the function itself?
The commenters are basically right; this is not something that can be totally solved by the user unless The MathWorks adds functionality. However, I wrote a small function, istilde (Wayback Machine Archive), a while back that attempts to do what you ask. It works in many cases, but it's really a bit of hack and not a totally robust solution. For example, I did not attempt to get it to work for functions called from the Command Window directly (this could potentially be added with some work). Also, it relies on parsing the actual M-file, which can have issues. See the included demo file for how one might use istilde.
Feel free to edit my code for your needs – just don't use this in any production code due to the robustness issues. Any improvements would be welcome.

'exact' numerical value after numerical optimization MATLAB

I'm having the following issue with my code. I've been trying to use some other posts that I found on line, like this one. But they didn't what I'm looking for.
My code uses a MATLAB Exchange function which optimize a numerical value that is important to be with 32 digits after the dot such as
0.59329669191989231613604260928696
The optimization function can be found here and it is called fminsearchbnd
The optimization function calculate this and store the value in a variable that I use all over my code. In order not to perform the optimization everytime I want to store the variable (I tried either on a *.mat and on a label in the string form.
But when I retrieve it, MATLAB transforms it in a double precision variable 'cutting' all the numbers after the 14th. However I need all of them because they are important!
Is it possible to read a number like that w/o using vpa() because with a symbolic value I can't do anything.
Any help is really appreciated.
Thanks
EDIT:
fminsearchbnd gives me this class(bb) -> double and when I want to see it on the workspace it is 0.586675392365899. But when I set formatSpec = '%.32f\n'; because I want to see all the numbers that the optimization gives me, typing set(editLabel,'String',num2str(bb,formatSpec))
You're trying to store/use a number that cannot be represented exactly in an IEEE754 64-bit double-precision floating point number.
I'm not sure how you got that number without using vpa() in the first place, since 64-bit double is Matlab's maximum of precision...
You could use the multiple precision toolbox by Ben Barrowes, or HPF by John d'Errico, also from the FEX. You'll have to convert/construct to/from string if you want to store/load it to/from file.
But I have to agree with John's comment there:
The fact is, most of the time, if you can't do it with a double, you
are doing something wrong
so...why exactly are those 32-or-more digits important?