I was using the curve_fit function to find two coefficients and could not get a result until I altered something called maxfev to be a much larger value, since my error was that 'maxfev=600 has been reached', I took a total guess and added maxfev=10000 into my curve_fit function and this seemed to work.
My question is: what is maxfev? What does it do, how does it work, and how has this affected my data?
The function curve_fit is a wrapper around leastsq (both from the scipy.optimize library). The parameter that you are adjusting specifies how many times the parameters for the model that you are trying to fit are allowed to be altered, while the program is attempting to find a local minimum (see below example).
data = [(1,0),(2,1),(3,2),(4,3)...]
model = a*x+b
Let us assume that you initialize the a and b to 0. The program attempts it once, gets a given array of leastsquares back, then the program will attempt to alter either a or b and run it again. This repeats itself until an optimal value for a and b were found (yielding the lowest least squares, which should be a=1 and b=-1).
The fact that your program can not find the optimal value after 600 alterations of the parameters is a clear indication that you are fitting the wrong model.
PS: Your problem has nothing to do with the IPython Notebook
Related
I have a differential equation that's a function of around 30 constants. The differential equation is a system of (N^2+1) equations (where N is typically 4). Solving this system produces N^2+1 functions.
Often I want to see how the solution of the differential equation functionally depends on constants. For example, I might want to plot the maximum value of one of the output functions and see how that maximum changes for each solution of the differential equation as I linearly increase one of the input constants.
Is there a particularly clean method of doing this?
Right now I turn my differential-equation-solving script into a large function that returns an array of output functions. (Some of the inputs are vectors & matrices). For example:
for i = 1:N
[OutputArray1(i, :), OutputArray2(i, :), OutputArray3(i, :), OutputArray4(i, :), OutputArray5(i, :)] = DE_Simulation(Parameter1Array(i));
end
Here I loop through the function. The function solves a differential equation, and then returns the set of solution functions for that input parameter, and then each is appended as a row to a matrix.
There are a few issues I have with my method:
If I want to see the solution to the differential equation for a different parameter, I have to redefine the function so that it is an input of one of the thirty other parameters. For the sake of code readability, I cannot see myself explicitly writing all of the input parameters as individual inputs. (Although I've read that structures might be helpful here, but I'm not sure how that would be implemented.)
I typically get lost in parameter space and often have to update the same parameter across multiple scripts. I have a script that runs the differential-equation-solving function, and I have a second script that plots the set of simulated data. (And I will save the local variables to a file so that I can load them explicitly for plotting, but I often get lost figuring out which file is associated with what set of parameters). The remaining parameters that are not in the input of the function are inside the function itself. I've tried making the parameters global, but doing so drastically slows down the speed of my code. Additionally, some of the inputs are arrays I would like to plot and see before running the solver. (Some of the inputs are time-dependent boundary conditions, and I often want to see what they look like first.)
I'm trying to figure out a good method for me to keep track of everything. I'm trying to come up with a smart method of saving generated figures with a file tag that displays all the parameters associated with that figure. I can save such a file as a notepad file with a generic tagging-number that's listed in the title of the figure, but I feel like this is an awkward system. It's particularly awkward because it's not easy to see what's different about a long list of 30+ parameters.
Overall, I feel as though what I'm doing is fairly simple, yet I feel as though I don't have a good coding methodology and consequently end up wasting a lot of time saving almost-identical functions and scripts to solve fairly simple tasks.
It seems like what you really want here is something that deals with N-D arrays instead of splitting up the outputs.
If all of the OutputArray_ variables have the same number of rows, then the line
for i = 1:N
[OutputArray1(i, :), OutputArray2(i, :), OutputArray3(i, :), OutputArray4(i, :), OutputArray5(i, :)] = DE_Simulation(Parameter1Array(i));
end
seems to suggest that what you really want your function to return is an M x K array (where in this case, K = 5), and you want to pack that output into an M x K x N array. That is, it seems like you'd want to refactor your DE_Simulation to give you something like
for i = 1:N
OutputArray(:,:,i) = DE_Simulation(Parameter1Array(i));
end
If they aren't the same size, then a struct or a table is probably the best way to go, as you could assign to one element of the struct array per loop iteration or one row of the table per loop iteration (the table approach would assume that the size of the variables doesn't change from iteration to iteration).
If, for some reason, you really need to have these as separate outputs (and perhaps later as separate inputs), then what you probably want is a cell array. In that case you'd be able to deal with the variable number of inputs doing something like
for i = 1:N
[OutputArray{i, 1:K}] = DE_Simulation(Parameter1Array(i));
end
I hesitate to even write that, though, because this almost certainly seems like the wrong data structure for what you're trying to do.
I've been working with matrices over GF(2) in Matlab. Well, I've been working with 0/1 matrices that I've been treating as being defined over GF(2). I was surprised/happy to see that Matlab provides some functionality in the Communications System Toolbox for working over finite fields. In particular, if I want to find the rank of a matrix over a finite field, there are a couple of methods: (1) use gfrank on the matrices that I already have defined, or (2) use rank on a Galois field array (created with gf). For matrices over GF(2), the former method seems to be significantly faster; however, there's a problem...
The documentation for gfrank says that the function doesn't work over fields of the form GF(2^m). I double checked on a toy example, and specifying GF(2) as the field to work over seems to output correct results. Moreover, the function's m-file specifies GF(2) as the default field (by specifying the second argument as 2 if nargin < 2). Something has to be wrong here, and it seems to be the documentation. However, I'd hate to assume that the documentation is wrong only to find out much later that the computation doesn't always work over GF(2^m). Does anybody know for sure what's wrong here? Thanks for your help.
I am having some issues with fminsearch of matlab. I have defined the TolX and TolFun as following
options = optimset('TolFun',1e-8, 'TolX', 1e-8)
Then I tried to estimate the parameters of my functions using
[estimates val] = fminsearch(model, start_point,options)
However, the val is around 3.3032e-04. Even though I specified the TolFun to be 1e-8, it still terminates before that with value around 3.3032e-04. Actually, the desired value of the parameter is obtained at something around 1.268e-04. So I tried to set the TolFun. Why is it not working, it should have converged to the least value of the function isn't it?
There are other reasons for termination of the search, for example, max number of function evaluations, max number of iterations, etc. fminsearch provides additional output arguments that give you information about the reason for termination. You especially want the full OUTPUT argument, which provides number of iterations, termination message, etc.
[X,FVAL,EXITFLAG,OUTPUT] = fminsearch(...) returns a structure
OUTPUT with the number of iterations taken in OUTPUT.iterations, the
number of function evaluations in OUTPUT.funcCount, the algorithm name
in OUTPUT.algorithm, and the exit message in OUTPUT.message.
Another possibility is that you've gotten stuck in local minimum. There's not much to be done for that problem, except to choose a different start point, or a different optimizer.
I hope I am on topic here. I'm asking here since it said on the faq page: a question concerning (among others) a software algorithm :) So here it goes:
I need to solve a system of ODEs (like $ \dot x = A(t) x$. The Matrix A may change and is given as a string in the function call (Calc_EDS_v2('Sys_EDS_a',...)
Then I'm using ode45 in a loop to find my x:
function [intervals, testing] = EDS_calc_v2(smA,options,debug)
[..]
for t=t_start:t_step:t_end)
[Te,Qe]=func_int(#intQ_2_v2,[t,t+t_step],q);
q=Qe(end,:);
[..]
end
[..]
with func_int being ode45 and #intQ_2_v2 my m-file. q is given back to the call as the starting vector. As you can see I'm just using ode45 on the intervall [t, t+t_step]. That's because my system matrix A can force ode45 to use a lot of steps, leading it to hit the AbsTol or RelTol very fast.
Now my A is something like B(t)*Q(t), so in the m-file intQ_2_v2.m I need to evaluate both B and Q at the times t.
I first done it like so: (v1 -file, so function name is different)
function q=intQ_2_v1(t,X)
[..]
B(1)=...; ... B(4)=...;
Q(1)=...; ...
than that is naturally only with the assumption that A is a 2x2 matrix. With that setup it took a basic system somewhere between 10 and 15 seconds to compute.
Instead of the above I now use the files B1.m to B4.m and Q1.m to B4.m (I know that that's not elegant, but I need to use quadgk on B later and quadgk doesn't support matrix functions.)
function q=intQ_2_v2(t,X)
[..]
global funcnameQ, funcnameB, d
for k=1:d
Q(k)=feval(str2func([funcnameQ,int2str(k)]),t);
B(k)=feval(str2func([funcnameB,int2str(k)]),t);
end
[..]
funcname (string) referring to B or Q (with added k) and d is dimension of the system.
Now I knew that it would cost me more time than the first version but I'm seeing the computing times are ten times as high! (getting 150 to 160 seconds) I do understand that opening 4 files and evaluate roughly 40 times per ode-loop is costly... and I also can't pre-evalute B and Q, since ode45 uses adaptive step sizes...
Is there a way to not use that last loop?
Mostly I'm interested in a solution to drive down the computing times. I do have a feeling that I'm missing something... but can't really put my finger on it. With that one taking nearly three minutes instead of 10 seconds I can get a coffee in between each testrun now... (plz don't tell me to get a faster computer)
(sorry for such a long question )
I'm not sure that I fully understand what you're doing here, but I can offer a few tips.
Use the profiler, it will help you understand exactly where the bottlenecks are.
Using feval is slower than using function handles directly, especially when using str2func to build the handle each time. There is also a slowdown from using the global variables (and it's a good habit to avoid these unless absolutely necessary). Each of these really adds up when using them repeatedly (as it looks like here). Store function handles to each of your mfiles in a cell array and either pass them directly to the function or use nested function for the optimization so that the cell array of handles is visible to the function being optimized. Personally, I prefer the nested method, but passing is better if you will use those mfiles elsewhere.
I expect this will get your runtime back to close to what the first method gave. Be sure to tell us if this was the problem or if you found another solution.
I am experiencing problems when I compare results from different runs of my Matlab software with the same input. To narrow the problem, I did the following:
save all relevant variables using Matlab's save() method
call a method which calculates something
save all relevant output variables again using save()
Without changing the called method, I did another run with
load the variables saved above and compare with the current input variables using isequal()
call my method again with the current input variables
load the out variables saved above and compare.
I can't believe the comparison in the last "line" detects slight differences. The calculations include single and double precision numbers, the error is in the magnitude of 1e-10 (the output is a double number).
The only possible explanation I could imagine is that either Matlab looses some precision when saving the variables (which I consider very unlikely, I use the default binary Matlab format) or that there are calculations included like a=b+c+d, which can be calculated as a=(b+c)+d or a=b+(c+d) which might lead to numerical differences.
Do you know what might be the reason for the observations described above?
Thanks a lot!
it really seems to be caused by the single/double mix in the calculations. Since I have switched to double precision only, the problem did not occur anymore. Thanks to everybody for your thoughts.
these could be rounding errors. you can find the floating point accuracy of you system like so:
>> eps('single')
ans =
1.1921e-07
On my system this reports 10^-7 which would explain discrepancies of your order
To ensure reproducible results, especially if you are using any random generating functions (either directly or indirectly), you should restore the same state at the beginning of each run:
%# save state (do this once)
defaultStream = RandStream.getDefaultStream;
savedState = defaultStream.State;
save rndStream.mat savedState
%# load state (do this at before each run)
load rndStream.mat savedState
defaultStream = RandStream.getDefaultStream();
defaultStream.State = savedState;