MATLAB how to make nlinfit() not quit out of loop and how to test distribution of results - matlab

Given some data (from the function 1/x), I solved for powers of a model function made of the sum of exponentials. I did this for several different samples of data and I want to plot the powers that I find and test to see what type of distribution they have.
My main problems are:
My loops are quitting when the nlinfit() function says the function gives -Inf values. I've tried playing around with initial conditions but can't see how to fix it. I know that I am working with the function 1/x, so there may not be a way around this problem. Is there at least a way to force the loop to keep going after getting this error? It only happens for some iterations. If I manually regenerate my sample data and run it again it works. So if I could force the loop to keep going it would be ideal. I have tried switching the for loop to a while loop and putting a try catch statement inside but this just seems to run endlessly.
I have code set to test how if the distribution of the resulting power is close to the exponential distribution. Is this the correct way to do this? When I add in the Monte Carlo tolerance ('MCTol') it has a drastic effect on p in some cases (going from 0.5 to 0.9). Do I have it set up correctly? Is there a way to also see empirically how far the curve is from the exponential distribution?
Are there better ways to fit this curve or do any of the above? I'm just learning so any suggestions are appreciated.
Here is my current code :
%Function that I use to fit curve and get error for each iteration
function[beta,error] = Fit_and_Calc_Error(T,X,modelfun, beta0)
opts = statset('nlinfit');
opts.RobustWgtFun = 'bisquare';
beta = nlinfit(T,X,modelfun,beta0,opts);
error = immse(X,modelfun(beta,T));
end
%Starting actual loop stuff
tracking_a = zeros();
tracking_error = zeros();
modelfun = #(a,x)(exp(-a(1)*x)+exp(-a(2)*x));
for i = 1:60
%make new random data
x = 20*rand(300,1)+1; %between 1 and 21 (Avoiding x=0)
X = sort(x);
Y = 1./(X);
%fit random data
beta0 = [max(X(:,1))/10;max(X(:,1))/10];
[beta,error]= Fit_and_Calc_Error(X,Y,modelfun, beta0);
%storing data found
tracking_a(i,1:length(beta)) = beta;
tracking_error(i) = error;
end
%Testing if first power found has exponential dist
pdN = fitdist(tracking_a(:,1),'Exponential');
histogram(tracking_a(:,1),10)
hold on
x_values2 = 0:0.001:5;
y2 = pdf(pdN,x_values2);
plot(x_values2,y2)
hold off
[h,p] = lillietest(tracking_a(:,1),'Distribution','exponential','MCTol',1e-4)
%Testing if second power found has exponential dist
pdN2 = fitdist(tracking_a(:,2),'Exponential');
histogram(tracking_a(:,2),10)
hold on
y3 = pdf(pdN2,x_values2);
plot(x_values2,y3)
hold off
[h2,p2] = lillietest(tracking_a(:,2),'Distribution','exponential','MCTol',1e-4)

Related

Finding the longest linear section of non-linear plot in MATLAB

Apologies for the long post but this takes a bit to explain. I'm trying to make a script that finds the longest linear portion of a plot. Sample data is in a csv file here, it is stress and strain data for calculating the shear modulus of 3D printed samples. The code I have so far is the following:
x_data = [];
y_data = [];
x_data = Data(:,1);
y_data = Data(:,2);
plot(x_data,y_data);
grid on;
answer1 = questdlg('Would you like to load last attempt''s numbers?');
switch answer1
case 'Yes'
[sim_slopes,reg_data] = regr_and_longest_part(new_x_data,new_y_data,str2num(answer2{3}),str2num(answer2{2}),K);
case 'No'
disp('Take a look at the plot, find a range estimate, and press any button to continue');
pause;
prompt = {'Eliminate values ABOVE this x-value:','Eliminate values BELOW this x-value:','Size of divisions on x-axis:','Factor for similarity of slopes:'};
dlg_title = 'Point elimination';
num_lines = 1;
defaultans = {'0','0','0','0.1'};
if isempty(answer2) < 1
defaultans = {answer2{1},answer2{2},answer2{3},answer2{4}};
end
answer2 = inputdlg(prompt,dlg_title,num_lines,defaultans);
uv_of_x_range = str2num(answer2{1});
lv_of_x_range = str2num(answer2{2});
x_div_size = str2num(answer2{3});
K = str2num(answer2{4});
close all;
iB = find(x_data > str2num(answer2{1}),1,'first');
iS = find(x_data > str2num(answer2{2}),1,'first');
new_x_data = x_data(iS:iB);
new_y_data = y_data(iS:iB);
[sim_slopes, reg_data] = regr_and_longest_part(new_x_data,new_y_data,str2num(answer2{3}),str2num(answer2{2}),K);
end
[longest_section0, Midx]= max(sim_slopes(:,4)-sim_slopes(:,3));
longest_section=1+longest_section0;
long_sec_x_data_start = x_div_size*(sim_slopes(Midx,3)-1)+lv_of_x_range;
long_sec_x_data_end = x_div_size*(sim_slopes(Midx,4)-1)+lv_of_x_range;
long_sec_x_data_start_idx=find(new_x_data >= long_sec_x_data_start,1,'first');
long_sec_x_data_end_idx=find(new_x_data >= long_sec_x_data_end,1,'first');
long_sec_x_data = new_x_data(long_sec_x_data_start_idx:long_sec_x_data_end_idx);
long_sec_y_data = new_y_data(long_sec_x_data_start_idx:long_sec_x_data_end_idx);
[b_long_sec, longes_section_reg_data] = robustfit(long_sec_x_data,long_sec_y_data);
plot(long_sec_x_data,b_long_sec(1)+b_long_sec(2)*long_sec_x_data,'LineWidth',3,'LineStyle',':','Color','k');
function [sim_slopes,reg_data] = regr_and_longest_part(x_points,y_points,x_div,lv,K)
reg_data = cell(1,3);
scatter(x_points,y_points,'.');
grid on;
hold on;
uv = lv+x_div;
ii=0;
while lv <= x_points(end)
if uv > x_points(end)
uv = x_points(end);
end
ii=ii+1;
indices = find(x_points>lv & x_points<uv);
temp_x_points = x_points((indices));
temp_y_points = y_points((indices));
if length(temp_x_points) <= 2
break;
end
[b,stats] = robustfit(temp_x_points,temp_y_points);
reg_data{ii,1} = b(1);
reg_data{ii,2} = b(2);
reg_data{ii,3} = length(indices);
plot(temp_x_points,b(1)+b(2)*temp_x_points,'LineWidth',2);
lv = lv+x_div;
uv = lv+x_div;
end
sim_slopes = NaN(length(reg_data),4);
sim_slopes(1,:) = [reg_data{1,1},0,1,1];
idx=1;
for ii=2:length(reg_data)
coff =sim_slopes(idx,1);
if abs(reg_data{ii,1}-coff) <= K*coff
C=zeros(ii-sim_slopes(idx,3)+1,1);
for kk=sim_slopes(idx,3):ii
C(kk)=reg_data{kk,1};
end
sim_slopes(idx,1)=mean(C);
sim_slopes(idx,2)=std(C);
sim_slopes(idx,4)=ii;
else
idx = idx + 1;
sim_slopes(idx,1)=reg_data{ii,1};
sim_slopes(idx,2)=0;
sim_slopes(idx,3)=ii;
sim_slopes(idx,4)=ii;
end
end
end
Apologies for the code not being well optimized, I'm still relatively new to MATLAB. I did not use derivatives because my data is relatively noisy and derivation might have made it worse.
I've managed to get the get the code to find the longest straight part of the plot by splitting the data up into sections called x_div_size then performing a robustfit on each section, the results of which are written into reg_data. The code then runs through reg_data and finds which lines have the most similar slopes, determined by the K factor, by calculating the average of the slopes in a section of the plot and makes a note of it in sim_slopes. It then finds the longest interval with max(sim_slopes(:,4)-sim_slopes(:,3)) and performs a regression on it to give the final answer.
The problem is that it will only consider the first straight portion that it comes across. When the data is plotted, it has a few parts where it seems straightest:
As an example, when I run the script with answer2 = {'0.2','0','0.0038','0.3'} I get the following, where the black line is the straightest part found by the code:
I have the following questions:
It's clear that from about x = 0.04 to x = 0.2 there is a long straight part and I'm not sure why the script is not finding it. Playing around with different values the script always seems to pick the first longest straight part, ignoring subsequent ones.
MATLAB complains that Warning: Iteration limit reached. because there are more than 50 regressions to perform. Is there a way to bypass this limit on robustfit?
When generating sim_slopes there might be section of the plot whose slope is too different from the average of the previous slopes so it gets marked as the end of a long section. But that section sometimes is sandwiched between several other sections on either side which instead have similar slopes. How would it be possible to tell the script to ignore one wayward section and to continue as if it falls within the tolerance allowed by the K value?
Take a look at the Douglas-Peucker algorithm. If you think of your (x,y) values as the vertices of an (open) polygon, this algorithm will simplify it for you, such that the largest distance from the simplified polygon to the original is smaller than some threshold you can choose. The simplified polygon will be the set of straight lines. Find the two vertices that are furthest apart, and you're done.
MATLAB has an implementation in the Mapping Toolbox called reducem. You might also find an implementation on the File Exchange (but be careful, there is also really bad code on there). Or, you can roll your own, it's quite a simple algorithm.
You can also try using the ischange function to detect changes in the intercept and slope of the data, and then extract the longest portion from that.
Using the sample data you provided, here is what I see from a basic attempt:
>> T = readtable('Data.csv');
>> T = rmmissing(T); % Remove rows with NaN
>> T = groupsummary(T,'Var1','mean'); % Average duplicate timestamps
>> [tf,slopes,intercepts] = ischange(T.mean_Var2, 'linear', 'SamplePoints', T.Var1); % find changes
>> plot(T.Var1, T.mean_Var2, T.Var1, slopes.*T.Var1 + intercepts)
which generates the plot
You should be able to extract the longest segment based on the indices given by find(tf).
You can also tune the parameters of ischange to get fewer or more segments. Adding the name-value pair 'MaxNumChanges' with a value of 4 or 5 produces more linear segments with a tighter fit to the curve, for example, which effectively removes the kink in the plot that you see.

Speeding up matlab for loop

I have a system of 5 ODEs with nonlinear terms involved. I am trying to vary 3 parameters over some ranges to see what parameters would produce the necessary behaviour that I am looking for.
The issue is I have written the code with 3 for loops and it takes a very long time to get the output.
I am also storing the parameter values within the loops when it meets a parameter set that satisfies an ODE event.
This is how I have implemented it in matlab.
function [m,cVal,x,y]=parameters()
b=5000;
q=0;
r=10^4;
s=0;
n=10^-8;
time=3000;
m=[];
cVal=[];
x=[];
y=[];
val1=0.1:0.01:5;
val2=0.1:0.2:8;
val3=10^-13:10^-14:10^-11;
for i=1:length(val1)
for j=1:length(val2)
for k=1:length(val3)
options = odeset('AbsTol',1e-15,'RelTol',1e-13,'Events',#eventfunction);
[t,y,te,ye]=ode45(#(t,y)systemFunc(t,y,[val1(i),val2(j),val3(k)]),0:time,[b,q,s,r,n],options);
if length(te)==1
m=[m;val1(i)];
cVal=[cVal;val2(j)];
x=[x;val3(k)];
y=[y;ye(1)];
end
end
end
end
Is there any other way that I can use to speed up this process?
Profile viewer results
I have written the system of ODEs simply with the a format like
function s=systemFunc(t,y,p)
s= zeros(2,1);
s(1)=f*y(1)*(1-(y(1)/k))-p(1)*y(2)*y(1)/(p(2)*y(2)+y(1));
s(2)=p(3)*y(1)-d*y(2);
end
f,d,k are constant parameters.
The equations are more complicated than what's here as its a system of 5 ODEs with lots of non linear terms interacting with each other.
Tommaso is right. Preallocating will save some time.
But I would guess that there is fundamentally not a lot you can do since you are running ode45 in a loop. ode45 itself may be the bottleneck.
I would suggest you profile your code to see where the bottleneck is:
profile on
parameters(... )
profile viewer
I would guess that ode45 is the problem. Probably you will find that you should actually focus your time on optimizing the systemFunc code for performance. But you won't know that until you run the profiler.
EDIT
Based on the profiler output and additional code, I see some things that will help
It seems like the vectorization of your values is hurting you. Instead of
#(t,y)systemFunc(t,y,[val1(i),val2(j),val3(k)])
try
#(t,y)systemFunc(t,y,val1(i),val2(j),val3(k))
where your system function is defined as
function s=systemFunc(t,y,p1,p2,p3)
s= zeros(2,1);
s(1)=f*y(1)*(1-(y(1)/k))-p1*y(2)*y(1)/(p2*y(2)+y(1));
s(2)=p3*y(1)-d*y(2);
end
Next, note that you don't have to preallocate space in the systemFunc, just combine them in the output:
function s=systemFunc(t,y,p1,p2,p3)
s = [ f*y(1)*(1-(y(1)/k))-p1*y(2)*y(1)/(p2*y(2)+y(1)),
p3*y(1)-d*y(2) ];
end
Finally, note that ode45 is internally taking about 1/3 of your runtime. There is not much you will be able to do about that. If you can live with it, I would suggest increasing your 'AbsTol' and 'RelTol' to more reasonable numbers. Those values are really small, and are making ode45 run for a really long time. If you can live with it, try increasing them to something like 1e-6 or 1e-8 and see how much the performance increases. Alternatively, depending on how smooth your function is, you might be able to do better with a different integrator (like ode23). But your mileage will vary based on how smooth your problem is.
I have two suggestions for you.
Preallocate the vectors in which you store your results and use an
increasing index to populate them into each iteration.
Since the options you use are always the same, instantiate then
outside the loop only once.
Final code:
function [m,cVal,x,y] = parameters()
b = 5000;
q = 0;
r = 10^4;
s = 0;
n = 10^-8;
time = 3000;
options = odeset('AbsTol',1e-15,'RelTol',1e-13,'Events',#eventfunction);
val1 = 0.1:0.01:5;
val1_len = numel(val1);
val2 = 0.1:0.2:8;
val2_len = numel(val2);
val3 = 10^-13:10^-14:10^-11;
val3_len = numel(val3);
total_len = val1_len * val2_len * val3_len;
m = NaN(total_len,1);
cVal = NaN(total_len,1);
x = NaN(total_len,1);
y = NaN(total_len,1);
res_offset = 1;
for i = 1:val1_len
for j = 1:val2_len
for k = 1:val3_len
[t,y,te,ye] = ode45(#(t,y)systemFunc(t,y,[val1(i),val2(j),val3(k)]),0:time,[b,q,s,r,n],options);
if (length(te) == 1)
m(res_offset) = val1(i);
cVal(res_offset) = val2(j);
x(res_offset) = val3(k);
y(res_offset) = ye(1);
end
res_offset = res_offset + 1;
end
end
end
end
If you only want to preserve result values that have been correctly computed, you can remove the rows containing NaNs at the bottom of your function. Indexing on one of the vectors will be enough to clear everything:
rows_ok = ~isnan(y);
m = m(rows_ok);
cVal = cVal(rows_ok);
x = x(rows_ok);
y = y(rows_ok);
In continuation of the other suggestions, I have 2 more suggestions for you:
You might want to try with a different solver, ODE45 is for non-stiff problems, but from the looks of it, it might seem like your problem could be stiff (parameters have a different order of magnitude). Try for instance with the ode23s method.
Secondly, without knowing which event you are looking for, maybe it is possible for you to use a logarithmic search rather than a linear one. e.g. the Bisection method. This will severely cut down on the number of times you have to solve the equation.

MATLAB Piecewise function

I have to construct the following function in MATLAB and am having trouble.
Consider the function s(t) defined for t in [0,4) by
{ sin(pi*t/2) , for t in [0,1)
s(t) = { -(t-2)^3 , for t in [1,3)*
{ sin(pi*t/2) , for t in [3,4)
(i) Generate a column vector s consisting of 512 uniform
samples of this function over the interval [0,4). (This
is best done by concatenating three vectors.)
I know it has to be something of the form.
N = 512;
s = sin(5*t/N).' ;
But I need s to be the piecewise function, can someone provide assistance with this?
If I understand correctly, you're trying to create 3 vectors which calculate the specific function outputs for all t, then take slices of each and concatenate them depending on the actual value of t. This is inefficient as you're initialising 3 times as many vectors as you actually want (memory), and also making 3 times as many calculations (CPU), most of which will just be thrown away. To top it off, it'll be a bit tricky to use concatenate if your t is ever not as you expect (i.e. monotonically increasing). It might be an unlikely situation, but better to be general.
Here are two alternatives, the first is imho the nice Matlab way, the second is the more conventional way (you might be more used to that if you're coming from C++ or something, I was for a long time).
function example()
t = linspace(0,4,513); % generate your time-trajectory
t = t(1:end-1); % exclude final value which is 4
tic
traj1 = myFunc(t);
toc
tic
traj2 = classicStyle(t);
toc
end
function trajectory = myFunc(t)
trajectory = zeros(size(t)); % since you know the size of your output, generate it at the beginning. More efficient than dynamically growing this.
% you could put an assert for t>0 and t<3, otherwise you could end up with 0s wherever t is outside your expected range
% find the indices for each piecewise segment you care about
idx1 = find(t<1);
idx2 = find(t>=1 & t<3);
idx3 = find(t>=3 & t<4);
% now calculate each entry apprioriately
trajectory(idx1) = sin(pi.*t(idx1)./2);
trajectory(idx2) = -(t(idx2)-2).^3;
trajectory(idx3) = sin(pi.*t(idx3)./2);
end
function trajectory = classicStyle(t)
trajectory = zeros(size(t));
% conventional way: loop over each t, and differentiate with if-else
% works, but a lot more code and ugly
for i=1:numel(t)
if t(i)<1
trajectory(i) = sin(pi*t(i)/2);
elseif t(i)>=1 & t(i)<3
trajectory(i) = -(t(i)-2)^3;
elseif t(i)>=3 & t(i)<4
trajectory(i) = sin(pi*t(i)/2);
else
error('t is beyond bounds!')
end
end
end
Note that when I tried it, the 'conventional way' is sometimes faster for the sampling size you're working on, although the first way (myFunc) is definitely faster as you scale up really a lot. In anycase I recommend the first approach, as it is much easier to read.

Euler's approximation in MATLAB

I have a project in MATLAB where I am to approximate the solution of a diff equation. To do that, I use ode45 to get the "real solution" and compare it to Euler's approximation that I perform 3 times with halving the step every time. Here are my issues:
ode45 doesn't seem to work on my computer. I get this message:
No help found for ode45.m.
When I type help ode45 in the commando window and this
Error using ode45
Too many input arguments.
Error in Lab2 (line 51)
[f, u] = ode45(#myode, [pi/6 pi/2], 1);
So I switched to ode23 and got a what I thought to be a quite good result. Problem is though that I noticed that the error in the last approximation becomes slightly smaller which shouldn't happens since the error gets only larger by every step... right?
To makes things worse, I tried to run my code with ode23 on the school computer and got different results (different solution curve). I tried then ode45 and got the same results. When I look at the curve and its values it is totally wrong since in my diff equation values should be decreasing instead of increasing like they do when I run them on the school computer.
I don't understand how the same code can produce two different results on different computers.
I don't understand either why ode45 is missing from my computer. I have tried re-installing the new version and it is still the same.
I am totally confused...
Here is my code:
This is in a function file named myode
function dudf = myode(f,u)
k=1/20;
dudf = (-k*u.^3)/sin(f).^3;
end
This is the program
%Euler's method
f_init = pi/6;
f_final = pi/2;
u_init = 1;
k = 1/20;
n = 10; %number of steps
h(1) = (f_final-f_init)/n;
fh(1) = f_init;
uh(1) = u_init;
%calculate euler's approximation after every step
for i = 2:n+1
fh(i) = fh(i-1)+h;
uh(i) = uh(i-1)+h*(-k*uh(i-1)^3)/(sin(fh(i-1))^3);
end
%save vaules of ode45 at every step for first appr.
step0=pi/30;
fiend = pi/6 + step0;
odeu1 = [1];
step1 = [pi/6];
for i = 1:length(uh)-1
[f, u] = ode23(#myode, [pi/6 fiend], 1);
odeu1 = [odeu1 u(end)];
step1 = [step1 fislut];
fiend = fiend + step0;
end
Any help is appreciated!

Matlab: Optimizing speed of function call and cosine sign-finding in a loop

The code in question is here:
function k = whileloop(odefun,args)
...
while (sign(costheta) == originalsign)
y=y(:) + odefun(0,y(:),vars,param)*(dt); % Line 4
costheta = dot(y-normpt,normvec);
k = k + 1;
end
...
end
and to clarify, odefun is F1.m, an m-file of mine. I pass it into the function that contains this while-loop. It's something like whileloop(#F1,args). Line 4 in the code-block above is the Euler method.
The reason I'm using a while-loop is because I want to trigger upon the vector "y" crossing a plane defined by a point, "normpt", and the vector normal to the plane, "normvec".
Is there an easy change to this code that will speed it up dramatically? Should I attempt learning how to make mex files instead (for a speed increase)?
Edit:
Here is a rushed attempt at an example of what one could try to test with. I have not debugged this. It is to give you an idea:
%Save the following 3 lines in an m-file named "F1.m"
function ydot = F1(placeholder1,y,placeholder2,placeholder3)
ydot = y/10;
end
%Run the following:
dt = 1.5e-12 %I do not know about this. You will have to experiment.
y0 = [.1,.1,.1];
normpt = [3,3,3];
normvec = [1,1,1];
originalsign = sign(dot(y0-normpt,normvec));
costheta = originalsign;
y = y0;
k = 0;
while (sign(costheta) == originalsign)
y=y(:) + F1(0,y(:),0,0)*(dt); % Line 4
costheta = dot(y-normpt,normvec);
k = k + 1;
end
disp(k);
dt should be sufficiently small that it takes hundreds of thousands of iterations to trigger.
Assume I must use the Euler method. I have a stochastic differential equation with state-dependent noise if you are curious as to why I tell you to take such an assumption.
I would focus on your actual ODE integration. The fewer steps you have to take, the faster the loop will run. I would only worry about the speed of the sign check after you've optimized the actual integration method.
It looks like you're using the first-order explicit Euler method. Have you tried a higher-order integrator or an implicit method? Often you can increase the time step significantly.