How to do nonlinear data-fitting a function on the experiment data - matlab

I have some experiment data. Hereby, I need to fit the following function to determine one of the variable. A Levenberg–Marquardt least-squares algorithm was used in this procedure.
I have used curve fitting option in Igor Pro software. I defined new fit function and tried to define independent and dependent variable.
Nevertheless, I don't know what is the reason that I got the this error:
"The fitting function returned INF for at least one X variable"
My function is :
sin(theta) = -1+2*sqrt(alpha/x)*exp(-beta*(x-alpha)^2)
beta = 1.135e-4;
sin(theta) = [-0.81704 -0.67649 -0.83137 -0.73468 -0.66744 -0.43602 0.45368 0.75802 0.96705 0.99717 ]
x = [72.01 59.99 51.13 45.53 36.15 31.66 30.16 29.01 25.62 23.47 ]
Is there any suggestion to find alpha variable here?
Is there any handy software or program for nonlinear curve fitting?

In gnuplot, it would look like this. The fit is not great, but that's not the "fault" of gnuplot, but apparently this data cannot be fitted with this function very well.
Code:
### nonlinear curve fitting
reset session
$Data <<EOD
72.01 -0.81704
59.99 -0.67649
51.13 -0.83137
45.53 -0.73468
36.15 -0.66744
31.66 -0.43602
30.16 0.45368
29.01 0.75802
25.62 0.96705
23.47 0.99717
EOD
f(x) = -1+2*sqrt(alpha/x)*exp(-beta*(x-alpha)**2)
# initial guessed values
alpha = 25
beta = 1
set fit nolog results
fit f(x) $Data u 1:2 via alpha,beta
plot $Data u 1:2 w lp pt 7, \
f(x) lc rgb "red"
print sprintf("alpha=%g, beta=%g",alpha,beta)
### end of code
Result:
alpha=25.818, beta=0.0195229

If it might be of some use, my equation search on your data turned up a good fit to a standard 4-parameter logistic equation "y = d + (a - d) / (1.0 + pow(x / c, b))" with parameters a = 0.96207949, b = 44.14292256, c = 30.67324939, and d = -0.74830947 yielding RMSE = 0.0565 and R-squared = 0.9943, and I have included code for a Python graphical fitter using this equation.
import numpy, scipy, matplotlib
import matplotlib.pyplot as plt
from scipy.optimize import curve_fit
theta = [-0.81704, -0.67649, -0.83137, -0.73468, -0.66744, -0.43602, 0.45368, 0.75802, 0.96705, 0.99717]
x = [72.01, 59.99, 51.13, 45.53, 36.15, 31.66, 30.16, 29.01, 25.62, 23.47]
# rename to match previous example code
xData = numpy.array(x)
yData = numpy.array(theta)
# StandardLogistic4Parameter equation from zunzun.com
def func(x, a, b, c, d):
return d + (a - d) / (1.0 + numpy.power(x / c, b))
# these are the same as the scipy defaults
initialParameters = numpy.array([1.0, 1.0, 1.0, 1.0])
# curve fit the test data
fittedParameters, pcov = curve_fit(func, xData, yData, initialParameters)
modelPredictions = func(xData, *fittedParameters)
absError = modelPredictions - yData
SE = numpy.square(absError) # squared errors
MSE = numpy.mean(SE) # mean squared errors
RMSE = numpy.sqrt(MSE) # Root Mean Squared Error, RMSE
Rsquared = 1.0 - (numpy.var(absError) / numpy.var(yData))
print('Parameters:', fittedParameters)
print('RMSE:', RMSE)
print('R-squared:', Rsquared)
print()
##########################################################
# graphics output section
def ModelAndScatterPlot(graphWidth, graphHeight):
f = plt.figure(figsize=(graphWidth/100.0, graphHeight/100.0), dpi=100)
axes = f.add_subplot(111)
# first the raw data as a scatter plot
axes.plot(xData, yData, 'D')
# create data for the fitted equation plot
xModel = numpy.linspace(min(xData), max(xData))
yModel = func(xModel, *fittedParameters)
# now the model as a line plot
axes.plot(xModel, yModel)
axes.set_xlabel('X Data') # X axis data label
axes.set_ylabel('Y Data') # Y axis data label
plt.show()
plt.close('all') # clean up after using pyplot
graphWidth = 800
graphHeight = 600
ModelAndScatterPlot(graphWidth, graphHeight)

Matlab
I slightly changed the function, -1 changed to -gamma and optimize to find gamma
The code is as follow
ydata = [-0.81704 -0.67649 -0.83137 -0.73468 -0.66744 -0.43602 0.45368...
0.75802 0.96705 0.99717 ];
xdata = [72.01 59.99 51.13 45.53 36.15 31.66 30.16 29.01 25.62 23.47 ];
sin_theta = #(alpha, beta, gamma, xdata) -gamma+2.*sqrt(alpha./xdata).*exp(beta.*(xdata-alpha).^2);
%Fitting function as function of array(x) required by lsqcurvefit
f = #(x,xdata) sin_theta(x(1),x(2), x(3),xdata);
% [alpha, beta, gamma]
x0 = [25, 0, 1] ;
options = optimoptions('lsqcurvefit','Algorithm','levenberg-marquardt', 'FunctionTolerance', 1e-30);
[x,resnorm,residual,exitflag,output] = lsqcurvefit(f,x0,xdata,ydata,[], [], options);
% Accuracy
RMSE = sqrt(sum(residual.^2)/length(residual));
alpha = x(1); beta = x(2); gamma = x(3);
%Plotting data
data = linspace(xdata(1),xdata(end));
plot(xdata,ydata,'ro',data,f(x,data),'b-', 'linewidth', 3)
legend('Data','Fitted exponential')
title('Data and Fitted Curve')
set(gca,'FontSize',20)
Result
alpha = 26.0582, beta = -0.0329, gamma = 0.7881 instead of 1, RMSE = 0.1498
Graph

Related

Modelling membrane evolution over time

I am trying to model the time evolution of a membrane based on the following code in MATLAB.
The basic outline is that the evolution is based on a differential equation
where j=0,1 and x^0 = x, x^1 = y and x^j(s_i) = x^j_i.
My code is the following.
import numpy as np
from matplotlib import pyplot as plt
R0 = 5 #radius
N = 360 #number of intervals
x0 = 2*np.pi*R0/(N/2) #resting membrane lengths
phi = np.linspace(0,2*np.pi, num=360, dtype=float)
R1 = R0 + 0.5*np.sin(20*phi)
X = R1*np.cos(phi)
Y = R1*np.sin(phi)
L = np.linspace(-1,358, num=360, dtype=int)
R = np.linspace(1,360, num=360,dtype=int) #right and left indexing vectors
R[359] = 0
X = R1*np.cos(phi)
Y = R1*np.sin(phi)
plt.plot(X,Y)
plt.axis("equal")
plt.show()
ds = 1/N
ds2 = ds**2
k = 1/10
w = 10**6
for i in range(0,20000):
lengths = np.sqrt( (X[R]-X)**2 + (Y[R]-Y)**2 )
Ex = k/ds2*(X[R] - 2*X + X[L] - x0*( (X[R]-X)/lengths - (X-X[L])/lengths[L]) )
Ey = k/ds2*(Y[R] - 2*Y + Y[L] - x0*( (Y[R]-Y)/lengths - (Y-Y[L])/lengths[L]) )
X = X + 1/w*Ex
Y = Y + 1/w*Ey
plt.plot(X,Y)
plt.axis("equal")
plt.show()
The model is supposed to devolve into a circular membrane, as below
but this is what mine does
Your definition of x0 is wrong.
In the Matlab code, it is equal to
x0 = 2*pi*R/N/2 # which is pi*R/N
while in your Python code it is
x0 = 2*np.pi*R0/(N/2) # which is 4*np.pi*R0/N
Correcting that, the end result is a circular shape, but with a different radius. I'm assuming that this is because of the reduced number of iterations (20000 instead of 1000000).
Edit:
As expected, using the correct number of iterations results in a plot similar to your expected one.

nlinfit appears better than fitgmdist for fitting normal mixture

I have a vector of data consisting of about 2 million samples that I suspect is a mixture of two gaussian's. I try to fit the data, Data, to a mixture using matlab's fitgmdist.
From histogram:
% histogram counts of X with 1000 bins.
[Yhist, x] = histcounts(Data, 1000, 'normalization', 'pdf');
x = (x(1:end-1) + x(2:end))/2;
Using fitgmdist:
% Increase no. of iterations. default is 100.
opts.MaxIter = 300;
% Ensure that it does all iterations.
opts.TolFun = 0;
GMModel = fitgmdist(Data, 2, 'Options', opts, 'Start', 'plus');
wts = GMModel.ComponentProportion;
mu = GMModel.mu;
sig = sqrt(squeeze(GMModel.Sigma));
Ygmfit = wts(1)*normpdf(x(:), mu(1), sig(1)) + wts(2)*normpdf(x(:), mu(2), sig(2));
Mixture results with fitgmdist:
wts = [0.6780, 0.322], mu = [-7.6444, -9.7831], sig = [0.8243, 0.5947]
Next I try using nlinfit:
% Define the callback function for nlinfit.
function y = nmmix(a, x)
a(1:2) = a(1:2)/sum(a);
y = a(1)*normpdf(x(:), a(3), a(5)) + a(2)*normpdf(x(:), a(4), a(6));
end
init_wts = [0.66, 1-0.66];
init_mu = [-7.7, -9.75];
init_sig = [0.5, 0.5];
a = nlinfit(x(:), Yhist(:), #nmmix, [init_wts, init_mu, init_sig]);
wts = a(1:2)/sum(a(1:2));
mu = a(3:4);
sig = a(5:6);
Ynlinfit = wts(1)*normpdf(x(:), mu(1), sig(1)) + wts(2)*normpdf(x(:), mu(2), sig(2));
Mixture results with nlinfit:
wts = [0.6349, 0.3651], mu = [-7.6305, -9.6991], sig = [0.6773, 0.6031]
% Plot to compare the results
figure;
hold on
plot(x(:), Yhist, 'b');
plot(x(:), Ygmfit, 'k');
plot(x(:), Ynlinfit, 'r');
It seems to be that the non-linear fit (red curve) is intuitively a better approximation to the histogram (blue curve) than "fitgmdist" (black curve).
The results are similar even if I use a finer histogram, say with 100,000 bins.
What can be the source of this discrepancy?
Added later: Of course one would not expect the results to be the same, but I would expect that the visual quality of the two fits would be comparable.

Matlab function for lorentzian fit with global variables

I want to fit a Lorentzian to my data, so first I want to test my fitting procedure to simulated data:
X = linspace(0,100,200);
Y = 20./((X-30).^2+20)+0.08*randn(size(X));
starting parameters
a3 = ((max(X)-min(X))/10)^2;
a2 = (max(X)+min(X))/2;
a1 = max(Y)*a3;
a0 = [a1,a2,a3];
find minimum for fit
afinal = fminsearch(#devsum,a0);
afinal is vector with parameters for my fit. If I test my function as follows
d= devsum(a0)
then d= 0, but if I do exactly what's in my function
a=a0;
d = sum((Y - a(1)./((X-a(2)).^2+a(3))).^2)
then d is not equal to zero. How is this possible? My function is super simple so I don't know what's going wrong.
my function:
%devsum.m
function d = devsum(a)
global X Y
d = sum((Y - a(1)./((X-a(2)).^2+a(3))).^2);
end
Basically I'm just implementing stuff I found here
http://www.home.uni-osnabrueck.de/kbetzler/notes/fitp.pdf
page 7
It is usually better to avoid using global variables. The way I usually solve these problems is to first define a function which evaluates the curve you want to fit as a function of x and the parameters:
% lorentz.m
function y = lorentz(param, x)
y = param(1) ./ ((x-param(2)).^2 + param(3))
In this way, you can reuse the function later for plotting the result of the fit.
Then, you define a small anonymous function with the property you want to minimize, with only a single parameter as input, since that is the format that fminsearch needs. Instead of using global variables, the measured X and Y are 'captured' (technical term is doing a closure over these variables) in the definition of the anonymous function:
fit_error = #(param) sum((y_meas - lorentz(param, x_meas)).^2)
And finally you fit your parameters by minimizing the error with fminsearch:
fitted_param = fminsearch(fit_error, starting_param);
Quick demonstration:
% simulate some data
X = linspace(0,100,200);
Y = 20./((X-30).^2+20)+0.08*randn(size(X));
% rough guess of initial parameters
a3 = ((max(X)-min(X))/10)^2;
a2 = (max(X)+min(X))/2;
a1 = max(Y)*a3;
a0 = [a1,a2,a3];
% define lorentz inline, instead of in a separate file
lorentz = #(param, x) param(1) ./ ((x-param(2)).^2 + param(3));
% define objective function, this captures X and Y
fit_error = #(param) sum((Y - lorentz(param, X)).^2);
% do the fit
a_fit = fminsearch(fit_error, a0);
% quick plot
x_grid = linspace(min(X), max(X), 1000); % fine grid for interpolation
plot(X, Y, '.', x_grid, lorentz(a_fit, x_grid), 'r')
legend('Measurement', 'Fit')
title(sprintf('a1_fit = %g, a2_fit = %g, a3_fit = %g', ...
a_fit(1), a_fit(2), a_fit(3)), 'interpreter', 'none')
Result:

MATLAB - Exponential Curve Fitting without Toolbox

I have data points (x, y) that I need to fit an exponential function to,
y = A + B * exp(C * x),
but I can neither use the Curve Fitting Toolbox nor the Optimization Toolbox.
User rayryeng was good enough to help me with working code:
x = [0 0.0036 0.0071 0.0107 0.0143 0.0178 0.0214 0.0250 0.0285 0.0321 0.0357 0.0392 0.0428 0.0464 0.0464];
y = [1.3985 1.3310 1.2741 1.2175 1.1694 1.1213 1.0804 1.0395 1.0043 0.9691 0.9385 0.9080 0.8809 0.7856 0.7856];
M = [ones(numel(x),1), x(:)]; %// Ensure x is a column vector
lny = log(y(:)); %// Ensure y is a column vector and take ln
X = M\lny; %// Solve for parameters
A = exp(X(1)); %// Solve for A
b = X(2); %// Get b
xval = linspace(min(x), max(x));
yval = A*exp(b*xval);
plot(x,y,'r.',xval,yval,'b');
However, this code only fits the equation without offset
y = A * exp(B * x).
How can I extend this code to fit the three-parameter equation?
In another attempt, I managed to fit the function using fminsearch:
function [xval, yval] = curve_fitting_exponential_1_optimized(x,y,xval)
start_point = rand(1, 3);
model = #expfun;
est = fminsearch(model, start_point);
function [sse, FittedCurve] = expfun(params)
A = params(1);
B = params(2);
C = params(3);
FittedCurve = A + B .* exp(-C * x);
ErrorVector = FittedCurve - y;
sse = sum(ErrorVector .^ 2);
end
yval = est(1)+est(2) * exp(-est(3) * xval);
end
The problem here is that the result depends on the starting point which is randomly chosen, so I don't get a stable solution. But since I need the function for automatization, I need something stable. How can I get a stable solution?
How to adapt rayryeng's code for three parameters?
rayryeng used the strategy to linearize a nonlinear equation so that standard regression methods can be applied. See also Jubobs' answer to a similar question.
This strategy does no longer work if there is a non-zero offset A. We can fix the situation by getting a rough estimate of the offset. As rubenvb mentioned in the comments, we could estimate A by min(y), but then the logarithm gets applied to a zero. Instead, we could leave a bit of space between our guess of A and the minimum of the data, say half its range. Then we subtract A from the data and use rayreng's method:
x = x(:); % bring the data into the standard, more
y = y(:); % convenient format of column vectors
Aguess = min(y) - (max(y) - min(y) / 2;
guess = [ones(size(x)), -x] \ log(y - Aguess);
Bguess = exp(guess(1));
Cguess = guess(2);
For the given data, this results in
Aguess = 0.4792
Bguess = 0.9440
Cguess = 21.7609
Other than for the two-parameter situation, we cannot expect this to be a good fit. Its SSE is 0.007331.
How to get a stable solution?
This guess is however useful as a starting point for the nonlinear optimization:
start_point = [Aguess, Bguess, Cguess];
est = fminsearch(#expfun, start_point);
Aest = est(1);
Best = est(2);
Cest = est(3);
Now the optimization arrives at a stable estimate, because the computation is deterministic:
Aest = -0.1266
Best = 1.5106
Cest = 10.2314
The SSE of this estimate is 0.004041.
This is what the data (blue dots) and fitted curves (green: guess, red: optimized) look like:
Here is the whole function in all its glory - special thanks to A. Donda!
function [xval, yval] = curve_fitting_exponential_1_optimized(x,y,xval)
x = x(:); % bring the data into the standard, more convenient format of column vectors
y = y(:);
Aguess = min(y) - (max(y)-min(y)) / 2;
guess = [ones(size(x)), -x] \ log(y - Aguess);
Bguess = exp(guess(1));
Cguess = guess(2);
start_point = [Aguess, Bguess, Cguess];
est = fminsearch(#expfun, start_point);
function [sse, FittedCurve] = expfun(params)
A = params(1);
B = params(2);
C = params(3);
FittedCurve = A + B .* exp(-C * x);
ErrorVector = FittedCurve - y;
sse = sum(ErrorVector .^ 2);
end
yval = est(1)+est(2) * exp(-est(3) * xval);
end

MATLAB: 2-D plot with z-axis given in color

My friends and I have been struggling to generate a 2-D plot in MATLAB with
$\eta_1$ and $\eta_2$ both varying in $0:0.01:1$ and the z-axis given by color.
We have a system of 8 differential equations, with HIVinf representing the total new HIV infections in a population over 1 year (HIVinf is obtained by integrating a function of $\eta_1, \eta_2$).
We are looping through $\eta_1$ and $\eta_2$ (two 'for' loops) with the ode45 solver within the 'for' loops.
Based on our prior numerical results, we should be getting much color variation in the 2D-plot. There should be patterns of darkness (high concentration of HIVinfections) along the edges of the plot, and lightness along the diagonals (low concentrations).
However, the following snippet does not produce what we want (I have attached the figure below).
[X,Y] = meshgrid(eta_11,eta_22);
figure;
pcolor(X,Y,AA);
shading interp;
I have attached the code below, as concisely as possible. The function ydot works fine (it is required to run ode45).
We would greatly appreciate if you could help us fix the snippet.
function All()
global Lambda mu mu_A mu_T beta tau eta_1 eta_2 lambda_T rho_1 rho_2 gamma
alpha = 20;
TIME = 365;
eta_11 = zeros(1,alpha);
eta_22 = zeros(1,alpha);
AA = zeros(1,alpha);
BB = zeros(1,alpha);
CC = zeros(1,alpha);
for n = 1:1:alpha
for m = 1:1:alpha
Lambda = 531062;
mu = 1/25550;
mu_A = 1/1460;
mu_T = 1/1825;
beta = 187/365000;
tau = 4/365;
lambda_T = 1/10;
rho_1 = 1/180;
rho_2 = 1/90;
gamma = 1/1000;
eta_1 = (n-1)./(alpha-1);
eta_11(m) = (m-1)./(alpha-1);
eta_2 = (m-1)./(alpha-1);
eta_22(m) = (m-1)./(alpha-1);
y0 = [191564208, 131533276, 2405629, 1805024, 1000000, 1000000, 500000, 500000];
[t,y] = ode45('SimplifiedEqns',[0:1:TIME],y0);
N = y(:,1)+y(:,2)+y(:,3)+y(:,4)+y(:,5)+y(;,6)+y(:,7)+y(:,8);
HIVinf1=[0:1:TIME];
HIVinf2=[beta.*(S+T).*(C1+C2)./N];
HIVinf=trapz(HIVinf1,HIVinf2);
AA(n,m) = HIVinf;
end
end
[X,Y] = meshgrid(eta_11,eta_22);
figure;
pcolor(X,Y,AA);
shading interp;
function ydot = SimplifiedEqns(t,y)
global Lambda mu mu_A mu_T beta tau eta_1 eta_2 lambda_T rho_1 rho_2 gamma
S = y(1);
T = y(2);
H = y(3);
C = y(4);
C1 = y(5);
C2 = y(6);
CM1 = y(7);
CM2 = y(8);
N = S + T + H + C + C1 + C2 + CM1 + CM2;
ydot = zeros(8,1);
ydot(1)=Lambda-mu.*S-beta.*(H+C+C1+C2).*(S./N)-tau.*(T+C).*(S./N);
ydot(2)=tau.*(T+C).*(S./N)-beta.*(H+C+C1+C2).*(T./N)-(mu+mu_T).*T;
ydot(3)=beta.*(H+C+C1+C2).*(S./N)-tau.*(T+C).*(H./N)-(mu+mu_A).*H;
ydot(4)=beta.*(H+C+C1+C2).*(T./N)+tau.*(T+C).*(H./N)- (mu+mu_A+mu_T+lambda_T).*C;
ydot(5)=lambda_T.*C-(mu+mu_A+rho_1+eta_1).*C1;
ydot(6)=rho_1.*C1-(mu+mu_A+rho_2+eta_2).*C2;
ydot(7)=eta_1.*C1-(mu+rho_1+gamma).*CM1;
ydot(8)=eta_2.*C2-(mu+rho_2+gamma.*(rho_1)./(rho_1+rho_2)).*CM2+(rho_1).*CM1;
end
end
Ok, I don't really know much about how the plot should look like, but your eta_11 and eta_22 are variables which are indexed only on the inner loop. That means that when n=1, m=1,2,3,...,alpha your eta_11/eta_22 will be a vector whose elements 1,2,3,...,alpha will be overwritten for every n. Since your meshgrid is outside of the loop, that could be a problem. Usually if you are plotting functions of two variables and you have said variables in 2 nested loops you just ignore the meshgrid. Like this
Option 1:
x=[0:0.01:1];
[x1,x2]=meshgrid(x,x);
y=x1+cos(x2);
contour(x,x,y,30);
Option 2
x=[0:0.01:1];
for i=1:101 %length(x)
for j=1:101
y(i,j)=x1(i)+cos(x2(j)); % It is important to index y with both
% loop variables
end
end
contour(x,x,y,30)