I want to fit the non-linear function (variation is T) with experimental data.
In here, I used lsqcurvefit, but I don’t know the exact principle of this function, and even I don't know how to use the command func, also. (so the script I wrote is bad.)
For this, how can I modify the script below?
(even w/o using lsqcurve fit, I don’t care. Chi-square, least-square, any form is okay.)
Code is as follows (3 parts):
%script 1
x=coefficient1
y=coefficient2
z=coefficient3
A1=x*0.321*T
A2=(y/0.2)+0.5*T
A3=(z+0.3)*0.17/T
%script 2
global A1
global A2
global A3
Result=(A1+0.3)*A2+0.3
%script 3
global Result
Sample=readmatrix(‘experimentaldata’);
XX=sample(1,:)’;
YY=sample(2,:)’;
Xdata=linespace(min(XX),max(XX),2000);
Yadata=(interpl(XX,YY,xdata);
Fitting=lsqscurvefit(Result,T,xdata,ydata,250,2000)
I'm afraid if I had mistake while understanding your script (including bizarre global variables and coefficients...). But until now it seems to me that you want to fit your data into quadratic polynomial (considering T as single variable and x,y,z as constant, inserting A1 & A2 to your Result gives quadratic polynomial)
%Result = (x*0.321*T+0.3)*((y/0.2)+0.5*T)+0.3
Result = #(c,T)c(1)*T^2+c(2)*T^3+c(3);
Here, your final equation form will be as above, c stands for your constant, and T stands for your variable(or xdata).
After you clarify your equation form, you should set your initial state of constant c. In lsqcurve, as given in their explanation, they find their final answer by iterating until error(or euclidean distance) becomes smaller than given criteria. Therefore setting valid initial value will be critical here, if not answer will converge to local minimum (you can refer to John's perfect answer about this issue here, curve fitting error:nlinfit rank deficient).
c0=[0.1, 0.2, 0.3];
But in this case I'll just randomly give initial value as above.
Afterward, answer(or c) is ready to be calculated
Fitting=lsqscurvefit(Result,c0,xdata,ydata)
Other matlab's non-linear fitting function including nlinfit have similar workflow. I think you can simply understand more detailed information about them if you take more attention to their description.
But personally, I recommend you to use matlab's polyfit if your final equation form is in polynomial. You don't have to take care tuning your initial value then...
Related
I am taking an Econometrics course, and have been trying to use Python rather than the propreitry STATA and EVIEWS they set the assignments in.
In one of the questions, I have consumption data over time. I am asked to compute it in two ways.
The first way is calculating a model of the form consumption = Aexp(Bt), and the second way is to log both sides and do ordinary OLS on log(consumption) = alpha + Bt
I know how to do the second way. Howver, when I try to do the first way it goes wrong. Using statsmodels, I can exponentiate the time data (after normalising), but this calculates a regression in the form consumption = Aexp(t) + B, which is not what I want. (I want to specify where the parameters go). In sklearn I could find a polynomial regression, but not exponential.
Then I found scipy.curve_fit
However this seems to have two problems:
(1) It seems to rely on initial guesses for parameters, which means my output will end up being different from proprietry software (whereas output for things like OLS are the same) [as I assume initial guesses means some iterative solution is done which is helpful for very weird and wonderful functions, but I assume fairly standard results hold for exponential regression]
(2) every time I try to implement it, it just returns the guess parameters.
Here is my code
`consumption_data = pd.read_csv(......\consumption.csv")
def func(x,a,b):
return a * np.exp(b*x)
xdata = consumption_data.YEAR
ydata = consumption_data.CONSUMPTION
ydata = (ydata - 1948)/100
popt, pcov = curve_fit(func, xdata, ydata, (1,1))
print(popt)
plt.plot(xdata, func(xdata, *popt), 'g--',)
`
The scipy.optimize code is basically just copy-pasted from their tutorial
https://docs.scipy.org/doc/scipy/reference/generated/scipy.optimize.curve_fit.html
short answer: use statsmodels GLM
statsmodels does not have nonlinear least squares. The best python library for that is lmfit https://pypi.org/project/lmfit/
curve_fit, lmfit and nonlinear least squares algorithm in general find an iterative solution to the optimization problem. Even when we have to provide starting values, the solution is in many cases the same across packages up to convergence tolerance, e.g. 1e-5 or 1e-6.
Many standard models in statistics and econometrics have a single global maximum with well behaved data. However, in other cases like mixture models, there might be many local optima and the estimation might converge to one of them.
To the specific case:
consumption = A exp(B t)
can be rewritten as
consumption = exp(a + B t)
So this is just a single index model or a generalized linear model with an exponential mean function.
The general version has the expectation of the dependent variable as a nonlinear function of a linear combination of the explanatory variables:
E(y | x) = g(x b)
This can be estimated with statsmodels with GLM with family Gaussian and the log-link.
Aside: In econometrics, there is a literature to use Poisson quasi-likelihood as an estimator for exp models instead of taking the log of the dependent variable.
Poisson usually uses the log-link function as in the above.
However, using GLM allows us to use log-link, i.e. exponential mean function, with any of the supported distribution families. The main difference is in the underlying variance assumption. Gaussian assumes constant variance, Poisson assumes that the variance is proportional to the mean and Gamma assumes that the variance is quadratic in the mean.
If we use a robust sandwich covariance estimator for parameter inference, then standard errors and inference are correct even if the variance function is misspecified.
I want to implement densities of probability measures in Matlab. For that I define density as a function handle such that the integral of some function f (given as a function handle) on the interval [a,b] can be computed by
syms x
int(f(x)*density(x),x,a,b)
When it comes to the Dirac measure the problem is that
int(dirac(x),x,0,b)
delivers the value 1/2 instead of 1 for all b>0. However if I type
int(dirac(x),x,a,b)
where a<0 and b>0 the returned value is 1 as it should be. For this reason multiplying by 2 will not suffice as I want my density to be valid for all intervals [a,b]. I also dont want to distinct cases before integrating so that the code remains valid for a large class of densities.
Does someone know how I can implement the Dirac probability measure (as defined here) in Matlab?
There's no unique, accepted definition for int(delta, 0, b). The problem here isn't that you're getting the "wrong" answer as that you want to impose a different convention somehow on what the delta function is than what was provided by Matlab. (Their choice is defensible but not unique.) If you evaluate this in Wolfram Alpha, for example, it will give you theta(0) - which is not defined as anything in particular. Here theta is the Heaviside function. If you want to impose your own convention, implement your own delta function.
EDIT
I see you wrote a comment on the question, while I was writing this answer, so.... Keep in mind that the Dirac measure or Dirac delta function is not a function at all. The problem you're having, along with what's described below, all relate to trying to give a functional form to something that is inherently not a function. What you are doing is not well-defined in the framework that you have in Matlab.
END OF EDIT
To put the point about conventions in context, the delta function can be defined by different properties. One is that int(delta(x) f(x), a, b) = f(0) when a < 0 < b. This doesn't tell you anything about the integral that you want. Another, which probably leads to an answer as you're getting from Matlab, is to define it as a limit. One (but not the only choice) is the limit of the zero-mean Gaussian as the variance goes to 0.
If you want to use a convention int(delta(x) f(x), a, b) = f(0) when a <= 0 < b, that probably won't get you in much trouble, but just keep in mind that it's a convention you've chosen more than a "right" or
"wrong" answer relative to what you got from Matlab.
As related note, there is a similar choice to be made on the step function (Heaviside function) at x=0. There are conventions in which is is (a) undefined, (b) -1, (c) +1, and (d) 1/2. None is "wrong." This probably corresponds roughly to the choice on the Dirac function since the Heaviside is (roughly) the integral of the Dirac.
Lets say, I have a function 'x' and a function '2sin(x)'
How do I output the intersects, i.e. the roots in MATLAB? I can easily plot the two functions and find them that way but surely there must exist an absolute way of doing this.
If you have two analytical (by which I mean symbolic) functions, you can define their difference and use fzero to find a zero, i.e. the root:
f = #(x) x; %defines a function f(x)
g = #(x) 2*sin(x); %defines a function g(x)
%solve f==g
xroot = fzero(#(x)f(x)-g(x),0.5); %starts search from x==0.5
For tricky functions you might have to set a good starting point, and it will only find one solution even if there are multiple ones.
The constructs seen above #(x) something-with-x are called anonymous functions, and they can be extended to multivariate cases as well, like #(x,y) 3*x.*y+c assuming that c is a variable that has been assigned a value earlier.
When writing the comments, I thought that
syms x; solve(x==2*sin(x))
would return the expected result. At least in Matlab 2013b solve fails to find a analytic solution for this problem, falling back to a numeric solver only returning one solution, 0.
An alternative is
s = feval(symengine,'numeric::solve',2*sin(x)==x,x,'AllRealRoots')
which is taken from this answer to a similar question. Besides using AllRealRoots you could use a numeric solver, manually setting starting points which roughly match the values you have read from the graph. This wa you get precise results:
[fzero(#(x)f(x)-g(x),-2),fzero(#(x)f(x)-g(x),0),fzero(#(x)f(x)-g(x),2)]
For a higher precision you could switch from fzero to vpasolve, but fzero is probably sufficient and faster.
I have a curve IxV. I also have an equation that I want to fit in this IxV curve, so I can adjust its constants. It is given by:
I = I01(exp((V-R*I)/(n1*vth))-1)+I02(exp((V-R*I)/(n2*vth))-1)
vth and R are constants already known, so I only want to achieve I01, I02, n1, n2. The problem is: as you can see, I is dependent on itself. I was trying to use the curve fitting toolbox, but it doesn't seem to work on recursive equations.
Is there a way to make the curve fitting toolbox work on this? And if there isn't, what can I do?
Assuming that I01 and I02 are variables and not functions, then you should set the problem up like this:
a0 = [I01 I02 n1 n2];
MinFun = #(a) abs(a(1)*(exp(V-R*I)/(a(3)*vth))-1) + a(2)*(exp((V-R*I)/a(4)*vth))-1) - I);
aout = fminsearch(a0,MinFun);
By subtracting I and taking the absolute value, the point where both sides are equal will be the point where MinFun is zero (minimized).
No, the CFTB cannot fit such recursively defined functions. And errors in I, since the true value of I is unknown for any point, will create a kind of errors in variables problem. All you have are the "measured" values for I.
The problem of errors in I MAY be serious, since any errors in I, or lack of fit, noise, model problems, etc., will be used in the expression itself. Then you exponentiate these inaccurate values, potentially casing a mess.
You may be able to use an iterative approach. Thus something like
% 0. Initialize I_pred
I_pred = I;
% 1. Estimate the values of your coefficients, for this model:
% (The curve fitting toolbox CAN solve this problem, given I_pred)
I = I01(exp((V-R*I_pred)/(n1*vth))-1)+I02(exp((V-R*I_pred)/(n2*vth))-1)
% 2. Generate new predictions for I_pred
I_pred = I01(exp((V-R*I_pred)/(n1*vth))-1)+I02(exp((V-R*I_pred)/(n2*vth))-1)
% Repeat steps 1 and 2 until the parameters from the CFTB stabilize.
The above pseudo-code will work only if your starting values are good, and there are not large errors/noise in the model/data. Even on a good day, the above approach may not converge well. But I see little hope otherwise.
I would like to measure the goodness-of-fit to an exponential decay curve. I am using the lsqcurvefit MATLAB function. I have been suggested by someone to do a chi-square test.
I would like to use the MATLAB function chi2gof but I am not sure how I would tell it that the data is being fitted to an exponential curve
The chi2gof function tests the null hypothesis that a set of data, say X, is a random sample drawn from some specified distribution (such as the exponential distribution).
From your description in the question, it sounds like you want to see how well your data X fits an exponential decay function. I really must emphasize, this is completely different to testing whether X is a random sample drawn from the exponential distribution. If you use chi2gof for your stated purpose, you'll get meaningless results.
The usual approach for testing the goodness of fit for some data X to some function f is least squares, or some variant on least squares. Further, a least squares approach can be used to generate test statistics that test goodness-of-fit, many of which are distributed according to the chi-square distribution. I believe this is probably what your friend was referring to.
EDIT: I have a few spare minutes so here's something to get you started. DISCLAIMER: I've never worked specifically on this problem, so what follows may not be correct. I'm going to assume you have a set of data x_n, n = 1, ..., N, and the corresponding timestamps for the data, t_n, n = 1, ..., N. Now, the exponential decay function is y_n = y_0 * e^{-b * t_n}. Note that by taking the natural logarithm of both sides we get: ln(y_n) = ln(y_0) - b * t_n. Okay, so this suggests using OLS to estimate the linear model ln(x_n) = ln(x_0) - b * t_n + e_n. Nice! Because now we can test goodness-of-fit using the standard R^2 measure, which matlab will return in the stats structure if you use the regress function to perform OLS. Hope this helps. Again I emphasize, I came up with this off the top of my head in a couple of minutes, so there may be good reasons why what I've suggested is a bad idea. Also, if you know the initial value of the process (ie x_0), then you may want to look into constrained least squares where you bind the parameter ln(x_0) to its known value.