How to extrapolate matrix valued functions on Matlab? - matlab

I have a matrix valued function which I'm trying to find its limit as x goes to 1.
So, in this example, I have three matrices v1-3, representing respectively the sampled values at [0.85, 0.9, 0.99]. What I do now, which is quite inefficient, is the following:
for i=1:101
for j = 1:160
v_splined = spline([0.85,0.9,0.99], [v1(i,j), v2(i,j), v3(i,j)], [1]);
end
end
There must be a better more efficient way to do this. Especially when soon enough I'll face the situation where v's will be 4-5 dimensional vectors.
Thanks!

Disclaimer: Naively extrapolating is risky business, do so at your own risk
Here's what I would say
Using a spline to extrapolate is risky business and not generally recommended. Do you know anything about the behavior of your function near x=1?
In the case where you only have 3 points you're probably better off using a 2nd order polynomial (a parabola) rather than fitting a spline through the three points. (unless you have a good reason not to do this.)
If you want to use a parabola (or higher order interpolating polynomial when you have more points), you can vectorize your code and use Lagrange or Newton polynomials to perform the extrapolation which will probably give you a nice speed up.
Using interpolating polynomials will also generalize easily to higher order polynomials with more points given. However, this will make extrapolation even more risky since high-order interpolating polynomials tend to oscillate severely near the ends of the domain.
If you want to use Lagrange polynomials to form a parabola, your result is given by:
v_splined = v1*(1-.9)*(1-.99)/( (.85-.9)*(.85-.99) ) ...
+v2*(1-.85)*(1-.99)/( (.9-.85)*(.9-.99) ) ...
+v3*(1-.85)*(1-.9)/( (.99-.85)*(.99-.9) );
I left this un-simplified so you can see how it comes from the Lagrange polynomials, but obviously simplifying is easy. Also note that this eliminates the need for loops.

Related

extrapolating a 2D matrix to predict a future output

I have a 2D 2401*266 matrix K which corresponds to x values (t: stored in a 1*266 array) and y values(z: stored in a 1*2401 array).
I want to extrapolate the matrix K to predict some future values (corresponding to t(1,267:279). So far I have extended t so that it is now a 1*279 matrix using a for loop:
for tq = 267:279
t(1,tq) = t(1,tq-1)+0.0333333333;
end
However I am stumped on how to extrapolate K without fitting a polynomial to each individual row?
I feel like there must be a more efficient way than this??
There are countless of extrapolation methods in the literature, "fitting a polynomial to each row" would be just one of them, not necessarily invalid, not sure why you mention that you do no wan't to do it. For 2D data perhaps fitting a surface would lead to better results though.
However, if you want an easy, simple way (that might or might not work with your problem), you can always use the function interp2, for interpolation. If you chose spline or makima as interpolation functions, it will also extrapolate for any query point outside the domain of K.

Mixture of 1D Gaussians fit to data in Matlab / Python

I have a discrete curve y=f(x). I know the locations and amplitudes of peaks. I want to approximate the curve by fitting a gaussian at each peak. How should I go about finding the optimized gaussian parameters ? I would like to know if there is any inbuilt function which will make my task simpler.
Edit
I have fixed mean of gaussians and tried to optimize on sigma using
lsqcurvefit() in matlab. MSE is less. However, I have an additional hard constraint that the value of approximate curve should be equal to the original function at the peaks. This constraint is not satisfied by my model. I am pasting current working code here. I would like to have a solution which obeys the hard constraint at peaks and approximately fits the curve at other points. The basic idea is that the approximate curve has fewer parameters but still closely resembles the original curve.
fun = #(x,xdata)myFun(x,xdata,pks,locs); %pks,locs are the peak locations and amplitudes already available
x0=w(1:6)*0.25; % my initial guess based on domain knowledge
[sigma resnorm] = lsqcurvefit(fun,x0,xdata,ydata); %xdata and ydata are the original curve data points
recons = myFun(sigma,xdata,pks,locs);
figure;plot(ydata,'r');hold on;plot(recons);
function f=myFun(sigma,xdata,a,c)
% a is constant , c is mean of individual gaussians
f=zeros(size(xdata));
for i = 1:6 %use 6 gaussians to approximate function
f = f + a(i) * exp(-(xdata-c(i)).^2 ./ (2*sigma(i)^2));
end
end
If you know your peak locations and amplitudes, then all you have left to do is find the width of each Gaussian. You can think of this as an optimization problem.
Say you have x and y, which are samples from the curve you want to approximate.
First, define a function g() that will construct the approximation for given values of the widths. g() takes a parameter vector sigma containing the width of each Gaussian. The locations and amplitudes of the Gaussians will be constrained to the values you already know. g() outputs the value of the sum-of-gaussians approximation at each point in x.
Now, define a loss function L(), which takes sigma as input. L(sigma) returns a scalar that measures the error--how badly the given approximation (using sigma) differs from the curve you're trying to approximate. The squared error is a common loss function for curve fitting:
L(sigma) = sum((y - g(sigma)) .^ 2)
The task now is to search over possible values of sigma, and find the choice that minimizes the error. This can be done using a variety of optimization routines.
If you have the Mathworks optimization toolbox, you can use the function lsqnonlin() (in this case you won't have to define L() yourself). The curve fitting toolbox is probably an alternative. Otherwise, you can use an open source optimization routine (check out cvxopt).
A couple things to note. You need to impose the constraint that all values in sigma are greater than zero. You can tell the optimization algorithm about this constraint. Also, you'll need to specify an initial guess for the parameters (i.e. sigma). In this case, you could probably choose something reasonable by looking at the curve in the vicinity of each peak. It may be the case (when the loss function is nonconvex) that the final solution is different, depending on the initial guess (i.e. you converge to a local minimum). There are many fancy techniques for dealing with this kind of situation, but a simple thing to do is to just try with multiple different initial guesses, and pick the best result.
Edited to add:
In python, you can use optimization routines in the scipy.optimize module, e.g. curve_fit().
Edit 2 (response to edited question):
If your Gaussians have much overlap with each other, then taking their sum may cause the height of the peaks to differ from your known values. In this case, you could take a weighted sum, and treat the weights as another parameter to optimize.
If you want the peak heights to be exactly equal to some specified values, you can enforce this constraint in the optimization problem. lsqcurvefit() won't be able to do it because it only handles bound constraints on the parameters. Take a look at fmincon().
you can use Expectation–Maximization algorithm for fitting Mixture of Gaussians on your data. it don't care about data dimension.
in documentation of MATLAB you can lookup gmdistribution.fit or fitgmdist.

Numerical evaluation of orthogonal polynomials

I've written some Matlab procedures that evaluate orthogonal polynomials, and as a sanity check I was trying to ensure that their dot product would be zero.
But, while I'm fairly sure there's not much that can go wrong, I'm finding myself with a slightly curious behaviour. My test is quite simple:
x = -1:.01:1;
for i0=0:9
v1 = poly(x, i0);
for i1=0:i0
v2 = poly(x,i1);
fprintf('%d, %d: %g\n', i0, i1, v1*v2');
end
end
(Note the dot product v1*v2' needs to be this way round because x is a horizontal vector.)
Now, to cut to the end of the story, I end up with values close to 0 (order of magnitude about 1e-15) for pairs of degrees that add up to an odd number (i.e., i0+i1=2k+1). When i0==i1 I expect the dot product not to be 0, but this also happens when i0+i1=2k, which I didn't expect.
To give you some more details, I initially did this test with Chebyshev polynomials of first kind. Now, they are orthogonal with respect to the weight
1 ./ sqrt(1-x.^2)
which goes to infinity when x goes to 1. So I thought that leaving this term out could be the cause of non-zero dot products.
But then, I did the same test with Legendre polynomials, and I get exactly the same result: when the sum of the degrees is even, the dot product is definitely far from 0 (order of magnitude 1e2).
One last detail, I used the trigonometric formula cos(n*acos(x)) to evaluate the Chebyshev polynomials, and I tried the recursive formula as well as one of the formulas involving the binomial coefficient to evaluate the Legendre polynomials.
Can anyone explain this odd (pun intended) behaviour?
You're being misled by symmetry. Both Chebyshev and Legendre polynomials are eigenfunctions of the parity operator, which means that they can all be classified as either odd or even functions. I guess the same goes for your custom orthogonal polynomials.
Due to this symmetry, if you multiply a polynomial P_n(x) by P_m(x), then the result will be an odd function if n+m is odd, and it will be even otherwise. You're computing sum_k P_n(x_k)*P_m(x_k) for a symmetric set of x_k values around the origin. This implies that for odd n+m you will always get zero. Try computing sum_k P_n(x_k)*Q_m(x_k) with P a Legendre, and Q a Chebyshev polynomial. My point is that for n+m=odd, the result doesn't tell you anything about orthogonality or the accuracy of your integration.
The problem is that probably you're not integrating accurately enough. These orthogonal polynomials defined on [-1,1] vary quite rapidly on their domain, especially close to the boundaries (x==+-1). Try increasing the points of your integration, using a non-equidistant mesh, or a proper integration using integral.
Final note: I'd advise you against calling your functions poly, since that's a MATLAB built-in. (And so is legendre.)

Goodness of fit with MATLAB and chi-square test

I would like to measure the goodness-of-fit to an exponential decay curve. I am using the lsqcurvefit MATLAB function. I have been suggested by someone to do a chi-square test.
I would like to use the MATLAB function chi2gof but I am not sure how I would tell it that the data is being fitted to an exponential curve
The chi2gof function tests the null hypothesis that a set of data, say X, is a random sample drawn from some specified distribution (such as the exponential distribution).
From your description in the question, it sounds like you want to see how well your data X fits an exponential decay function. I really must emphasize, this is completely different to testing whether X is a random sample drawn from the exponential distribution. If you use chi2gof for your stated purpose, you'll get meaningless results.
The usual approach for testing the goodness of fit for some data X to some function f is least squares, or some variant on least squares. Further, a least squares approach can be used to generate test statistics that test goodness-of-fit, many of which are distributed according to the chi-square distribution. I believe this is probably what your friend was referring to.
EDIT: I have a few spare minutes so here's something to get you started. DISCLAIMER: I've never worked specifically on this problem, so what follows may not be correct. I'm going to assume you have a set of data x_n, n = 1, ..., N, and the corresponding timestamps for the data, t_n, n = 1, ..., N. Now, the exponential decay function is y_n = y_0 * e^{-b * t_n}. Note that by taking the natural logarithm of both sides we get: ln(y_n) = ln(y_0) - b * t_n. Okay, so this suggests using OLS to estimate the linear model ln(x_n) = ln(x_0) - b * t_n + e_n. Nice! Because now we can test goodness-of-fit using the standard R^2 measure, which matlab will return in the stats structure if you use the regress function to perform OLS. Hope this helps. Again I emphasize, I came up with this off the top of my head in a couple of minutes, so there may be good reasons why what I've suggested is a bad idea. Also, if you know the initial value of the process (ie x_0), then you may want to look into constrained least squares where you bind the parameter ln(x_0) to its known value.

Polynomial data fitting in a special way in MATLAB

I have some data let's say the following vector:
[1.2 2.13 3.45 4.59 4.79]
And I want to get a polynomial function, say f to fit this data. Thus, I want to go with something like polyfit. However, what polyfit does is minimizing the sum of least square errors. But, what I want is to have
f(1)=1.2
f(2)=2.13
f(3)=3.45
f(4)=4.59
f(5)=4.79
That is to say, I want to manipulate the fitting algorithm so that it will give me the exact points that I already gave as well as some fitted values where exact values are not given.
How can I do that?
I think everyone is missing the point. You said that "That is to say, I want to manipulate the fitting algorithm so that I will give me the exact points as well as some fitted values where exact fits are not present. How can I do that?"
To me, this means you wish an exact (interpolatory) fit for a listed set, and for some other points, you want to do a least squares fit.
You COULD do that using LSQLIN, by setting a set of equality constraints on the points to be fit exactly, and then allowing the rest of the points to be fit in a least squares sense.
The problem is, this will require a high order polynomial. To be able to fit 5 points exactly, plus some others, the order of the polynomial will be quite a bit higher. And high order polynomials, especially those with constrained points, will do nasty things. But feel free to do what you will, just as long as you also expect a poor result.
Edit: I should add that a better choice is to use a least squares spline, which is something you CAN constrain to pass through a given set of points, while fitting other points in a least squares sense, and still not do something wild and crazy as a result.
Polyfit does what you want. An N-1 degree polynomial can fit N points exactly, thus, when it minimizes the sum of squared error, it gets 0 (which is what you want).
y=[1.2 2.13 3.45 4.59 4.79];
x=[1:5];
coeffs = polyfit(x,y,4);
Will get you a polynomial that goes through all of your points.
What you ask is known as Lagrange Interpolation . There is a MATLAB file exchange available. http://www.mathworks.com/matlabcentral/fileexchange/899-lagrange-polynomial-interpolation
However, you should note that least squares polynomial fitting is generally preferred to Lagrange Interpolation since the data you have in principle will have noise in it and Lagrange Interpolation will fit the noise as well as the data you have. So if you know that your data actually represents M dimensional polynomial and have N data, where N>>M, then you will have a order N polynomial with Lagrange.
You have options.
Use polyfit, just give it enough leeway to perform an exact fit. That is:
values = [1.2 2.13 3.45 4.59 4.79];
p = polyfit(1:length(values), values, length(values)-1);
Now
polyval(p,2) %returns 2.13
Use interpolation / extrapolation
values = [1.2 2.13 3.45 4.59 4.79];
xInterp = 0:0.1:6;
valueInterp = interp1(1:length(values), values, xInterp ,'linear','extrap');
Interpolation provides a lot of options for smoothing, extrapolation etc. For example, try:
valueInterp = interp1(1:length(values), values, xInterp ,'spline','extrap');