Can any function be decomposed as sum of Gaussians? - neural-network

In Fourier series, any function can be decomposed as sum of sine and
cosine
In neural networks, any function can be decomposed as weighted sum over logistic functions. (A one layer neural network)
In wavelet transforms, any function can be decomposed as weighted sum of Haar functions
Is there also such property for decomposition into mixture of Gaussians? If so, is there a proof?

If the sum allows to be infinite, then the answer is Yes. Please refer to Yves Meyer's book of "Wavelet and Operators", section 6.6, lemma 10.

There's a theorem, the Stone-Weierstrass theorem, which gives conditions for when a family of functions can approximate any continuous function. You need
an algebra of functions (closed under addition, subtraction, and
multiplication)
the constant functions
and you need the functions to separate points:
(for any two distinct points you can find a a function that assigns them different values)
You can approximate a constant function with increasingly wide gaussians. You can time-shift gaussians to separate points. So if you form an algebra out of gaussians, you can approximate any continuous function with them.

Yes. Decomposing any function to a sum of any kind of Gaussians is possible, since it can be decomposed to a sum of Dirac functions :) (and Dirac is a Gaussian where the variance approaches zero).
Some more interesting questions would be:
Can any function be decomposed to a sum of non-zero variance Gaussians, with a given, constant variance, that are defined around varying centers?
Can any function be be decomposed to a sum of non-zero variance Gaussians, all having 0 as the center, but defined with alternating variances?
The Mathematics Stack Exchange might be a better place to answer these questions though.

Related

Matlab `xcorr(x,y)` for more than two inputs

xcorr(x,y)returns the cross-correlation of two discrete-time sequences. I would like to know if there is a similar function that applies to more than two discrete-time sequences.
the term correlation is accurately defined, so you can use corrcoef to obtain the sense of how 3 vectors are correlated, maybe that is what you want? if so then:
correlation = corrcoef([V1(:) V2(:) V3(:)]);
will reflect the degree of correlation (negative or positive) of the vectors.
Matlab's built-in xcorr is made for a specific case of two-vectors, measuring the similarity between one vector and a time shifted vector. Each time shift yields a scalar, and you loop over all time shifts. you can xcorr(V1,V2), xcorr(V1,V3), xcorr(V2,V3) to find the correlation per time shift between all pairs, and create a 3D map that visualize the degree of similarity as function of the time-shifts.

Mixture of 1D Gaussians fit to data in Matlab / Python

I have a discrete curve y=f(x). I know the locations and amplitudes of peaks. I want to approximate the curve by fitting a gaussian at each peak. How should I go about finding the optimized gaussian parameters ? I would like to know if there is any inbuilt function which will make my task simpler.
Edit
I have fixed mean of gaussians and tried to optimize on sigma using
lsqcurvefit() in matlab. MSE is less. However, I have an additional hard constraint that the value of approximate curve should be equal to the original function at the peaks. This constraint is not satisfied by my model. I am pasting current working code here. I would like to have a solution which obeys the hard constraint at peaks and approximately fits the curve at other points. The basic idea is that the approximate curve has fewer parameters but still closely resembles the original curve.
fun = #(x,xdata)myFun(x,xdata,pks,locs); %pks,locs are the peak locations and amplitudes already available
x0=w(1:6)*0.25; % my initial guess based on domain knowledge
[sigma resnorm] = lsqcurvefit(fun,x0,xdata,ydata); %xdata and ydata are the original curve data points
recons = myFun(sigma,xdata,pks,locs);
figure;plot(ydata,'r');hold on;plot(recons);
function f=myFun(sigma,xdata,a,c)
% a is constant , c is mean of individual gaussians
f=zeros(size(xdata));
for i = 1:6 %use 6 gaussians to approximate function
f = f + a(i) * exp(-(xdata-c(i)).^2 ./ (2*sigma(i)^2));
end
end
If you know your peak locations and amplitudes, then all you have left to do is find the width of each Gaussian. You can think of this as an optimization problem.
Say you have x and y, which are samples from the curve you want to approximate.
First, define a function g() that will construct the approximation for given values of the widths. g() takes a parameter vector sigma containing the width of each Gaussian. The locations and amplitudes of the Gaussians will be constrained to the values you already know. g() outputs the value of the sum-of-gaussians approximation at each point in x.
Now, define a loss function L(), which takes sigma as input. L(sigma) returns a scalar that measures the error--how badly the given approximation (using sigma) differs from the curve you're trying to approximate. The squared error is a common loss function for curve fitting:
L(sigma) = sum((y - g(sigma)) .^ 2)
The task now is to search over possible values of sigma, and find the choice that minimizes the error. This can be done using a variety of optimization routines.
If you have the Mathworks optimization toolbox, you can use the function lsqnonlin() (in this case you won't have to define L() yourself). The curve fitting toolbox is probably an alternative. Otherwise, you can use an open source optimization routine (check out cvxopt).
A couple things to note. You need to impose the constraint that all values in sigma are greater than zero. You can tell the optimization algorithm about this constraint. Also, you'll need to specify an initial guess for the parameters (i.e. sigma). In this case, you could probably choose something reasonable by looking at the curve in the vicinity of each peak. It may be the case (when the loss function is nonconvex) that the final solution is different, depending on the initial guess (i.e. you converge to a local minimum). There are many fancy techniques for dealing with this kind of situation, but a simple thing to do is to just try with multiple different initial guesses, and pick the best result.
Edited to add:
In python, you can use optimization routines in the scipy.optimize module, e.g. curve_fit().
Edit 2 (response to edited question):
If your Gaussians have much overlap with each other, then taking their sum may cause the height of the peaks to differ from your known values. In this case, you could take a weighted sum, and treat the weights as another parameter to optimize.
If you want the peak heights to be exactly equal to some specified values, you can enforce this constraint in the optimization problem. lsqcurvefit() won't be able to do it because it only handles bound constraints on the parameters. Take a look at fmincon().
you can use Expectation–Maximization algorithm for fitting Mixture of Gaussians on your data. it don't care about data dimension.
in documentation of MATLAB you can lookup gmdistribution.fit or fitgmdist.

efficient inversion of known CDF in MATLAB

I need to compute efficiently and in a numerically stable way the inverse CDF F^-1(y) (cumulative distribution function) of a probability function, assuming that both the PDF f(x) and the CDF F(x) are known analytically but the inverse CDF is not. I am doing this in MATLAB.
This is a root-finding problem for F(x)-y and I could use fzero:
invcdf = #(y, x0) fzero(#(x) cdf(x) - y, x0);
However, fzero is for a generic nonlinear function.
I wonder if there is some function, or I can write some algorithm that uses the explicit information that F(x) is a cdf (for example, we know that it is monotonically non-decreasing and we have its derivative, f(x)).
FYI, the shape of the PDFs I am working with is generic mixtures of Gaussian distributions multiplied by a polynomial of arbitrary degree (the CDF can be computed analytically in this case, although it's not pretty and it becomes expensive for polynomials with many terms). Note that I need to compute the inverse CDF for millions of CDFs within this class; a lookup table is not feasible.
For more mathematical details see also this related question on Math Exchange (here I am asking specifically for a MATLAB solution).

Implementation of integral and infinite summation

I am having some trouble with implementing the following equation in Matlab:
The trouble is with using numerical/symbolic variables/implementation.
Can someone please write down the code to help me. Implementation would be great.
The constants for the equation are:
m=1; rho=0.5; H=1; I=1877; sigma=20;
For example if N=2, then:
for n=1, An=0.257, Zn-1=inf, Zn=0.4146;
for n=2, An=1, Zn-1=0.4146, Zn=0.1066;
Thanks for the help.
In numerical methods all infinite quantities are approximated by finite ones. Therefore, you have to study this expression for convergence (analytically or by means of numerical experiment). Once you know the radius of convergence you know which number to pick up to approximate the sign $\infinity$ in sums and integrals. The numerical evaluation of integrals is a large subject itself (you may read any book on numerical methods or specifically on numerical quadratures https://en.wikipedia.org/wiki/Numerical_integration). The simplest numerical approximation of an integral is based on the rectangular rule for a regular uniform grid:
$$\int_a^b f(x) dx=\sum_j f(x_j) \Delta x$$, where $\Delta x = x_{j+1}-x_j$

Creating a 1D Second derivative of gaussian Window

In MATLAB I need to generate a second derivative of a gaussian window to apply to a vector representing the height of a curve. I need the second derivative in order to determine the locations of the inflection points and maxima along the curve. The vector representing the curve may be quite noise hence the use of the gaussian window.
What is the best way to generate this window?
Is it best to use the gausswin function to generate the gaussian window then take the second derivative of that?
Or to generate the window manually using the equation for the second derivative of the gaussian?
Or even is it best to apply the gaussian window to the data, then take the second derivative of it all? (I know these last two are mathematically the same, however with the discrete data points I do not know which will be more accurate)
The maximum length of the height vector is going to be around 100-200 elements.
Thanks
Chris
I would create a linear filter composed of the weights generated by the second derivative of a Gaussian function and convolve this with your vector.
The weights of a second derivative of a Gaussian are given by:
Where:
Tau is the time shift for the filter. If you are generating weights for a discrete filter of length T with an odd number of samples, set tau to zero and allow t to vary from [-T/2,T/2]
sigma - varies the scale of your operator. Set sigma to a value somewhere between T/6. If you are concerned about long filter length then this can be reduced to T/4
C is the normalising factor. This can be derived algebraically but in practice I always do this numerically after calculating the filter weights. For unity gain when smoothing periodic signals, I will set C = 1 / sum(G'').
In terms of your comment on the equivalence of smoothing first and taking a derivative later, I would say it is more involved than that. As which derivative operator would you use in the second step? A simple central difference would not yield the same results.
You can get an equivalent (but approximate) response to a second derivative of a Gaussian by filtering the data with two Gaussians of different scales and then taking the point-wise differences between the two resulting vectors. See Difference of Gaussians for that approach.