Creating a 1D Second derivative of gaussian Window - matlab

In MATLAB I need to generate a second derivative of a gaussian window to apply to a vector representing the height of a curve. I need the second derivative in order to determine the locations of the inflection points and maxima along the curve. The vector representing the curve may be quite noise hence the use of the gaussian window.
What is the best way to generate this window?
Is it best to use the gausswin function to generate the gaussian window then take the second derivative of that?
Or to generate the window manually using the equation for the second derivative of the gaussian?
Or even is it best to apply the gaussian window to the data, then take the second derivative of it all? (I know these last two are mathematically the same, however with the discrete data points I do not know which will be more accurate)
The maximum length of the height vector is going to be around 100-200 elements.
Thanks
Chris

I would create a linear filter composed of the weights generated by the second derivative of a Gaussian function and convolve this with your vector.
The weights of a second derivative of a Gaussian are given by:
Where:
Tau is the time shift for the filter. If you are generating weights for a discrete filter of length T with an odd number of samples, set tau to zero and allow t to vary from [-T/2,T/2]
sigma - varies the scale of your operator. Set sigma to a value somewhere between T/6. If you are concerned about long filter length then this can be reduced to T/4
C is the normalising factor. This can be derived algebraically but in practice I always do this numerically after calculating the filter weights. For unity gain when smoothing periodic signals, I will set C = 1 / sum(G'').
In terms of your comment on the equivalence of smoothing first and taking a derivative later, I would say it is more involved than that. As which derivative operator would you use in the second step? A simple central difference would not yield the same results.
You can get an equivalent (but approximate) response to a second derivative of a Gaussian by filtering the data with two Gaussians of different scales and then taking the point-wise differences between the two resulting vectors. See Difference of Gaussians for that approach.

Related

Matlab `xcorr(x,y)` for more than two inputs

xcorr(x,y)returns the cross-correlation of two discrete-time sequences. I would like to know if there is a similar function that applies to more than two discrete-time sequences.
the term correlation is accurately defined, so you can use corrcoef to obtain the sense of how 3 vectors are correlated, maybe that is what you want? if so then:
correlation = corrcoef([V1(:) V2(:) V3(:)]);
will reflect the degree of correlation (negative or positive) of the vectors.
Matlab's built-in xcorr is made for a specific case of two-vectors, measuring the similarity between one vector and a time shifted vector. Each time shift yields a scalar, and you loop over all time shifts. you can xcorr(V1,V2), xcorr(V1,V3), xcorr(V2,V3) to find the correlation per time shift between all pairs, and create a 3D map that visualize the degree of similarity as function of the time-shifts.

Mixture of 1D Gaussians fit to data in Matlab / Python

I have a discrete curve y=f(x). I know the locations and amplitudes of peaks. I want to approximate the curve by fitting a gaussian at each peak. How should I go about finding the optimized gaussian parameters ? I would like to know if there is any inbuilt function which will make my task simpler.
Edit
I have fixed mean of gaussians and tried to optimize on sigma using
lsqcurvefit() in matlab. MSE is less. However, I have an additional hard constraint that the value of approximate curve should be equal to the original function at the peaks. This constraint is not satisfied by my model. I am pasting current working code here. I would like to have a solution which obeys the hard constraint at peaks and approximately fits the curve at other points. The basic idea is that the approximate curve has fewer parameters but still closely resembles the original curve.
fun = #(x,xdata)myFun(x,xdata,pks,locs); %pks,locs are the peak locations and amplitudes already available
x0=w(1:6)*0.25; % my initial guess based on domain knowledge
[sigma resnorm] = lsqcurvefit(fun,x0,xdata,ydata); %xdata and ydata are the original curve data points
recons = myFun(sigma,xdata,pks,locs);
figure;plot(ydata,'r');hold on;plot(recons);
function f=myFun(sigma,xdata,a,c)
% a is constant , c is mean of individual gaussians
f=zeros(size(xdata));
for i = 1:6 %use 6 gaussians to approximate function
f = f + a(i) * exp(-(xdata-c(i)).^2 ./ (2*sigma(i)^2));
end
end
If you know your peak locations and amplitudes, then all you have left to do is find the width of each Gaussian. You can think of this as an optimization problem.
Say you have x and y, which are samples from the curve you want to approximate.
First, define a function g() that will construct the approximation for given values of the widths. g() takes a parameter vector sigma containing the width of each Gaussian. The locations and amplitudes of the Gaussians will be constrained to the values you already know. g() outputs the value of the sum-of-gaussians approximation at each point in x.
Now, define a loss function L(), which takes sigma as input. L(sigma) returns a scalar that measures the error--how badly the given approximation (using sigma) differs from the curve you're trying to approximate. The squared error is a common loss function for curve fitting:
L(sigma) = sum((y - g(sigma)) .^ 2)
The task now is to search over possible values of sigma, and find the choice that minimizes the error. This can be done using a variety of optimization routines.
If you have the Mathworks optimization toolbox, you can use the function lsqnonlin() (in this case you won't have to define L() yourself). The curve fitting toolbox is probably an alternative. Otherwise, you can use an open source optimization routine (check out cvxopt).
A couple things to note. You need to impose the constraint that all values in sigma are greater than zero. You can tell the optimization algorithm about this constraint. Also, you'll need to specify an initial guess for the parameters (i.e. sigma). In this case, you could probably choose something reasonable by looking at the curve in the vicinity of each peak. It may be the case (when the loss function is nonconvex) that the final solution is different, depending on the initial guess (i.e. you converge to a local minimum). There are many fancy techniques for dealing with this kind of situation, but a simple thing to do is to just try with multiple different initial guesses, and pick the best result.
Edited to add:
In python, you can use optimization routines in the scipy.optimize module, e.g. curve_fit().
Edit 2 (response to edited question):
If your Gaussians have much overlap with each other, then taking their sum may cause the height of the peaks to differ from your known values. In this case, you could take a weighted sum, and treat the weights as another parameter to optimize.
If you want the peak heights to be exactly equal to some specified values, you can enforce this constraint in the optimization problem. lsqcurvefit() won't be able to do it because it only handles bound constraints on the parameters. Take a look at fmincon().
you can use Expectation–Maximization algorithm for fitting Mixture of Gaussians on your data. it don't care about data dimension.
in documentation of MATLAB you can lookup gmdistribution.fit or fitgmdist.

Variable levels of smoothing within the same Matlab matrix

I currently have a large matrix M (~100x100x50 elements) containing both positive and negative values. At the moment, if I want to smooth this matrix, I use the smooth3 function to apply a gaussian kernel over the entire 3-D matrix.
What I want to achieve is a variable level of smoothing within this matrix - i.e.. different parts of the matrix M are smoothed to different levels of sigma depending of the value in a similar 3-D matrix, d (with values ranging from 0 to 1). Where d is 0, no smoothing occurs, where d is 1 a maximum level of smoothing occurs.
The fact that the matrix is 3-D is trivial. Smoothing in 3 dimensions is nice, but not essential, and my current code (performing various other manipulations) handles each of the 50 slices of M separately anyway. I am happy to replace smooth3 with a convolution of M with a gaussian function, and perform this convolution over each slice individually. What I can't figure out is how to vary the sigma level of this gaussian function (based on d) given its location in M and output the result accordingly.
An alternative approach may be to use matrix d as a mask for a very smooth version of matrix Ms and somehow manipulate M and Ms to give an equivalent result, however I'm not convinced that this will work as I can't think of a function to combine M and Md that won't give artefacts of each of M or Ms when 0 < d < 1...any thoughts?
[I'm using 2009b, and only have access to the Signal Processing toolbox.]
You should have a look at the Guided Image Filter. It is a computationally efficient generalization of the bilateral filter.
http://research.microsoft.com/en-us/um/people/jiansun/papers/guidedfilter_eccv10.pdf
It will allow you to do proper smoothing based on your guidance matrix.

Least squares optimal scaling

I have two waveforms which are linked by a numerical factor. I need to use optimal scaling (least squares) between the two waveforms to calculate this factor in Matlab. Unfortunately I have no idea how to do this. The two wave forms are seismic signals related by the velocity of the seismic waves, which I'm trying to calculate. Any ideas? need more info?
Call W1 and W2 the two vectors. For this to work, they must be column vectors. Transpose them if they are rows instead of columns. Then if we wish to find the value of k such that W1 = k*W2, just use backslash.
k = W2\W1;
Backslash here gives you a linear regression (least squares) estimator, as requested. This does not handle the unknown phase shift case of course.
one cheesy way to estimate the linear factor without having to deal with phase shift is to compute the ratio of the estimated scales of the waves. the cheesiest way is to use standard deviation:
k = std(W1) / std(W2);
if you care about robustness, I would substitute in the MAD or the IQR; the MAD is the median absolute deviation, which you can (somewhat inefficiently) 'inline' as so:
MAD = #(x)(median(abs(bsxfun(#minus,x,median(x)))));
k = MAD(W1) / MAD(W2);
the IQR is the interquartile range, which requires a proper quantile computation. you can implement this inefficiently using sort. I leave this as an exercise to the reader.

Help me understand FFT function (Matlab)

1) Besides the negative frequencies, which is the minimum frequency provided by the FFT function? Is it zero?
2) If it is zero how do we plot zero on a logarithmic scale?
3) The result is always symmetrical? Or it just appears to be symmetrical?
4) If I use abs(fft(y)) to compare 2 signals, may I lose some accuracy?
1) Besides the negative frequencies, which is the minimum frequency provided by the FFT function? Is it zero?
fft(y) returns a vector with the 0-th to (N-1)-th samples of the DFT of y, where y(t) should be thought of as defined on 0 ... N-1 (hence, the 'periodic repetition' of y(t) can be thought of as a periodic signal defined over Z).
The first sample of fft(y) corresponds to the frequency 0.
The Fourier transform of real, discrete-time, periodic signals has also discrete domain, and it is periodic and Hermitian (see below). Hence, the transform for negative frequencies is the conjugate of the corresponding samples for positive frequencies.
For example, if you interpret (the periodic repetition of) y as a periodic real signal defined over Z (sampling period == 1), then the domain of fft(y) should be interpreted as N equispaced points 0, 2π/N ... 2π(N-1)/N. The samples of the transform at the negative frequencies -π ... -π/N are the conjugates of the samples at frequencies π ... π/N, and are equal to the samples at frequencies
π ... 2π(N-1)/N.
2) If it is zero how do we plot zero on a logarithmic scale?
If you want to draw some sort of Bode plot you may plot the transform only for positive frequencies, ignoring the samples corresponding to the lowest frequencies (in particular 0).
3) The result is always symmetrical? Or it just appears to be symmetrical?
It has Hermitian symmetry if y is real: Its real part is symmetric, its imaginary part is anti-symmetric. Stated another way, its amplitude is symmetric and its phase anti-symmetric.
4) If I use abs(fft(y)) to compare 2 signals, may I lose some accuracy?
If you mean abs(fft(x - y)), this is OK and you can use it to get an idea of the frequency distribution of the difference (or error, if x is an estimate of y). If you mean abs(fft(x)) - abs(fft(y)) (???) you lose at least phase information.
Well, if you want to understand the Fast Fourier Transform, you want to go back to the basics and understand the DFT itself. But, that's not what you asked, so I'll just suggest you do that in your own time :)
But, in answer to your questions:
Yes, (excepting negatives, as you said) it is zero. The range is 0 to (N-1) for an N-point input.
In MATLAB? I'm not sure I understand your question - plot zero values as you would any other value... Though, as rightly pointed out by duffymo, there is no natural log of zero.
It's essentially similar to a sinc (sine cardinal) function. It won't necessarily be symmetrical, though.
You won't lose any accuracy, you'll just have the magnitude response (but I guess you knew that already).
Consulting "Numerical Recipes in C", Chapter 12 on "Fast Fourier Transform" says:
The frequency ranges from negative fc to positive fc, where fc is the Nyquist critical frequency, which is equal to 1/(2*delta), where delta is the sampling interval. So frequencies can certainly be negative.
You can't plot something that doesn't exist. There is no natural log of zero. You'll either plot frequency as the x-axis or choose a range that doesn't include zero for your semi-log axis.
The presence or lack of symmetry in the frequency range depends on the nature of the function in the time domain. You can have a plot in the frequency domain that is not symmetric about the y-axis.
I don't think that taking the absolute value like that is a good idea. You'll want to read a great deal more about convolution, correction, and signal processing to compare two signals.
result of fft can be 0. already answered by other people.
to plot 0 frequency, the trick is to set it to a very small positive number (I use exp(-15) for that purpose).
already answered by other people.
if you are only interested in the magnitude, yes you can do that. this is applicable, say, in many image processing problems.
Half your question:
3) The results of the FFT operation depend on the nature of the signal; hence there's nothing requiring that it be symmetrical, although if it is you may get some more information about the properties of the signal
4) That will compare the magnitudes of a pair of signals, but those being equal do no guarantee that the FFTs are identical (don't forget about phase). It may, however, be enough for your purposes, but you should be sure of that.