Finding defined peaks with Clusters in MATLAB - matlab

this is my problem:
I have the next data "A", which looks like:
As you can see, I have drawn with red circles the apparently peaks, the most defined are 2 and 7, I say that they are defined because its standard deviation is low in comparison with the other peaks (especially the second one).
What I need is a way (anyway) to get the values and the standard deviation of n peaks in a numeric array.
I have tried with "clusters", but I got no good results:
First of all, I used "kmeans" MATLAB function, and I realize that this algorithm doesn't group peaks as I need. As you can see in the picture above, in the red circle, that cluster has at less 3 or 4 peaks. And kmeans need that you set the number of clusters, and I need to identify it automatically.
I hope that anyone can give me some ideas, or a way to get better results, thanks.
Pd: I leave the data "A" in the next link.
https://drive.google.com/file/d/0B4WGV21GqSL5a2EyQ2l0SHZURzA/edit?usp=sharing

The problem is that your axes have very different meaning.
K-means optimizes variance. But variance in X is something entirely different than variance in Y, isn't it? Furthermore, each of these methods will split your data in both X and Y, whereas I assume you want the data to be partitioned on the X axis only.
I suggest the following: consider the Y axis to be a weight, and X axis to be a position.
Then perform weighted density estimation, and look for low density to separate your clusters.
I can't help you with MATLAB. I don't use it.
Mathematically, what you want to do is place a Gaussian at each point, with area Y and center X. Then find minima and maxima on the sum of these Gaussians. See Wikipedia, Kernel Density Estimation for details; except that you want to use the Y axis as weights. You could maybe also use 1/Y as standard deviation, if you don't want to use weights.

Related

Fitting gaussians to close peaks in MATLAB

I have a data set that has two peaks close together. I'd like to fit these peaks with gaussians so that I come up with a new data set that replicates the original one. To this end, I am using MATLAB's "findpeaks" function, and using the heights and widths of the peaks in order to come up with the appropriate number of gaussians, and then add those gaussians together. However, because the peaks are so close together, the result looks like the following (with the original data set in blue and the replicated one in red):
Is there a better method to replicate the data with gaussian peaks?
Gaussian function are defined by two variable, mean and variance. The two peak would give you the means of the two gaussians and by the look of the figure the same variance for them both (If some data has gone through a Gaussian process the variance would be the same, I can not think of a physical process where that would not be the case, unless it is just an arbitrary plot). So you only have to find one variable. As for the peaks that would just be the normalization so that the area under the curve sums up to 1. A gaussian sums up to 1 by default, if the sum under the plot you are trying to fit is 2 you do not need to do anything, otherwise scale accordingly.
My guess is something like this (pseudo code):
f = 0.5*gauss(-3,var)+0.5*gauss(3,var)
If you know more about the process that created the plot, then you can actually do better.

How to compute distance and estimate quality of heterogeneous grids in Matlab?

I want to evaluate the grid quality where all coordinates differ in the real case.
Signal is of a ECG signal where average life-time is 75 years.
My task is to evaluate its age at the moment of measurement, which is an inverse problem.
I think 2D approximation of the 3D case is hard (done here by Abo-Zahhad) with with 3-leads (2 on chest and one at left leg - MIT-BIT arrhythmia database):
where f is a piecewise continuous function in R^2, \epsilon is the error matrix and A is a 2D matrix.
Now, I evaluate the average grid distance in x-axis (time) and average grid distance in y-axis (energy).
I think this can be done by Matlab's Image Analysis toolbox.
However, I am not sure how complete the toolbox's approaches are.
I think a transform approach must be used in the setting of uneven and noncontinuous grids. One approach is exact linear time euclidean distance transforms of grid line sampled shapes by Joakim Lindblad et all.
The method presents a distance transform (DT) which assigns to each image point its smallest distance to a selected subset of image points.
This kind of approach is often a basis of algorithms for many methods in image analysis.
I tested unsuccessfully the case with bwdist (Distance transform of binary image) with chessboard (returns empty square matrix), cityblock, euclidean and quasi-euclidean where the last three options return full matrix.
Another pseudocode
% https://stackoverflow.com/a/29956008/54964
%// retrieve picture
imgRGB = imread('dummy.png');
%// detect lines
imgHSV = rgb2hsv(imgRGB);
BW = (imgHSV(:,:,3) < 1);
BW = imclose(imclose(BW, strel('line',40,0)), strel('line',10,90));
%// clear those masked pixels by setting them to background white color
imgRGB2 = imgRGB;
imgRGB2(repmat(BW,[1 1 3])) = 255;
%// show extracted signal
imshow(imgRGB2)
where I think the approach will not work here because the grids are not necessarily continuous and not necessary ideal.
pdist based on the Lumbreras' answer
In the real examples, all coordinates differ such that pdist hamming and jaccard are always 1 with real data.
The options euclidean, cytoblock, minkowski, chebychev, mahalanobis, cosine, correlation, and spearman offer some descriptions of the data.
However, these options make me now little sense in such full matrices.
I want to estimate how long the signal can live.
Sources
J. Müller, and S. Siltanen. Linear and nonlinear inverse problems with practical applications.
EIT with the D-bar method: discontinuous heart-and-lungs phantom. http://wiki.helsinki.fi/display/mathstatHenkilokunta/EIT+with+the+D-bar+method%3A+discontinuous+heart-and-lungs+phantom Visited 29-Feb 2016.
There is a function in Matlab defined as pdist which computes the pairwisedistance between all row elements in a matrix and enables you to choose the type of distance you want to use (Euclidean, cityblock, correlation). Are you after something like this? Not sure I understood your question!
cheers!
Simply, do not do it in the post-processing. Those artifacts of the body can be about about raster images, about the viewer and/or ... Do quality assurance in the signal generation/processing step.
It is much easier to evaluate the original signal than its views.

How to measure power spectral density in matlab?

I am trying to measure the PSD of a stochastic process in matlab, but I am not sure how to do it. I have posted the exact same question here, but I thought I might have more luck here.
The stochastic process describes wind speed, and is represented by a vector of real numbers. Each entry corresponds to the wind speed in a point in space, measured in m/s. The points are 0.0005 m apart. How do I measure and plot the PSD? Let's call the vector V. My first idea was to use
[p, w] = pwelch(V);
loglog(w,p);
But is this correct? The thing is, that I'm given an analytical expression, which the PSD should (in theory) match. By plotting it with these two lines of code, it looks all wrong. Specifically it looks as though it could need a translation and a scaling. Other than that, the shapes of the two are similar.
UPDATE:
The image above actually doesn't depict the PSD obtained by using pwelch on a single vector, but rather the mean of the PSD of 200 vectors, since these vectors stems from numerical simulations. As suggested, I have tried scaling by 2*pi/0.0005. I saw that you can actually give this information to pwelch. So I tried using the code
[p, w] = pwelch(V,[],[],[],2*pi/0.0005);
loglog(w,p);
instead. As seen below, it looks much nicer. It is, however, still not perfect. Is that just something I should expect? Taking the squareroot is not the answer, by the way. But thanks for the suggestion. For one thing, it should follow Kolmogorov's -5/3 law, which it does now (it follows the green line, which has slope -5/3). The function I'm trying to match it with is the Shkarofsky spectral density function, which is the one-dimensional Fourier transform of the Shkarofsky correlation function. Is it not possible to mark up math, here on the site?
UPDATE 2:
I have tried using [p, w] = pwelch(V,[],[],[],1/0.0005); as I was suggested. But as you can seem it still doesn't quite match up. It's hard for me to explain exactly what I'm looking for. But what I would like (or, what I expected) is that the dip, of the computed and the analytical PSD happens at the same time, and falls off with the same speed. The data comes from simulations of turbulence. The analytical expression has been fitted to actual measurements of turbulence, wherein this dip is present as well. I'm no expert at all, but as far as I know the dip happens at the small length scales, since the energy is dissipated, due to viscosity of the air.
What about using the standard equation for obtaining a PSD? I'd would do this way:
Sxx(f) = (fft(x(t)).*conj(fft(x(t))))*(dt^2);
or
Sxx = fftshift(abs(fft(x(t))))*(dt^2);
Then, if you really need, you may think of applying a windowing criterium, such as
Hanning
Hamming
Welch
which will only somehow filter your PSD.
Presumably you need to rescale the frequency (wavenumber) to units of 1/m.
The frequency units from pwelch should be rescaled, since as the documentation explains:
W is the vector of normalized frequencies at which the PSD is
estimated. W has units of rad/sample.
Off the cuff my guess is that the scaling factor is
scale = 1/0.0005/(2*pi);
or 318.3 (m^-1).
As for the intensity, it looks like taking a square root might help. Perhaps your equation reports an intensity, not PSD?
Edit
As you point out, since the analytical and computed PSD have nearly identical slopes they appear to obey similar power laws up to 800 m^-1. I am not sure to what degree you require exponents or offsets to match to be satisfied with a specific model, and I am not familiar with this particular theory.
As for the apparent inconsistency at high wavenumbers, I would point out that you are entering the domain of very small numbers and therefore (1) floating point issues and (2) noise are probably lurking. The very nice looking dip in the computed PSD on the other hand appears very real but I have no explanation for it (maybe your noise is not white?).
You may want to look at this submission at matlab central as it may be useful.
Edit #2
After inspecting documentation of pwelch, it appears that you should pass 1/0.0005 (the sampling rate) and not 2*pi/0.0005. This should not affect the slope but will affect the intercept.
The dip in PSD in your simulation results looks similar to aliasing artifacts
that I have seen in my data when the original data were interpolated with a
low-order method. To make this clearer - say my original data was spaced at
0.002m, but in the course of cleaning up missing data, trying to save space, whatever,
I linearly interpolated those data onto a 0.005m spacing. The frequency response
of linear interpolation is not well-behaved, and will introduce peaks and valleys
at the high wavenumber end of your spectrum.
There are different conventions for spectral estimates - whether the wavenumber
units are 1/m, or radians/m. Single-sided spectra or double-sided spectra.
help pwelch
shows that the default settings return a one-sided spectrum, i.e. the bin for some
frequency ω will include the power density for both +ω and -ω.
You should double check that the idealized spectrum to which you are comparing
is also a one-sided spectrum. Otherwise, you'll need to half the values of your
one-sided spectrum to get values representative of the +ω side of a
two-sided spectrum.
I agree with Try Hard that it is the cyclic frequency (generally Hz, or in this case 1/m)
which should be specified to pwelch. That said, the returned frequency vector
from pwelch is also in those units. Analytical
spectral formulae are usually written in terms of angular frequency, so you'll
want to be sure that you evaluate it in terms of radians/m, but scale back to 1/m
for plotting.

Filter noisy points during interpolation

I am using matlab's interpolation feature to interpolate the values of some points inside the convex hull formed. However, some of the points inside the convex hull have noisy z value. My input points are of two dimension x & y with z giving the value at the location(x,y). In some cases the z value is particular noise value. So, how can I make sure that it doesn't affect the interpolation, I mean I don't want this value to be considered. Suggestions?
Define your criteria for the point to be 'too noisy'.
Too many standard deviations from the mean? (Calculate mean and standard deviation, then threshold at n standard deviations?)
Too different from the surrounding values? (Use a smoothing/lowpass filter.)
Some background on what this data represents, and some of its characteristics, would help here.

How to calculate residuals for two curves (matrixes) of different size?

I've got a theoretical curve which was calculated numerically and an experimental curve (better to say a massive of experimental points). I need to calculate the residuals between these two curves to check the accuracy of modeling with the least squares sum method. These matrixes (curves) are of different size. Is there any function in MATLAB providing the calculation of residuals for two matrixes of different size?
I thought I'll just elaborate a bit on what Aabaz said in case there are others who might find this useful (Although Aabaz's explanation is probably clear enough for people who have an understanding of the necessary math etc.).
First, I'm assuming you have a 2D plot but it shouldn't be difficult to generalize to ND case.
Basically, for each point in your experimental data (xi, yi), use your "theoretical curve" to estimate yi' for the value xi. This is probably what Aabaz is referring to by making grid step size the same so that you interpolate the points exactly at the x coordinate values xi of your experimental data using the formula for your curve(s).
Next, to measure whether the fitting is good, you could for e.g. measure the sum of square differences using:
error = sum( (yi' - yi)^2 ){where i range over all points in your exp. data}
Of course other error metrics other than least square could be used to estimate how well the data fit your model (i.e. your curve) but by far for most applications, least square is the most common.
Hope this helps.