I'm having trouble trying to find the 4 dominant peaks in this graph
The signal is values are very jittery, in that they go up then down, making it hard to find the maximum value and it's index.
function [peaks, locations] = findMaxs (mag, threshold)
len = length(mag);
prev = 1;
cur = 2;
next = 3;
k = 1; %number of peaks
while next < len
if mag(cur) - mag(prev) > threshold
if mag(cur) > mag(next)
peaks(k) = mag(cur);
fprintf('peak at %d\n', cur);
k = k + 1;
end
end
prev = cur;
cur = next;
next = next + 1;
end
end
findpeaks() gave me way too many results, so I'm using this function. However, if I set the threshold too low, I get too many results, and if I set it even very slightly too high, I miss one of the dominant peaks.
How can I do this?
If your dominant peaks are seperated like in the plot you included, there is a parameter for findpeaks() that can help a whole lot. Try:
findpeaks(x, 'MINPEAKDISTANCE', dist);
with x being your magnitudes and dist being a distance you can assume to be te smallest distance between 2 peaks. This might give you a false peek in between 2 peek that are more than 2*dist from each other, if so consider adding a small threshold with 'MINPEAKHEIGHT'
Another Option is calulating your threshold dynamicly, for exsample by calulating the mean m and the standard deviation sigma and setting a threshold by only counting peaks that are n*sigma above m.
you can still use findpeaks.
for example [pks,locs] = findpeaks(data) returns the indices of the local peaks.
then you can sort data(locs) and get the top 4 amplitudes.
[a ind]=sort(data(locs,'descend')
or set a threshold, data(locs)>threshold etc...
One way to do this is to compute the difference function for the magnitude array (which is equivalent to derivative for continuous functions). Look for points where the value for the difference function goes from positive to negative. Those are your peak points.
To find the most prominent peaks, compute the second order difference function at the points obtained from the first order difference and select the ones which are of highest magnitude.
If the number of prominent peaks is unknown before-hand you can employ a threshold at this time as a measure of prominence.
Related
I tried to generate 1000 the random values in normal distribution by the normrnd function.
A = normrnd(4,1,[1000 1]);
I would like to set the minimum value is 2. However, that function just can define the mean and sd. How can I set the minimum value is 2 ?
You can't. Gaussian or normally distributed numbers are in a bell curve, with the tails tailing off to infinity. What you can do is "censor" them by eliminating every number beyond a cut-off.
Since you choose mean = 4 and sigma = 1, you will end up ~95% elements of A fall within range [2,6]. The number of elements whose values smaller than 2 is about 2.5%. If you consider this figure is small, you can wrap these elements to a minimum value. For example:
A = normrnd(4,1,[1000 1]);
A(A < 2) = A(A<2) + 2 - min(A(A<2))
Of course, it is technically not gaussian distribution. However if you have total control of mean and sigma, you can get a "more gaussian like" distribution by adding an offset to A:
A = A + 2 - min(A)
Note: This assumes you can have an arbitrarily set standard deviation, which may not be the case
As others have said, you cannot specify a lower bound for a true Gaussian. However, you can generate a Gaussian and estimate 1-p percent of values to be above and then ignore p percent of values (which will fall outside your cutoff).
For example, in the following code, I am generating a Gaussian where 95% of data-points fall above 2. Then I am removing all points below 2, knowing that 5% of data will be removed.
This is a solution because setting as p gets closer to zero, your chances of getting uncensored sample data that follows your Gaussian curve and is entirely above your cutoff goes to 100% (Really it's defined by the p/n ratio, but if n is fixed this is true).
n = 1000; % number of samples
cutoff = 2; % Cutoff point for min-value
mu = 4; % Mean
p = .05; % Percentile you would like to cutoff
z = -sqrt(2) * erfcinv(p*2); % compute z score
sigma = (cutoff - mu)/z; % compute standard deviation
A = normrnd(mu,sigma,[n 1]);
I would recommend removing values below the cutoff rather than re-attributing them to the lower bound of your distribution, but that is up to you.
A(A<cutoff) = []; % removes all values of A less than cutoff
If you want to be symmetrical (which you should to prevent sample skew) the following should work.
A(A>(2*mu-cutoff)) = [];
Could someone explain the logic of this program.
I dont understand why the y=y/max(y)
and,
interp = (0.5-y(i-1)) / (y(i)-y(i-1));
tlead = x(i-1) + interp*(x(i)-x(i-1));
The script:
function width = fwhm(x,y)
y = y / max(y);
N = length(y);
MicroscopeMag=10;
PixelWidth=7.8; % Pixel Pitch is 7.8 Microns.
%------- find index of center (max or min) of pulse---------------%
[~,centerindex] = max(y);% 479 S10 find center peak and coordinate
%------- find index of center (max or min) of pulse-----------------%
i = 2;
while sign(y(i)-0.5) == sign(y(i-1)-0.5) %trying to see the curve raise
i = i+1; %474 S10
end %first crossing is between v(i-1) & v(i)
interp = (0.5-y(i-1)) / (y(i)-y(i-1));
tlead = x(i-1) + interp*(x(i)-x(i-1));
i=centerindex+1; %471
%------- start search for next crossing at center--------------------%
while ((sign(y(i)-0.5) == sign(y(i-1)-0.5)) && (i <= N-1))
i = i+1;
end
if i ~= N
interp = (0.5-y(i-1)) / (y(i)-y(i-1));
ttrail = x(i-1) + interp*(x(i)-x(i-1));
%width = ttrail - tlead; % FWHM
width=((ttrail - tlead)/MicroscopeMag)*PixelWidth;
% Lateral Magnification x Pixel pitch of 7.8 microns.
end
Thanks.
The two segments of code you specifically mention are both housekeeping: it's more about the compsci of it than the optics.
So the first line
y = y/max(y);
is normalising it to 1, i.e. dividing the whole series through by the maximum value. This is a fairly common practice and it's sensible to do it here, it saves the programmer from having to divide through by it later.
The next part,
interp = (0.5-y(i-1)) / (y(i)-y(i-1));
tlead = x(i-1) + interp*(x(i)-x(i-1));
and the corresponding block later on for ttrail, are about trying to interpolate the exact point(s) where the signal's value would be 0.5. Earlier it identifies the centre of the peak and the last index position before half-maximum, so now we have a range containing the leading edge of the signal.
The 'half-maximum' criterion requires us to find the point where that leading edge's value is 0.5 (we normalised to 1, so the half-maximum is by definition 0.5). The data probably won't have a sample at exactly that value - it'll go [... 0.4856 0.5024 ...] or something similar.
So these two lines are an attempt to determine in fractions of an index exactly where the line would cross the 0.5 value. It does this by simple linear interpolation:
y(i)-y(i-1)
gives us the delta_y between the two values either side, and
0.5-y(i-1)
gives us the shortfall. By taking the ratio we can linearly interpolate how far between the two index positions we should go to hit exactly 0.5.
The next line then works out the corresponding delta_x, which gives you the actual distance in terms of the timebase.
It does the same thing for the trailing edge, then uses these two interpolated values to give you a more precise value for the full-width.
To visualise this I would put a breakpoint at the i = 2 line and step through it, noting or plotting the values of y(i) as you go. stem is helpful for visualising discrete data, especially when you're working between index positions.
The program computes the resolution of a microscope using the Full Width at Half Maximum (FWHM) of the Point Spread Function (PSF) characterizing the microscope with a given objective/optics/etc.
The PSF normally looks like a gaussian:
and the FWHM tells you how good is your microscope system to discern small objects (i.e. the resolution). Let's say you are looking at 2 point objects, then the resolution (indirectly FWHM) is the minimum size those objects need to be if you are indeed to tell that there are 2 objects close to one another instead of one big object.
Now for the above function, it looks like it first compute the maximum of the PSF and then progressively goes down along the curve until it approximately reaches the half maximum. Then it's possible to compute the FWHM from the distribution of the PSF.
Hope that makes things a bit clearer!
How to differentiate between a double peak and a single peak array?
Also if the array represents a double peak, how to find the minimum point between two peaks? The minimum points outside of the peaks (left of left peak and right of right peak) should not be considered in finding the minimum point.
I found PEAKDET function to be quite reliable and fast although it's loop based. It does not require pre-smoothing of noisy data, but finds local max and min extrema with difference larger than parameter delta.
Since PEAKDET runs from left to right it sometime misses peaks on the right site. To avoid it I prefer to run it twice:
%# some data
n = 100;
x = linspace(0,3*pi,n);
y = sin(x) + rand(1,n)/5;
%# run peakdet twice left-to-right and right-to-left
delta = 0.5;
[ymaxtab, ymintab] = peakdet(y, delta, x);
[ymaxtab2, ymintab2] = peakdet(y(end:-1:1), delta, x(end:-1:1));
ymaxtab = unique([ymaxtab; ymaxtab2],'rows');
ymintab = unique([ymintab; ymintab2],'rows');
%# plot the curve and show extreme points based on number of peaks
plot(x,y)
hold on
if size(ymaxtab,1) == 2 && size(ymintab,1) == 1 %# if double peak
plot(ymintab(:,1),ymintab(:,2),'r.','markersize',30)
elseif size(ymaxtab,1) == 1 && size(ymintab,1) == 0 %# if single peak
plot(ymaxtab(:,1),ymaxtab(:,2),'r.','markersize',30)
else %# if more (or less)
plot(ymintab(:,1),ymintab(:,2),'r.','markersize',30)
plot(ymaxtab(:,1),ymaxtab(:,2),'r.','markersize',30)
end
hold off
Here is one algorithm that might work depending on how noisy your signal is. Here I define a peak as the set of connected points greater than a given threshold value.
Assuming your original data is in the array A. First, find a threshold:
t = (max(A)+min(A))/2;
Next, find all the points greater than this threshold t:
P = A>t;
Count number of connected entries points that are greater than t using bwlabel
L = bwlabel(P);
numberOfPeaks = max(L);
Now numberOfPeaks should tell you how many peaks (connected points greater than the threshold value) you have in your data
Now to find the minimum point between the two peak we need to identify those points that seperate the two peaks using the label matrix L.
firstPoint = find(L==1,1,'last')+1;
lastPoint = find(L==2,1,'first')-1;
So the valley between the first two peaks is the points with index between firsPoint and lastPoint. The minimum would then be
minValue = min(A(firstPoint:lastPoint));
Solution that does not depend on the Image Processing Toolbox
As #Nzbuu notes the aboth relies on the image processing toolbox function bwlabel. So, here is away to avoid that. First, I Assume that the the array P correctly identifies points belonging to peak (P(i)=1) and those belonging to valleys (P(i)=-1). If this is the case the boundary between peaks and valleys can be identified when dP = P(i+1)-P(i) = 1 or -1.
dP = diff(P);
To calculate the number of peaks simply sum the number of 1's in dP:
numberOfPeaks = sum(dP==1);
And the points identifying the first valley are between
firstPoint = find(dP==-1,1,'first')+1 %# the -1 represents the last point of the peak so add 1
lastPoint = find(dP==1,2,'first'); #% Find the start of the second peak
lastPoint = lastPoint(end); #% Keep the last value
You can find the local min/max as follows:
x = 0:.1:4*pi;
y = sin(x);
plot(x,y)
diffy = diff(y);
localMin = find(diffy(1:end-1)<=0 & diffy(2:end) > 0)+1;
localMax = find(diffy(1:end-1)>=0 & diffy(2:end) < 0)+1;
hold on
plot(x(localMin),y(localMin),'dg')
plot(x(localMax),y(localMax),'*r')
Resulting in:
Basically you are finding where the delta between the y values change signs. If your data is noisy this will cause lots of local min/max values and you may need to filter your data.
To find the minimum value between two peaks you can do something like this:
if numel(localMax) == 1
fprintf('The max value is: %f',y(localMax));
elseif numel(localMax > 1)
betweenPeaksIndex = localMin(localMin > localMax(1) & localMin <localMax(2));
fprintf('The min between the first 2 peaks is: %f',y(betweenPeaksIndex));
else
fprintf('The was no local Max ..???');
end
I have a vector of data, which contains integers in the range -20 20.
Bellow is a plot with the values:
This is a sample of 96 elements from the vector data. The majority of the elements are situated in the interval -2, 2, as can be seen from the above plot.
I want to eliminate the noise from the data. I want to eliminate the low amplitude peaks, and keep the high amplitude peak, namely, peaks like the one at index 74.
Basically, I just want to increase the contrast between the high amplitude peaks and low amplitude peaks, and if it would be possible to eliminate the low amplitude peaks.
Could you please suggest me a way of doing this?
I have tried mapstd function, but the problem is that it also normalizes that high amplitude peak.
I was thinking at using the wavelet transform toolbox, but I don't know exact how to reconstruct the data from the wavelet decomposition coefficients.
Can you recommend me a way of doing this?
One approach to detect outliers is to use the three standard deviation rule. An example:
%# some random data resembling yours
x = randn(100,1);
x(75) = -14;
subplot(211), plot(x)
%# tone down the noisy points
mu = mean(x); sd = std(x); Z = 3;
idx = ( abs(x-mu) > Z*sd ); %# outliers
x(idx) = Z*sd .* sign(x(idx)); %# cap values at 3*STD(X)
subplot(212), plot(x)
EDIT:
It seems I misunderstood the goal here. If you want to do the opposite, maybe something like this instead:
%# some random data resembling yours
x = randn(100,1);
x(75) = -14; x(25) = 20;
subplot(211), plot(x)
%# zero out everything but the high peaks
mu = mean(x); sd = std(x); Z = 3;
x( abs(x-mu) < Z*sd ) = 0;
subplot(212), plot(x)
If it's for demonstrative purposes only, and you're not actually going to be using these scaled values for anything, I sometimes like to increase contrast in the following way:
% your data is in variable 'a'
plot(a.*abs(a)/max(abs(a)))
edit: since we're posting images, here's mine (before/after):
You might try a split window filter. If x is your current sample, the filter would look something like:
k = [L L L L L L 0 0 0 x 0 0 0 R R R R R R]
For each sample x, you average a band of surrounding samples on the left (L) and a band of surrounding samples on the right. If your samples are positive and negative (as yours are) you should take the abs. value first. You then divide the sample x by the average value of these surrounding samples.
y[n] = x[n] / mean(abs(x([L R])))
Each time you do this the peaks are accentuated and the noise is flattened. You can do more than one pass to increase the effect. It is somewhat sensitive to the selection of the widths of these bands, but can work. For example:
Two passes:
What you actually need is some kind of compression to scale your data, that is: values between -2 and 2 are scale by a certain factor and everything else is scaled by another factor. A crude way to accomplish such a thing, is by putting all small values to zero, i.e.
x = randn(1,100)/2; x(50) = 20; x(25) = -15; % just generating some data
threshold = 2;
smallValues = (abs(x) <= threshold);
y = x;
y(smallValues) = 0;
figure;
plot(x,'DisplayName','x'); hold on;
plot(y,'r','DisplayName','y');
legend show;
Please do not that this is a very nonlinear operation (e.g. when you have wanted peaks valued at 2.1 and 1.9, they will produce very different behavior: one will be removed, the other will be kept). So for displaying, this might be all you need, for further processing it might depend on what you are trying to do.
To eliminate the low amplitude peaks, you're going to equate all the low amplitude signal to noise and ignore.
If you have any apriori knowledge, just use it.
if your signal is a, then
a(abs(a)<X) = 0
where X is the max expected size of your noise.
If you want to get fancy, and find this "on the fly" then, use kmeans of 3. It's in the statistics toolbox, here:
http://www.mathworks.com/help/toolbox/stats/kmeans.html
Alternatively, you can use Otsu's method on the absolute values of the data, and use the sign back.
Note, these and every other technique I've seen on this thread is assuming you are doing post processing. If you are doing this processing in real time, things will have to change.
I have a MATLAB function that finds charateristic points in a sample. Unfortunatley it only works about 90% of the time. But when I know at which places in the sample I am supposed to look I can increase this to almost 100%. So I would like to know if there is a function in MATLAB that would allow me to find the range where most of my results are, so I can then recalculate my characteristic points. I have a vector which stores all the results and the right results should lie inside a range of 3% between -24.000 to 24.000. Wheras wrong results are always lower than the correct range. Unfortunatley my background in statistics is very rusty so I am not sure how this would be called.
Can somebody give me a hint what I would be looking for? Is there a function build into MATLAB that would give me the smallest possible range where e.g. 90% of the results lie.
EDIT: I am sorry if I didn't make my question clear. Everything in my vector can only range between -24.000 and 24.000. About 90% of my results will be in a range which spans approximately 1.44 ([24-(-24)]*3% = 1.44). These are very likely to be the correct results. The remaining 10% are outside of that range and always lower (why I am not sure taking then mean value is a good idea). These 10% are false and result from blips in my input data. To find the remaining 10% I want to repeat my calculations, but now I only want to check the small range.
So, my goal is to identify where my correct range lies. Delete the values I have found outside of that range. And then recalculate my values, not on a range between -24.000 and 24.000, but rather on a the small range where I already found 90% of my values.
The relevant points you're looking for are the percentiles:
% generate sample data
data = [randn(900,1) ; randn(50,1)*3 + 5; ; randn(50,1)*3 - 5];
subplot(121), hist(data)
subplot(122), boxplot(data)
% find 5th, 95th percentiles (range that contains 90% of the data)
limits = prctile(data, [5 95])
% find data in that range
reducedData = data(limits(1) < data & data < limits(2));
Other approachs exist to detect outliers, such as the IQR outlier test and the three standard deviation rule, among many others:
%% three standard deviation rule
z = 3;
bounds = z * std(data)
reducedData = data( abs(data-mean(data)) < bounds );
and
%% IQR outlier test
Q = prctile(data, [25 75]);
IQ = Q(2)-Q(1);
%a = 1.5; % mild outlier
a = 3.0; % extreme outlier
bounds = [Q(1)-a*IQ , Q(2)+a*IQ]
reducedData = data(bounds(1) < data & data < bounds(2));
BTW if you want to get the z value (|X|<z) that corresponds to 90% area under the curve, use:
area = 0.9; % two-tailed probability
z = norminv(1-(1-area)/2)
Maybe you should try mean value (in matlab: mean) and standard deviation (in matlab: std)?
What is the statistic distribution of your data?
See also this wiki page, section "Interpretation and application".
In general for almost every distribution, very useful Chebyshev's inequalities take place.
In most of the cases this should work:
meanval = mean(data)
stDev = std(data)
and probably the most (75%) of your values will be placed in range:
<meanVal - 2*stDev, meanVal + 2*stDev>
it seems like maybe you want to find the number x in [-24,24] that maximizes the number of sample points in [x,x+1.44]; probably the fastest way to do this involves a sort of the sample points, which is ultimately nlog(n) time; a cheesy approximation would be as follows:
brkpoints = linspace(-24,24-1.44,n_brkpoints); %choose n_brkpoints big, but < # of sample points?
n_count = histc(data,[brkpoints,inf]); %count # data points between breakpoints;
accbins = 1.44 / (brkpoints(2) - brkpoints(1); %# of bins to accumulate;
cscount = cumsum(n_count); %half of the boxcar sum computation;
boxsum = cscount - [zeros(accbins,1);cscount(1:end-accbins)]; %2nd half;
[dum,maxi] = max(boxsum); %which interval has the maximal # counts?
lorange = brkpoints(maxi); %the lower range;
hirange = lorange + 1.44
this solution does fudge some of the corner case stuff about the bottom and top bin, etc.
note that if you're going to go by the Chebyshev inequality route, Petunin's Inequality is probably applicable, and will give a slight boost.