Logic of this FWHM script? - matlab

Could someone explain the logic of this program.
I dont understand why the y=y/max(y)
interp = (0.5-y(i-1)) / (y(i)-y(i-1));
tlead = x(i-1) + interp*(x(i)-x(i-1));
The script:
function width = fwhm(x,y)
y = y / max(y);
N = length(y);
PixelWidth=7.8; % Pixel Pitch is 7.8 Microns.
%------- find index of center (max or min) of pulse---------------%
[~,centerindex] = max(y);% 479 S10 find center peak and coordinate
%------- find index of center (max or min) of pulse-----------------%
i = 2;
while sign(y(i)-0.5) == sign(y(i-1)-0.5) %trying to see the curve raise
i = i+1; %474 S10
end %first crossing is between v(i-1) & v(i)
interp = (0.5-y(i-1)) / (y(i)-y(i-1));
tlead = x(i-1) + interp*(x(i)-x(i-1));
i=centerindex+1; %471
%------- start search for next crossing at center--------------------%
while ((sign(y(i)-0.5) == sign(y(i-1)-0.5)) && (i <= N-1))
i = i+1;
if i ~= N
interp = (0.5-y(i-1)) / (y(i)-y(i-1));
ttrail = x(i-1) + interp*(x(i)-x(i-1));
%width = ttrail - tlead; % FWHM
width=((ttrail - tlead)/MicroscopeMag)*PixelWidth;
% Lateral Magnification x Pixel pitch of 7.8 microns.

The two segments of code you specifically mention are both housekeeping: it's more about the compsci of it than the optics.
So the first line
y = y/max(y);
is normalising it to 1, i.e. dividing the whole series through by the maximum value. This is a fairly common practice and it's sensible to do it here, it saves the programmer from having to divide through by it later.
The next part,
interp = (0.5-y(i-1)) / (y(i)-y(i-1));
tlead = x(i-1) + interp*(x(i)-x(i-1));
and the corresponding block later on for ttrail, are about trying to interpolate the exact point(s) where the signal's value would be 0.5. Earlier it identifies the centre of the peak and the last index position before half-maximum, so now we have a range containing the leading edge of the signal.
The 'half-maximum' criterion requires us to find the point where that leading edge's value is 0.5 (we normalised to 1, so the half-maximum is by definition 0.5). The data probably won't have a sample at exactly that value - it'll go [... 0.4856 0.5024 ...] or something similar.
So these two lines are an attempt to determine in fractions of an index exactly where the line would cross the 0.5 value. It does this by simple linear interpolation:
gives us the delta_y between the two values either side, and
gives us the shortfall. By taking the ratio we can linearly interpolate how far between the two index positions we should go to hit exactly 0.5.
The next line then works out the corresponding delta_x, which gives you the actual distance in terms of the timebase.
It does the same thing for the trailing edge, then uses these two interpolated values to give you a more precise value for the full-width.
To visualise this I would put a breakpoint at the i = 2 line and step through it, noting or plotting the values of y(i) as you go. stem is helpful for visualising discrete data, especially when you're working between index positions.

The program computes the resolution of a microscope using the Full Width at Half Maximum (FWHM) of the Point Spread Function (PSF) characterizing the microscope with a given objective/optics/etc.
The PSF normally looks like a gaussian:
and the FWHM tells you how good is your microscope system to discern small objects (i.e. the resolution). Let's say you are looking at 2 point objects, then the resolution (indirectly FWHM) is the minimum size those objects need to be if you are indeed to tell that there are 2 objects close to one another instead of one big object.
Now for the above function, it looks like it first compute the maximum of the PSF and then progressively goes down along the curve until it approximately reaches the half maximum. Then it's possible to compute the FWHM from the distribution of the PSF.
Hope that makes things a bit clearer!


How to Deal with Edge Cases: For Loops and Modulo

I'm trying to apply bare-bones image processing to images like this: My for-loop does exactly what I want it to: it allows me to find the pixels of highest intensity, and also remember the coordinates of that pixel. However, the code breaks whenever it encounters a multiple of rows – which in this case is equal to 18.
For example, the length of this image (rows * columns of image) is 414. So there are 414/18 = 23 cases where the program fails (i.e., the number of columns).
Perhaps there is a better way to accomplish my goal, but this is the only way I could think of sorting an image by pixel intensity while also knowing the coordinates of each pixel. Happy to take suggestions of alternative code, but it'd be great if someone had an idea of how to handle the cases where mod(x,18) = 0 (i.e., when the index of the vector is divisible by the total # of rows).
image = imread('test.tif'); % feed program an image
image_vector = image(:); % vectorize image
[sortMax,sortIndex] = sort(image_vector, 'descend'); % sort vector so
%that highest intensity pixels are at top
max_sort = [];
[rows,cols] = size(image);
for i=1:length(image_vector)
x = mod(sortIndex(i,1),rows); % retrieve original coordinates
% of pixels from matrix "image"
y = floor(sortIndex(i,1)/rows) +1;
if image(x,y) > 0.5 * max % filter out background noise
max_sort(i,:) = [x,y];
You know that MATLAB indexing starts at 1, because you do +1 when you compute y. But you forgot to subtract 1 from the index first. Here is the correct computation:
index = sortIndex(i,1) - 1;
x = mod(index,rows) + 1;
y = floor(index/rows) + 1;
This computation is performed by the function ind2sub, which I recommend you use.
Edit: Actually, ind2sub does the equivalent of:
x = rem(sortIndex(i,1) - 1, rows) + 1;
y = (sortIndex(i,1) - x) / rows + 1;
(you can see this by typing edit ind2sub. rem and mod are the same for positive inputs, so x is computed identically. But for computing y they avoid the floor, I guess it is slightly more efficient.
Note also that
is the same as
That is, you can use the linear index directly to index into the two-dimensional array.

How do I find exact rest points?

I have a displacement and a time data of a movement of an object.
The object oscillates around zero. That is, first - it gets set into motion by a small amount of force, then it comes to rest. again, a little force is applied and object gets set into motion.
I have found out the velocity and acceleration using
V= [0 ; diff(disp) ./ diff(times)];
A= [0; diff(V) ./ diff(times)];
I was thinking of finding points where velocity is zero. But i guess there are more than required such instances. Find the graph below:
velocity plot
I am interested in only circles time values. Is there a way to get these?
I observe a pattern
velocity increases then decreases by almost same amount.
Then due to friction, it crosses zero by a smaller amount and again becomes negative
finally comes to rest, but a very little velocity is still present.
It is this touch point to zero that I want. Then again force is applied and the same cycle repeats.
Pl note that I do not have a time of when force is applied. Otherwise there was nothing to be done.
Also, I did plot the acceleration. But is seems so useless..
I am using matlab.
Here's one way to find approximate zeros in gridded data:
% some dummy synthetic data
x = linspace(0, 10, 1e3);
y = exp(-0.3*x) .* sin(x) .* cos(pi*x);
% its derivative (presumably your "acceleration")
yp = diff(y) ./ diff(x);
% Plot data to get an overview
plot(x,y), hold on
% Find zero crossings (product of two consecutive data points is negative)
zero_x = y(1:end-1) .* y(2:end) < 0;
% Use derivative for linear interpolation between those points
x_cross = x(zero_x) + y(zero_x)./yp(zero_x);
% Plot those zeros
plot(x_cross, zeros(size(x_cross)), 'ro')
It is then up to you to select which zeros you need, because I could not understand from the question what made those points in the circles so special...
The resting points you asked have the following property:
dx / dt = v = 0
d^2 x / dt^2 = a = 0 # at the instance that the object becomes v = 0, there is no force on it.
So you may want to check also the second formula to filter the resting points.

Computing a moving average

I need to compute a moving average over a data series, within a for loop. I have to get the moving average over N=9 days. The array I'm computing in is 4 series of 365 values (M), which itself are mean values of another set of data. I want to plot the mean values of my data with the moving average in one plot.
I googled a bit about moving averages and the "conv" command and found something which i tried implementing in my code.:
hold on
for ii=1:4;
wts = [1/24;repmat(1/12,11,1);1/24];
hold off
So basically, I compute my mean and plot it with a (wrong) moving average. I picked the "wts" value right off the mathworks site, so that is incorrect. (source: http://www.mathworks.nl/help/econ/moving-average-trend-estimation.html) My problem though, is that I do not understand what this "wts" is. Could anyone explain? If it has something to do with the weights of the values: that is invalid in this case. All values are weighted the same.
And if I am doing this entirely wrong, could I get some help with it?
My sincerest thanks.
There are two more alternatives:
1) filter
From the doc:
You can use filter to find a running average without using a for loop.
This example finds the running average of a 16-element vector, using a
window size of 5.
data = [1:0.2:4]'; %'
windowSize = 5;
2) smooth as part of the Curve Fitting Toolbox (which is available in most cases)
From the doc:
yy = smooth(y) smooths the data in the column vector y using a moving
average filter. Results are returned in the column vector yy. The
default span for the moving average is 5.
%// Create noisy data with outliers:
x = 15*rand(150,1);
y = sin(x) + 0.5*(rand(size(x))-0.5);
y(ceil(length(x)*rand(2,1))) = 3;
%// Smooth the data using the loess and rloess methods with a span of 10%:
yy1 = smooth(x,y,0.1,'loess');
yy2 = smooth(x,y,0.1,'rloess');
In 2016 MATLAB added the movmean function that calculates a moving average:
N = 9;
M_moving_average = movmean(M,N)
Using conv is an excellent way to implement a moving average. In the code you are using, wts is how much you are weighing each value (as you guessed). the sum of that vector should always be equal to one. If you wish to weight each value evenly and do a size N moving filter then you would want to do
N = 7;
wts = ones(N,1)/N;
sum(wts) % result = 1
Using the 'valid' argument in conv will result in having fewer values in Ms than you have in M. Use 'same' if you don't mind the effects of zero padding. If you have the signal processing toolbox you can use cconv if you want to try a circular moving average. Something like
N = 7;
wts = ones(N,1)/N;
should work.
You should read the conv and cconv documentation for more information if you haven't already.
I would use this:
% does moving average on signal x, window size is w
function y = movingAverage(x, w)
k = ones(1, w) / w
y = conv(x, k, 'same');
ripped straight from here.
To comment on your current implementation. wts is the weighting vector, which from the Mathworks, is a 13 point average, with special attention on the first and last point of weightings half of the rest.

How to find dominant peaks in matlab (fft)

I'm having trouble trying to find the 4 dominant peaks in this graph
The signal is values are very jittery, in that they go up then down, making it hard to find the maximum value and it's index.
function [peaks, locations] = findMaxs (mag, threshold)
len = length(mag);
prev = 1;
cur = 2;
next = 3;
k = 1; %number of peaks
while next < len
if mag(cur) - mag(prev) > threshold
if mag(cur) > mag(next)
peaks(k) = mag(cur);
fprintf('peak at %d\n', cur);
k = k + 1;
prev = cur;
cur = next;
next = next + 1;
findpeaks() gave me way too many results, so I'm using this function. However, if I set the threshold too low, I get too many results, and if I set it even very slightly too high, I miss one of the dominant peaks.
How can I do this?
If your dominant peaks are seperated like in the plot you included, there is a parameter for findpeaks() that can help a whole lot. Try:
findpeaks(x, 'MINPEAKDISTANCE', dist);
with x being your magnitudes and dist being a distance you can assume to be te smallest distance between 2 peaks. This might give you a false peek in between 2 peek that are more than 2*dist from each other, if so consider adding a small threshold with 'MINPEAKHEIGHT'
Another Option is calulating your threshold dynamicly, for exsample by calulating the mean m and the standard deviation sigma and setting a threshold by only counting peaks that are n*sigma above m.
you can still use findpeaks.
for example [pks,locs] = findpeaks(data) returns the indices of the local peaks.
then you can sort data(locs) and get the top 4 amplitudes.
[a ind]=sort(data(locs,'descend')
or set a threshold, data(locs)>threshold etc...
One way to do this is to compute the difference function for the magnitude array (which is equivalent to derivative for continuous functions). Look for points where the value for the difference function goes from positive to negative. Those are your peak points.
To find the most prominent peaks, compute the second order difference function at the points obtained from the first order difference and select the ones which are of highest magnitude.
If the number of prominent peaks is unknown before-hand you can employ a threshold at this time as a measure of prominence.

Find only relevant points in MATLAB

I have a MATLAB function that finds charateristic points in a sample. Unfortunatley it only works about 90% of the time. But when I know at which places in the sample I am supposed to look I can increase this to almost 100%. So I would like to know if there is a function in MATLAB that would allow me to find the range where most of my results are, so I can then recalculate my characteristic points. I have a vector which stores all the results and the right results should lie inside a range of 3% between -24.000 to 24.000. Wheras wrong results are always lower than the correct range. Unfortunatley my background in statistics is very rusty so I am not sure how this would be called.
Can somebody give me a hint what I would be looking for? Is there a function build into MATLAB that would give me the smallest possible range where e.g. 90% of the results lie.
EDIT: I am sorry if I didn't make my question clear. Everything in my vector can only range between -24.000 and 24.000. About 90% of my results will be in a range which spans approximately 1.44 ([24-(-24)]*3% = 1.44). These are very likely to be the correct results. The remaining 10% are outside of that range and always lower (why I am not sure taking then mean value is a good idea). These 10% are false and result from blips in my input data. To find the remaining 10% I want to repeat my calculations, but now I only want to check the small range.
So, my goal is to identify where my correct range lies. Delete the values I have found outside of that range. And then recalculate my values, not on a range between -24.000 and 24.000, but rather on a the small range where I already found 90% of my values.
The relevant points you're looking for are the percentiles:
% generate sample data
data = [randn(900,1) ; randn(50,1)*3 + 5; ; randn(50,1)*3 - 5];
subplot(121), hist(data)
subplot(122), boxplot(data)
% find 5th, 95th percentiles (range that contains 90% of the data)
limits = prctile(data, [5 95])
% find data in that range
reducedData = data(limits(1) < data & data < limits(2));
Other approachs exist to detect outliers, such as the IQR outlier test and the three standard deviation rule, among many others:
%% three standard deviation rule
z = 3;
bounds = z * std(data)
reducedData = data( abs(data-mean(data)) < bounds );
%% IQR outlier test
Q = prctile(data, [25 75]);
IQ = Q(2)-Q(1);
%a = 1.5; % mild outlier
a = 3.0; % extreme outlier
bounds = [Q(1)-a*IQ , Q(2)+a*IQ]
reducedData = data(bounds(1) < data & data < bounds(2));
BTW if you want to get the z value (|X|<z) that corresponds to 90% area under the curve, use:
area = 0.9; % two-tailed probability
z = norminv(1-(1-area)/2)
Maybe you should try mean value (in matlab: mean) and standard deviation (in matlab: std)?
What is the statistic distribution of your data?
See also this wiki page, section "Interpretation and application".
In general for almost every distribution, very useful Chebyshev's inequalities take place.
In most of the cases this should work:
meanval = mean(data)
stDev = std(data)
and probably the most (75%) of your values will be placed in range:
<meanVal - 2*stDev, meanVal + 2*stDev>
it seems like maybe you want to find the number x in [-24,24] that maximizes the number of sample points in [x,x+1.44]; probably the fastest way to do this involves a sort of the sample points, which is ultimately nlog(n) time; a cheesy approximation would be as follows:
brkpoints = linspace(-24,24-1.44,n_brkpoints); %choose n_brkpoints big, but < # of sample points?
n_count = histc(data,[brkpoints,inf]); %count # data points between breakpoints;
accbins = 1.44 / (brkpoints(2) - brkpoints(1); %# of bins to accumulate;
cscount = cumsum(n_count); %half of the boxcar sum computation;
boxsum = cscount - [zeros(accbins,1);cscount(1:end-accbins)]; %2nd half;
[dum,maxi] = max(boxsum); %which interval has the maximal # counts?
lorange = brkpoints(maxi); %the lower range;
hirange = lorange + 1.44
this solution does fudge some of the corner case stuff about the bottom and top bin, etc.
note that if you're going to go by the Chebyshev inequality route, Petunin's Inequality is probably applicable, and will give a slight boost.