Median of arbitrary datapoint around index - MATLAB - matlab

I've been using the findpeaks function with great success to detect peaks in my signal. My next step is to clean these identified peaks, for which I have the indices.
My goal is to calculate the median of Y data points before and Y data points after a given index and replace whatever values (noise) there are with these new values (the calculated median).
Something like this:
% points before, peak, points after
% ↓ ↓ ↓
x = [1, 2, 3, 1, 34, 3, 2, 1, 3]
Calculate the median of the 4 data points preceding and following my peak the peak of 34...
Median of [1,2,3,1,3,2,1,3] is 2.
Replace my peak with this new value:
% Replaced peak with surrounding median
% ↓
x1 = [1, 2, 3, 1, 2, 3, 2, 1, 3]
Any suggestion on how to implement this?

Find the peaks and replace them with the results of medfilt1()
[~,idx]=findpeaks(x);
if ~isempty(idx)
m = medfilt1(x,9);
x(idx) = m(idx);
end

I think it is most efficient to process each peak individually. I'll demonstrate in a step-by-step manner in the following.
Take the neighborhood of each peak
x(idx_max-N:idx_max+N)
with N the number of elements to the left and right of the peak, respectively. The median of the neighborhood around each peak can be computed by using MATLAB's median() function:
median(x(idx_max-N:idx_max+N))
Now, you can replace either only the element at the peak position with the median of the neighborhood:
x(idx_max) = median(x(idx_max-N:idx_max+N))
or easily replace all elements of the neighborhood with the median value:
x(idx_max-N:idx_max+N) = median(x(idx_max-N:idx_max+N))
(Note that scalar expansion is used in the last example to assign a scalar value to multiple elements of an array.)

Related

Reduce three dimensional array to a vector of significant numbers

Hi I have a three dimensional array in Matlab, something like <10 x 10 x 100> and I would like to reduce this array to a vector of significant numbers. For example I would like to take each matrix(picture) split it in half by columns, compute sum(left)-sum(right) and return this <1 x 100> vector back. Unfortunately I cannot figure out or find out how to do that. Is it possible? And how could I achieve it?
Thanks a lot for any help.
Here's a one-liner, given a matrix A:
result = -squeeze(diff(sum(reshape(A, [50 2 100]), 1), 1, 2)).';
How it works:
First, reshape the data into a 50-by-2-by-100 matrix where values from the left half of each matrix are in column 1 and values from the right half of each matrix are in column 2. Then apply sum down each column to get a 1-by-2-by-100 matrix. You can then take the difference between the columns with diff, although this subtracts the left column from the right, so you have to add a minus to negate the result. The resulting 1-by-1-by-100 matrix can be collapsed to a 100-by-1 column vector with squeeze, and this can be transposed into a row vector. Alternatively, you can use another reshape instead of the squeeze and transpose:
result = -reshape(diff(sum(reshape(A, [50 2 100]), 1), 1, 2), [1 100]);

Get Matrix of minimum coordinate distance to point set

I have a set of points or coordinates like {(3,3), (3,4), (4,5), ...} and want to build a matrix with the minimum distance to this point set. Let me illustrate using a runnable example:
width = 10;
height = 10;
% Get min distance to those points
pts = [3 3; 3 4; 3 5; 2 4];
sumSPts = length(pts);
% Helper to determine element coordinates
[cols, rows] = meshgrid(1:width, 1:height);
PtCoords = cat(3, rows, cols);
AllDistances = zeros(height, width,sumSPts);
% To get Roh_I of evry pt
for k = 1:sumSPts
% Get coordinates of current Scribble Point
currPt = pts(k,:);
% Get Row and Col diffs
RowDiff = PtCoords(:,:,1) - currPt(1);
ColDiff = PtCoords(:,:,2) - currPt(2);
AllDistances(:,:,k) = sqrt(RowDiff.^2 + ColDiff.^2);
end
MinDistances = min(AllDistances, [], 3);
This code runs perfectly fine but I have to deal with matrix sizes of about 700 milion entries (height = 700, width = 500, sumSPts = 2k) and this slows down the calculation. Is there a better algorithm to speed things up?
As stated in the comments, you don't necessary have to put everything into a huge matrix and deal with gigantic matrices. You can :
1. Slice the pts matrix into reasonably small slices (say of length 100)
2. Loop on the slices and calculate the Mindistances slice over these points
3. Take the global min
tic
Mindistances=[];
width = 500;
height = 700;
Np=2000;
pts = [randi(width,Np,1) randi(height,Np,1)];
SliceSize=100;
[Xcoords,Ycoords]=meshgrid(1:width,1:height);
% Compute the minima for the slices from 1 to floor(Np/SliceSize)
for i=1:floor(Np/SliceSize)
% Calculate indexes of the next slice
SliceIndexes=((i-1)*SliceSize+1):i*SliceSize
% Get the corresponding points and reshape them to a vector along the 3rd dim.
Xpts=reshape(pts(SliceIndexes,1),1,1,[]);
Ypts=reshape(pts(SliceIndexes,2),1,1,[]);
% Do all the diffs between your coordinates and your points using bsxfun singleton expansion
Xdiffs=bsxfun(#minus,Xcoords,Xpts);
Ydiffs=bsxfun(#minus,Ycoords,Ypts);
% Calculate all the distances of the slice in one call
Alldistances=bsxfun(#hypot,Xdiffs,Ydiffs);
% Concatenate the mindistances
Mindistances=cat(3,Mindistances,min(Alldistances,[],3));
end
% Check if last slice needed
if mod(Np,SliceSize)~=0
% Get the corresponding points and reshape them to a vector along the 3rd dim.
Xpts=reshape(pts(floor(Np/SliceSize)*SliceSize+1:end,1),1,1,[]);
Ypts=reshape(pts(floor(Np/SliceSize)*SliceSize+1:end,2),1,1,[]);
% Do all the diffs between your coordinates and your points using bsxfun singleton expansion
Xdiffs=bsxfun(#minus,Xcoords,Xpts);
Ydiffs=bsxfun(#minus,Ycoords,Ypts);
% Calculate all the distances of the slice in one call
Alldistances=bsxfun(#hypot,Xdiffs,Ydiffs);
% Concatenate the mindistances
Mindistances=cat(3,Mindistances,min(Alldistances,[],3));
end
% Get global minimum
Mindistances=min(Mindistances,[],3);
toc
Elapsed time is 9.830051 seconds.
Note :
You'll not end up doing less calculations. But It will be a lot less intensive for your memory (700M doubles takes 45Go in memory), thus speeding up the process (With the help of vectorizing aswell)
About bsxfun singleton expansion
One of the great strength of bsxfun is that you don't have to feed it matrices whose values are along the same dimensions.
For example :
Say I have two vectors X and Y defined as :
X=[1 2]; % row vector X
Y=[1;2]; % Column vector Y
And that I want a 2x2 matrix Z built as Z(i,j)=X(i)+Y(j) for 1<=i<=2 and 1<=j<=2.
Suppose you don't know about the existence of meshgrid (The example is a bit too simple), then you'll have to do :
Xs=repmat(X,2,1);
Ys=repmat(Y,1,2);
Z=Xs+Ys;
While with bsxfun you can just do :
Z=bsxfun(#plus,X,Y);
To calculate the value of Z(2,2) for example, bsxfun will automatically fetch the second value of X and Y and compute. This has the advantage of saving a lot of memory space (No need to define Xs and Ys in this example) and being faster with big matrices.
Bsxfun Vs Repmat
If you're interested with comparing the computational time between bsxfun and repmat, here are two excellent (word is not even strong enough) SO posts by Divakar :
Comparing BSXFUN and REPMAT
BSXFUN on memory efficiency with relational operations

Generating a random list of (x, y) points that satisfy a condition?

So I need to generate a matrix of x and y points given that they meet the condition that at these (x,y) points concentration is greater than 10. Note that I first run a code that gives me concentration at each location, and now I need Matlab to "randomly" pick (x,y) points with the above condition.
Would appreciate any suggestions on how to go about this.
assuming your data looks something like this :
data= [... x y concentration
1, 1, 1; ...
2, 1, 11; ...
1, 2, 12; ...
2, 2, 1 ...
]
You could find all concentrations bigger than 10 with:
data_cbigger10=data(data(:,3)>10,:) % using logical indexing
and choose a random point from there with:
randomPoint=data_cbigger10(ceil(size(data_cbigger10,2)*rand),:) % pick a random index
If the dimensions are as follows:
the dimension of concentration is 52x61x61 as concentration is c(x,y,time), that of x is 1x61 and 1x52 for y. #PetrH – s2015
this should do the trick:
This is your data, I just make something up:
x=linspace(0,1,61);
y=linspace(0,1,52);
con=20*rand(61,52);
Now I find all positions in con which are bigger than 10. This results in a logical matrix. By multipling it with an random matrix the same size I get a matrix with random values where 'con' is bigger than 10, but everywhere else equals zero.
data_cbigger10=rand(size(con)).*(con>10);
by finding the max, or min, Value a random point is choosen:
for n=1:1:10
data_cbigger10=rand(size(con)).*(con>10);
[vals,xind]=max(data_cbigger10);
xind=squeeze(xind);
[vals,yind]=max(squeeze(vals));
[~,time_ind]=max(squeeze(vals));
yind=yind(time_ind);
xind=xind(yind,time_ind);
x_res(n)=x(xind)
y_res(n)=y(yind)
time_res(n)=time(time_ind)
con_res(n)=con(xind,yind,time_ind)
con(xind,yind,time_ind)=0; % setting the choosen point to zero, so it will not be choosen again.
end
Hope this works now for you.
Assuming you have the concentration for each point (x,y) stored in an array concentration you can use the find() and randsample() functions like so:
conGT10 = find(concentration>10); % find where concentration is greater than 10 (gives you indices)
randomPoints = randsample(conGT10,nn); % choose nn random numbers from those that satisfy the condition
x1 = x(randomPoints); % given the randomly drawn indices pull the corresponding numbers for x and y
y1 = y(randomPoints);
EDIT:
The above assumes that arrays x, y, and concentration are 1d and of the same length. Apparently this is not true for your problem.
You have a grid of points on a (x,y) plane and you measure concentration on this grid in different time periods. So the length of x is nx, the length of y is ny and the size of concentration is nx by ny by nt. For simplicity I will assume that you measure concentration only once, i.e. nt=1 and concentration is only 2d array.
The modified version of my previous answer would then be as follows:
[rows,cols] = find(concentration>10); % find where concentration is greater than 10 (gives you indices)
randomIndices = randsample(length(rows),nn); % choose nn random integers from 1 to n, where n is the number of observations that satisfy the condition 'concentration>10'
randomX = x(rows(randomIndices));
randomY = y(cols(randomIndices));

Find mean of non-zero elements

I am assuming that the mean fucntion takes a matrix and calculate its mean by suming all element of the array, and divide it by the total number of element.
However, I am using this functionality to calculate the mean of my matrix. Then I come across a point where I don't want the mean function to consider the 0 elements of my matrix. Specifically, my matrix is 1x100000 array, and that maybe 1/3 to 1/2 of its element is all 0. If that is the case, can I replace the 0 element with NULL so that the matlab wouldn't consider them in calculating the mean? What else can I do?
Short version:
Use nonzeros:
mean( nonzeros(M) );
A longer answer:
If you are working with an array with 100K entries, with a significant amount of these entries are 0, you might consider working with sparse representation. It might also be worth considering storing it as a column vector, rather than a row vector.
sM = sparse(M(:)); %// sparse column
mean( nonzeros(sM) ); %// mean of only non-zeros
mean( sM ); %// mean including zeros
As you were asking "What else can I do?", here comes another approach, which does not depend on the statistics Toolbox or any other Toolbox.
You can compute them mean yourself by summing up the values and dividing by the number of nonzero elements (nnz()). Since summing up zeros does not affect the sum, this will give the desired result. For a 1-dimensional case, as you seem to have it, this can be done as follows:
% // 1 dimensional case
M = [1, 1, 0 4];
sum(M)/nnz(M) % // 6/3 = 2
For a 2-dimensional case (or n-dimensional case) you have to specify the dimension along which the summation should happen
% // 2-dimensional case (or n-dimensional)
M = [1, 1, 0, 4
2, 2, 4, 0
0, 0, 0, 1];
% // column means of nonzero elements
mean_col = sum(M, 1)./sum(M~=0, 1) % // [1.5, 1.5, 4, 2.5]
% // row means of nonzero elements
mean_row = sum(M, 2)./sum(M~=0, 2) % // [2; 2.667; 1.0]
To find the mean of only the non-zero elements, use logical indexing to extract the non-zero elements and then call mean on those:
mean(M(M~=0))

Matlab ordfilt2 or alternatives for weighted local max

I would like to compute the weighted maxima of a vector in Matlab. For weighted maxima I intend the following:
Given a vector of 2*N+1 weights W={w[-N], w[-N+1] .. w[0] .. w[N]} and given an input sequence A, weighted maxima is a vector M where m[i]=max(w[-N]*a[i-N], w[-N+1]*a[i-N+1], ... w[N]*a[i+N])
So for example given a vector A= [1, 4, 12, 2, 4] and weights W=[0.5, 1, 0.5], the weighted maxima would be M=[2, 6, 12, 6, 4].
This can be done using ordfilt2, but ordfilt2 uses weights as additive rather then multiplicative.
I am actually working on 4D matrixes, but any 1D solution would work as the 4D weight matrix is separable.
My current solution is to generate shifted copies of the input array A, weight them according to the shift and maximize all the arrays. Shift is performed using circshift and is the bottleneck in the process. generating shifted matrixes "manually" trough indexing turned out to be even slower.
Can you suggest any more efficient solution?
EDIT: For a positive A, M=exp(ordfilt2(log(A), length(W), ones(size(W)), log(W))) does the job, but still takes longer than the circshift solution above. I am still looking for more efficient solutions.
>> B = padarray(A, [0 floor(numel(W)/2)], 0); % Pad A with zeros
>> B = bsxfun(#times, B(bsxfun(#plus, 1:numel(B)-numel(W)+1, (0:numel(W)-1)')), W(:)); % Apply the weights
>> M = max(B) % Compute the local maxima
M =
2 6 12 6 4