Outlier detection in probability/ frequency distribution - matlab

I have following two dimensional dataset. Both (X and Y) are continuous random variables.
Z = (X, y) = {(1, 7), (2, 15), (3, 24), (4, 25), (5, 29), (6, 32), (7, 34), (8, 35), (9, 27), (10, 39)}
I want to detect outliers with respect to the y variable's values. The normal range for y variable is 10-35. Thus 1st and last pairs, in above dataset, are outliers and others are normal paris. I want to transform variable z = (x, y) into probability/ frequency distribution that outlier values (first and last pair) lies outside standard deviation 1. Can any one help me out to solve this problem.
PS: I have tried different distances such as eucledian and mahalanobis distances but they didn't worked.

I'm not exactly sure what your end goal is, but I'm going to assume you format your x,y variables in a nx2 matrix, so z = [x,y] where x:= nx1 and y:= nx1 vectors.
So what you are asking is for a way to separate out data points where y is outside of 10-35 range? For that you can use a conditional statement to find indexes where that occurs:
index = z(:,2) <= 35 & z(:,2) >= 10; %This gives vector of 0's & 1's length nx1
z_inliers = z(index,:); %This has a [x,y] matrix of only inlier data points
z_outliers = z(~index,:); %This has a [x,y] matrix of outlier data points
If you want to do this according to standard deviation then instead of 10 and 35 do:
low_range = mean(z(:,2)) - std(z(:,2));
high_range = mean(z(:,2)) + std(z(:,2));
index = y <= high_range & y >= low_range;
Then you can plot your pdf's or whatever with those points.

Related

Median of arbitrary datapoint around index - MATLAB

I've been using the findpeaks function with great success to detect peaks in my signal. My next step is to clean these identified peaks, for which I have the indices.
My goal is to calculate the median of Y data points before and Y data points after a given index and replace whatever values (noise) there are with these new values (the calculated median).
Something like this:
% points before, peak, points after
% ↓ ↓ ↓
x = [1, 2, 3, 1, 34, 3, 2, 1, 3]
Calculate the median of the 4 data points preceding and following my peak the peak of 34...
Median of [1,2,3,1,3,2,1,3] is 2.
Replace my peak with this new value:
% Replaced peak with surrounding median
% ↓
x1 = [1, 2, 3, 1, 2, 3, 2, 1, 3]
Any suggestion on how to implement this?
Find the peaks and replace them with the results of medfilt1()
[~,idx]=findpeaks(x);
if ~isempty(idx)
m = medfilt1(x,9);
x(idx) = m(idx);
end
I think it is most efficient to process each peak individually. I'll demonstrate in a step-by-step manner in the following.
Take the neighborhood of each peak
x(idx_max-N:idx_max+N)
with N the number of elements to the left and right of the peak, respectively. The median of the neighborhood around each peak can be computed by using MATLAB's median() function:
median(x(idx_max-N:idx_max+N))
Now, you can replace either only the element at the peak position with the median of the neighborhood:
x(idx_max) = median(x(idx_max-N:idx_max+N))
or easily replace all elements of the neighborhood with the median value:
x(idx_max-N:idx_max+N) = median(x(idx_max-N:idx_max+N))
(Note that scalar expansion is used in the last example to assign a scalar value to multiple elements of an array.)

Calculating the barycenter of multiple triangles

I want to calculate each individual barycenter (centroid) of a list of triangles. Thus far I've managed to write this much :
function Triangle_Source_Centroid(V_Epoch0, F_Epoch0)
for i = 1:length(F_Epoch0)
Centroid_X = F_Epoch0(V_Epoch0(:,1),1) + F_Epoch0(V_Epoch0(:,1),2) + F_Epoch0(V_Epoch0(:,1),3);
Centroid_Y = F_Epoch0(V_Epoch0(:,2),1) + F_Epoch0(V_Epoch0(:,2),2) + F_Epoch0(V_Epoch0(:,2),3);
Centroid_Z = F_Epoch0(V_Epoch0(:,3),1) + F_Epoch0(V_Epoch0(:,3),2) + F_Epoch0(V_Epoch0(:,3),3);
Triangle_Centroid = [Centroid_X; Centroid_Y; Centroid_Z];
end
end
it doesn't work, and only gives me an error message:
Subscript indices must either be real positive integers or logicals.
Given how the variables are named, I'm guessing that V_Epoch0 is an N-by-3 matrix of vertices (X, Y, and Z for the columns) and F_Epoch0 is an M-by-3 matrix of face indices (each row is a set of row indices into V_Epoch0 showing which points make each triangle). Assuming this is right...
You can actually avoid using a for loop in this case by making use of matrix indexing. For example, to get the X coordinates for every point in F_Epoch0, you can do this:
allX = reshape(V_Epoch0(F_Epoch0, 1), size(F_Epoch0));
Then you can take the mean across the columns to get the average X coordinate for each triangular face:
meanX = mean(allX, 2);
And meanX is now a M-by-1 column vector. You can then repeat this for Y and Z coordinates:
allY = reshape(V_Epoch0(F_Epoch0, 2), size(F_Epoch0));
meanY = mean(allY, 2);
allZ = reshape(V_Epoch0(F_Epoch0, 3), size(F_Epoch0));
meanZ = mean(allZ, 2);
centroids = [meanX meanY meanZ];
And centroids is an M-by-3 matrix of triangle centroid coordinates.
Bonus:
All of the above can actually be done with just this one line:
centroids = squeeze(mean(reshape(V_Epoch0(F_Epoch0, :), [size(F_Epoch0, 1) 3 3]), 2));
Check out the documentation for multidimensional arrays to learn more about how this works.

Select values with a matrix of indices in MATLAB?

In MATLAB, I am looking for an efficient (and/or vectorized) way of filling a matrix by selecting from multiple matrices given a "selector matrix." For instance, given three source matrices
M1 = [0.1, 0.2; 0.3, 0.4]
M2 = [1, 2; 3, 4]
M3 = [10, 20; 30, 40]
and a matrix of indices
I = [1, 3; 1, 2]
I want to generate a new matrix M = [0.1, 20; 0.3, 4] by selecting the first entry from M1, second from M3, etc.
I can definitely do it in a nested loop, going through each entry and filling in the value, but I am sure there is a more efficient way.
What if M1, M2, M3 and M are all 3D matrices (RGB images)? Each entry of I tells us from which matrix we should take a 3-vector. Say, if I(1, 3) = 3, then we know entries indexed by (1, 3, :) of M should be M3(1, 3, :).
A way of doing this, without changing the way you store your variable is to use masks. If you have a few matrices, it is doing the job avoiding a for loop. You won't be able to fully vectorize without going through the cat function, or using cells.
M = zeros(size(M1));
Itmp = repmat(I==1,[1 1 size(M1,3)]); M(Itmp) = M1(Itmp);
Itmp = repmat(I==2,[1 1 size(M1,3)]); M(Itmp) = M2(Itmp);
Itmp = repmat(I==3,[1 1 size(M1,3)]); M(Itmp) = M3(Itmp);
I think the best way to approach this is to stack dimensions (ie have a matrix with values that are each of your indvidiual matricies). Unfortunately MATLAB doesn't really support array level indexing so what ends up happening is you end up using linear indexing to convert your values through the subs2ind command. I believe you can use the code below.
M1 = [0.1, 0.2; 0.3, 0.4]
M2 = [1, 2; 3, 4]
M3 = [10, 20; 30, 40]
metamatrix=cat(3,M1,M2,M3)
%Create a 3 dimenssional or however many dimension matrix by concatenating
%lower order matricies
I=[1,1,1;1,2,3;2,1,1;2,2,2]
M=reshape(metamatrix(sub2ind(size(metamatrix),I(:,1),I(:,2),I(:,3))),size(metamatrix(:,:,1)))
With a more complex (3 dimensional case), you would have to extend the code for higher dimensions.
One way of doing this could be to generate a 4D matrix with you images. It has the cost of increasing the amount of memory, or at least, change you memory scheme.
Mcat = cat(4, M1, M2, M3);
Then you can use the function sub2ind to get a vectorized Matrix creation.
% get the index for the basic Image matrix
I = repmat(I,[1 1 3]); % repeat the index for for RGB images
Itmp = sub2ind(size(I),reshape(1:numel(I),size(I)));
% update so that indices reach the I(x) value element on the 4th dim of Mcat.
Itmp = Itmp + (I-1)*numel(I);
% get the matrix
M = Mcat(Itmp);
I haven't tested it properly, but it should work.

How do I plot points / vectors as points?

p = [3,3]
plot(p, 'x')
This weirdly generates this:
I'd like it to be a point at x=3/y=3 on the plot. How?
#mathematician1975 is right, but I feel like this requires a bit more explanation:
Like the official documentation states:
plot(Y) plots the columns of Y versus the index of each value when Y is a real number.
so in fact this is not weird at all that plot(p, 'x') plots each value in p against its index, i.e. the points (1, 3) and (2, 3).
This is actually handy in some cases (when you want the x-coordiantes to be a running index), but not in yours. To plot point p correctly, use the syntax plot(X, Y), that is:
plot(p(2), p(1), 'x')
(here I assumed that the y-coordinate is the first in p, but if it's the x-coordinates you can just swap the places of the input arguments).
In the general case, if p is a matrix with two columns (say, the first contains all y-coordinates and the second all x-coordinates), you can plot all points like so:
plot(p(:, 2), p(:, 1), 'x')
You need vectors of each coordinate. For example:
x = [3,4]
y = [5,6]
plot(x,y,'x')
will plot the points (3,5) and (4,6)

random floating points generator in matlab?

I have generated a figure of 4 x 4 area in matlab. Now I need to place more than 200 points(Actually moving device) on this area randomly but distributed evenly all over 4 x 4 area. I am using the following line to randomly generated the x and y co-ordinate to select the place for each of the different points.
a =200;
x_base = randi([1 5], 1, a);
b = rand([10 8], 1);
y_base = randi([3 7],1, a);
With the above code I can get only integer co-ordinates for x and y. Hence am not able to distribute the points evenly all over the area. This is because I am using randi function which generates integer only. I would like to know is there any way of generating floating point numbers randomly so that I can distribute the points more evenly?
I am looking for random floating point numbers between 1 to 20.
rand
Generates a number between 0 and 1.
rand(m,n) generates an m-by-n array of such numbers.
You want to select n random points in the 4x4 area from (0, 0) to (4, 4)?
unifinv(rand(n, 2), 0, 4)
minVal = 1;
maxVal = 20;
r = rand(1) * (maxVal - minVal) + minVal