Vectorized spatial distance between values in multidimensional arrays - scipy

Given an (2,2,3,3,3) array of 3D-cartesian coordinates along the last dimension, what is the syntax for computing the Euclidean between pairwise values in XA and XB using scipy.spatial.distance.cdist to yield an output array with shape (2, 3, 3)?
XA = np.random.normal(size=(2,2,3,3,3))
XB = np.random.normal(size=(2,2,3,3,3))
dist = cdist(XA[:, 0, ...], XB[:, 1, ...], 'seuclidean')
Returns ValueError: XA must be a 2-dimensional array. As such, alternative to loops what is the pythonic syntax for computing cdist(XA[:, 0], XB[:, 1])?

Is this doing the job? if only pairwise distances are needed, and the coordinates are on the last dimension:
import numpy as np
XA = np.random.normal(size=(2,2,3,3,3))
XB = np.random.normal(size=(2,2,3,3,3))
distances = np.sqrt(np.sum((XA - XB)**2, axis=-1))
but here distances.shape is (2, 2, 3, 3)...

Related

How to identify multiple intersecting polygons in MATLAB?

I'm trying to identify overlapping/intersecting polygons. The techniques I have found only compare two polygons at a time. I have tens-of-thousands of cells in a dataset, and in each one there are 2-20 polygons, each described by x-y coordinates. I want to find the overlapping polygons in each cell. Looping between every pair to check for an intersection is very slow, so I want to ask...
Is there a way to compare all polygons at the same time and extract the IDs of those that are overlapping?
Here is a simple example of a single entry from the dataset:
shapes = cell(4,2);
shapes{1,1} = 'poly1';
shapes{2,1} = 'poly2';
shapes{3,1} = 'poly3';
shapes{4,1} = 'poly4';
shapes{1,2} = [1, 3, 3; 1, 1, 3]';
shapes{2,2} = [2, 4, 2; 2, 2, 5]';
shapes{3,2} = [4, 5, 5, 4; 3, 3, 5, 5]';
shapes{4,2} = [1, 3, 3, 1; 4, 4, 6, 6]';
This example contains these 4 polygons:
This plot was made with separate 'polyshape' objects, but that doesn't mean I need to use this kind of object in the solution.
The output I would like is a record of each overlapping pair:
result =
2×2 cell array
{'poly1'} {'poly2'}
{'poly2'} {'poly4'}
P.S. My current method is to loop through each pair and use the poly2mask function on each polygon of the pair. Then use the & operator to add the binary masks together. This produces a logical array of 1's where there is any overlap.
P.P.S. The actual polygons I am looking at are all annular sectors, therefore they are not all convex
Here is a solution that makes use of 'polyshape' vectors and avoids making all those pairwise comparisons in extra loops (although I don't know how the 'overlap' function works).
% Set up empty vector to hold the different shapes
polyvec = [];
% Loop all shapes and combine into polyshape vector
for ii = 1 : size(shapes, 1)
poly = polyshape(shapes{ii,2}(:,1), shapes{ii,2}(:,2));
% When you combine polyshape objects together the you get
% a vector that is of the polyshape object type
polyvec = [polyvec, poly];
end
% Use the overlap function to compute a symmetric binary matrix
% of which polygons in the polygon vector overlap.
interMatSym = overlaps(polyvec);
% I only need the upper triangle of the symmetric interaction
% matrix and all polygons overlap with themselves so use 'triu'
interMat = triu(overlaps(polyvec), 1);
% Find the coordinates of the overlap in the interaction matrix
[x, y] = find(interMat);
% Save the result
result = [shapes(x,1), shapes(y,1)];
result =
2×2 cell array
{'poly1'} {'poly2'}
{'poly2'} {'poly4'}
If there is a way to create a polyshpae vector any more efficiently then I'd love to know!

Calculating the barycenter of multiple triangles

I want to calculate each individual barycenter (centroid) of a list of triangles. Thus far I've managed to write this much :
function Triangle_Source_Centroid(V_Epoch0, F_Epoch0)
for i = 1:length(F_Epoch0)
Centroid_X = F_Epoch0(V_Epoch0(:,1),1) + F_Epoch0(V_Epoch0(:,1),2) + F_Epoch0(V_Epoch0(:,1),3);
Centroid_Y = F_Epoch0(V_Epoch0(:,2),1) + F_Epoch0(V_Epoch0(:,2),2) + F_Epoch0(V_Epoch0(:,2),3);
Centroid_Z = F_Epoch0(V_Epoch0(:,3),1) + F_Epoch0(V_Epoch0(:,3),2) + F_Epoch0(V_Epoch0(:,3),3);
Triangle_Centroid = [Centroid_X; Centroid_Y; Centroid_Z];
end
end
it doesn't work, and only gives me an error message:
Subscript indices must either be real positive integers or logicals.
Given how the variables are named, I'm guessing that V_Epoch0 is an N-by-3 matrix of vertices (X, Y, and Z for the columns) and F_Epoch0 is an M-by-3 matrix of face indices (each row is a set of row indices into V_Epoch0 showing which points make each triangle). Assuming this is right...
You can actually avoid using a for loop in this case by making use of matrix indexing. For example, to get the X coordinates for every point in F_Epoch0, you can do this:
allX = reshape(V_Epoch0(F_Epoch0, 1), size(F_Epoch0));
Then you can take the mean across the columns to get the average X coordinate for each triangular face:
meanX = mean(allX, 2);
And meanX is now a M-by-1 column vector. You can then repeat this for Y and Z coordinates:
allY = reshape(V_Epoch0(F_Epoch0, 2), size(F_Epoch0));
meanY = mean(allY, 2);
allZ = reshape(V_Epoch0(F_Epoch0, 3), size(F_Epoch0));
meanZ = mean(allZ, 2);
centroids = [meanX meanY meanZ];
And centroids is an M-by-3 matrix of triangle centroid coordinates.
Bonus:
All of the above can actually be done with just this one line:
centroids = squeeze(mean(reshape(V_Epoch0(F_Epoch0, :), [size(F_Epoch0, 1) 3 3]), 2));
Check out the documentation for multidimensional arrays to learn more about how this works.

Select values with a matrix of indices in MATLAB?

In MATLAB, I am looking for an efficient (and/or vectorized) way of filling a matrix by selecting from multiple matrices given a "selector matrix." For instance, given three source matrices
M1 = [0.1, 0.2; 0.3, 0.4]
M2 = [1, 2; 3, 4]
M3 = [10, 20; 30, 40]
and a matrix of indices
I = [1, 3; 1, 2]
I want to generate a new matrix M = [0.1, 20; 0.3, 4] by selecting the first entry from M1, second from M3, etc.
I can definitely do it in a nested loop, going through each entry and filling in the value, but I am sure there is a more efficient way.
What if M1, M2, M3 and M are all 3D matrices (RGB images)? Each entry of I tells us from which matrix we should take a 3-vector. Say, if I(1, 3) = 3, then we know entries indexed by (1, 3, :) of M should be M3(1, 3, :).
A way of doing this, without changing the way you store your variable is to use masks. If you have a few matrices, it is doing the job avoiding a for loop. You won't be able to fully vectorize without going through the cat function, or using cells.
M = zeros(size(M1));
Itmp = repmat(I==1,[1 1 size(M1,3)]); M(Itmp) = M1(Itmp);
Itmp = repmat(I==2,[1 1 size(M1,3)]); M(Itmp) = M2(Itmp);
Itmp = repmat(I==3,[1 1 size(M1,3)]); M(Itmp) = M3(Itmp);
I think the best way to approach this is to stack dimensions (ie have a matrix with values that are each of your indvidiual matricies). Unfortunately MATLAB doesn't really support array level indexing so what ends up happening is you end up using linear indexing to convert your values through the subs2ind command. I believe you can use the code below.
M1 = [0.1, 0.2; 0.3, 0.4]
M2 = [1, 2; 3, 4]
M3 = [10, 20; 30, 40]
metamatrix=cat(3,M1,M2,M3)
%Create a 3 dimenssional or however many dimension matrix by concatenating
%lower order matricies
I=[1,1,1;1,2,3;2,1,1;2,2,2]
M=reshape(metamatrix(sub2ind(size(metamatrix),I(:,1),I(:,2),I(:,3))),size(metamatrix(:,:,1)))
With a more complex (3 dimensional case), you would have to extend the code for higher dimensions.
One way of doing this could be to generate a 4D matrix with you images. It has the cost of increasing the amount of memory, or at least, change you memory scheme.
Mcat = cat(4, M1, M2, M3);
Then you can use the function sub2ind to get a vectorized Matrix creation.
% get the index for the basic Image matrix
I = repmat(I,[1 1 3]); % repeat the index for for RGB images
Itmp = sub2ind(size(I),reshape(1:numel(I),size(I)));
% update so that indices reach the I(x) value element on the 4th dim of Mcat.
Itmp = Itmp + (I-1)*numel(I);
% get the matrix
M = Mcat(Itmp);
I haven't tested it properly, but it should work.

Outlier detection in probability/ frequency distribution

I have following two dimensional dataset. Both (X and Y) are continuous random variables.
Z = (X, y) = {(1, 7), (2, 15), (3, 24), (4, 25), (5, 29), (6, 32), (7, 34), (8, 35), (9, 27), (10, 39)}
I want to detect outliers with respect to the y variable's values. The normal range for y variable is 10-35. Thus 1st and last pairs, in above dataset, are outliers and others are normal paris. I want to transform variable z = (x, y) into probability/ frequency distribution that outlier values (first and last pair) lies outside standard deviation 1. Can any one help me out to solve this problem.
PS: I have tried different distances such as eucledian and mahalanobis distances but they didn't worked.
I'm not exactly sure what your end goal is, but I'm going to assume you format your x,y variables in a nx2 matrix, so z = [x,y] where x:= nx1 and y:= nx1 vectors.
So what you are asking is for a way to separate out data points where y is outside of 10-35 range? For that you can use a conditional statement to find indexes where that occurs:
index = z(:,2) <= 35 & z(:,2) >= 10; %This gives vector of 0's & 1's length nx1
z_inliers = z(index,:); %This has a [x,y] matrix of only inlier data points
z_outliers = z(~index,:); %This has a [x,y] matrix of outlier data points
If you want to do this according to standard deviation then instead of 10 and 35 do:
low_range = mean(z(:,2)) - std(z(:,2));
high_range = mean(z(:,2)) + std(z(:,2));
index = y <= high_range & y >= low_range;
Then you can plot your pdf's or whatever with those points.

FLANN in matlab returns different distance from my own calculation

I'm using FLANN in matlab and using SIFT feature descriptor as my data. There is a function:
[result, ndists] = flann_search(index, testset, ...);
Here the index is built with kd-tree. The "user manual" said result returns the nearest neighbors of the samples in testset, and ndists contains the corresponding distances between the test samples and the nearest neighbors. I used the euclidean distance and found that the distances in ndists are different from that of computed by the orignal data. And even more strange, all the numbers in ndists are integers, which is often not possible for euclidean distance. Can you help me to explain this?
FLANN by default returns squared euclidean distance (x12 + ... + xn2). You can change the used metric with flann_set_distance_type(type, order) (see manual).
An example:
from pyflann import *
import numpy as np
dataset = np.array(
[[1., 1, 1, 2, 3],
[10, 10, 10, 3, 2],
[100, 100, 2, 30, 1]
])
testset = np.array(
[[1., 1, 1, 1, 1],
[90, 90, 10, 10, 1]
])
result, dists = FLANN().nn(
dataset, testset, 1, algorithm="kmeans", branching=32, iterations=7, checks=16)
Output:
>>> result
array([0, 2], dtype=int32)
>>> dists
array([ 5., 664.])
>>> ((testset[0] - dataset[0])**2).sum()
5.0
>>> ((testset[1] - dataset[2])**2).sum()
664.0
SIFT features are integers so the resulting distances are also integers in case of the squared euclidean distance.