I am looking for an efficient way to calculate the pairwise distances between all points in a coordinate matrix of size (nodeCount x 2) using MATLAB. I do not desire to calculate the pairwise distance twice (for example, between nodes 1-2 and between nodes 2-1). I have constructed an outer 'for' loop that increments through each node with an inner loop that evaluates only nodes of a higher index number. The result is an upper triangular matrix populated by the nodal separation distances. I would like to vectorize these computations, or at a minimum increase the efficiency of this operation. Any help would be appreciated.
gap = 10;
for s = 1:(nodeCount);
for ss = s+1:(nodeCount);
if abs(nodeCoord(s,1)-nodeCoord(ss,1)) < gap;
sep(s,ss) = sqrt((nodeCoord(s,1)-nodeCoord(ss,1))^2+(nodeCoord(s,2)-nodeCoord(ss,2))^2);
end
end
end
The loop is not really dependent that perspect. I guess you want to find the distance to all other coordinates try this:
xCoord = [1;2;3;4;5];
yCoord = [1;2;3;4;5]:
xSquare = bsxfun(#(x,y) power((x-y),2),xCoord,xCoord.');
ySquare = bsxfun(#(x,y) power((x-y),2),yCoord,yCoord.');
dist = sqrt(xSquare+ySquare);
xCoord = [1;2;3;4;5];
yCoord = [1;2;3;4;5];
dist = sqrt(pdist2(xCoord,yCoord,'euclidean'));
Can use the function pdist2
Rather than trying to use the fact that the lower triangular elements are not needed, as they are zeros in the output, I think you are better off using a technique that's based on fast matrix multiplication as discussed in this very smart solution to get the full matrix of euclidean distances. To get the desired output of upper triangular matrix, you can wrap the output with triu.
The code that follows next is a slightly modified version of it on the terms that we are calculating the distances among the same pair of coordinates from nodeCoord.
Code
numA = size(nodeCoord,1);
helpA = ones(numA,6);
helpB = ones(numA,6);
for idx = 1:2
sqA_idx = nodeCoord(:,idx).^2;
helpA(:,3*idx-1:3*idx) = [-2*nodeCoord(:,idx), sqA_idx ];
helpB(:,3*idx-2:3*idx-1) = [sqA_idx , nodeCoord(:,idx)];
end
sep = triu(sqrt(helpA(:,1:3) * helpB(:,1:3)')<gap).* sqrt(helpA * helpB');
pdist(nodeCoord) does it in a fast way, but returning the data in a vector. Mapping it back to a matrix costs about the same time as computing the distances:
sep3=zeros(nodeCount,nodeCount);
sep3(tril(true(nodeCount),-1))=pdist(nodeCoord);
sep3=sep3+sep3.';
If you are happy with a lower triangular matrix, you can leave out the last line.
Related
I'm trying to apply bare-bones image processing to images like this: My for-loop does exactly what I want it to: it allows me to find the pixels of highest intensity, and also remember the coordinates of that pixel. However, the code breaks whenever it encounters a multiple of rows – which in this case is equal to 18.
For example, the length of this image (rows * columns of image) is 414. So there are 414/18 = 23 cases where the program fails (i.e., the number of columns).
Perhaps there is a better way to accomplish my goal, but this is the only way I could think of sorting an image by pixel intensity while also knowing the coordinates of each pixel. Happy to take suggestions of alternative code, but it'd be great if someone had an idea of how to handle the cases where mod(x,18) = 0 (i.e., when the index of the vector is divisible by the total # of rows).
image = imread('test.tif'); % feed program an image
image_vector = image(:); % vectorize image
[sortMax,sortIndex] = sort(image_vector, 'descend'); % sort vector so
%that highest intensity pixels are at top
max_sort = [];
[rows,cols] = size(image);
for i=1:length(image_vector)
x = mod(sortIndex(i,1),rows); % retrieve original coordinates
% of pixels from matrix "image"
y = floor(sortIndex(i,1)/rows) +1;
if image(x,y) > 0.5 * max % filter out background noise
max_sort(i,:) = [x,y];
else
continue
end
end
You know that MATLAB indexing starts at 1, because you do +1 when you compute y. But you forgot to subtract 1 from the index first. Here is the correct computation:
index = sortIndex(i,1) - 1;
x = mod(index,rows) + 1;
y = floor(index/rows) + 1;
This computation is performed by the function ind2sub, which I recommend you use.
Edit: Actually, ind2sub does the equivalent of:
x = rem(sortIndex(i,1) - 1, rows) + 1;
y = (sortIndex(i,1) - x) / rows + 1;
(you can see this by typing edit ind2sub. rem and mod are the same for positive inputs, so x is computed identically. But for computing y they avoid the floor, I guess it is slightly more efficient.
Note also that
image(x,y)
is the same as
image(sortIndex(i,1))
That is, you can use the linear index directly to index into the two-dimensional array.
I am still wrapping my head around vectorization and I'm having a difficult time trying to resolve the following function I made...
for i = 1:size(X, 1)
min_n = inf;
for j=1:K
val = X(i,:)' - centroids(j,:)';
diff = val'*val;
if (diff < min_n)
idx(i) = j;
min_n = diff;
end
end
end
X is an array of (x,y) coordinates...
2 5
5 6
...
...
centroids in this example is limited to 3 rows. It is also in (x,y) format as shown above.
For every pair in X I am computing the closest pair of centroids. I then store the index of the centroid in idx.
So idx(i) = j means that I am storing the index j of the centroid at index i, where i corresponds to the index of X. This means the closest centroid to pair X(i, :) is at idx(i).
Can I possibly simplify this via vectorization? I struggle with just vectorizing the inner loop.
Here are three options. But please note that the disadvantage of vectorization, as compared to your double loops, is that it stores all the difference operation results at once, which means that if your matrices have many rows, you might run out of memory. On the other hand, the vectorized approach is probably much faster.
Option 1
If you have access to Statistics and Machine Learning Toolbox, you can use the function pdist2 to get all the pairwise distances between rows of two matrices. Then, the min function gives you the minimum of each column of the result. Its first returned value are the minimal values, and its second are the indices, which is what you need for idx:
diff = pdist2(centroids,X);
[~,idx] = min(diff);
Option 2
If you don't have access to the toolbox, you can use bsxfun. This will let you compute the difference operation between the two matrices even if their dimensions don't agree. All you need to do is to use shiftdim to reshape X' to have size [1,size(X,2),size(X,1)], and then reshapedX and and centroids are compatible with their dimensions (see documentation of bsxfun). This lets you take the difference between their values. The result is a three dimensional array, which you need to sum along the second dimension to get the norm of the differences between rows. At this point you can proceed as in option 1.
reshapedX = shiftdim(X',-1);
diff = bsxfun(#minus,centroids,reshapedX);
diff = squeeze(sum(diff.^2,2));
[~,idx] = min(diff);
Note: Starting in the Matlab version 2016b, the bsxfun is used implicitly and you do not need to call it anymore. So the line with bsxfun can be replaced with the simpler line diff = centroids-reshapedX.
Option 3
Use the function dsearchn, which performs exactly what you need:
idx = dsearchn(centroids,X);
it could be done using pdist2 - pairwise distances between rows of two matrices:
% random data
X = rand(500,2);
centroids = rand(3,2);
% pairwise distances
D = pdist2(X,centroids);
% closest centroid index for each X coordinates
[~,idx] = min(D,[],2)
% plot
scatter(centroids(:,1),centroids(:,2),300,(1:size(centroids,1))','filled');
hold on;
scatter(X(:,1),X(:,2),30,idx);
legend('Centroids','data');
What I am currently doing is computing the euclidean distance between all elements in a vector (the elements are pixel locations in a 2D image) to see if the elements are close to each other. I create a reference vector that takes on the value of each index within the vector incrementally. The euclidean distance between the reference vector and all the elements in the pixel location vector is computed using the MATLAB function "pdist2" and the result is applied to some conditions; however, upon running the code, this function seems to be taking the longest to compute (i.e. for one run, the function was called upon 27,245 times and contributed to about 54% of the overall program's run time). Is there a more efficient method to do this and speed up the program?
[~, n] = size(xArray); %xArray and yArray are same size
%Pair the x and y coordinates of the interest pixels
pairLocations = [xArray; yArray].';
%Preallocate cells with the max amount (# of interest pixels)
p = cell(1,n);
for i = 1:n
ref = [xArray(i), yArray(i)];
d = pdist2(ref,pairLocations,'euclidean');
d = d < dTh;
d = find(d==1);
[~,k] = size(d);
if (k >= num)
p{1,i} = d;
end
end
For squared Euclidean distance, there is a trick using matrix dot product:
||a-b||² = <a-b, a-b> = ||a||² - 2<a,b> + ||b||²
Let C = [xArray; yArray]; a 2×n matrix of all locations, then
n2 = sum(C.^2); % sq norm of coordinates
D = bsxfun(#plus, n2, n2.') - 2 * C.' * C;
Now D(ii,jj) holds the square distance between point ii and point jj.
Should run quite quickly.
I'm writing a program for school and I have nested for-loops that create a 4-dimensional array (of the distances between two points with coordinates (x,y) and (x',y')) as below:
pos_x=1:20;
pos_y=1:20;
Lx = length(pos_x);
Ly = length(pos_y);
Lx2 = Lx/2;
Ly2 = Ly/2;
%Distance function, periodic boundary conditions
d_x=abs(repmat(1:Lx,Lx,1)-repmat((1:Lx)',1,Lx));
d_x(d_x>Lx2)=Lx-d_x(d_x>Lx2);
d_y=abs(repmat(1:Ly,Ly,1)-repmat((1:Ly)',1,Ly));
d_y(d_y>Ly2)=Ly-d_y(d_y>Ly2);
for l=1:Ly
for k=1:Lx
for j=1:Ly
for i=1:Lx
distance(l,k,j,i)=sqrt(d_x(k,i).^2+d_y(l,j).^2);
end
end
end
end
d_x and d_y are just 20x20 matrices and Lx=Ly for trial purposes. It's very slow and obviously not a very elegant way of doing it. I tried to vectorize the nested loops and succeeded in getting rid of the two inner loops as:
dx2=zeros(Ly,Lx,Ly,Lx);
dy2=zeros(Ly,Lx,Ly,Lx);
distance=zeros(Ly,Lx,Ly,Lx);
for l=1:Ly
for k=1:Lx
dy2(l,k,:,:)=repmat(d_y(l,:),Ly,1);
dx2(l,k,:,:)=repmat(d_x(k,:)',1,Lx);
end
end
distance=sqrt(dx2.^2+dy2.^2);
which basically replaces the 4 for-loops above. I've now been trying for 2 days but I couldn't find a way to vectorize all the loops. I wanted to ask:
whether it's possible to actually get rid of these 2 loops
if so, i'd appreciate any tips and tricks to do so.
I have so far tried using repmat again in 4 dimensions, but you can't transpose a 4 dimensional matrix so I tried using permute and repmat together in many different combinations to no avail.
Any advice will be greatly appreciated.
thanks for the replies. Sorry for the bad wording, what I basically want is to have a population of oscillators uniformly located on the x-y plane. I want to simulate their coupling and the coupling function is a function of the distance between every oscillator. And every oscillator has an x and a y coordinate, so i need to find the distance between osci(1,1) and osci(1,1),..osci(1,N),osci(2,1),..osci(N,N)... and then the same for osci(1,2) and osci(1,1)...osci(N,N) and so on.. (so basically the distance between all oscillators and all other oscillators plus the self-coupling) if there's an easier way to do it other than using a 4-D array, i'd also definitely like to know it..
If I understand you correctly, you have oscillators all over the place, like this:
Then you want to calculate the distance between oscillator 1 and oscillators 1 through 100, and then between oscillator 2 and oscillators 1 through 100 etc. I believe that this can be represented by a 2D distance matrix, were the first dimension goes from 1 to 100, and the second dimension goes from 1 to 100.
For example
%# create 100 evenly spaced oscillators
[xOscillator,yOscillator] = ndgrid(1:10,1:10);
oscillatorXY = [xOscillator(:),yOscillator(:)];
%# calculate the euclidean distance between the oscillators
xDistance = abs(bsxfun(#minus,oscillatorXY(:,1),oscillatorXY(:,1)')); %'# abs distance x
xDistance(xDistance>5) = 10-xDistance; %# add periodic boundary conditions
yDistance = abs(bsxfun(#minus,oscillatorXY(:,2),oscillatorXY(:,2)')); %'# abs distance y
yDistance(yDistance>5) = 10-yDistance; %# add periodic boundary conditions
%# and we get the Euclidean distance
euclideanDistance = sqrt(xDistance.^2 + yDistance.^2);
I find that imaginary numbers can sometimes help convey coupled information quite well while reducing clutter. My method will double the number of calculations necessary (i.e. I find the distance X and Y then Y and X), and I still need a single for loop
x = 1:20;
y = 1:20;
[X,Y] = meshgrid(x,y);
Z =X + Y*i;
z = Z(:);
leng = length(z);
store = zeros(leng);
for looper = 1:(leng-1)
dummyz = circshift(z,looper);
store(:,looper+1) = z - dummyz;
end
final = abs(store);
I am working towards comparing multiple images. I have these image data as column vectors of a matrix called "images." I want to assess the similarity of images by first computing their Eucledian distance. I then want to create a matrix over which I can execute multiple random walks. Right now, my code is as follows:
% clear
% clc
% close all
%
% load tea.mat;
images = Input.X;
M = zeros(size(images, 2), size (images, 2));
for i = 1:size(images, 2)
for j = 1:size(images, 2)
normImageTemp = sqrt((sum((images(:, i) - images(:, j))./256).^2));
%Need to accurately select the value of gamma_i
gamma_i = 1/10;
M(i, j) = exp(-gamma_i.*normImageTemp);
end
end
My matrix M however, ends up having a value of 1 along its main diagonal and zeros elsewhere. I'm expecting "large" values for the first few elements of each row and "small" values for elements with column index > 4. Could someone please explain what is wrong? Any advice is appreciated.
Since you're trying to compute a Euclidean distance, it looks like you have an error in where your parentheses are placed when you compute normImageTemp. You have this:
normImageTemp = sqrt((sum((...)./256).^2));
%# ^--- Note that this parenthesis...
But you actually want to do this:
normImageTemp = sqrt(sum(((...)./256).^2));
%# ^--- ...should be here
In other words, you need to perform the element-wise squaring, then the summation, then the square root. What you are doing now is summing elements first, then squaring and taking the square root of the summation, which essentially cancel each other out (or are actually the equivalent of just taking the absolute value).
Incidentally, you can actually use the function NORM to perform this operation for you, like so:
normImageTemp = norm((images(:, i) - images(:, j))./256);
The results you're getting seem reasonable. Recall the behavior of the exp(-x). When x is zero, exp(-x) is 1. When x is large exp(-x) is zero.
Perhaps if you make M(i,j) = normImageTemp; you'd see what you expect to see.
Consider this solution:
I = Input.X;
D = squareform( pdist(I') ); %'# euclidean distance between columns of I
M = exp(-(1/10) * D); %# similarity matrix between columns of I
PDIST and SQUAREFORM are functions from the Statistics Toolbox.
Otherwise consider this equivalent vectorized code (using only built-in functions):
%# we know that: ||u-v||^2 = ||u||^2 + ||v||^2 - 2*u.v
X = sum(I.^2,1);
D = real( sqrt(bsxfun(#plus,X,X')-2*(I'*I)) );
M = exp(-(1/10) * D);
As was explained in the other answers, D is the distance matrix, while exp(-D) is the similarity matrix (which is why you get ones on the diagonal)
there is an already implemented function pdist, if you have a matrix A, you can directly do
Sim= squareform(pdist(A))