MATLAB: Computing euclidean distance in an efficient way? - matlab

What I am currently doing is computing the euclidean distance between all elements in a vector (the elements are pixel locations in a 2D image) to see if the elements are close to each other. I create a reference vector that takes on the value of each index within the vector incrementally. The euclidean distance between the reference vector and all the elements in the pixel location vector is computed using the MATLAB function "pdist2" and the result is applied to some conditions; however, upon running the code, this function seems to be taking the longest to compute (i.e. for one run, the function was called upon 27,245 times and contributed to about 54% of the overall program's run time). Is there a more efficient method to do this and speed up the program?
[~, n] = size(xArray); %xArray and yArray are same size
%Pair the x and y coordinates of the interest pixels
pairLocations = [xArray; yArray].';
%Preallocate cells with the max amount (# of interest pixels)
p = cell(1,n);
for i = 1:n
ref = [xArray(i), yArray(i)];
d = pdist2(ref,pairLocations,'euclidean');
d = d < dTh;
d = find(d==1);
[~,k] = size(d);
if (k >= num)
p{1,i} = d;
end
end

For squared Euclidean distance, there is a trick using matrix dot product:
||a-b||² = <a-b, a-b> = ||a||² - 2<a,b> + ||b||²
Let C = [xArray; yArray]; a 2×n matrix of all locations, then
n2 = sum(C.^2); % sq norm of coordinates
D = bsxfun(#plus, n2, n2.') - 2 * C.' * C;
Now D(ii,jj) holds the square distance between point ii and point jj.
Should run quite quickly.

Related

Vectorize cosine similarity in matlab

I have a 1000x1000x3 3d matrix. I want to calculate the cosine of the angle between each 3d vector of a certain set and each 3d vector I can vertically extract from the 3d matrix. Then I should be able to create a 1000x1000 matrix with the index of the vector that has the maximum cosine similarity (i.e. minimum angle) with the original data.
How can I vectorize this calculation or at least some parts of it? Currently I do it with nested for loops (huge amount of time and overhead).
I couldn't find a function that takes the norm over the third dimension, but this works as well I think.
a = rand(1000,1000,3)-.5; %dataset
na = sqrt(a(:,:,1).^2+a(:,:,2).^2+a(:,:,3).^2); %the norm of each vector
b = [1.2,1,3]; %vector to compare angle against
nb = norm(b); %the norm of the compare vector
b = repmat(b,[1000,1000]);
b = reshape(b,[1000,1000,3]); %1000x1000 copies of b
pl = a.*nb + na.*b;
mn = a.*nb - na.*b;
npl = sqrt(pl(:,:,1).^2+pl(:,:,2).^2+pl(:,:,3).^2);
nmn = sqrt(mn(:,:,1).^2+mn(:,:,2).^2+mn(:,:,3).^2);
theta = 2 * atan(nmn./npl); %angle between [0 and pi] as expected
The math is an adaptation of this formula:
theta = 2 * atan(norm(x*norm(y) - norm(x)*y) / norm(x * norm(y) + norm(x) * y))

Mahalanobis distance in Matlab

I would like to calculate the mahalanobis distance of input feature vector Y (1x14) to all feature vectors in matrix X (18x14). Each 6 vectors of X represent one class (So I have 3 classes). Then based on mahalanobis distances I will choose the vector that is the nearest to the input and classify it to one of the three classes as well.
My problem is when I use the following code I got only one value. How can I get mahalanobis distance between the input Y and every vector in X. So at the end I have 18 values and then I choose the smallest one. Any help will be appreciated. Thank you.
Note: I know that mahalanobis distance is a measure of the distance between a point P and a distribution D, but I don't how could this be applied in my situation.
Y = test1; % Y: 1x14 vector
S = cov(X); % X: 18x14 matrix
mu = mean(X,1);
d = ((Y-mu)/S)*(Y-mu)'
I also tried to separate the matrix X into 3; so each one represent the feature vectors of one class. This is the code, but it doesn't work properly and I got 3 distances and some have negative value!
Y = test1;
X1 = Action1;
S1 = cov(X1);
mu1 = mean(X1,1);
d1 = ((Y-mu1)/S1)*(Y-mu1)'
X2 = Action2;
S2 = cov(X2);
mu2 = mean(X2,1);
d2 = ((Y-mu2)/S2)*(Y-mu2)'
X3= Action3;
S3 = cov(X3);
mu3 = mean(X3,1);
d3 = ((Y-mu3)/S3)*(Y-mu3)'
d= [d1,d2,d3];
MahalanobisDist= min(d)
One last thing, when I used mahal function provided by Matlab I got this error:
Warning: Matrix is close to singular or badly scaled. Results may be inaccurate.
If you have to implement the distance yourself (school assignment for instance) this is of absolutely no use to you, but if you just need to calculate the distance as an intermediate step for other calculations I highly recommend d = Pdist2(a,b, distance_measure) the documentation is on matlabs site
It computes the pairwise distance between a vector (or even a matrix) b and all elements in a and stores them in vector d where the columns correspond to entries in b and the rows are entries from a. So d(i,j) is the distance between row j in b and row i in a (hope that made sense). If you want it could even parameters to find the k nearest neighbors, it's a great function.
in your case you would use the following code and you'd end up with the distance between elements, and the index as well
%number of neighbors
K = 1;
% X=18x14, Y=1x14, dist=18x1
[dist, iidx] = pdist2(X,Y,'mahalanobis','smallest',K);
%to find the class, you can do something like this
num_samples_per_class = 6;
matching_class = ceil(iidx/ num_samples_per_class);

Inverted pendelum matrix derivative approximation

Here I've written a dynamic function as:
function dAx = dynamic(t,x)
global u;
g = 9.8;
l = 0.5;
m = 0.5;
h = 2;
dx(1,1) = x(2);
dx(2,1) = g/l*sin(x(1))-h/(m*l^2)*x(2)+1/(m*l)*cos(x(1))*u(1,1);
dx(3,1) = g*lcos(x(3))-u(2,1);
A = [x(1)*x(2)+10*x(1);10*x(2)-5*x(1);x(3)]
dx = 1e-3
dAx = [(((x(1)+dx)+(x(1)-dx))*((x(2)+dx)+(x(2)-dx)))/(2*dx)+(10*(x(1)+dx)+(x(1)-dx))/(2*dx);
((10*(x(2)+dx)+(x(2)-dx))-5*((x(1)+dx)+(x(1)-dx)))/(2*dx);
((x(3)+dx)+(x(3)-dx))/(2*dx)]; % dA/dx using central derivative method computation
Here there is a matrix A (3*1) and function out put is derivative of matrix A related to system states.
I've tried to use central difference method.Is my derivative matrix calculation correct?
#Lahidj dAx/dx is a matrix of the size of nA by nx. You compute this column by column. Take the first state (eg. x(1)). Increment and decrement it by a small value (dx), like 1e-3. Get the A vectors for increased and decreased values of alpha. Compute the difference between these two vectors and divide it by two times of dx. (A_plus - A_minus)/(2*dx). Repeat this for the rest of the states/columns, you would get dAx/dx.

Defining an efficient distance function in matlab

I'm using kNN search function in matlab, but I'm calculating the distance between two objects of my own defined class, so I've written a new distance function. This is it:
function d = allRepDistance(obj1, obj2)
%calculates the min dist. between repr.
% obj2 is a vector, to fit kNN function requirements
n = size(obj2,1);
d = zeros(n,1);
for i=1:n
M = dist(obj1.Repr, [obj2(i,:).Repr]');
d(i) = min(min(M));
end
end
The difference is that obj.Repr may be a matrix, and I want to calculate the minimal distance between all the rows of each argument. But even if obj1.Repr is just a vector, which gives essentially the normal euclidian distance between two vectors, the kNN function is slower by a factor of 200!
I've checked the performance of just the distance function (no kNN). I measured the time it takes to calculate the distance between a vector and the rows of a matrix (when they are in the object), and it work slower by a factor of 3 then the normal distance function.
Does that make any sense? Is there a solution?
You are using dist(), which corresponds to the Euclidean distance weight function. However, you are not weighting your data, i.e. you don't consider that one dimension is more important that others. Thus, you can directly use the Euclidean distance pdist():
function d = allRepDistance(obj1, obj2)
% calculates the min dist. between repr.
% obj2 is a vector, to fit kNN function requirements
n = size(obj2,1);
d = zeros(n,1);
for i=1:n
X = [obj1.Repr, obj2(i,:).Repr'];
M = pdist(X,'euclidean');
d(i) = min(min(M));
end
end
BTW, I don't know your matrix dimensions, so you will need to deal with the concatenation of elements to create X correctly.

Multiply an arbitrary number of matrices an arbitrary number of times

I have found several questions/answers for vectorizing and speeding up routines for multiplying a matrix and a vector in a single loop, but I am trying to do something a little more general, namely multiplying an arbitrary number of matrices together, and then performing that operation an arbitrary number of times.
I am writing a general routine for calculating thin-film reflection from an arbitrary number of layers vs optical frequency. For each optical frequency W each layer has an index of refraction N and an associated 2x2 transfer matrix L and 2x2 interface matrix I which depends on the index of refraction and the thickness of the layer. If n is the number of layers, and m is the number of frequencies, then I can vectorize the index into an n x m matrix, but then in order to calculate the reflection at each frequency, I have to do nested loops. Since I am ultimately using this as part of a fitting routine, anything I can do to speed it up would be greatly appreciated.
This should provide a minimum working example:
W = 1260:0.1:1400; %frequency in cm^-1
N = rand(4,numel(W))+1i*rand(4,numel(W)); %dummy complex index of refraction
D = [0 0.1 0.2 0]/1e4; %thicknesses in cm
[n,m] = size(N);
r = zeros(size(W));
for x = 1:m %loop over frequencies
C = eye(2); % first medium is air
for y = 2:n %loop over layers
na = N(y-1,x);
nb = N(y,x);
%I = InterfaceMatrix(na,nb); % calculate the 2x2 interface matrix
I = [1 na*nb;na*nb 1]; % dummy matrix
%L = TransferMatrix(nb) % calculate the 2x2 transfer matrix
L = [exp(-1i*nb*W(x)*D(y)) 0; 0 exp(+1i*nb*W(x)*D(y))]; % dummy matrix
C = C*I*L;
end
a = C(1,1);
c = C(2,1);
r(x) = c/a; % reflectivity, the answer I want.
end
Running this twice for two different polarizations for a three layer (air/stuff/substrate) problem with 2562 frequencies takes 0.952 seconds while solving the exact same problem with the explicit formula (vectorized) for a three layer system takes 0.0265 seconds. The problem is that beyond 3 layers, the explicit formula rapidly becomes intractable and I would have to have a different subroutine for each number of layers while the above is completely general.
Is there hope for vectorizing this code or otherwise speeding it up?
(edited to add that I've left several things out of the code to shorten it, so please don't try to use this to actually calculate reflectivity)
Edit: In order to clarify, I and L are different for each layer and for each frequency, so they change in each loop. Simply taking the exponent will not work. For a real world example, take the simplest case of a soap bubble in air. There are three layers (air/soap/air) and two interfaces. For a given frequency, the full transfer matrix C is:
C = L_air * I_air2soap * L_soap * I_soap2air * L_air;
and I_air2soap ~= I_soap2air. Thus, I start with L_air = eye(2) and then go down successive layers, computing I_(y-1,y) and L_y, multiplying them with the result from the previous loop, and going on until I get to the bottom of the stack. Then I grab the first and third values, take the ratio, and that is the reflectivity at that frequency. Then I move on to the next frequency and do it all again.
I suspect that the answer is going to somehow involve a block-diagonal matrix for each layer as mentioned below.
Not next to a matlab, so that's only a starter,
Instead of the double loop you can write na*nb as Nab=N(1:end-1,:).*N(2:end,:);
The term in the exponent nb*W(x)*D(y) can be written as e=N(2:end,:)*W'*D;
The result of I*L is a 2x2 block matrix that has this form:
M = [1, Nab; Nab, 1]*[e-, 0;0, e+] = [e- , Nab*e+ ; Nab*e- , e+]
with e- as exp(-1i*e), and e+ as exp(1i*e)'
see kron on how to get the block matrix form, to vectorize the propagation C=C*I*L just take M^n
#Lama put me on the right path by suggesting block matrices, but the ultimate answer ended up being more complicated, and so I put it here for posterity. Since the transfer and interface matrix is different for each layer, I leave in the loop over the layers, but construct a large sparse block matrix where each block represents a frequency.
W = 1260:0.1:1400; %frequency in cm^-1
N = rand(4,numel(W))+1i*rand(4,numel(W)); %dummy complex index of refraction
D = [0 0.1 0.2 0]/1e4; %thicknesses in cm
[n,m] = size(N);
r = zeros(size(W));
C = speye(2*m); % first medium is air
even = 2:2:2*m;
odd = 1:2:2*m-1;
for y = 2:n %loop over layers
na = N(y-1,:);
nb = N(y,:);
% get the reflection and transmission coefficients from subroutines as a vector
% of length m, one value for each frequency
%t = Tab(na, nb);
%r = Rab(na, nb);
t = rand(size(W)); % dummy vector for MWE
r = rand(size(W)); % dummy vector for MWE
% create diagonal and off-diagonal elements. each block is [1 r;r 1]/t
Id(even) = 1./t;
Id(odd) = Id(even);
Io(even) = 0;
Io(odd) = r./t;
It = [Io;Id/2].';
I = spdiags(It,[-1 0],2*m,2*m);
I = I + I.';
b = 1i.*(2*pi*D(n).*nb).*W;
B(even) = -b;
B(odd) = b;
L = spdiags(exp(B).',0,2*m,2*m);
C = C*I*L;
end
a = spdiags(C,0);
a = a(odd).';
c = spdiags(C,-1);
c = c(odd).';
r = c./a; % reflectivity, the answer I want.
With the 3 layer system mentioned above, it isn't quite as fast as the explicit formula, but it's close and probably can get a little faster after some profiling. The full version of the original code clocks at 0.97 seconds, the formula at 0.012 seconds and the sparse diagonal version here at 0.065 seconds.