Vectorization for PCA in Matlab - matlab

I have a matrix A of 3D point coordinates, so the size of A is N by 3, where N is the number of points. I did a knnsearch on A for 20 neighbors and got a index matrix idN, which is N by 21. Then I want to calculate PCA for each point with its 20 neighbors. For example, the first point in matrix A, it would be
pca(A(idN(1,2:end),:))
And I want to calculate this for all the points. I can use a loop to do this but it will be very slow when the number of points is high. Is there a way I can calculate pca without using loop or at least a way to make this process faster?
Loop part of the code
idN = knnsearch(A,A,'k',21);
result = zeros(N,12);
for i = 1:N
[coef,~,latent] = pca(A(idN(i,2:end),:));
result(i,1:9) = coef(:)';
result(i,10:12) = latent';
end

Related

Calculate Euclidean distance for every row with every other row in a NxM matrix?

I have a matrix that I generate from a CSV file as follows:
X = xlsread('filename.csv');
I am looping through the matrix based on the number of records and I need to find the Euclidean distance for each of the rows of this matrix :
for i = 1:length(X)
j = X(:, [2:5])
end
The resulting matrix is of 150 X 4. What would be the best way to calculate the Euclidean distance of each row (with 4 columns as the data points) with every row and getting an average of the same?
In order to find the Euclidean distance between any pair of rows, you could use the function pdist.
X = randn(6, 4);
D = pdist(X,'euclidean');
res=mean(D);
The average is stored in res.

Remove for loop: Extracting Neighbour Distances from Matrix containing distances between all instances

I am trying to create a matrix of distances between my N data points and their K neighbours. The data matrix is NxA so the distance matrix Y_distances is NxN and eachi,jth entry is the distance between data point i and j. Using knnsearch I have a matrix of the row numbers of each data point and its K neighbours called IDX, I then perform dimensionality reduction and want to use the distances between the neighbouring points in the lower dimension space. I am currently using a for loop as such:
no_neighbours=k;
IDX = knnsearch(X,X,'K',no_neighbours);
Y_Distances = sqrt(dist2(y, y));
for i = 1:N
for j= 1:A
Y_neighbour_distances = Y_Distances(i,IDX(i,j));
end
end
Any suggestions on how to avoid these loops as they are quite time consuming on large datasets.
You probably need this call to sparse:
I = ndgrid(1:N, 1:K);
sparse(I, IDX, Y_Distances)

Matlab calculating nearest neighbour distance for all (u, v) vectors in an array

I am trying to calculate the distance between nearest neighbours within a nx2 matrix like the one shown below
point_coordinates =
11.4179 103.1400
16.7710 10.6691
16.6068 119.7024
25.1379 74.3382
30.3651 23.2635
31.7231 105.9109
31.8653 36.9388
%for loop going from the top of the vector column to the bottom
for counter = 1:size(point_coordinates,1)
%current point defined selected
current_point = point_coordinates(counter,:);
%math to calculate distance between the current point and all the points
distance_search= point_coordinates-repmat(current_point,[size(point_coordinates,1) 1]);
dist_from_current_point = sqrt(distance_search(:,1).^2+distance_search(:,2).^2);
%line to omit self subtraction that gives zero
dist_from_current_point (dist_from_current_point <= 0)=[];
%gives the shortest distance calculated for a certain vector and current_point
nearest_dist=min(dist_from_current_point);
end
%final line to plot the u,v vectors and the corresponding nearest neighbour
%distances
matnndist = [point_coordinates nearest_dist]
I am not sure how to structure the 'for' loop/nearest_neighbour line to be able to get the nearest neighbour distance for each u,v vector.
I would like to have, for example ;
for the first vector you could have the coordinates and the corresponding shortest distance, for the second vector another its shortest distance, and this goes on till n
Hope someone can help.
Thanks
I understand you want to obtain the minimum distance between different points.
You can compute the distance for each pair of points with bsxfun; remove self-distances; minimize. It's more computationally efficient to work with squared distances, and take the square root only at the end.
n = size(point_coordinates,1);
dist = bsxfun(#minus, point_coordinates(:,1), point_coordinates(:,1).').^2 + ...
bsxfun(#minus, point_coordinates(:,2), point_coordinates(:,2).').^2;
dist(1:n+1:end) = inf; %// remove self-distances
min_dist = sqrt(min(dist(:)));
Alternatively, you could use pdist. This avoids computing each distance twice, and also avoids self-distances:
dist = pdist(point_coordinates);
min_dist = min(dist(:));
If I can suggest a built-in function, use knnsearch from the statistics toolbox. What you are essentially doing is a K-Nearest Neighbour (KNN) algorithm, but you are ignoring self-distances. The way you would call knnsearch is in the following way:
[idx,d] = knnsearch(X, Y, 'k', k);
In simple terms, the KNN algorithm returns the k closest points to your data set given a query point. Usually, the Euclidean distance is the distance metric that is used. For MATLAB's knnsearch, X is a 2D array that consists of your dataset where each row is an observation and each column is a variable. Y would be the query points. Y is also a 2D array where each row is a query point and you need to have the same number of columns as X. We would also specify the flag 'k' to denote how many closest points you want returned. By default, k = 1.
As such, idx would be a N x K matrix, where N is the total number of query points (number of rows of Y) and K would be those k closest points to the dataset for each query point we have. idx indicates the particular points in your dataset that were closest to each query. d is also a N x K matrix that returns the smallest distances for these corresponding closest points.
As such, what you want to do is find the closest point for your dataset to each of the other points, ignoring self-distances. Therefore, you would set both X and Y to be the same, and set k = 2, discarding the first column of both outputs to get the result you're looking for.
Therefore:
[idx,d] = knnsearch(point_coordinates, point_coordinates, 'k', 2)
idx = idx(:,2);
d = d(:,2);
We thus get for idx and d:
>> idx
idx =
3
5
1
1
7
3
5
>> d
d =
17.3562
18.5316
17.3562
31.9027
13.7573
20.4624
13.7573
As such, this tells us that for the first point in your data set, it matched with point #3 the best. This matched with the closest distance of 17.3562. For the second point in your data set, it matched with point #5 the best with the closest distance being 18.5316. You can continue on with the rest of the results in a similar pattern.
If you don't have access to the statistics toolbox, consider reading my StackOverflow post on how I compute KNN from first principles.
Finding K-nearest neighbors and its implementation
In fact, it is very similar to Luis Mendo's post to you earlier.
Good luck!

Defining an efficient distance function in matlab

I'm using kNN search function in matlab, but I'm calculating the distance between two objects of my own defined class, so I've written a new distance function. This is it:
function d = allRepDistance(obj1, obj2)
%calculates the min dist. between repr.
% obj2 is a vector, to fit kNN function requirements
n = size(obj2,1);
d = zeros(n,1);
for i=1:n
M = dist(obj1.Repr, [obj2(i,:).Repr]');
d(i) = min(min(M));
end
end
The difference is that obj.Repr may be a matrix, and I want to calculate the minimal distance between all the rows of each argument. But even if obj1.Repr is just a vector, which gives essentially the normal euclidian distance between two vectors, the kNN function is slower by a factor of 200!
I've checked the performance of just the distance function (no kNN). I measured the time it takes to calculate the distance between a vector and the rows of a matrix (when they are in the object), and it work slower by a factor of 3 then the normal distance function.
Does that make any sense? Is there a solution?
You are using dist(), which corresponds to the Euclidean distance weight function. However, you are not weighting your data, i.e. you don't consider that one dimension is more important that others. Thus, you can directly use the Euclidean distance pdist():
function d = allRepDistance(obj1, obj2)
% calculates the min dist. between repr.
% obj2 is a vector, to fit kNN function requirements
n = size(obj2,1);
d = zeros(n,1);
for i=1:n
X = [obj1.Repr, obj2(i,:).Repr'];
M = pdist(X,'euclidean');
d(i) = min(min(M));
end
end
BTW, I don't know your matrix dimensions, so you will need to deal with the concatenation of elements to create X correctly.

vectorizing 4 nested for-loops in Matlab

I'm writing a program for school and I have nested for-loops that create a 4-dimensional array (of the distances between two points with coordinates (x,y) and (x',y')) as below:
pos_x=1:20;
pos_y=1:20;
Lx = length(pos_x);
Ly = length(pos_y);
Lx2 = Lx/2;
Ly2 = Ly/2;
%Distance function, periodic boundary conditions
d_x=abs(repmat(1:Lx,Lx,1)-repmat((1:Lx)',1,Lx));
d_x(d_x>Lx2)=Lx-d_x(d_x>Lx2);
d_y=abs(repmat(1:Ly,Ly,1)-repmat((1:Ly)',1,Ly));
d_y(d_y>Ly2)=Ly-d_y(d_y>Ly2);
for l=1:Ly
for k=1:Lx
for j=1:Ly
for i=1:Lx
distance(l,k,j,i)=sqrt(d_x(k,i).^2+d_y(l,j).^2);
end
end
end
end
d_x and d_y are just 20x20 matrices and Lx=Ly for trial purposes. It's very slow and obviously not a very elegant way of doing it. I tried to vectorize the nested loops and succeeded in getting rid of the two inner loops as:
dx2=zeros(Ly,Lx,Ly,Lx);
dy2=zeros(Ly,Lx,Ly,Lx);
distance=zeros(Ly,Lx,Ly,Lx);
for l=1:Ly
for k=1:Lx
dy2(l,k,:,:)=repmat(d_y(l,:),Ly,1);
dx2(l,k,:,:)=repmat(d_x(k,:)',1,Lx);
end
end
distance=sqrt(dx2.^2+dy2.^2);
which basically replaces the 4 for-loops above. I've now been trying for 2 days but I couldn't find a way to vectorize all the loops. I wanted to ask:
whether it's possible to actually get rid of these 2 loops
if so, i'd appreciate any tips and tricks to do so.
I have so far tried using repmat again in 4 dimensions, but you can't transpose a 4 dimensional matrix so I tried using permute and repmat together in many different combinations to no avail.
Any advice will be greatly appreciated.
thanks for the replies. Sorry for the bad wording, what I basically want is to have a population of oscillators uniformly located on the x-y plane. I want to simulate their coupling and the coupling function is a function of the distance between every oscillator. And every oscillator has an x and a y coordinate, so i need to find the distance between osci(1,1) and osci(1,1),..osci(1,N),osci(2,1),..osci(N,N)... and then the same for osci(1,2) and osci(1,1)...osci(N,N) and so on.. (so basically the distance between all oscillators and all other oscillators plus the self-coupling) if there's an easier way to do it other than using a 4-D array, i'd also definitely like to know it..
If I understand you correctly, you have oscillators all over the place, like this:
Then you want to calculate the distance between oscillator 1 and oscillators 1 through 100, and then between oscillator 2 and oscillators 1 through 100 etc. I believe that this can be represented by a 2D distance matrix, were the first dimension goes from 1 to 100, and the second dimension goes from 1 to 100.
For example
%# create 100 evenly spaced oscillators
[xOscillator,yOscillator] = ndgrid(1:10,1:10);
oscillatorXY = [xOscillator(:),yOscillator(:)];
%# calculate the euclidean distance between the oscillators
xDistance = abs(bsxfun(#minus,oscillatorXY(:,1),oscillatorXY(:,1)')); %'# abs distance x
xDistance(xDistance>5) = 10-xDistance; %# add periodic boundary conditions
yDistance = abs(bsxfun(#minus,oscillatorXY(:,2),oscillatorXY(:,2)')); %'# abs distance y
yDistance(yDistance>5) = 10-yDistance; %# add periodic boundary conditions
%# and we get the Euclidean distance
euclideanDistance = sqrt(xDistance.^2 + yDistance.^2);
I find that imaginary numbers can sometimes help convey coupled information quite well while reducing clutter. My method will double the number of calculations necessary (i.e. I find the distance X and Y then Y and X), and I still need a single for loop
x = 1:20;
y = 1:20;
[X,Y] = meshgrid(x,y);
Z =X + Y*i;
z = Z(:);
leng = length(z);
store = zeros(leng);
for looper = 1:(leng-1)
dummyz = circshift(z,looper);
store(:,looper+1) = z - dummyz;
end
final = abs(store);