knnsearch from Matlab to Julia - matlab

I am trying to run a nearest neighbour search in Julia using NearestNeighbors.jl package. The corresponding Matlab code is
X = rand(10);
Y = rand(100);
Z = zeros(size(Y));
Z = knnsearch(X, Y);
This generates Z, a vector of length 100, where the i-th element is the index of X whose element is nearest to the i-th element in Y, for all i=1:100.
Could really use some help converting the last line of the Matlab code above to Julia!

Use:
X = rand(1, 10)
Y = rand(1, 100)
nn(KDTree(X), Y)[1]
The storing the intermediate KDTree object would be useful if you wanted to reuse it in the future (as it will improve the efficiency of queries).
Now what is the crucial point of my example. The NearestNeighbors.jl accepst the following input data:
It can either be:
a matrix of size nd × np with the points to insert in the tree where nd is the dimensionality of the points and np is the number of points
a vector of vectors with fixed dimensionality, nd, which must be part of the type.
I have used the first approach. The point is that observations must be in columns (not in rows as in your original code). Remember that in Julia vectors are columnar, so rand(10) is considered to be 1 observation that has 10 dimensions by NearestNeighbors.jl, while rand(1, 10) is considered to be 10 observations with 1 dimension each.
However, for your original data since you want a nearest neighbor only and it is single-dimensional and is small it is enough to write (here I assume X and Y are original data you have stored in vectors):
[argmin(abs(v - y) for v in X) for y in Y]
without using any extra packages.
The NearestNeighbors.jl is very efficient for working with high-dimensional data that has very many elements.

Related

What is the reverse process of the repmat or repemel command?

I have a matrix of 50-by-1 that is demodulated data. As this matrix has only one element in each row, I want to repeat this only element 16 times in each row so the matrix become 50 by 16. I did it using the repmat(A,16) command in Matlab. Now at receiving end noise is also added in matrix of 50 by 16. I want to get it back of 50 by 1 matrix. How can I do this?
I tried averaging of all rows but it is not a valid method. How can I know when an error is occurring in this process?
You are describing a problem of the form y = A * x + n, where y is the observed data, A is a known linear transform, and n is noise. The least squares estimate is the simplest estimate of the unknown vector x. The keys here are to express the repmat() function as a matrix and the observed data as a vector (i.e., a 50*16x1 vector rather than a 50x16 matrix).
x = 10 * rand(50,1); % 50x1 data vector;
A = repmat(eye(length(x)),[16,1]); % This stacks 16 replicas of x.
n = rand(50*16,1); % Noise
y = A * x + n; % Observed data
xhat = A \ y; % Least squares estimate of x.
As for what the inverse (what I assume you mean by 'reverse') of A is, it doesn't have one. If you look at its rank, you'll see it is only 50. The best you can do is to use its pseudoinverse, which is what the \ operator does.
I hope the above helps.

Matlab calculating nearest neighbour distance for all (u, v) vectors in an array

I am trying to calculate the distance between nearest neighbours within a nx2 matrix like the one shown below
point_coordinates =
11.4179 103.1400
16.7710 10.6691
16.6068 119.7024
25.1379 74.3382
30.3651 23.2635
31.7231 105.9109
31.8653 36.9388
%for loop going from the top of the vector column to the bottom
for counter = 1:size(point_coordinates,1)
%current point defined selected
current_point = point_coordinates(counter,:);
%math to calculate distance between the current point and all the points
distance_search= point_coordinates-repmat(current_point,[size(point_coordinates,1) 1]);
dist_from_current_point = sqrt(distance_search(:,1).^2+distance_search(:,2).^2);
%line to omit self subtraction that gives zero
dist_from_current_point (dist_from_current_point <= 0)=[];
%gives the shortest distance calculated for a certain vector and current_point
nearest_dist=min(dist_from_current_point);
end
%final line to plot the u,v vectors and the corresponding nearest neighbour
%distances
matnndist = [point_coordinates nearest_dist]
I am not sure how to structure the 'for' loop/nearest_neighbour line to be able to get the nearest neighbour distance for each u,v vector.
I would like to have, for example ;
for the first vector you could have the coordinates and the corresponding shortest distance, for the second vector another its shortest distance, and this goes on till n
Hope someone can help.
Thanks
I understand you want to obtain the minimum distance between different points.
You can compute the distance for each pair of points with bsxfun; remove self-distances; minimize. It's more computationally efficient to work with squared distances, and take the square root only at the end.
n = size(point_coordinates,1);
dist = bsxfun(#minus, point_coordinates(:,1), point_coordinates(:,1).').^2 + ...
bsxfun(#minus, point_coordinates(:,2), point_coordinates(:,2).').^2;
dist(1:n+1:end) = inf; %// remove self-distances
min_dist = sqrt(min(dist(:)));
Alternatively, you could use pdist. This avoids computing each distance twice, and also avoids self-distances:
dist = pdist(point_coordinates);
min_dist = min(dist(:));
If I can suggest a built-in function, use knnsearch from the statistics toolbox. What you are essentially doing is a K-Nearest Neighbour (KNN) algorithm, but you are ignoring self-distances. The way you would call knnsearch is in the following way:
[idx,d] = knnsearch(X, Y, 'k', k);
In simple terms, the KNN algorithm returns the k closest points to your data set given a query point. Usually, the Euclidean distance is the distance metric that is used. For MATLAB's knnsearch, X is a 2D array that consists of your dataset where each row is an observation and each column is a variable. Y would be the query points. Y is also a 2D array where each row is a query point and you need to have the same number of columns as X. We would also specify the flag 'k' to denote how many closest points you want returned. By default, k = 1.
As such, idx would be a N x K matrix, where N is the total number of query points (number of rows of Y) and K would be those k closest points to the dataset for each query point we have. idx indicates the particular points in your dataset that were closest to each query. d is also a N x K matrix that returns the smallest distances for these corresponding closest points.
As such, what you want to do is find the closest point for your dataset to each of the other points, ignoring self-distances. Therefore, you would set both X and Y to be the same, and set k = 2, discarding the first column of both outputs to get the result you're looking for.
Therefore:
[idx,d] = knnsearch(point_coordinates, point_coordinates, 'k', 2)
idx = idx(:,2);
d = d(:,2);
We thus get for idx and d:
>> idx
idx =
3
5
1
1
7
3
5
>> d
d =
17.3562
18.5316
17.3562
31.9027
13.7573
20.4624
13.7573
As such, this tells us that for the first point in your data set, it matched with point #3 the best. This matched with the closest distance of 17.3562. For the second point in your data set, it matched with point #5 the best with the closest distance being 18.5316. You can continue on with the rest of the results in a similar pattern.
If you don't have access to the statistics toolbox, consider reading my StackOverflow post on how I compute KNN from first principles.
Finding K-nearest neighbors and its implementation
In fact, it is very similar to Luis Mendo's post to you earlier.
Good luck!

Matlab-Select particular values in a matrix

I am a beginner in matlab and I have a particular z matrix of size m×1 with values 0,1,3,5,2 etc..with above values repeating. Now I have 4 other column matrix x1,x2,x3 and y and I want to do regression.
I have used lm = LinearModel.fit(x,y,'linear') specifying columns.Now I want to do regression only for values in matrix x1,x2,x3 and y for those corresponding to z matrix with value of 1 and neglect the other rows.How do I do it?
That's very simple. I'm going to assume that your matrix of predictor variables and outputs are also of size m (number of samples). All you have to do is find the locations within z that are 1, subset your 3 column matrix of x1,x2,x3 and y, then use LinearModel.fit to fit your data. Assuming your matrix of predictors is stored in X, and your outputs are stored in y, you would do this:
ind = z == 1;
xOut = X(ind,:);
yOut = y(ind);
lm1 = LinearModel.fit(xOut, yOut, 'linear');
BTW, these are very simple subsetting operations in MATLAB. Suggest you read a tutorial before asking any further questions here.

Distance matrix calculation

I hope you will have the right answer to this question, which is rather challanging.
I have two uni-dimensional vectors Y and Z, which hold the coordinates of N grid points located on a square grid. So
Ny points along Y
Nz points along Z
N = Ny*Nz
Y = Y[N]; (Y holds N entries)
Z = Z[N]; (Z holds N entries)
Now, the goal would be generating the distance matrix D, which holds N*N entries: so each row of matrix D is defined as the distance between the i-th point on the grid and the remnant (N - i) points.
Generally, to compute the whole matrix I would call
D = squareform(pdist([Y Z]));
or,equivalently,
D = pdist2([Y Z],[Y Z]).
But, since D is a symmetric matrix, I'd like to generate only the N(N + 1)/2 independent entries and store them into a row-ordered vector Dd.
So the question is: how to generate a row-ordered array Dd whose entries are defined by the lower triangular terms of matrix D? I'd, furthermore, like storing the entries in a column-major order.
I hope the explanation is clear enough.
As woodchips commented, it is simpler and faster to compute the whole matrix and extract the elements you care about. Here's a way to do it:
ndx = find(tril(true(size(D))));
Dd = D(ndx);
If you absolutely must compute elements of Dd without computing matrix D first, you probably need a double for loop.

Matlab - relational matricies?

I have a vector of coordinates called x. I want to get the element(s) with the min y coordinate:
a = find(x(:,2)==min(x(:,2))); % Contains indices
This returns the indexes of the elements with the smallest y coordinates. I say element*s* because sometimes this would return more than 1 value (e.g. (10,2) and (24,2) both have 2 as y coordinate and if 2 is the min y coordinate...).
Anyway, my next step is to sort (ascending) the elements with the min y coordinates according to their x coordinates. First I do:
b = sort(x(a,1));
The above operation might rearrange the elements with min y coordinates so I want to apply this rearrangement to a as well. So I do:
[v i] = ismember(b, x(:, 1));
Unfortunately, if there are elements with the same x value but different y values and one of these elements turns out to be a member of a (i.e. b) then the above matrix chooses it. For example if (10,2) and (24,2) are the elements with smallest y coordinates and there is a 3rd element (24, 13) then it will mess up the above operation. Is there a better way? I wrote my script using loops and everything was fine but in line with Matlab's methodology I rewrote it and I fear my unfamiliarity with matlab is causing this error.
Sorry, I might have misunderstood your question but lemme rephrase what I think you want here:
You have a set of 2D coordinates:
x = [24,2; 10,2; 24,13];
You want the pairs of coordinates to stay together (24,2) (10,2) and (24,13). And you want to find the pairs of coordinates that has the min y-coordinate and if there are multiples, then you want to sort them by x-coordinate. And you want the row indices of what those coordinate pairs were in the original matrix x. So in other words, you want a final answer of:
v = [10,2; 24,2];
i = [2,1];
If I understood correctly, then this is how you can do it:
(Note: I changed x to have one more pair (40,13) to illustrate the difference between idx(i) and i)
>> x = [40,13; 24,2; 10,2; 24,13];
>> idx = find(x(:,2)==min(x(:,2))) %Same as what you've done before.
idx =
2
3
>> [v,i] = sortrows(x(idx,:)) %Use sortrows to sort by x-coord while preserving pairings
v =
10 2
24 2
i = % The indices in x(idx,:)
2
1
>> idx(i) %The row indices in the original matrix x
ans =
3
2
And finally, if this is not what you wanted, can you indicate what you think your answer [v,i] should be in the example you gave?