How to calculate the near neighbour for high-dimensional data(especially for 15000 dims data)? - distance

How to calculate the near neigbours for high-dimensional data(especially for 15000 dims data)?
----I was confirmed that euclidean distance is not an efficient way to represent the distance between high dimensional datas. Are there any better ideas to calculate the distance for high-dim data?

Related

how to calculate and plot the fast fourier transform of sla data (3 d matrix) in matlab?

I want to check for the existent dominant peaks in sla data(3 d matrix lon, lat ,time) from which I can delineate what period to choose so that I can filter this data and look for waves propagating in the region. My data is for all lat and lon(global data). However, I want to average over a lon range of 70E to 100E and after that average over 10S to 10N. This average will give a a vector only in time dimension over which I can perform fast fourier transform. Then I want to plot this with time as the x axis. I'm new to matlab, any help is appreciated. Also if anyone has done this before it would be nice if you told me if my logic is correct or not.

Stability of pose estimation using n points

I am using chessboard to estimate translation vector between it and the camera. Firstly, the intrinsic camera parameters are calculated, then translation vector are estimated using n points detected from the chessboard.
I found a very strange phenomenon: the translation vector is accurate and stable when using more points in the chessboard, and such phenomenon is more obvious when the distance is farer. For instance, the square in the chessboard is 1cm*1cm, when the distance is 3m, translation vector is accurately estimated when using 25 points, while it is inaccurate and unstable using the minimal 4 points. However, when the distance is 0.6m, estimation results of translation vector using 4 points and 25 points are similar, which are all accurate.
How to explain this phenomenon (in theory)? what's the relationship between stable estimation result and distance, and number of points?
Thanks.
When you are using a smaller number of points, the calculation of the translation vector is more sensitive to the noise in coordinates of those points. Point coordinates are noisy due to a finite resolution of the camera (among other things). A that noise only increases with distance. So using a larger number of points should provide for a better estimation.

Spectral clustering distance/similarity

All papers about spectral clustering use similarity matrix as the input to spectral clustering algorithm.
Is it also possible to use pairwise distance matrix? I haven't seen any version of spectral clustering code which would use parwise distance.
I am implementing spectral clustering in matlab and it has the function pdist and the output of this function is pairwise distance matrix.
Similarity or Affinity Matrix gives an idea about the closeness of these data points with respect to each other. Distance on the other hand gives the measure of dis-similarity w.r.t each other. The easiest and most frequently used way of using pairwise distances for Similarity Matrix is to use a Gaussian kernel to get the affinity measure.
For points a and b, let D = pdist(a,b) give you the pairwise distance. Then the similarity for your matrix can be obtained as sim_ab = exp-(D/f) where f is a scaling factor.

Calculating Jaccard distance of a large matrix in Matlab

I have a large matrix of size 40K * 900K. It is a sparse, binary matrix and I would like to calculate the Jaccard distance between its rows (40K by 40K Jacard distance in total). I'm aware of built-in function pdist which calculates ths similarity for me, but due to matrix size it seems like it can't and it shows me the following error message.
Matrix is too large to convert to linear index.
Error in ==> pdist at 139
elseif any(imag(X(:)))
Any suggestion on how to resolve this problem?

Find measurement which fits corresponds to same variable at other locations

If I have wind speed measurements for 4 different locations within a geographical radius of approximately 400km for one year, is there a method for determining which wind speed measurement best fits all of the location i.e. does one of the locations have a similar wind speed to all other locations? Can this be achieved?
I suppose you could find the one that provides minimum e.g. quadratic loss from all the others:
speeds is an N-by-4 matrix, with N windspeed measurements for each location
%Loss will find the squared loss for location i. It subtracts column i from each column in speeds, and squares this difference (for column i, this will always be 0). Then average over all rows and the 3 non-zero columns.
loss = #(i)sum(mean((speeds - repmat(speeds(:, i), 1, 4)).^2)) ./ 3;
%Apply loss to each of the 4 locations, find the minimum.
[v i] = min(arrayfun(loss, 1:4));
The loss function takes the average squared difference between each windspeed and the speeds at all other locations. Then we use arrayfun to calculate this loss for each location.