Plotting Jaccard Index against spatial data (lat-long) - vegan

I am trying to find a way to plot a matrix (In this case a matrix with jaccard indices) against spatial distances (I have latitude and longitude data). I have been told to use the "geosphere" package but I haven't been able to fully understand how to use it.
So if anyone here is well versed in doing such things, please help me out
kind regards

Package geosphere is indeed a good choice as it is based on ellipsoid instead of a sphere and gives more accurate results than many other alternatives (and directly in metres). However, it is more tedious to use as it only calculates a distance between two points, or in a matrix, the track lengths from point to the next point instead of the full matrix. The following is an easy way that does a lot of unnecessary calculation that you throw away, but it is much simpler than alternatives (and sufficiently fast for any practical purpose).
Assume you have matrix x of dimensions N times 2, where the two columns are the longitude and latitude (in this order) in decimal degrees and N is the number of observations:
library(geosphere)
N <- NROW(x)
geodists <- matrix(0, N, N)
for (i in 1:N) for(j in 1:N) geodists[i,j] <- distGeo(x[i,], x[j,])
## alternative for only lower diagonal:
## for(j in 1:(N-1)) for(i in (j+1):N) geodists[i,j] <- distGeo(x[i,], x[j,])
geodists <- as.dist(geodists)
The geodists will then be arranged similarly as the Jaccard distances (assuming you used vegan or other package that returns standad dist structures) and these can be directly plotted against each other. If you used some package that gives you a matrix (which is symmetric and has zero diagonal), it is best to change the result to distances (as.dist()) which only have the lower triangle without the zero diagonal as these give nicer plots.
Package sp uses also WGS84 ellipsoid, but its results differ little (less than 0.01% in my test for distances up to 2500km) from those given by the geosphere. It may be that geosphere is more accurate (and it also allows alternatives to WGS84 ellipsoid). However, sp is much easier to use, and will give you directly the symmetric matrix of distances (but in kilometres instead of metres, though I claimed so in my comment), and you can do with one command:
library(sp)
geodists <- as.dist(spDists(x, longlat=TRUE))*1000 # in metres

Related

Finding length between a lot of elements

I have an image of a cytoskeleton. There are a lot of small objects inside and I want to calculate the length between all of them in every axis and to get a matrix with all this data. I am trying to do this in matlab.
My final aim is to figure out if there is any axis with a constant distance between the object.
I've tried bwdist and to use connected components without any luck.
Do you have any other ideas?
So, the end goal is that you want to globally stretch this image in a certain direction (linearly) so that the distances between nearest pairs end up the closest together, hopefully the same? Or may you do more complex stretching ? (note that with arbitrarily complex one you can always make it work :) )
If linear global one, distance in x' and y' is going to be a simple multiplication of the old distance in x and y, applied to every pair of points. So, the final euclidean distance will end up being sqrt((SX*x)^2 + (SY*y)^2), with SX being stretch in x and SY stretch in y; X and Y are distances in X and Y between pairs of points.
If you are interested in just "the same" part, solution is not so difficult:
Find all objects of interest and put their X and Y coordinates in a N*2 matrix.
Calculate distances between all pairs of objects in X and Y. You will end up with 2 matrices sized N*N (with 0 on the diagonal, symmetric and real, not sure what is the name for that type of matrix).
Find minimum distance (say this is between A an B).
You probably already have this. Now:
Take C. Make N-1 transformations, which all end up in C->nearestToC = A->B. It is a simple system of equations, you have X1^2*SX^2+Y1^2*SY^2 = X2^2*SX^2+Y2*SY^2.
So, first say A->B = C->A, then A->B = C->B, then A->B = C->D etc etc. Make sure transformation is normalized => SX^2 + SY^2 = 1. If it cannot be found, the only valid transformation is SX = SY = 0 which means you don't have solution here. Obviously, SX and SY need to be real.
Note that this solution is unique except in case where X1 = X2 and Y1 = Y2. In this case, grab some other point than C to find this transformation.
For each transformation check the remaining points and find all nearest neighbours of them. If distance is always the same as these 2 (to a given tolerance), great, you found your transformation. If not, this transformation does not work and you should continue with the next one.
If you want a transformation that minimizes variations between distances (but doesn't require them to be nearly equal), I would do some optimization method and search for a minimum - I don't know how to find an exact solution otherwise. I would pick this also in case you don't have linear or global stretch.
If i understand your question correctly, the first step is to obtain all of the objects center of mass points in the image as (x,y) coordinates. Then, you can easily compute all of the distances between all points. I suggest taking a look on a histogram of those distances which may provide some information as to the nature of distance distribution (for example if it is uniformly random, or are there any patterns that appear).
Obtaining the center of mass points is not an easy task, consider transforming the image into a binary one, or some sort of background subtraction with blob detection or/and edge detector.
For building a histogram you can use histogram.

Kullback Leibler Divergence of 2 Histograms in MatLab

I would like a function to calculate the KL distance between two histograms in MatLab. I tried this code:
http://www.mathworks.com/matlabcentral/fileexchange/13089-kldiv
However, it says that I should have two distributions P and Q of sizes n x nbins. However, I am having trouble understanding how the author of the package wants me to arrange the histograms. I thought that providing the discretized values of the random variable together with the number of bins would suffice (I would assume the algorithm would use an arbitrary support to evaluate the expectations).
Any help is appreciated.
Thanks.
The function you link to requires that the two histograms passed be aligned and thus have the same length NBIN x N (not N X NBIN), that is, if N>1 then the number of rows in the inputs should be equal to the number of bins in the histograms. If you are just going to compare two histograms (that is if N=1) it doesn't really matter, you can pass either row or column vector versions of these as long as you are consistent and the order of bins matches.
A generic call to the function looks like this:
dists = kldiv(bins,P,Q)
The implementation allows comparison of multiple histograms to each other (that is, N>1), in which case pairs of columns (with matching column index) in each array are compared and the result is a row vector with distances for each matching pair.
Array bins should be the same size as P and Q and is used to perform a very minimal check that the inputs are of the same size, but is not used in the computation. The routine expects bins to contain the numeric labels of your bins so that it can check for repeated bin labels and warn you if repeats occur, but otherwise doesn't use the information.
You could do away with bins and compute the distance with
KL = sum(P .* (log2(P)-log2(Q)));
without using the Matlab Central versions. However the version you link to performs the abovementioned minimal checks and in addition allows computation of two alternative distances (consult the documentation).
The version linked to by eigenchris checks that no histogram bins are empty (which would make the computation blow up numerically) and if there are, removes their contribution to the sum (not sure this is entirely appropriate - consult an expert on the subject). It should probably also be aware of the exact form of the formula, specifically note the use of log2 above versus natural logarithm in the version linked to by eigenchris.

How can I generate a set of n dimensional vectors that contains all integer points in an n-dimensional rectangular prism

Okay, so I'm working on a problem related to quantum chaos and one of the things I need to do is to map the unit cube in n-dimensions to a parallelepiped in n-dimensions and find all integer points in the interior of this parallelepiped. I have been trying to do this using the following scheme:
Given the linear map B and the dimension of the cube n, we find the coordinates of the corners of the unit hypercube by converting numbers j from 0 to (2^n -1) into their binary representation and turning them into vectors that describe the vertices of the cube.
The next step was to apply the map B to each of these vectors, which gives me a set of 2^n vectors describing the coordinates of the vertices of the parallelepiped in n dimensions
Now, we take the maximum and minimum value attained by any of these vertices in each coordinate direction, i.e the first element of my vectors might have a maximum value of 4 across all of the vertices and a minimum value of -3 etc. This gives me an n-dimensional rectangular prism that contains my parallelepiped and some extra unwanted space.
I now find all points with integer coordinates in this bounding rectangular prism described as vectors in n dimensions
Finally, I apply the inverse of the map B to each of the points and throw away any points that have any coefficients greater than 1 as they must originally have lain outside my unit hypercube.
My issue arises in step 4, I'm struggling to come up with a way of generating all vectors with integer coordinates in my rectangular hyper-prism such that I can change the number of dimensions n on the fly. Ideally, i'd like to be able to increase n at will until it becomes too computationally heavy to do so, but every method of finding all integer points in the prism i've tried so far has relied on n for loops to permute each element and thus I need to rewrite the code every time.
So I guess my question is this, is there any way to code this up so that I can change n on the fly? Also, any thoughts on the idea of the algorithm itself would be appreciated :) It wouldn't surprise me if i've overcomplicated things massively...
EDIT:
Of course as soon as I post the question I see a lovely little link in the side-bar where a clever method has been given already for how to do this: Generate a matrix containing all combinations of elements taken from n vectors
I'll leave this up for the moment just in case anyone has any comments on the method in general, but otherwise (since I can't upvote yet I'll just say it here) Luis Mendo, you are a hero!

How to find the nearest points to given coordinates with MATLAB?

I need to solve a minimization problem with Matlab and I'm wondering which is the easiest solution. All the potential solutions that I've been thinking in require lot of programming effort.
Suppose that I have a lat/long coordinate point (A,B), what I need is to search for the nearest point to this one in a map of lat/lon coordinates.
In particular, the latitude and longitude arrays are two matrices of 2030x1354 elements (1km distance) and the idea is to find the unique indexes in those matrices that minimize the distance to the coordinates (A,B), i.e., to find the closest values to the given coordinates (A,B).
Any help would be very appreciated.
Thanks!
This is always a fun one :)
First off: Mohsen Nosratinia's answer is OK, as long as
you don't need to know the actual distance
you can guarantee with absolute certainty that you will never go near the polar regions
and will never go near the ±180° meridian
For a given latitude, -180° and +180° longitude are actually the same point, so simply looking at differences between angles is not sufficient. This will be more of a problem in the polar regions, since large longitude differences there will have less of an impact on the actual distance.
Spherical coordinates are very useful and practical for purposes of navigation, mapping, and that sort of thing. For spatial computations however, like the on-surface distances you are trying to compute, spherical coordinates are actually pretty cumbersome to work with.
Although it is possible to do such calculations using the angles directly, I personally don't consider it very practical: you often have to have a strong background in spherical trigonometry, and considerable experience to know its many pitfalls -- very often there are instabilities or "special points" you need to work around (the poles, for example), quadrant ambiguities you need to consider because of trig functions you've introduced, etc.
I've learned to do all this in university, but I also learned that the spherical trig approach often introduces complexity that mathematically speaking is not strictly required, in other words, the spherical trig is not the simplest representation of the underlying problem.
For example, your distance problem is pretty trivial if you convert your latitudes and longitudes to 3D Cartesian X,Y,Z coordinates, and then find the distances through the simple formula
distance (a, b) = R · arccos( a/|a| · b/|b| )
where a and b are two such Cartesian vectors on the sphere. Note that |a| = |b| = R, with R = 6371 the radius of Earth.
In MATLAB code:
% Some example coordinates (degrees are assumed)
lon = 360*rand(2030, 1354);
lat = 180*rand(2030, 1354) - 90;
% Your point of interest
P = [4, 54];
% Radius of Earth
RE = 6371;
% Convert the array of lat/lon coordinates to Cartesian vectors
% NOTE: sph2cart expects radians
% NOTE: use radius 1, so we don't have to normalize the vectors
[X,Y,Z] = sph2cart( lon*pi/180, lat*pi/180, 1);
% Same for your point of interest
[xP,yP,zP] = sph2cart(P(1)*pi/180, P(2)*pi/180, 1);
% The minimum distance, and the linear index where that distance was found
% NOTE: force the dot product into the interval [-1 +1]. This prevents
% slight overshoots due to numerical artifacts
dotProd = xP*X(:) + yP*Y(:) + zP*Z(:);
[minDist, index] = min( RE*acos( min(max(-1,dotProd),1) ) );
% Convert that linear index to 2D subscripts
[ii,jj] = ind2sub(size(lon), index)
If you insist on skipping the conversion to Cartesian and use lat/lon directly, you'll have to use the Haversine formula, as outlined on this website for example, which is also the method used by distance() from the mapping toolbox.
Now, all of this is valid for the whole Earth, provided you find the smooth spherical Earth accurate enough an approximation. If you want to include the Earth's oblateness or some higher order shape model (or God forbid, distances including terrain), you need to do far more complicated stuff. But I don't think that is your goal here :)
PS - I wouldn't be surprised that if you would write everything out that I did, you'll probably re-discover the Haversine formula. I just prefer to be able to calculate something as simple as distances along the sphere from first principles alone, rather than from some black box formula you had implanted in your head sometime long ago :)
Let Lat and Long denote latitude and longitude matrices, then
dist2=sum(bsxfun(#minus, cat(3,A,B), cat(3,Lat,Long)).^2,3);
[I,J]=find(dist2==min(dist2(:)));
I and J contain the indices in A and B that correspond to nearest point. Note that if there are multiple answers, I and J will not be scalar values, but vectors.

Using triplequad to calculate density (in Matlab)

As i've explained in a previous question: I have a dataset consisting of a large semi-random collection of points in three dimensional euclidian space. In this collection of points, i am trying to find the point that is closest to the area with the highest density of points.
As high performance mark answered;
the most straightforward thing to do would be to divide your subset of
Euclidean space into lots of little unit volumes (voxels) and count
how many points there are in each one. The voxel with the most points
is where the density of points is at its highest. Perhaps initially
dividing your space into 2 x 2 x 2 voxels, then choosing the voxel
with most points and sub-dividing that in turn until your criteria are
satisfied.
Mark suggested i use triplequad for this, but this is not a function i am familiar with, or understand very well. Does anyone have any pointers on how i could go about using this function in Matlab for what i am trying to do?
For example, say i have a random normally distributed matrix A = randn([300,300,300]), how could i use triplequad to find the point i am looking for? Because as i understand currently, i also have to provide triplequad with a function fun when using it. Which function should that be for this problem?
Here's an answer which doesn't use triplequad.
For the purposes of exposition I define an array of data like this:
A = rand([30,3])*10;
which gives me 30 points uniformly distributed in the box (0:10,0:10,0:10). Note that in this explanation a point in 3D space is represented by each row in A. Now define a 3D array for the counts of points in each voxel:
counts = zeros(10,10,10)
Here I've chosen to have a 10x10x10 array of voxels, but this is just for convenience, it would be only a little more difficult to have chosen some other number of voxels in each dimension, and there don't have to be the same number of voxels along each axis. Then the code
for ix = 1:size(A,1)
counts(ceil(A(ix,1)),ceil(A(ix,2)),ceil(A(ix,3))) = counts(ceil(A(ix,1)),ceil(A(ix,2)),ceil(A(ix,3)))+1
end
will count up the number of points in each of the voxels in counts.
EDIT
Unfortunately I have to do some work this afternoon and won't be able to get back to wrestling with the triplequad solution until later. Hope this is OK in the meantime.