Being neither great at math nor coding, I am trying to understand the output I am getting when I try to calculate the linear distance between pairs of 3D points. Essentially, I have the 3D points of a bird that is moving in a confined area towards a stationary reward. I would like to calculate the distance of the animal to the reward at each point. However, when looking online for the best way to do this, I tried several options and get different results that I'm not sure how to interpret.
Example data:
reward = [[0.381605200000000,6.00214980000000,0.596942400000000]];
animal_path = = [2.08638710671220,-1.06496059617432,0.774253689976102;2.06262715454806,-1.01019576900787,0.773933446776898;2.03912411242035,-0.954888684677576,0.773408777383975;2.01583648760496,-0.898935333316342,0.772602855030873];
distance1 = sqrt(sum(([animal_path]-[reward]).^2));
distance2 = norm(animal_path - reward);
distance3 = pdist2(animal_path, reward);
Distance 1 gives 3.33919107083497 13.9693378592353 0.353216791787775
Distance 2 gives 14.3672145652704
Distance 3 gives 7.27198528565078
7.21319284516199
7.15394253573951
7.09412041863743
Why do these all yield different values (and different numbers of values)? Distance 3 seems to make the most sense for my purposes, even though the values are too large for the dimensions of the animal enclosure, which should be something like 3 or 4 meters.
Can someone please explain this in simple terms and/or point me to something less technical and jargon-y than the Matlab pages?
There are many things mathematicians call distance. What you normally associate with distance is the eucledian distance. This is what you want in this situation. The length of the line between two points. Now to your problem. The Euclidean distance distance is also called norm (or 2-norm).
For two points you can use the norm function, which means with distance2 you are already close to a solution. The problem is only, you input all your points at once. This does not calculate the distance for each point, instead it calculates the norm of the matrix. Something of no interest for you. This means you have to call norm once for each row point on the path:
k=nan(size(animal_path,1),1)
for p=1:size(animal_path,1),
k(p)=norm(animal_path(p,:) - reward);
end
Alternatively you can follow the idea you had in distance1. The only mistake you made there, you calculated the sum for each column, where the sum of each row was needed. Simple fix, you can control this using the second input argument of sum:
distance1 = sqrt(sum((animal_path-reward).^2,2))
Related
We have set of (x,y) coordinates in matrix form. we want to fit a curve through those points. but it does not take points in sequence(nearest point) rather it takes the points arranged according to the sequence arranged in matrix
I think #mathematician1975 is quite right, you do not need the points arranged in sequence before do curve fitting, at least for OLS. http://en.wikipedia.org/wiki/Ordinary_least_squares If you do want the sequence, sort the coordinates by variable is fine. In both case, you do not need the nearest point.
The question is quite unclear but based on your comments, you could simply create a new matrix of ordered points by deciding on a start point and using the Euclidean norm to identify the nearest neighbour and add that to your new matrix. Continue this for each point (of course making sure that you exclude all previous points from the nearest neighbour evaluation at each step) until you have just one point left.
Once this is done you have your points in the order you want and can fit your parametric curve accordingly.
It is quite possible that there is a magic command in matlab that will do this for you but if there is I am not aware of it. If there is I am sure you will get an answer telling you so in due course.
If not you can still obtain what (I think) you want with loops and the norm function. It may not be optimal but it will give you what you want.
I need to solve a minimization problem with Matlab and I'm wondering which is the easiest solution. All the potential solutions that I've been thinking in require lot of programming effort.
Suppose that I have a lat/long coordinate point (A,B), what I need is to search for the nearest point to this one in a map of lat/lon coordinates.
In particular, the latitude and longitude arrays are two matrices of 2030x1354 elements (1km distance) and the idea is to find the unique indexes in those matrices that minimize the distance to the coordinates (A,B), i.e., to find the closest values to the given coordinates (A,B).
Any help would be very appreciated.
Thanks!
This is always a fun one :)
First off: Mohsen Nosratinia's answer is OK, as long as
you don't need to know the actual distance
you can guarantee with absolute certainty that you will never go near the polar regions
and will never go near the ±180° meridian
For a given latitude, -180° and +180° longitude are actually the same point, so simply looking at differences between angles is not sufficient. This will be more of a problem in the polar regions, since large longitude differences there will have less of an impact on the actual distance.
Spherical coordinates are very useful and practical for purposes of navigation, mapping, and that sort of thing. For spatial computations however, like the on-surface distances you are trying to compute, spherical coordinates are actually pretty cumbersome to work with.
Although it is possible to do such calculations using the angles directly, I personally don't consider it very practical: you often have to have a strong background in spherical trigonometry, and considerable experience to know its many pitfalls -- very often there are instabilities or "special points" you need to work around (the poles, for example), quadrant ambiguities you need to consider because of trig functions you've introduced, etc.
I've learned to do all this in university, but I also learned that the spherical trig approach often introduces complexity that mathematically speaking is not strictly required, in other words, the spherical trig is not the simplest representation of the underlying problem.
For example, your distance problem is pretty trivial if you convert your latitudes and longitudes to 3D Cartesian X,Y,Z coordinates, and then find the distances through the simple formula
distance (a, b) = R · arccos( a/|a| · b/|b| )
where a and b are two such Cartesian vectors on the sphere. Note that |a| = |b| = R, with R = 6371 the radius of Earth.
In MATLAB code:
% Some example coordinates (degrees are assumed)
lon = 360*rand(2030, 1354);
lat = 180*rand(2030, 1354) - 90;
% Your point of interest
P = [4, 54];
% Radius of Earth
RE = 6371;
% Convert the array of lat/lon coordinates to Cartesian vectors
% NOTE: sph2cart expects radians
% NOTE: use radius 1, so we don't have to normalize the vectors
[X,Y,Z] = sph2cart( lon*pi/180, lat*pi/180, 1);
% Same for your point of interest
[xP,yP,zP] = sph2cart(P(1)*pi/180, P(2)*pi/180, 1);
% The minimum distance, and the linear index where that distance was found
% NOTE: force the dot product into the interval [-1 +1]. This prevents
% slight overshoots due to numerical artifacts
dotProd = xP*X(:) + yP*Y(:) + zP*Z(:);
[minDist, index] = min( RE*acos( min(max(-1,dotProd),1) ) );
% Convert that linear index to 2D subscripts
[ii,jj] = ind2sub(size(lon), index)
If you insist on skipping the conversion to Cartesian and use lat/lon directly, you'll have to use the Haversine formula, as outlined on this website for example, which is also the method used by distance() from the mapping toolbox.
Now, all of this is valid for the whole Earth, provided you find the smooth spherical Earth accurate enough an approximation. If you want to include the Earth's oblateness or some higher order shape model (or God forbid, distances including terrain), you need to do far more complicated stuff. But I don't think that is your goal here :)
PS - I wouldn't be surprised that if you would write everything out that I did, you'll probably re-discover the Haversine formula. I just prefer to be able to calculate something as simple as distances along the sphere from first principles alone, rather than from some black box formula you had implanted in your head sometime long ago :)
Let Lat and Long denote latitude and longitude matrices, then
dist2=sum(bsxfun(#minus, cat(3,A,B), cat(3,Lat,Long)).^2,3);
[I,J]=find(dist2==min(dist2(:)));
I and J contain the indices in A and B that correspond to nearest point. Note that if there are multiple answers, I and J will not be scalar values, but vectors.
I have a matrix I am working with which 300x5000 and I wanted to test which distance calculation parameter is the most effective. I got the following results:
'Sqeuclidean' = 17 iterations, total sum of distances = 25175.4
'Correlation' = 9 iterations, total sum of distances = 32.7
'Cityblock' = 34 iterations, total sum of distances = 105175.3
'Cosine' = 11 iterations, total sum of distances = 11.9
I am having trouble understanding why the results vary so much and how to choose the most effective distance parameter. Any advice?
EDIT:
I have 300 features with 5000 instances of each feature.
the function looks like this:
[idx, ctrs, sumd, d] = kmeans(matrix, 25, 'distance', 'cityblock', 'replicate', 20)
with interchanging the distance parameter. The features were already normalized.
Thanks!
As slayton commented, you really need to define what 'best' means for your particular problem.
The only thing that matters is how well the distance function clusters the data. In general, clustering is highly-dependent on the distance function. The two metrics that you've selected (number of iterations, sum of distances) are pretty irrelevant to how well the clustering works.
You need to know what you're trying to achieve with clustering, and you need some metric for how well you've achieved that goal. If there's an objective metric to determine how good your clusters are, then use that. Often, the metric is fuzzier: does this look right when I visualize the data. Look at your data, and look at how each distance function clusters the data. Select the distance function that seems to generate the best clusters. Do this for several subsets of your data, to make sure that your intuition is correct. You should also try to understand the result that each distance function gives you.
Lastly, some problems lend themselves to a particular distance function. If your problem has spatial features, then a Euclidean (geometric) distance is often a natural choice. Other distance functions will perform better for different problems.
Distance values from different
distance functions
data sets
normalizations
are generally not comparable. Simple example from reality: measure distances in "meter" or in "inch", and you get very different results. The result in meters will not be better just because it is measured on a different scale. So you must not compare the variances of different results.
Notice that k-means is meant to be used with euclidean distance only, and may not converge with other distance functions. IMHO, L_p norms should be fine, and on TF-IDF maybe also cosine. But I do not know a proof for that.
Oh, and k-means works really bad with high-dimensional data. It is meant for low dimensionality.
As i've explained in a previous question: I have a dataset consisting of a large semi-random collection of points in three dimensional euclidian space. In this collection of points, i am trying to find the point that is closest to the area with the highest density of points.
As high performance mark answered;
the most straightforward thing to do would be to divide your subset of
Euclidean space into lots of little unit volumes (voxels) and count
how many points there are in each one. The voxel with the most points
is where the density of points is at its highest. Perhaps initially
dividing your space into 2 x 2 x 2 voxels, then choosing the voxel
with most points and sub-dividing that in turn until your criteria are
satisfied.
Mark suggested i use triplequad for this, but this is not a function i am familiar with, or understand very well. Does anyone have any pointers on how i could go about using this function in Matlab for what i am trying to do?
For example, say i have a random normally distributed matrix A = randn([300,300,300]), how could i use triplequad to find the point i am looking for? Because as i understand currently, i also have to provide triplequad with a function fun when using it. Which function should that be for this problem?
Here's an answer which doesn't use triplequad.
For the purposes of exposition I define an array of data like this:
A = rand([30,3])*10;
which gives me 30 points uniformly distributed in the box (0:10,0:10,0:10). Note that in this explanation a point in 3D space is represented by each row in A. Now define a 3D array for the counts of points in each voxel:
counts = zeros(10,10,10)
Here I've chosen to have a 10x10x10 array of voxels, but this is just for convenience, it would be only a little more difficult to have chosen some other number of voxels in each dimension, and there don't have to be the same number of voxels along each axis. Then the code
for ix = 1:size(A,1)
counts(ceil(A(ix,1)),ceil(A(ix,2)),ceil(A(ix,3))) = counts(ceil(A(ix,1)),ceil(A(ix,2)),ceil(A(ix,3)))+1
end
will count up the number of points in each of the voxels in counts.
EDIT
Unfortunately I have to do some work this afternoon and won't be able to get back to wrestling with the triplequad solution until later. Hope this is OK in the meantime.
Please can you help me understand how to calculate the Mean-Squared Displacement for a single particle moving randomly within a given period of time. I have read a lot of articles on this (including Saxton,1991,Single-Particle Tracking: The Distribution of Diffusion Coefficients), but still confused (not getting the right answer).
Let me start by showing you how I do it and please correct me if I'm wrong:
The way I'm doing it is as follows:
1.Run the program from t=0 to t=100
2.Calculate the displacement, (s(t)-s(t+tau)), at each timestep (ie. at t=1,2,3,...100) and store it in a vector
3.Square the answer to number 2
4.find the mean to the answer of 3
In essence, this is what I'm doing in Matlab
%Initialise the lattice with a square consisting of 16 nonzero lattice sites then proceed %as follows to calculate the MSD:
for t=1:tend
% Allow the particle to move randomly in the lattice. Then do the following
[row,col]=find(lattice>0);
centroid=mean([row col]);
xvec=[xvec centroid(2)];
yvec=[yvec centroid(1)];
k=length(xvec)-1; % Time
dt=1;
diffx = xvec(1:k) - xvec((1+dt):(k+dt));
diffy = yvec(1:k) - yvec((1+dt):(k+dt));
xsquare = diffx.^2;
ysquare = diffy.^2;
MSD=mean(xsquare+ysquare);
end
I'm trying to find the MSD in order to compute the diffusion co-efficient. Note that I'm modelling a collection of lattice sites (16) to represent a single particle (more biologically realistic), instead of just one. I have been brief with the comment within the for loop as it is quite long, but I'm happy to send it to you.
So far, I'm getting very small MSD values (in the range of 0.001-1), whereas I'm supposed to get values in the range of (10-50). The particle moves very large distances so surely my range of 0.001-1 cannot be right!
This is an extract from the article which I'm trying to reproduce their figure:
" We began by running some simulations in 1D for a single
cell. We allowed the cell to move for a given number of
Monte Carlo time steps (MCS), worked out the mean square
distance traveled in that time, repeated this process 500
times, and evaluate the mean squared distance for this t.
We then repeated this process ten times to get the mean of
. The reason for this choice of repetitions was to
keep the time required to run the simulations within a reasonable
level yet ensuring that the standard deviation of the
mean was relatively small (<7%)".
You can access the article here "From discrete to a continuous model of biological cell movement, 2004, by Turner et al., Physical Review E".
Any hints are greatly appreciated.
How many dimensions does the particle move along ?
I don't have Matlab right now, but here is how I'd do that over one dimension :
% pos is the vector of positions
delta = pos(2:100) - pos(1:99);
meanSquared = mean(delta .* delta);
First of all, why have a particle cover multiple lattice sites? What counts for MSD, in the end, is the displacement of the centroid, which can be represented as a point. If your particle (or cell) is large, or only takes large steps, you can always just make a wider grid. Also, if you're trying to reproduce a figure from somewhere else, you should really use the same algorithm.
For your Monte Carlo simulation, what do you do? If all you really want is get a displacement, you can generate a bunch of random movement vectors in one go (using rand or randi), and use cumsum to calculate the positions. Also, have you plotted your random walks to make sure the data is sensible?
Then, your code looks a bit funny (see comments). Why don't you just use the code provided in this answer to calculate MSD from the positions?
for t=1:tend
% Allow the particle to move randomly in the lattice. Then do the following
[row,col]=find(lattice>0); %# what do you do this for?
centroid=mean([row col]);
xvec=[xvec centroid(2)];
yvec=[yvec centroid(1)]; %# till here, I have no idea what you want to do
k=length(xvec)-1; % Time %# you should subtract dt here
dt=1; %# dt should depend on t!
diffx = xvec(1:k) - xvec((1+dt):(k+dt));
diffy = yvec(1:k) - yvec((1+dt):(k+dt));
xsquare = diffx.^2;
ysquare = diffy.^2;
MSD=mean(xsquare+ysquare);
end