Using TSP package in R to calculate the shortest distance through multiple points without forming a closed loop - distance

I have multiple latitudinal and longitudinal points for which I have generated a distance matrix. I use the TSP function to calculate the minimum distance connecting all the points. I have also added the dummy variable since I didn't want the closed-loop. This exercise works fine for less number of data points, however, when there are >20 data points, I get different values of the distance every time I rerun the code. This is because the way the points are being connected to find the distance differs every time I run the code. I have also tried to keep the starting and ending city points constant but the issue persists.
tsp=TSP(dat4) ## dat4 is the matrix of distance between the cities
tsp1=insert_dummy(tsp,label="cut")#create a dummy variable to avoid forming closed loop
tour1=solve_TSP(tsp1,method = "repetitive_nn",label="cut")
dis=tour_length(tour1)
plot(dat4)
lines(dat4[cut_tour(tour1,cut="cut"),])

Related

Nearest neighbor analysis that provides me a distribution of adjacent values?

I am trying to perform a nearest neighbor type of analysis on an array based on a given set of (x,y) coordinate indexes (NN.HEME_indices). The array on which I am evaluating is a 142x128 double that has a number of positive values throughout (ATV_ng_g_raw). In this work, I’m trying to limit this nearest neighbor analysis to within 10 pixels of the indexes (including 0 and 10).
My goal is to retrieve two histograms from this analysis. One graph is where the x-axis is the number of pixels away from the indexes (from 0 to 10, inclusive) and the y-axis is the average of the non-zero numbers at each distance. The second histogram is the same but the y-axis is the average of all numbers (including the 0’s) at each distance.
I’ve placed a .mat file (“NN_04062021.mat) here. This includes:
“NN” data structure – contains the coordinate indeces that serve as the central points, (HEME_indices; 2046 x 2 double). This was derived from a separate 142x128 double
ATV_ng_g_raw – contains the array on which I would be performing the nearest neighbor analysis (142x128 double)
Anyone have any thoughts? Happy to clarify further or provide additional workspace variables if needed. Thank you!
yname='yname';
xname='xname';
indexname='raw_index';
[NN.yname,NN.xname]=find(ATV_ng_g_raw);
NN.(indexname)=[NN.(xname),NN.(yname)];
nnidxname='HEME_raw_idx';
nndistname='HEME_raw_dist';
[NN.(nnidxname),NN.(nndistname)]=knnsearch(NN.HEME_indices,NN.(indexname)); %knnsearch provides counts vs actual values in the ATV_ng_g_raw array
figure,histogram(NN.(nndistname),'BinMethod','integers');
ylabel('Counts of HEME+ voxels');
xlabel('Distance to ARVs (voxels)');

Understanding 3D distance outputs in matlab

Being neither great at math nor coding, I am trying to understand the output I am getting when I try to calculate the linear distance between pairs of 3D points. Essentially, I have the 3D points of a bird that is moving in a confined area towards a stationary reward. I would like to calculate the distance of the animal to the reward at each point. However, when looking online for the best way to do this, I tried several options and get different results that I'm not sure how to interpret.
Example data:
reward = [[0.381605200000000,6.00214980000000,0.596942400000000]];
animal_path = = [2.08638710671220,-1.06496059617432,0.774253689976102;2.06262715454806,-1.01019576900787,0.773933446776898;2.03912411242035,-0.954888684677576,0.773408777383975;2.01583648760496,-0.898935333316342,0.772602855030873];
distance1 = sqrt(sum(([animal_path]-[reward]).^2));
distance2 = norm(animal_path - reward);
distance3 = pdist2(animal_path, reward);
Distance 1 gives 3.33919107083497 13.9693378592353 0.353216791787775
Distance 2 gives 14.3672145652704
Distance 3 gives 7.27198528565078
7.21319284516199
7.15394253573951
7.09412041863743
Why do these all yield different values (and different numbers of values)? Distance 3 seems to make the most sense for my purposes, even though the values are too large for the dimensions of the animal enclosure, which should be something like 3 or 4 meters.
Can someone please explain this in simple terms and/or point me to something less technical and jargon-y than the Matlab pages?
There are many things mathematicians call distance. What you normally associate with distance is the eucledian distance. This is what you want in this situation. The length of the line between two points. Now to your problem. The Euclidean distance distance is also called norm (or 2-norm).
For two points you can use the norm function, which means with distance2 you are already close to a solution. The problem is only, you input all your points at once. This does not calculate the distance for each point, instead it calculates the norm of the matrix. Something of no interest for you. This means you have to call norm once for each row point on the path:
k=nan(size(animal_path,1),1)
for p=1:size(animal_path,1),
k(p)=norm(animal_path(p,:) - reward);
end
Alternatively you can follow the idea you had in distance1. The only mistake you made there, you calculated the sum for each column, where the sum of each row was needed. Simple fix, you can control this using the second input argument of sum:
distance1 = sqrt(sum((animal_path-reward).^2,2))

Controlled random number/dataset generation in MATLAB

Say, I have a cube of dimensions 1x1x1 spanning between coordinates (0,0,0) and (1,1,1). I want to generate a random set of points (assume 10 points) within this cube which are somewhat uniformly distributed (i.e. within certain minimum and maximum distance from each other and also not too close to the boundaries). How do I go about this without using loops? If this is not possible using vector/matrix operations then the solution with loops will also do.
Let me provide some more background details about my problem (This will help in terms of what I exactly need and why). I want to integrate a function, F(x,y,z), inside a polyhedron. I want to do it numerically as follows:
$F(x,y,z) = \sum_{i} F(x_i,y_i,z_i) \times V_i(x_i,y_i,z_i)$
Here, $F(x_i,y_i,z_i)$ is the value of function at point $(x_i,y_i,z_i)$ and $V_i$ is the weight. So to calculate the integral accurately, I need to identify set of random points which are not too close to each other or not too far from each other (Sorry but I myself don't know what this range is. I will be able to figure this out using parametric study only after I have a working code). Also, I need to do this for a 3D mesh which has multiple polyhedrons, hence I want to avoid loops to speed things out.
Check out this nice random vectors generator with fixed sum FEX file.
The code "generates m random n-element column vectors of values, [x1;x2;...;xn], each with a fixed sum, s, and subject to a restriction a<=xi<=b. The vectors are randomly and uniformly distributed in the n-1 dimensional space of solutions. This is accomplished by decomposing that space into a number of different types of simplexes (the many-dimensional generalizations of line segments, triangles, and tetrahedra.) The 'rand' function is used to distribute vectors within each simplex uniformly, and further calls on 'rand' serve to select different types of simplexes with probabilities proportional to their respective n-1 dimensional volumes. This algorithm does not perform any rejection of solutions - all are generated so as to already fit within the prescribed hypercube."
Use i=rand(3,10) where each column corresponds to one point, and each row corresponds to the coordinate in one axis (x,y,z)

Using triplequad to calculate density (in Matlab)

As i've explained in a previous question: I have a dataset consisting of a large semi-random collection of points in three dimensional euclidian space. In this collection of points, i am trying to find the point that is closest to the area with the highest density of points.
As high performance mark answered;
the most straightforward thing to do would be to divide your subset of
Euclidean space into lots of little unit volumes (voxels) and count
how many points there are in each one. The voxel with the most points
is where the density of points is at its highest. Perhaps initially
dividing your space into 2 x 2 x 2 voxels, then choosing the voxel
with most points and sub-dividing that in turn until your criteria are
satisfied.
Mark suggested i use triplequad for this, but this is not a function i am familiar with, or understand very well. Does anyone have any pointers on how i could go about using this function in Matlab for what i am trying to do?
For example, say i have a random normally distributed matrix A = randn([300,300,300]), how could i use triplequad to find the point i am looking for? Because as i understand currently, i also have to provide triplequad with a function fun when using it. Which function should that be for this problem?
Here's an answer which doesn't use triplequad.
For the purposes of exposition I define an array of data like this:
A = rand([30,3])*10;
which gives me 30 points uniformly distributed in the box (0:10,0:10,0:10). Note that in this explanation a point in 3D space is represented by each row in A. Now define a 3D array for the counts of points in each voxel:
counts = zeros(10,10,10)
Here I've chosen to have a 10x10x10 array of voxels, but this is just for convenience, it would be only a little more difficult to have chosen some other number of voxels in each dimension, and there don't have to be the same number of voxels along each axis. Then the code
for ix = 1:size(A,1)
counts(ceil(A(ix,1)),ceil(A(ix,2)),ceil(A(ix,3))) = counts(ceil(A(ix,1)),ceil(A(ix,2)),ceil(A(ix,3)))+1
end
will count up the number of points in each of the voxels in counts.
EDIT
Unfortunately I have to do some work this afternoon and won't be able to get back to wrestling with the triplequad solution until later. Hope this is OK in the meantime.

Mean-Squared Displacement (MATLAB)

Please can you help me understand how to calculate the Mean-Squared Displacement for a single particle moving randomly within a given period of time. I have read a lot of articles on this (including Saxton,1991,Single-Particle Tracking: The Distribution of Diffusion Coefficients), but still confused (not getting the right answer).
Let me start by showing you how I do it and please correct me if I'm wrong:
The way I'm doing it is as follows:
1.Run the program from t=0 to t=100
2.Calculate the displacement, (s(t)-s(t+tau)), at each timestep (ie. at t=1,2,3,...100) and store it in a vector
3.Square the answer to number 2
4.find the mean to the answer of 3
In essence, this is what I'm doing in Matlab
%Initialise the lattice with a square consisting of 16 nonzero lattice sites then proceed %as follows to calculate the MSD:
for t=1:tend
% Allow the particle to move randomly in the lattice. Then do the following
[row,col]=find(lattice>0);
centroid=mean([row col]);
xvec=[xvec centroid(2)];
yvec=[yvec centroid(1)];
k=length(xvec)-1; % Time
dt=1;
diffx = xvec(1:k) - xvec((1+dt):(k+dt));
diffy = yvec(1:k) - yvec((1+dt):(k+dt));
xsquare = diffx.^2;
ysquare = diffy.^2;
MSD=mean(xsquare+ysquare);
end
I'm trying to find the MSD in order to compute the diffusion co-efficient. Note that I'm modelling a collection of lattice sites (16) to represent a single particle (more biologically realistic), instead of just one. I have been brief with the comment within the for loop as it is quite long, but I'm happy to send it to you.
So far, I'm getting very small MSD values (in the range of 0.001-1), whereas I'm supposed to get values in the range of (10-50). The particle moves very large distances so surely my range of 0.001-1 cannot be right!
This is an extract from the article which I'm trying to reproduce their figure:
" We began by running some simulations in 1D for a single
cell. We allowed the cell to move for a given number of
Monte Carlo time steps (MCS), worked out the mean square
distance traveled in that time, repeated this process 500
times, and evaluate the mean squared distance for this t.
We then repeated this process ten times to get the mean of
. The reason for this choice of repetitions was to
keep the time required to run the simulations within a reasonable
level yet ensuring that the standard deviation of the
mean was relatively small (<7%)".
You can access the article here "From discrete to a continuous model of biological cell movement, 2004, by Turner et al., Physical Review E".
Any hints are greatly appreciated.
How many dimensions does the particle move along ?
I don't have Matlab right now, but here is how I'd do that over one dimension :
% pos is the vector of positions
delta = pos(2:100) - pos(1:99);
meanSquared = mean(delta .* delta);
First of all, why have a particle cover multiple lattice sites? What counts for MSD, in the end, is the displacement of the centroid, which can be represented as a point. If your particle (or cell) is large, or only takes large steps, you can always just make a wider grid. Also, if you're trying to reproduce a figure from somewhere else, you should really use the same algorithm.
For your Monte Carlo simulation, what do you do? If all you really want is get a displacement, you can generate a bunch of random movement vectors in one go (using rand or randi), and use cumsum to calculate the positions. Also, have you plotted your random walks to make sure the data is sensible?
Then, your code looks a bit funny (see comments). Why don't you just use the code provided in this answer to calculate MSD from the positions?
for t=1:tend
% Allow the particle to move randomly in the lattice. Then do the following
[row,col]=find(lattice>0); %# what do you do this for?
centroid=mean([row col]);
xvec=[xvec centroid(2)];
yvec=[yvec centroid(1)]; %# till here, I have no idea what you want to do
k=length(xvec)-1; % Time %# you should subtract dt here
dt=1; %# dt should depend on t!
diffx = xvec(1:k) - xvec((1+dt):(k+dt));
diffy = yvec(1:k) - yvec((1+dt):(k+dt));
xsquare = diffx.^2;
ysquare = diffy.^2;
MSD=mean(xsquare+ysquare);
end