Efficient way to compute coincidence of space and time - matlab

Given a 3-dimensional polyline P = {(x1, y1, t1), ..., (xn, yn, tn)} and another polyline Q = {(x1, y1, t1), ..., (xm, ym, tm)} (m not necessarly equal to n, so polylines could have different length), coincidence in space and time occurs when the trajectories of moving objects P and Q, have some timing and positioning in common (Point A, as seen in example figure is a coincidence point cause (xa, ya, ta)==(xb, yb, tb) obviously coincidence point could be a point outside of initial sets of point)
The concept is quite simple and visual perspective easily identify where colocation happen. Hardest part is how to realize an algorithm that efficiently compute coincidence and return the calculated (remember: point could be outside given sets of points) x, y coords and time t of where colocation happens!! This algorithm will be developed in Matlab so I have all necessary to rapidly work.
Best regards

assuming that x, y, z are functions of t for all segments of each polyline, here's a brute-force start. in 4 dimensions: P has segments p1 from (x_start(t), y_start(t), z_start(t), t) to (x_end(t), y_end(t), z_end(t), t), and similarly Q
for each segment p of P
for each segment q of Q
if p intersects q (in 4 dimensions)
output intersection point
the intersection condition is:
there exists alpha and beta in [0,1] where alpha * px_start(t) + (1 - alpha) * (px_end(t) - px_start(t)) = beta * qx_start(t) + (1 - beta) * (qx_end(t) - qx_start(t)) and 2 more similar conditions for y and z
solvability of the intersection condition depends on what are the functions x(t), y(t), z(t) -- linear? polynomials? etc.

Related

How to generate n points and restrict the distance between them to be greater than a given value?

I would like to generate, randomly with uniform distribution, n points such that the distance between any point is greater than some fixed value.
Here is the scenario that I have in MATLAB:
I have M red points which I generate randomly with uniform distribution as follow: the x-abscissa of the M red points are xr = rand(1, M) and the y-ordinate of the M red points are yr = rand(1, M).
Also, I have M black points which I generate similarly as the red points, i.e., xb = rand(1, M) and yb = rand(1, M).
Then, I calculate the distances between all the points as follow:
x = [xr, xb];
y = [yr, yb];
D = sqrt(bsxfun(#minus, x, x').^2 + bsxfun(#minus, y, y').^2);
d = D(1:M, M + 1:end);
Now, I have to restrict the distance d to be always greater than some given value, say d0=0.5.
How to do this?
While such sampling (which is equivalent to non-overlapping circles generation) is discussed on math.stackexchange, see https://mathematica.stackexchange.com/questions/2594/efficient-way-to-generate-random-points-with-a-predefined-lower-bound-on-their-p and https://mathematica.stackexchange.com/questions/69649/generate-nonoverlapping-random-circles, I would like to point out to another potential solution which involves quasi-random numbers. For quasi-random Sobol sequences there is a statement which says that there is minimum positive distance between points which amounts to 0.5*sqrt(d)/N, where d is dimension of the problem, and N is number of points sampled in hypercube. Paper from the man himself http://www.sciencedirect.com/science/article/pii/S0378475406002382

Cosine distance as vector distance function for k-means

I have a graph of N vertices where each vertex represents a place. Also I have vectors, one per user, each one of N coefficients where the coefficient's value is the duration in seconds spent at the corresponding place or 0 if that place was not visited.
E.g. for the graph:
the vector:
v1 = {100, 50, 0 30, 0}
would mean that we spent:
100secs at vertex 1
50secs at vertex 2 and
30secs at vertex 4
(vertices 3 & 5 where not visited, thus the 0s).
I want to run a k-means clustering and I've chosen cosine_distance = 1 - cosine_similarity as the metric for the distances, where the formula for cosine_similarity is:
as described here.
But I noticed the following. Assume k=2 and one of the vectors is:
v1 = {90,0,0,0,0}
In the process of solving the optimization problem of minimizing the total distance from candidate centroids, assume that at some point, 2 candidate centroids are:
c1 = {90,90,90,90,90}
c2 = {1000, 1000, 1000, 1000, 1000}
Running the cosine_distance formula for (v1, c1) and (v1, c2) we get exactly the same distance of 0.5527864045 for both.
I would assume that v1 is more similar (closer) to c1 than c2. Apparently this is not the case.
Q1. Why is this assumption wrong?
Q2. Is the cosine distance a correct distance function for this case?
Q3. What would be a better one given the nature of the problem?
Let's divide cosine similarity into parts and see how and why it works.
Cosine between 2 vectors - a and b - is defined as:
cos(a, b) = sum(a .* b) / (length(a) * length(b))
where .* is an element-wise multiplication. Denominator is here just for normalization, so let's simply call it L. With it our functions turns into:
cos(a, b) = sum(a .* b) / L
which, in its turn, may be rewritten as:
cos(a, b) = (a[1]*b[1] + a[2]*b[2] + ... + a[k]*b[k]) / L =
= a[1]*b[1]/L + a[2]*b[2]/L + ... + a[k]*b[k]/L
Let's get a bit more abstract and replace x * y / L with function g(x, y) (L here is constant, so we don't put it as function argument). Our cosine function thus becomes:
cos(a, b) = g(a[1], b[1]) + g(a[2], b[2]) + ... + g(a[n], b[n])
That is, each pair of elements (a[i], b[i]) is treated separately, and result is simply sum of all treatments. And this is good for your case, because you don't want different pairs (different vertices) to mess with each other: if user1 visited only vertex2 and user2 - only vertex1, then they have nothing in common, and similarity between them should be zero. What you actually don't like is how similarity between individual pairs - i.e. function g() - is calculated.
With cosine function similarity between individual pairs looks like this:
g(x, y) = x * y / L
where x and y represent time users spent on the vertex. And here's the main question: does multiplication represent similarity between individual pairs well? I don't think so. User who spent 90 seconds on some vertex should be close to user who spent there, say, 70 or 110 seconds, but much more far from users who spend there 1000 or 0 seconds. Multiplication (even normalized by L) is totally misleading here. What it even means to multiply 2 time periods?
Good news is that this is you who design similarity function. We have already decided that we are satisfied with independent treatment of pairs (vertices), and we only want individual similarity function g(x, y) to make something reasonable with its arguments. And what is reasonable function to compare time periods? I'd say subtraction is a good candidate:
g(x, y) = abs(x - y)
This is not similarity function, but instead distance function - the closer are values to each other, the smaller is result of g() - but eventually idea is the same, so we can interchange them when we need.
We may also want to increase impact of large mismatches by squaring the difference:
g(x, y) = (x - y)^2
Hey! We've just reinvented (mean) squared error! We can now stick to MSE to calculate distance, or we can proceed finding good g() function.
Sometimes we may want not increase, but instead smooth the difference. In this case we can use log:
g(x, y) = log(abs(x - y))
We can use special treatment for zeros like this:
g(x, y) = sign(x)*sign(y)*abs(x - y) # sign(0) will turn whole expression to 0
Or we can go back from distance to similarity by inversing the difference:
g(x, y) = 1 / abs(x - y)
Note, that in recent options we haven't used normalization factor. In fact, you can come up with some good normalization for each case, or just omit it - normalization is not always needed or good. For example, in cosine similarity formula if you change normalization constant L=length(a) * length(b) to L=1, you will get different, but still reasonable results. E.g.
cos([90, 90, 90]) == cos(1000, 1000, 1000) # measuring angle only
cos_no_norm([90, 90, 90]) < cos_no_norm([1000, 1000, 1000]) # measuring both - angle and magnitude
Summarizing this long and mostly boring story, I would suggest rewriting cosine similarity/distance to use some kind of difference between variables in two vectors.
Cosine similarity is meant for the case where you do not want to take length into accoun, but the angle only.
If you want to also include length, choose a different distance function.
Cosine distance is closely related to squared Euclidean distance (the only distance for which k-means is really defined); which is why spherical k-means works.
The relationship is quite simple:
squared Euclidean distance sum_i (x_i-y_i)^2 can be factored into sum_i x_i^2 + sum_i y_i^2 - 2 * sum_i x_i*y_i. If both vectors are normalized, i.e. length does not matter, then the first two terms are 1. In this case, squared Euclidean distance is 2 - 2 * cos(x,y)!
In other words: Cosine distance is squared Euclidean distance with the data normalized to unit length.
If you don't want to normalize your data, don't use Cosine.
Q1. Why is this assumption wrong?
As we see from the definition, cosine similarity measures angle between 2 vectors.
In your case, vector v1 lies flat on the first dimension, while c1 and c2 both are equally aligned from the axes, and thus, cosine similarity has to be same.
Note that the trouble lies with c1 and c2 pointing in the same direction. Any v1 will have the same cosine similarity with both of them. For illustration :
Q2. Is the cosine distance a correct distance function for this case?
As we see from the example in hand, probably not.
Q3. What would be a better one given the nature of the problem?
Consider Euclidean Distance.

Least distance error between two lines

I have two plotted lines and I want to find the least distance error between them. When I simply subtract them from each other I get the error in the x-direction. But I am looking for the error in the least distance way between the two lines.
Any help is greatly appreciated!
Best regards,
Gidi
With d = pdist2(L1, L2, 'euclidean', 'smallest', 1); you get a vector d with the distance from each point in L2 to it's closest neighbor in L1. Then, the shortest distance will be min(d).
I am assuming that both L1 and L2 are n-by-2 and m-by-2 with m and n being the number of points (n and m are allowed to be different). From your comment I would guess you had not included the x-component. To fix that, you could say L1 = [y_n, u_new] and likewise for L2 from z, assuming y_n was the x-component. If y_n is a row vector, you should transpose it as in L1 = [y_n', u_new].
If you wish to plot the minimum distance for each point and both lines, plot(y_n, [u_new, z, d]) should work. Again here, check the orientation of your vectors.

Finding the first point of great circle Intersection

I have a problem I've been trying to solve and I cannot come up with the answer. I have written a function in Matlab which, given two lat/lon points and two bearings, will return the two great circle points of intersection.
However, what I really need is the first great circle point of intersection along the two initial headings. I.e. if two airplanes begin at lat/lon points 1 and 2, with initial bearings of bearing1 and bearing2, which of the two great circle intersection points is the first one they encounter? There are many solutions (using haversine) which will give me the closer of the two points, but I don't actually care about which is closer, I care about which I will encounter first given specific start points and headings. There are many cases where the closer of the two intersections is actually the second intersection encountered.
Now, I realize I could do this with lots of conditional statements for handling the different cases, but I figure there's got to be a way to handle it with regard to the order I take all my cross products (function code given below), but I simply can't come up with the right solution! I should also mention that this function is going to be used in a large computationally intensive model, and so I'm thinking the solution to this problem needs to be rather elegant/speedy. Can anyone help me with this problem?
The following is not my function code (I can't list that here), but it is the pseudo code that my function was based off of:
%Given inputs of lat1,lon1,Bearing1,lat2,lon2,Bearing2:
%Calculate arbitrary secondary point along same initial bearing from first
%point
dAngle = 45;
lat3 = asind( sind(lat1)*cosd(dAngle) + cosd(lat1)*sind(dAngle)*cosd(Bearing1));
lon3 = lon1 + atan2( sind(Bearing1)*sind(dAngle)*cosd(lat1), cosd(dAngle)-sind(lat1)*sind(lat3) )*180/pi;
lat4 = asind( sind(lat2)*cosd(dAngle) + cosd(lat2)*sind(dAngle)*cosd(Bearing2));
lon4 = lon2 + atan2( sind(Bearing2)*sind(dAngle)*cosd(lat2), cosd(dAngle)-sind(lat2)*sind(lat4) )*180/pi;
%% Calculate unit vectors
% We now have two points defining each of the two great circles. We need
% to calculate unit vectors from the center of the Earth to each of these
% points
[Uvec1(1),Uvec1(2),Uvec1(3)] = sph2cart(lon1*pi/180,lat1*pi/180,1);
[Uvec2(1),Uvec2(2),Uvec2(3)] = sph2cart(lon2*pi/180,lat2*pi/180,1);
[Uvec3(1),Uvec3(2),Uvec3(3)] = sph2cart(lon3*pi/180,lat3*pi/180,1);
[Uvec4(1),Uvec4(2),Uvec4(3)] = sph2cart(lon4*pi/180,lat4*pi/180,1);
%% Cross product
%Need to calculate the the "plane normals" for each of the two great
%circles
N1 = cross(Uvec1,Uvec3);
N2 = cross(Uvec2,Uvec4);
%% Plane of intersecting line
%With two plane normals, the cross prodcut defines their intersecting line
L = cross(N1,N2);
L = L./norm(L);
L2 = -L;
L2 = L2./norm(L2);
%% Convert normalized intersection line to geodetic coordinates
[lonRad,latRad,~]=cart2sph(L(1),L(2),L(3));
lonDeg = lonRad*180/pi;
latDeg = latRad*180/pi;
[lonRad,latRad,~]=cart2sph(L2(1),L2(2),L2(3));
lonDeg2 = lonRad*180/pi;
latDeg2 = latRad*180/pi;
UPDATE: A user on the Mathworks forums pointed this out:
Actually they might each reach a different point. I might have misunderstood the question, but the way you worded it suggests that both trajectories will converge towards the same point, which is not true. If you image the first point being a little after one intersection and the second point being a little after the other intersection, you have a situation were each "plane" will travel towards the intersection that was the closest to the other plane at the beginning.
I didn't think about this. It is actually very possible that each of the planes intersects a different great circle intersection first. Which just made things much more complicated...
Presumably you have a velocity vector for the direction of the plane (the direction in which you want to look for your first intersection - the "bearing vector on the surface of the earth"). You didn't actually specify this. Let us call it v.
You also have the cartesian coordinates of two points, P1 and P2, as well as the initial position of the plane, P0. Each is assumed to be already on the unit sphere (length = 1).
Now we want to know which is the shorter distance - to P1, or P2 - so we need to know the angle "as seen from the direction of the normal vector". For this we need both the sin and the cos - then we can find the angle using atan2. We obtain sin from the cross product (remember all vectors were normalized), and cos from the dot product. To get the sin "as seen from the right direction", we take dot product with the normal vector.
Here is a piece of code that puts it all together - I am using very simple coordinates for the points and direction vector so I can confirm in my head that this gives the correct answer:
% sample points of P0...P2 and v
P0 = [1 0 0];
P1 = [0 1 0];
P2 = [0 -1 0];
v = [0 1 0];
% compute the "start direction normal":
n0 = cross(P0, v);
n0 = n0 / norm( n0 ); % unit vector
% compute cross and dot products:
cr01 = cross(P0, P1);
cr02 = cross(P0, P2);
cos01 = dot(P0, P1);
cos02 = dot(P0, P2);
% to get sin with correct sign, take dot product with direction normal:
sin01 = dot(cr01, n0);
sin02 = dot(cr02, n0);
% note - since P0 P1 and P2 are all in the same plane
% cr02 and cr02 are either pointing in the same direction as n0
% or the opposite direction. In the latter case we get a sign flip for the sin
% in the former case this does nothing
% Now get the angle, mapped from 0 to 2 pi:
ang1 = mod(atan2(sin01, cos01), 2*pi);
ang2 = mod(atan2(sin02, cos02), 2*pi);
if( ang1 < ang2 )
fprintf(1,'point 1 is reached first\n');
% point 1 is the first one reached
else
fprintf(1,'point 2 is reached first\n');
% point 2 is the first
end
When you change the direction of the velocity vector (pointing towards P2 instead of P1), the program correctly tells you "point 2 is reached first".
Let me know if this works for you!

Calculate distance from point p to high dimensional Gaussian (M, V)

I have a high dimensional Gaussian with mean M and covariance matrix V. I would like to calculate the distance from point p to M, taking V into consideration (I guess it's the distance in standard deviations of p from M?).
Phrased differentially, I take an ellipse one sigma away from M, and would like to check whether p is inside that ellipse.
If V is a valid covariance matrix of a gaussian, it then is symmetric positive definite and therefore defines a valid scalar product. By the way inv(V) also does.
Therefore, assuming that M and p are column vectors, you could define distances as:
d1 = sqrt((M-p)'*V*(M-p));
d2 = sqrt((M-p)'*inv(V)*(M-p));
the Matlab way one would rewrite d2as (probably some unnecessary parentheses):
d2 = sqrt((M-p)'*(V\(M-p)));
The nice thing is that when V is the unit matrix, then d1==d2and it correspond to the classical euclidian distance. To find wether you have to use d1 or d2is left as an exercise (sorry, part of my job is teaching). Write the multi-dimensional gaussian formula and compare it to the 1D case, since the multidimensional case is only a particular case of the 1D (or perform some numerical experiment).
NB: in very high dimensional spaces or for very many points to test, you might find a clever / faster way from the eigenvectors and eigenvalues of V (i.e. the principal axes of the ellipsoid and their corresponding variance).
Hope this helps.
A.
Consider computing the probability of the point given the normal distribution:
M = [1 -1]; %# mean vector
V = [.9 .4; .4 .3]; %# covariance matrix
p = [0.5 -1.5]; %# 2d-point
prob = mvnpdf(p,M,V); %# probability P(p|mu,cov)
The function MVNPDF is provided by the Statistics Toolbox
Maybe I'm totally off, but isn't this the same as just asking for each dimension: Am I inside the sigma?
PSEUDOCODE:
foreach(dimension d)
(M(d) - sigma(d) < p(d) < M(d) + sigma(d)) ?
Because you want to know if p is inside every dimension of your gaussian. So actually, this is just a space problem and your Gaussian hasn't have to do anything with it (except for M and sigma which are just distances).
In MATLAB you could try something like:
all(M - sigma < p < M + sigma)
A distance to that place could be, where I don't know the function for the Euclidean distance. Maybe dist works:
dist(M, p)
Because M is just a point in space and p as well. Just 2 vectors.
And now the final one. You want to know the distance in a form of sigma's:
% create a distance vector and divide it by sigma
M - p ./ sigma
I think that will do the trick.