Calculating the "distance" between two two-dimensional data series - matlab

I have two datasets (tracks) with points in x/y which represent GPS positions. I want to analyze the distance between both tracks. The points are not necessary in sync, but having the same frequency, as shown in this little excerpt (each track consists of 1000+ points):
Example Picture
Due to being not in sync I can't just compare the two points which are closest to each other. And since the path is not exactly the same I can't sync the tracks. It might be a solution interpolating a curve for each dataset and then calculating the integral in between. Since the tracks are much longer than shown in the example I can't just use regression functions like polyfit.
How can this be done or are there other/better strategies for analyzing (mean/mean square...) the distance?

am304's answer is by far the easiest, and probably the way to go.
However, I'd like to add a few other ways to do this, which are much more complicated, but could greatly enhance accuracy depending on your use case.
And if it's not for you, then it could be useful for anyone else passing by.
Method 1
Pros: fast, easy
Cons: method is overly optimistic about the smoothness of the tracks
Determine the B-spline representation for both tracks. You then have a parametric relation for both tracks:
The distance between both tracks is then the average of the function
for all applicable t, which is computed through the following integral:
Method 2
Pros: closest to the "physics" of the situation
Cons: hard to get right, specific to the situation and thus non-reusable
Use the equations of motion of whatever was following that track to derive a transition matrix for any arbitrary time step t. When possible, also come up with an appropriate noise model.
Use a Kalman filter to re-sample both tracks to some equally-spaced time vector, which is preferably different from the time vector of both track 1 and track 2.
Compute the distances between the x,y pairs thus computed, and take the average.
Method 3
Pros: fast, easy
Cons: method is overly optimistic about the smoothness of the tracks. Both fits are biased.
Fit a space curve through track 1
Compute the distances of all points in track 2 to this space curve.
Repeat 1 and 2, but vice versa.
Take the average of all these distances.
Method 4
Pros: fast, easy
Cons: method is overly optimistic about the smoothness of the tracks. Fit will be of lesser quality due to inherently larger noise terms.
Fit a space curve to the union of both tracks. That is, treat points from track 1 and track 2 as a single data set, through which to fit a space curve.
Compute the perpendicular residuals of both tracks with respect to this space curve.
compute the average all these distances.
Remarks
Note that all methods here use the flat-Earth assumption. If the tracks are truly long and cover a non-negligible portion of the Earth's surface, you'll have to compute distances via the Haversine formula rather than a mere Pythagorean root. The Kalman filter is less sensitive to this, provided your equations of motion take care of a spherical Earth.
If you have an elevation model of the region of interest, use that. Of course depending on the area, you'd be surprised how much of a difference that makes compared to a smooth Earth.

Is the x/y data logged as a function of time? If so, you can resample one or both datasets to have to same sample time vector using the resample function for timeseries. You'll have to convert your data to a timeseries object first, but it's worth it. Once both data sets are resampled to the same time vector, you simply subtract one from the other.

Related

take center of mass or average for matched features

I have a application for tracking, then I will have the player object as the following photo shows. I need to do the following:
1- detect features from each frames and match them with the next frame, I use SURF
2- calculate the average point from the feature points which I have estimated from step 1
3- calculate distance between the average point that estimated at step 2, between each two frames.
then I am able to save the location for the matched features,
surfPoints.Location
but still I don't know what is the best way to get center of mass for these points, or take average for them?
Also how to filter the miss matched points, I see that there is a function estimateGeometricTransform , but this function remove many points from the matched ones !
is there any good approach for that?
So let me sum up :
You have two keypoint arrays, and matching function that gives you indices of matches in both lists ("keypoint 7 in original list is ~ matching keypoint 12 in the second")
So now your question is to evaluate global shift from these local displacements, taking into account outliers ?
In that case (fitting a model given outliers) you should really look into RANSAC song (and the eternally funny RANSAC song)
Although the algorithm works great, it is non-deterministic (as it will involve trying out models based on random samples and evaluating the number of outliers)
I'll let you do the reading on RANSAC's theory (simple statistics), now let's see how to use RANSAC in your case :
Your problem is thus : given a list of 2D vectors, find the best 2D vector that minimizes the number of "outliers"
The model fitting step is then just picking a vector out of the list of vector
Outliers are vectors that go "CRAZY WRONG" in direction or norm
Also, RANSAC explained by Mathworks
The difficulty here is that you have non-rigid motion. estimateGeometricTransform is great when the motion can be described by an affine or a projective transformation. However, because you are tracking a complex articulated object, like a person, the motion is much more complicated. This is why estimateGeometericTransform rejects a lot of matches as outliers.
There are several things you can try. One is to try using vision.PointTracker to track the points. It uses the KLT (Kanade-Lucas-Tomasi) algorithm.
Alternatively, if your camera is stationary, you can try using vision.ForegroundDetector, which implements background subtraction. It will give you a binary mask showing all moving objects.

Matlab calculate geographical distance to lat/lng polyline

In Matlab I would like to calculate the (shortest) distances between a set of independent points (m-by-2 matrix of lat/lng) and a set of polylines (n-by-2 matrix of lat/lng). The resulting table should be an n-m matrix with distances in KM.
I have rewritten this JavaScript implementation (http://www.bdcc.co.uk/Gmaps/BdccGeo.js) to Matlab, but it does not seem to perform well.
Currently I am working on a project with a relatively large set of data and running into performance issues. I have roughly 40.000 points and 150 polylines. The polylines are subsets of the original set of 40.000 points. With about 15 seconds per polyline, calculating all these distances can take up to an hour. Also, the intermediate matrixes of 40000x150x3 cause out of memory errors on my lesser machines.
Instead of optimizing or revising this implementation I am wondering if Matlab doesn't already have some (smarter) functions built in for this. But as far as I can see, the documentation mainly has information on how to display geodata as opposed to doing calculations on it.
Does anyone know or have experience with these kind of calculations in Matlab? Has anything like this already been written which I can reuse so I don't have to reinvent the wheel. And finally, is this expected performance, given these numbers, or should my function be able to perform much better?

Shape Context - Rotation Invariance

I was trying to implement Shape Context (in MatLab). I was trying to achieve rotation invariance.
The general approach for shape context is to compute distances and angles between each set of interest points in a given image. You then bin into a histogram based on whether these calculated values fall into certain ranges. You do this for both a standard and a test image. To match two different images, from this you use a chi-square function to estimate a "cost" between each possible pair of points in the two different histograms. Finally, you use an optimization technique such as the hungarian algorithm to find optimal assignments of points and then sum up the total cost, which will be lower for good matches.
I've checked several websites and papers, and they say that to make the above approach rotation invariant, you need to calculate each angle between each pair of points using the tangent vector as the x-axis. (ie http://www.cs.berkeley.edu/~malik/papers/BMP-shape.pdf page 513)
What exactly does this mean? No one seems to explain it clearly. Also, from which of each pair of points would you get the tangent vector - would you average the two?
A couple other people suggested I could use gradients (which are easy to find in Matlab) and use this as a substitute for the tangent points, though it does not seem to compute reasonable cost scores with this. Is it feasible to do this with gradients?
Should gradient work for this dominant orientation?
What do you mean by ordering the bins with respect to that orientation? I was originally going to have a square matrix of bins - with the radius between two given points determining the column in the matrix and the calculated angle between two given points determining the row.
Thank you for your insight.
One way of achieving (somewhat) rotation invariance is to make sure that where ever you compute your image descriptor their orientation (that is ordering of the bins) would be (roughly) the same. In order to achieve that you pick the dominant orientation at the point where you extract each descriptor and order the bins with respect to that orientation. This way you can compare bin-to-bin of different descriptors knowing that their ordering is the same: with respect to their local dominant orientation.
From my personal experience (which is not too much) these methods looks better on paper than in practice.

Measuring density for three dimensional data (in Matlab)

I have a dataset consisting of a large collection of points in three dimensional euclidian space. In this collection of points, i am trying to find the point that is nearest to the area with the highest density of points.
So my problem consists of two steps:
1: Determine where density of the distribution of points is at its highest
2: Determine which point is nearest to the point found in 1
Point 2 i can manage, but i'm not sure how to solve point 1. I know there are a lot of functions for density estimation in Matlab, but i'm not sure which one would be the most suitable, or straightforward to use.
Does anyone know?
My command of statistics is a little bit rusty, but as far as i can tell, this type of problem calls for multivariate analysis. Someone suggested i use multivariate kernel density estimation, but i'm not really sure if that's the best solution.
Density is a measure of mass per unit volume. On the assumption that your points all have the same mass then you are, I suppose, trying to measure the number of points per unit volume. So one approach is to divide your subset of Euclidean space into lots of little unit volumes (let's call them voxels like everyone does) and count how many points there are in each one. The voxel with the most points is where the density of points is at its highest. This is, of course, numerical integration of a sort. If your points were distributed according to some analytic function (and I guess they are not) you could solve the problem with pencil and paper.
You might make this approach as sophisticated as you like, perhaps initially dividing your space into 2 x 2 x 2 voxels, then choosing the voxel with most points and sub-dividing that in turn until your criteria are satisfied.
I hope this will get you started on your point 1; you seem to be OK with point 2 so I'll stop now.
EDIT
It looks as if triplequad might be what you are looking for.

Process for comparing two datasets

I have two datasets at the time (in the form of vectors) and I plot them on the same axis to see how they relate with each other, and I specifically note and look for places where both graphs have a similar shape (i.e places where both have seemingly positive/negative gradient at approximately the same intervals). Example:
So far I have been working through the data graphically but realize that since the amount of the data is so large plotting each time I want to check how two sets correlate graphically it will take far too much time.
Are there any ideas, scripts or functions that might be useful in order to automize this process somewhat?
The first thing you have to think about is the nature of the criteria you want to apply to establish the similarity. There is a wide variety of ways to measure similarity and the more precisely you can describe what you want for "similar" to mean in your problem the easiest it will be to implement it regardless of the programming language.
Having said that, here is some of the thing you could look at :
correlation of the two datasets
difference of the derivative of the datasets (but I don't think it would be robust enough)
spectral analysis as mentionned by #thron of three
etc. ...
Knowing the origin of the datasets and their variability can also help a lot in formulating robust enough algorithms.
Sure. Call your two vectors A and B.
1) (Optional) Smooth your data either with a simple averaging filter (Matlab 'smooth'), or the 'filter' command. This will get rid of local changes in velocity ("gradient") that appear to be essentially noise (as in the ascending component of the red trace.
2) Differentiate both A and B. Now you are directly representing the velocity of each vector (Matlab 'diff').
3) Add the two differentiated vectors together (element-wise). Call this C.
4) Look for all points in C whose absolute value is above a certain threshold (you'll have to eyeball the data to get a good idea of what this should be). Points above this threshold indicate highly similar velocity.
5) Now look for where a high positive value in C is followed by a high negative value, or vice versa. In between these two points you will have similar curves in A and B.
Note: a) You could do the smoothing after step 3 rather than after step 1. b) Re 5), you could have a situation in which a 'hill' in your data is at the edge of the vector and so is 'cut in half', and the vectors descend to baseline before ascending in the next hill. Then 5) would misidentify the hill as coming between the initial descent and subsequent ascent. To avoid this, you could also require that the points in A and B in between the two points of velocity similarity have high absolute values.