Matlab calculate geographical distance to lat/lng polyline - matlab

In Matlab I would like to calculate the (shortest) distances between a set of independent points (m-by-2 matrix of lat/lng) and a set of polylines (n-by-2 matrix of lat/lng). The resulting table should be an n-m matrix with distances in KM.
I have rewritten this JavaScript implementation (http://www.bdcc.co.uk/Gmaps/BdccGeo.js) to Matlab, but it does not seem to perform well.
Currently I am working on a project with a relatively large set of data and running into performance issues. I have roughly 40.000 points and 150 polylines. The polylines are subsets of the original set of 40.000 points. With about 15 seconds per polyline, calculating all these distances can take up to an hour. Also, the intermediate matrixes of 40000x150x3 cause out of memory errors on my lesser machines.
Instead of optimizing or revising this implementation I am wondering if Matlab doesn't already have some (smarter) functions built in for this. But as far as I can see, the documentation mainly has information on how to display geodata as opposed to doing calculations on it.
Does anyone know or have experience with these kind of calculations in Matlab? Has anything like this already been written which I can reuse so I don't have to reinvent the wheel. And finally, is this expected performance, given these numbers, or should my function be able to perform much better?

Related

Comparing 2 sets of coordinates and finding points that are apart by less than a certain distance

I have two large databases (500k to 3M rows) inside PostgreSQL, each containing a set of GPS lat longs. I need to compare all coordinates in one database with that of the other database, and find points that are within 300m of each other.
I started using PostgreSQL as I had heard about its spatial indexing which speeds up geometry-related tasks a lot. The idea is to use spatial indexing e.g. R Trees, to only check nodes that were already determined to be close to each other, instead of checking the entire database every time O(n^2)
However, I couldn't find anything related to this.
*Edit: I am not looking for a distance calculation algorithm, I am looking for optimizations to speed up the comparison of locations in my 2 tables. So it is not a duplicate question.
Just giving a tip here.
Distance is a result of a static function (calculation) that has input the 2 sets of long and lat.
Work with the formula you can find on the internet to find the deviation (sum of lat and long) bigger than which the distance is certainly greater than 300 km .
This deviation will be bigger the closer the point to the poles is and will be lesser the closer the point to the equator is. So divide the coords in 3-4 different areas based on how close to the equator they are, then calculate how different the other coords must be in order to be certain that distance is higher than 300km and then work only with the remaining coords.
that would drastically improve the performance of any type of search you plan to perform.
good luck!

MATLAB: How can I get griddata() to cache things to run faster for many identical interpolations?

I'm using the griddata() command in MATLAB to go from a spherical grid of sizes on the order (128x256x1500) to a Cartesian cubic grid, centered on the sphere and containing N^3 regularly-spaced points (where N is between 128 and 512). I need to do this for dozens or hundreds of checkpoints in my simulation, and several variables per checkpoint. I'm going to need to interpolate from the same spherical grid to the same cubic grid several hundred or several thousand times over, using new data on the spherical grid each time!
Since the most computationally expensive part of this routine is the triangulation and interpolation, I would like to cache some information the first time the routine is run and use that information for subsequent runs.
I think I could probably cache a table of vertex indices and associated interpolation weights for every point on the cubic grid, but I'm not sure how/where to do this....
As far as I can tell, this is not possible using the current implementation of griddata(). Is there any way I could do something like this -- perhaps re-writing the griddata() routine?

Calculating the "distance" between two two-dimensional data series

I have two datasets (tracks) with points in x/y which represent GPS positions. I want to analyze the distance between both tracks. The points are not necessary in sync, but having the same frequency, as shown in this little excerpt (each track consists of 1000+ points):
Example Picture
Due to being not in sync I can't just compare the two points which are closest to each other. And since the path is not exactly the same I can't sync the tracks. It might be a solution interpolating a curve for each dataset and then calculating the integral in between. Since the tracks are much longer than shown in the example I can't just use regression functions like polyfit.
How can this be done or are there other/better strategies for analyzing (mean/mean square...) the distance?
am304's answer is by far the easiest, and probably the way to go.
However, I'd like to add a few other ways to do this, which are much more complicated, but could greatly enhance accuracy depending on your use case.
And if it's not for you, then it could be useful for anyone else passing by.
Method 1
Pros: fast, easy
Cons: method is overly optimistic about the smoothness of the tracks
Determine the B-spline representation for both tracks. You then have a parametric relation for both tracks:
The distance between both tracks is then the average of the function
for all applicable t, which is computed through the following integral:
Method 2
Pros: closest to the "physics" of the situation
Cons: hard to get right, specific to the situation and thus non-reusable
Use the equations of motion of whatever was following that track to derive a transition matrix for any arbitrary time step t. When possible, also come up with an appropriate noise model.
Use a Kalman filter to re-sample both tracks to some equally-spaced time vector, which is preferably different from the time vector of both track 1 and track 2.
Compute the distances between the x,y pairs thus computed, and take the average.
Method 3
Pros: fast, easy
Cons: method is overly optimistic about the smoothness of the tracks. Both fits are biased.
Fit a space curve through track 1
Compute the distances of all points in track 2 to this space curve.
Repeat 1 and 2, but vice versa.
Take the average of all these distances.
Method 4
Pros: fast, easy
Cons: method is overly optimistic about the smoothness of the tracks. Fit will be of lesser quality due to inherently larger noise terms.
Fit a space curve to the union of both tracks. That is, treat points from track 1 and track 2 as a single data set, through which to fit a space curve.
Compute the perpendicular residuals of both tracks with respect to this space curve.
compute the average all these distances.
Remarks
Note that all methods here use the flat-Earth assumption. If the tracks are truly long and cover a non-negligible portion of the Earth's surface, you'll have to compute distances via the Haversine formula rather than a mere Pythagorean root. The Kalman filter is less sensitive to this, provided your equations of motion take care of a spherical Earth.
If you have an elevation model of the region of interest, use that. Of course depending on the area, you'd be surprised how much of a difference that makes compared to a smooth Earth.
Is the x/y data logged as a function of time? If so, you can resample one or both datasets to have to same sample time vector using the resample function for timeseries. You'll have to convert your data to a timeseries object first, but it's worth it. Once both data sets are resampled to the same time vector, you simply subtract one from the other.

Prediction avoiding landmass

I am working on a project where the following functions has to be implemented.
Predict the location of the ships (in maritime environment) into a future time (Can be done with Kalman filter, IMM filter and some other algorithms).
Ships can be any part of the world.
Avoiding landmass during prediction
Shortest path along the shorelines
I am totally done with the first part which is predicting without considering the shoreline information. I have
problem with the functions 2 and 3.
Problem in function 2
At times, your predicted location can fall into the landmass area which is totally unacceptable.
I am using following coastal area shp file http://openstreetmapdata.com/data/coastlines
This file has converted XY values of the world shoreline data.
I have loaded this shp file into postgreSQL and used postgis to read it from the database.
So my idea is to go through all the polygons (shoreline defined based on polygons) and checking whether the line connecting the present location and the predicted location
crosses the polygon. If it crosses, that means we have to find the where the ship intercept the shoreline first.
So if I follow this approach going through all the polygons, it is going to take time forever. (It has around 62000 polygons with each of them has 1000's of
points). So any advice on this? I thought about initially dividing the worldmap into hierachical areas (Level 1 : 10 polygons, Level 2: Each polygon has 10 polygons inside).
But I am not sure how to divide the world map with the above shp file into the level of polygons I require.
Or any functionality of postgis helpful for this? or any other libraries for this purpose. I believe this kind of functionality should be available already. But I could not
able to figure it out sofar.
Function 3
Since now we know where does the ship intercept the shoreline first, we can predict it along the shoreline using the shortest path algorithm given we know
the destination information. But to do this, you need to divide the above shoreline map into grids so the shortest path can be used.
So how can you make grids based on this along the shorelines? I am not doing image processing here. What I have is this shp file now. Any advice is appreciated.
or should I go with some image processing approach and make the grid shorelines. if so please provide some links.
First, PostGIS is pretty fast, and with the proper indexes, as long as you keep your polygons reasonably small, you should be able to make up for the number of them with good indexing and overlapping operator support (overlapping polygons can use GIST and GIN indexes, with the latter performing better than the former for reads and worse for writes).
62000 polygons globally is nothing. Write back when you are having to check more than a few thousand whose bounding boxes overlap with your line....
For the third problem, you are going one direction, right? I am wondering how hard it would be to write a tangent(point, vector, polygon) function which would return the closest tangent to a polygon along a certain vector (a vector could be represented by a (point, point) tuple). If you were to combine this with KNN searches, you ought to be able to plot a course using a WITH RECURSIVE query.

Measuring density for three dimensional data (in Matlab)

I have a dataset consisting of a large collection of points in three dimensional euclidian space. In this collection of points, i am trying to find the point that is nearest to the area with the highest density of points.
So my problem consists of two steps:
1: Determine where density of the distribution of points is at its highest
2: Determine which point is nearest to the point found in 1
Point 2 i can manage, but i'm not sure how to solve point 1. I know there are a lot of functions for density estimation in Matlab, but i'm not sure which one would be the most suitable, or straightforward to use.
Does anyone know?
My command of statistics is a little bit rusty, but as far as i can tell, this type of problem calls for multivariate analysis. Someone suggested i use multivariate kernel density estimation, but i'm not really sure if that's the best solution.
Density is a measure of mass per unit volume. On the assumption that your points all have the same mass then you are, I suppose, trying to measure the number of points per unit volume. So one approach is to divide your subset of Euclidean space into lots of little unit volumes (let's call them voxels like everyone does) and count how many points there are in each one. The voxel with the most points is where the density of points is at its highest. This is, of course, numerical integration of a sort. If your points were distributed according to some analytic function (and I guess they are not) you could solve the problem with pencil and paper.
You might make this approach as sophisticated as you like, perhaps initially dividing your space into 2 x 2 x 2 voxels, then choosing the voxel with most points and sub-dividing that in turn until your criteria are satisfied.
I hope this will get you started on your point 1; you seem to be OK with point 2 so I'll stop now.
EDIT
It looks as if triplequad might be what you are looking for.