MATLAB: How can I get griddata() to cache things to run faster for many identical interpolations? - matlab

I'm using the griddata() command in MATLAB to go from a spherical grid of sizes on the order (128x256x1500) to a Cartesian cubic grid, centered on the sphere and containing N^3 regularly-spaced points (where N is between 128 and 512). I need to do this for dozens or hundreds of checkpoints in my simulation, and several variables per checkpoint. I'm going to need to interpolate from the same spherical grid to the same cubic grid several hundred or several thousand times over, using new data on the spherical grid each time!
Since the most computationally expensive part of this routine is the triangulation and interpolation, I would like to cache some information the first time the routine is run and use that information for subsequent runs.
I think I could probably cache a table of vertex indices and associated interpolation weights for every point on the cubic grid, but I'm not sure how/where to do this....
As far as I can tell, this is not possible using the current implementation of griddata(). Is there any way I could do something like this -- perhaps re-writing the griddata() routine?

Related

Matlab calculate geographical distance to lat/lng polyline

In Matlab I would like to calculate the (shortest) distances between a set of independent points (m-by-2 matrix of lat/lng) and a set of polylines (n-by-2 matrix of lat/lng). The resulting table should be an n-m matrix with distances in KM.
I have rewritten this JavaScript implementation (http://www.bdcc.co.uk/Gmaps/BdccGeo.js) to Matlab, but it does not seem to perform well.
Currently I am working on a project with a relatively large set of data and running into performance issues. I have roughly 40.000 points and 150 polylines. The polylines are subsets of the original set of 40.000 points. With about 15 seconds per polyline, calculating all these distances can take up to an hour. Also, the intermediate matrixes of 40000x150x3 cause out of memory errors on my lesser machines.
Instead of optimizing or revising this implementation I am wondering if Matlab doesn't already have some (smarter) functions built in for this. But as far as I can see, the documentation mainly has information on how to display geodata as opposed to doing calculations on it.
Does anyone know or have experience with these kind of calculations in Matlab? Has anything like this already been written which I can reuse so I don't have to reinvent the wheel. And finally, is this expected performance, given these numbers, or should my function be able to perform much better?

Calculating the "distance" between two two-dimensional data series

I have two datasets (tracks) with points in x/y which represent GPS positions. I want to analyze the distance between both tracks. The points are not necessary in sync, but having the same frequency, as shown in this little excerpt (each track consists of 1000+ points):
Example Picture
Due to being not in sync I can't just compare the two points which are closest to each other. And since the path is not exactly the same I can't sync the tracks. It might be a solution interpolating a curve for each dataset and then calculating the integral in between. Since the tracks are much longer than shown in the example I can't just use regression functions like polyfit.
How can this be done or are there other/better strategies for analyzing (mean/mean square...) the distance?
am304's answer is by far the easiest, and probably the way to go.
However, I'd like to add a few other ways to do this, which are much more complicated, but could greatly enhance accuracy depending on your use case.
And if it's not for you, then it could be useful for anyone else passing by.
Method 1
Pros: fast, easy
Cons: method is overly optimistic about the smoothness of the tracks
Determine the B-spline representation for both tracks. You then have a parametric relation for both tracks:
The distance between both tracks is then the average of the function
for all applicable t, which is computed through the following integral:
Method 2
Pros: closest to the "physics" of the situation
Cons: hard to get right, specific to the situation and thus non-reusable
Use the equations of motion of whatever was following that track to derive a transition matrix for any arbitrary time step t. When possible, also come up with an appropriate noise model.
Use a Kalman filter to re-sample both tracks to some equally-spaced time vector, which is preferably different from the time vector of both track 1 and track 2.
Compute the distances between the x,y pairs thus computed, and take the average.
Method 3
Pros: fast, easy
Cons: method is overly optimistic about the smoothness of the tracks. Both fits are biased.
Fit a space curve through track 1
Compute the distances of all points in track 2 to this space curve.
Repeat 1 and 2, but vice versa.
Take the average of all these distances.
Method 4
Pros: fast, easy
Cons: method is overly optimistic about the smoothness of the tracks. Fit will be of lesser quality due to inherently larger noise terms.
Fit a space curve to the union of both tracks. That is, treat points from track 1 and track 2 as a single data set, through which to fit a space curve.
Compute the perpendicular residuals of both tracks with respect to this space curve.
compute the average all these distances.
Remarks
Note that all methods here use the flat-Earth assumption. If the tracks are truly long and cover a non-negligible portion of the Earth's surface, you'll have to compute distances via the Haversine formula rather than a mere Pythagorean root. The Kalman filter is less sensitive to this, provided your equations of motion take care of a spherical Earth.
If you have an elevation model of the region of interest, use that. Of course depending on the area, you'd be surprised how much of a difference that makes compared to a smooth Earth.
Is the x/y data logged as a function of time? If so, you can resample one or both datasets to have to same sample time vector using the resample function for timeseries. You'll have to convert your data to a timeseries object first, but it's worth it. Once both data sets are resampled to the same time vector, you simply subtract one from the other.

finding the global maximum of an unknown surface

I have a model that is solved, returns a single output value and plots it. From those values, I plot a surface using x-values varying from 1-35 and y-values varying from 1-39, and have the returned values as values on the z-axis. See below.
This figure does not behave according to a defined function, it is simply a plot of output values.
I've been trying to use a random optimization algorithm that I created in an attempt to find the global maximum, but it takes a very long time and isn't always correct (when compared to a grid-search algorithm that I use as a comparison). The surface that is created has subtle changes in it, enough to create multiple troublesome local minima and maxima. I'm looking for a way to find the global maximum of this non-convex surface in a relatively quick fashion.
EDIT:
35-by-39 is the search area and that's as big as it gets. The values of the x and y axes are the input values of the model (probably shouldve mentioned that), so each of the z-values are associated with an x and y input coordinate. And my initial guess is usually smack dab in the middle of the search area.
The creation of this figure took about 50 minutes, because each of the 1365 z-values takes about 3 seconds to compute. I'd like to do this without having to use exhaustive enumeration (evaluating every point for a z-value). I'd like this to take around 5 minutes instead of 50.
EDIT(2):
Sorry for the confusion. The figure below is a 35-by-39 grid of z-values and is used purely for reference. In the actual executing of the program, all I have is the x- and y-coordinates, and I am trying to find the global maximum z-value in the fewest function evaluations possible in order to save time. So horchler, in reference to your comment, the latter.
EDIT(3):
The thing with this figure is its only a single example. There are multiple different figures that are formed when I use data from a separate source (i.e. the left side might be uninteresting in this example, but for a separate set of data, it may or may not contain the global max). And this adds to the complexity. It is impossible to tell from the data where the location of the global max will be.
Some surfaces are incredibly smooth, others have large and frequent peaks throughout.

Data interpolation over a non-homogeneous surface

In my project i deal with big data surfaces.
In a certain point, i have a line across the data, and I need the values of the points of the line.
The grid is non,homogeneous, it doesnt go from n:m with fixed steps nor nothing.
Lets ilustrate!
In the figure the 2D proyection of my data can be seen. Each of the points has also other 3 data information. I defined a arbitrary red line with the form y=ax+b. a and b are known.
How can I define i.e. 50 points in the line that has not only the x and y coords (wich is straigforward) but also the interpolation of the 3 data information of each of the points around it.
I know is not an easy question but I can't seem to step forward even a bit.
PD: realize I DONT want code written for me, but the idea of how to achieve my objective.
You could use a tool like triScatteredInterp, which will triangulate the 2-d domain, then interpolate a list of points along your line. Griddata is also an option.
I have a toolbox for problems like this (of course.) It allows me to build a triangulation of the non-convex domain in the (x,y) plane. Then it can form a completely general slice through that surface, interpolating in z also as it does so. The result will be a 1-manifold, in this case a piecewise linear function along that path in (x,y,z). While those tools are not posted on the file exchange, they are available for the person willing to invest the time to learn to use them.
If the surface you describe is a completely general one in 3-d, that might be fairly complex, then you might need a CRUST based tool to define that surface triangulation. These can be found online too. Once a triangulation is available, my tools can then be used to slice them. (Sorry, I never did finish that piece.)
What I did was to define several points in the crack line and then cheack for each one of them in wich quadrilateral it is with inpoligon matlab function (no tthe fastest way but less than 2 secs).
Then I created a triangular plane in the used quadrilaterals using x,y and Z or the othre data , achieving a linear interpolation between the data.
finally i take out all the points that are 0 o Nan.

Sea Ice data - MATLAB 3D matrix

I want to make a matrix in which data is in a matrix and I can pull out each grid in the matrix as a certain long, lat point. The data lasts over 3 years so I would also need a 3rd dimension as time.
What I have now is three 1437x159 doubles of lat, long, and sea ice data. How do I combine them into a 3d matrix that fits the criteria I mentioned above? Basically, I want to be able to say, I want data at -50S lat and 50W lon at day 47 and be able to index into the array and find that answer.
Thanks!
Yes- this can be done without too much difficulty. I have done analogous work in analyzing atmospheric data across time.
The fundamental problem is that you have data organized by hour, with grids varying dynamically over time, and you need to analyze data over time. I would recommend one of two approaches.
Approach 1: Grid Resampling
This involves resampling the grid data over a uniform, standardized grid. Define the grid using the Matlab ndgrid() function, then resample each point using interp2(), and concatonate into a uniform 3D matrix. You can then directly interpolate within this resampled data using interp3(). This approach involves minimal programming, with the trade-off of losing some of the original data in the resampling process.
Approach 2: Dynamic Interpolation
Define a custom class wrapper around your data object, say 'SeaIce', and write your own SeaIce.interp3() method. This method would load the grid information for each hour, perform the interpolation first in the lateral dimension, and subsequently in the time dimension. This ensures that no information is lost via interpolation, with the tradeoff of more coding involved.
The second approach is described in detail (for the wind domain) in my publication "Wind Analysis in Aviation Applications". Slides available here.