Sea Ice data - MATLAB 3D matrix - matlab

I want to make a matrix in which data is in a matrix and I can pull out each grid in the matrix as a certain long, lat point. The data lasts over 3 years so I would also need a 3rd dimension as time.
What I have now is three 1437x159 doubles of lat, long, and sea ice data. How do I combine them into a 3d matrix that fits the criteria I mentioned above? Basically, I want to be able to say, I want data at -50S lat and 50W lon at day 47 and be able to index into the array and find that answer.
Thanks!

Yes- this can be done without too much difficulty. I have done analogous work in analyzing atmospheric data across time.
The fundamental problem is that you have data organized by hour, with grids varying dynamically over time, and you need to analyze data over time. I would recommend one of two approaches.
Approach 1: Grid Resampling
This involves resampling the grid data over a uniform, standardized grid. Define the grid using the Matlab ndgrid() function, then resample each point using interp2(), and concatonate into a uniform 3D matrix. You can then directly interpolate within this resampled data using interp3(). This approach involves minimal programming, with the trade-off of losing some of the original data in the resampling process.
Approach 2: Dynamic Interpolation
Define a custom class wrapper around your data object, say 'SeaIce', and write your own SeaIce.interp3() method. This method would load the grid information for each hour, perform the interpolation first in the lateral dimension, and subsequently in the time dimension. This ensures that no information is lost via interpolation, with the tradeoff of more coding involved.
The second approach is described in detail (for the wind domain) in my publication "Wind Analysis in Aviation Applications". Slides available here.

Related

Matlab calculate geographical distance to lat/lng polyline

In Matlab I would like to calculate the (shortest) distances between a set of independent points (m-by-2 matrix of lat/lng) and a set of polylines (n-by-2 matrix of lat/lng). The resulting table should be an n-m matrix with distances in KM.
I have rewritten this JavaScript implementation (http://www.bdcc.co.uk/Gmaps/BdccGeo.js) to Matlab, but it does not seem to perform well.
Currently I am working on a project with a relatively large set of data and running into performance issues. I have roughly 40.000 points and 150 polylines. The polylines are subsets of the original set of 40.000 points. With about 15 seconds per polyline, calculating all these distances can take up to an hour. Also, the intermediate matrixes of 40000x150x3 cause out of memory errors on my lesser machines.
Instead of optimizing or revising this implementation I am wondering if Matlab doesn't already have some (smarter) functions built in for this. But as far as I can see, the documentation mainly has information on how to display geodata as opposed to doing calculations on it.
Does anyone know or have experience with these kind of calculations in Matlab? Has anything like this already been written which I can reuse so I don't have to reinvent the wheel. And finally, is this expected performance, given these numbers, or should my function be able to perform much better?

MATLAB: How can I get griddata() to cache things to run faster for many identical interpolations?

I'm using the griddata() command in MATLAB to go from a spherical grid of sizes on the order (128x256x1500) to a Cartesian cubic grid, centered on the sphere and containing N^3 regularly-spaced points (where N is between 128 and 512). I need to do this for dozens or hundreds of checkpoints in my simulation, and several variables per checkpoint. I'm going to need to interpolate from the same spherical grid to the same cubic grid several hundred or several thousand times over, using new data on the spherical grid each time!
Since the most computationally expensive part of this routine is the triangulation and interpolation, I would like to cache some information the first time the routine is run and use that information for subsequent runs.
I think I could probably cache a table of vertex indices and associated interpolation weights for every point on the cubic grid, but I'm not sure how/where to do this....
As far as I can tell, this is not possible using the current implementation of griddata(). Is there any way I could do something like this -- perhaps re-writing the griddata() routine?

Calculating the "distance" between two two-dimensional data series

I have two datasets (tracks) with points in x/y which represent GPS positions. I want to analyze the distance between both tracks. The points are not necessary in sync, but having the same frequency, as shown in this little excerpt (each track consists of 1000+ points):
Example Picture
Due to being not in sync I can't just compare the two points which are closest to each other. And since the path is not exactly the same I can't sync the tracks. It might be a solution interpolating a curve for each dataset and then calculating the integral in between. Since the tracks are much longer than shown in the example I can't just use regression functions like polyfit.
How can this be done or are there other/better strategies for analyzing (mean/mean square...) the distance?
am304's answer is by far the easiest, and probably the way to go.
However, I'd like to add a few other ways to do this, which are much more complicated, but could greatly enhance accuracy depending on your use case.
And if it's not for you, then it could be useful for anyone else passing by.
Method 1
Pros: fast, easy
Cons: method is overly optimistic about the smoothness of the tracks
Determine the B-spline representation for both tracks. You then have a parametric relation for both tracks:
The distance between both tracks is then the average of the function
for all applicable t, which is computed through the following integral:
Method 2
Pros: closest to the "physics" of the situation
Cons: hard to get right, specific to the situation and thus non-reusable
Use the equations of motion of whatever was following that track to derive a transition matrix for any arbitrary time step t. When possible, also come up with an appropriate noise model.
Use a Kalman filter to re-sample both tracks to some equally-spaced time vector, which is preferably different from the time vector of both track 1 and track 2.
Compute the distances between the x,y pairs thus computed, and take the average.
Method 3
Pros: fast, easy
Cons: method is overly optimistic about the smoothness of the tracks. Both fits are biased.
Fit a space curve through track 1
Compute the distances of all points in track 2 to this space curve.
Repeat 1 and 2, but vice versa.
Take the average of all these distances.
Method 4
Pros: fast, easy
Cons: method is overly optimistic about the smoothness of the tracks. Fit will be of lesser quality due to inherently larger noise terms.
Fit a space curve to the union of both tracks. That is, treat points from track 1 and track 2 as a single data set, through which to fit a space curve.
Compute the perpendicular residuals of both tracks with respect to this space curve.
compute the average all these distances.
Remarks
Note that all methods here use the flat-Earth assumption. If the tracks are truly long and cover a non-negligible portion of the Earth's surface, you'll have to compute distances via the Haversine formula rather than a mere Pythagorean root. The Kalman filter is less sensitive to this, provided your equations of motion take care of a spherical Earth.
If you have an elevation model of the region of interest, use that. Of course depending on the area, you'd be surprised how much of a difference that makes compared to a smooth Earth.
Is the x/y data logged as a function of time? If so, you can resample one or both datasets to have to same sample time vector using the resample function for timeseries. You'll have to convert your data to a timeseries object first, but it's worth it. Once both data sets are resampled to the same time vector, you simply subtract one from the other.

Resampling data with minimal loss of information in time-domain

I am trying to resample/recreate already recorded data for plotting purposes. I thought this is best place to ask the question (besides dsp.se).
The data is sampled at high frequency, contains to much data points and not suitable for plotting in time domain (not enough memory). i want to sample it with minimal loss. The sampling interval of the resulting data doesn't need to be same (well it is again for plotting purposes, not analysis) although input data in equally sampled.
When we use the regular resample command from matlab/octave, it can distort stiff pieces of the curve.
What is the best approach here?
For reference I put two pictures found in tex.se)
First image is regular resample
Second image is a better resampled data that can well behave around peaks.
You should try this set of files from the File Exchange. It computes optimal lookup table based on either the maximum set of points or a given error. You can choose from natural, linear, or spline for the interpolation methods. Spline will have the smallest table size but is slower than linear. I don't use natural unless I have a really good reason.
Sincerely,
Jason

Process for comparing two datasets

I have two datasets at the time (in the form of vectors) and I plot them on the same axis to see how they relate with each other, and I specifically note and look for places where both graphs have a similar shape (i.e places where both have seemingly positive/negative gradient at approximately the same intervals). Example:
So far I have been working through the data graphically but realize that since the amount of the data is so large plotting each time I want to check how two sets correlate graphically it will take far too much time.
Are there any ideas, scripts or functions that might be useful in order to automize this process somewhat?
The first thing you have to think about is the nature of the criteria you want to apply to establish the similarity. There is a wide variety of ways to measure similarity and the more precisely you can describe what you want for "similar" to mean in your problem the easiest it will be to implement it regardless of the programming language.
Having said that, here is some of the thing you could look at :
correlation of the two datasets
difference of the derivative of the datasets (but I don't think it would be robust enough)
spectral analysis as mentionned by #thron of three
etc. ...
Knowing the origin of the datasets and their variability can also help a lot in formulating robust enough algorithms.
Sure. Call your two vectors A and B.
1) (Optional) Smooth your data either with a simple averaging filter (Matlab 'smooth'), or the 'filter' command. This will get rid of local changes in velocity ("gradient") that appear to be essentially noise (as in the ascending component of the red trace.
2) Differentiate both A and B. Now you are directly representing the velocity of each vector (Matlab 'diff').
3) Add the two differentiated vectors together (element-wise). Call this C.
4) Look for all points in C whose absolute value is above a certain threshold (you'll have to eyeball the data to get a good idea of what this should be). Points above this threshold indicate highly similar velocity.
5) Now look for where a high positive value in C is followed by a high negative value, or vice versa. In between these two points you will have similar curves in A and B.
Note: a) You could do the smoothing after step 3 rather than after step 1. b) Re 5), you could have a situation in which a 'hill' in your data is at the edge of the vector and so is 'cut in half', and the vectors descend to baseline before ascending in the next hill. Then 5) would misidentify the hill as coming between the initial descent and subsequent ascent. To avoid this, you could also require that the points in A and B in between the two points of velocity similarity have high absolute values.