Goodness of fit - Comparing few data points with Simulated Equation Curves - matlab

I have a set of reference data points to which I want to fit a sigmoidal curve. I can use the curve fitting tool of MATLAB to do this but I have a custom equation to fit to the data. The equation has 4-5 variables which I want to vary and then test for the goodness of fit.
I tried using the goodnessOfFitfunction for this. But it requires the test data and reference data matrices to be of the same size. The numbers of reference data points that I have are few (15-20) and the number of test points generated by using the custom equation is large.
Is there any other way by which I can check the goodness of fit of the curve? Or do I have find the test data points corresponding to the points in the reference data and then use the goodnessOfFit function (One problem with this approach is that I don't have the same resolution for the x axis in the test and reference data e.g. for a x-point 1.2368 in ref. data I have either 1.23 and 1.24 in my test data. I will have to round off the data and then calculate the fit).

do I have find the test data points corresponding to the points in the reference data and then use the goodnessOfFit function. I will have to round off the data and then calculate the fit).
Yes, buddy..! Seems like you have to do it in the hard way! :/
But instead of simply rounding off, you can find the two points in the test data just before and after the corresponding reference sample point. Then use linear interpolation to guess the value corresponding to the reference point.
Or easier, there is a resamplefunction in Matlab which would resample your test data to match your reference data. This would work if the reference data have a constant sample interval.
All the best!

Related

Measuring curve “closeness” with unequal data ranges

Provided that I have a similar example:
where the blue data is my calculated/measured data and my red data is the given groundtruth data. The task is to get the similarity/closeness between the data and each of the given curves so that a classification can be done, it could also be possible to choose multiple classes if the results seem to be very close.
I can divide the problem in my mind to several subproblems:
The data range is no the same
The resolution of the calculated/measured data is higher than the ground-truth data
The calculated data has some bias/shift
The following questions come to my mind when trying to solve those problems
Is it better to fit the calculated/measured data first then attempting to solve the problem?
Would it be fine to use the data points as is and calculate the mean squared error of each curve assuming it is a fitting attempt and thus getting the best fit? What would be the effect of the bias/shift in this case?
What is a good approach to dealing with the data/range mismatch, by decreasing the number of samples for the higher sampled version or increasing the number of samples for the lower sampled data in the given range?

Resampling data with minimal loss of information in time-domain

I am trying to resample/recreate already recorded data for plotting purposes. I thought this is best place to ask the question (besides dsp.se).
The data is sampled at high frequency, contains to much data points and not suitable for plotting in time domain (not enough memory). i want to sample it with minimal loss. The sampling interval of the resulting data doesn't need to be same (well it is again for plotting purposes, not analysis) although input data in equally sampled.
When we use the regular resample command from matlab/octave, it can distort stiff pieces of the curve.
What is the best approach here?
For reference I put two pictures found in tex.se)
First image is regular resample
Second image is a better resampled data that can well behave around peaks.
You should try this set of files from the File Exchange. It computes optimal lookup table based on either the maximum set of points or a given error. You can choose from natural, linear, or spline for the interpolation methods. Spline will have the smallest table size but is slower than linear. I don't use natural unless I have a really good reason.
Sincerely,
Jason

Sea Ice data - MATLAB 3D matrix

I want to make a matrix in which data is in a matrix and I can pull out each grid in the matrix as a certain long, lat point. The data lasts over 3 years so I would also need a 3rd dimension as time.
What I have now is three 1437x159 doubles of lat, long, and sea ice data. How do I combine them into a 3d matrix that fits the criteria I mentioned above? Basically, I want to be able to say, I want data at -50S lat and 50W lon at day 47 and be able to index into the array and find that answer.
Thanks!
Yes- this can be done without too much difficulty. I have done analogous work in analyzing atmospheric data across time.
The fundamental problem is that you have data organized by hour, with grids varying dynamically over time, and you need to analyze data over time. I would recommend one of two approaches.
Approach 1: Grid Resampling
This involves resampling the grid data over a uniform, standardized grid. Define the grid using the Matlab ndgrid() function, then resample each point using interp2(), and concatonate into a uniform 3D matrix. You can then directly interpolate within this resampled data using interp3(). This approach involves minimal programming, with the trade-off of losing some of the original data in the resampling process.
Approach 2: Dynamic Interpolation
Define a custom class wrapper around your data object, say 'SeaIce', and write your own SeaIce.interp3() method. This method would load the grid information for each hour, perform the interpolation first in the lateral dimension, and subsequently in the time dimension. This ensures that no information is lost via interpolation, with the tradeoff of more coding involved.
The second approach is described in detail (for the wind domain) in my publication "Wind Analysis in Aviation Applications". Slides available here.

Numerical integration using Simpson's Rule on discrete data

I am looking for numerical integration with matlab. I know that there is a trapz function in matlab but the precision is not good enough. By searching it online, I found there is a quad function there it seems only accept symbolic expression as input. My data is all discrete and one-dimensional. Is that any way to use quad on my data? Thanks.
An answer to your question would be no. The only way to perform numerical integration for data with no expression in Matlab is by using the trapz function. If it's not accurate enough for you, try writing your own quad function as Li-aung said, it's very simple, this may help.
Another method you may try is to use the powerful Curve Fitting Tool cftool to make a fit then use the integrate function which can operate on cfit objects (it has a weird convention, the upper limit is the first argument!). I don't think you will get much accurate answers than trapz, it depends on the fit.
Use the spline function in MATLAB to interpolate your data, then integrate this data. This is the standard method for integrating data in discrete form.
You can use quadl() to integrate your data if you first create a function in which you interpolate them.
function f = int_fun(x,xdata,ydata)
f = interp1(xdata,ydata,x);
And then feed it to the quadl() function:
integral = quadl(#int_fun,A,B,[],[],x,y) % syntax to pass extra arguments
% to the function
Integration of a function of one variable is the computation of the area under the curve of the graph of the function. For this answer I'll leave aside the nasty functions and the corner cases and all the twists and turns that trip up writers of numerical integration routines, most of which are probably not relevant here.
Simpson's rule is an approach to the numerical integration of a function for which you have a code to evaluate the function at points within its domain. That's irrelevant here.
Let's suppose that your data represents a time series of values collected at regular intervals. Then you can plot your data as a histogram with bars of equal width. The integrand you seek is the sum of the areas of the bars in the histogram between the limits you are interested in.
You should be able to apply this approach to data sets where the x-axis (ie the width of the bars in the histogram) does not show time, to the situation where the bars are not of equal width, to the situation where the data crosses the x-axis, and most reasonable data sets, quite easily.
The discretisation of your data establishes a limit to the accuracy of the result you can get. If, for example, your time series is sampled at 1sec intervals you can't integrate over an interval which is not a whole number of seconds by this approach. But then, you don't really have the data on which to compute a figure with any more accuracy by any approach. Sure, you can use Matlab (or anything else) to generate extra digits of precision but they don't carry any meaning.

Using matlab to calculate the properties of a polygon defined as a list of points

Does MATLAB have a built-in function to find general properties like center of mass & moments of inertia for a polygon defined as a list of (non-integer valued) points?
regionprops performs this task for integer valued points, on the assumption that these represent indices of pixels in an image. But the only functions I can find that treat non integral point lists are polyarea and inpolygon.
My kludge for now is to create a bwconncomp structure with all the points multiplied by some large value (like 10,000), then feeding it in to regionprops, but wondered if there is a more elegant solution.
You should check out the submission POLYGEOM by H.J. Sommer on the MathWorks File Exchange. It looks like it has all the property measurements you want, and nice documentation describing the formulae used in the code.
I don't know of a function in MATLAB that would do this for you.
However, poly2mask might be of use for you to create the pixel masks to feed into regionprops. I also suggest that, should you decide to go this route, you carefully test how much the discretization affects the results, so that you don't create crazy large arrays (and waste time) for no real gain in accuracy.
One possibility is to farm out the calculations to the Java Topology Suite. I don't know about "moments of inertia", but it does at least have a centroid method.