Numerical integration using Simpson's Rule on discrete data - matlab

I am looking for numerical integration with matlab. I know that there is a trapz function in matlab but the precision is not good enough. By searching it online, I found there is a quad function there it seems only accept symbolic expression as input. My data is all discrete and one-dimensional. Is that any way to use quad on my data? Thanks.

An answer to your question would be no. The only way to perform numerical integration for data with no expression in Matlab is by using the trapz function. If it's not accurate enough for you, try writing your own quad function as Li-aung said, it's very simple, this may help.
Another method you may try is to use the powerful Curve Fitting Tool cftool to make a fit then use the integrate function which can operate on cfit objects (it has a weird convention, the upper limit is the first argument!). I don't think you will get much accurate answers than trapz, it depends on the fit.

Use the spline function in MATLAB to interpolate your data, then integrate this data. This is the standard method for integrating data in discrete form.

You can use quadl() to integrate your data if you first create a function in which you interpolate them.
function f = int_fun(x,xdata,ydata)
f = interp1(xdata,ydata,x);
And then feed it to the quadl() function:
integral = quadl(#int_fun,A,B,[],[],x,y) % syntax to pass extra arguments
% to the function

Integration of a function of one variable is the computation of the area under the curve of the graph of the function. For this answer I'll leave aside the nasty functions and the corner cases and all the twists and turns that trip up writers of numerical integration routines, most of which are probably not relevant here.
Simpson's rule is an approach to the numerical integration of a function for which you have a code to evaluate the function at points within its domain. That's irrelevant here.
Let's suppose that your data represents a time series of values collected at regular intervals. Then you can plot your data as a histogram with bars of equal width. The integrand you seek is the sum of the areas of the bars in the histogram between the limits you are interested in.
You should be able to apply this approach to data sets where the x-axis (ie the width of the bars in the histogram) does not show time, to the situation where the bars are not of equal width, to the situation where the data crosses the x-axis, and most reasonable data sets, quite easily.
The discretisation of your data establishes a limit to the accuracy of the result you can get. If, for example, your time series is sampled at 1sec intervals you can't integrate over an interval which is not a whole number of seconds by this approach. But then, you don't really have the data on which to compute a figure with any more accuracy by any approach. Sure, you can use Matlab (or anything else) to generate extra digits of precision but they don't carry any meaning.

Related

How calculating hessian works for Neural Network learning

Can anyone explain to me in a easy and less mathematical way what is a Hessian and how does it work in practice when optimizing the learning process for a neural network ?
To understand the Hessian you first need to understand Jacobian, and to understand a Jacobian you need to understand the derivative
Derivative is the measure of how fast function value changes withe the change of the argument. So if you have the function f(x)=x^2 you can compute its derivative and obtain a knowledge how fast f(x+t) changes with small enough t. This gives you knowledge about basic dynamics of the function
Gradient shows you in multidimensional functions the direction of the biggest value change (which is based on the directional derivatives) so given a function ie. g(x,y)=-x+y^2 you will know, that it is better to minimize the value of x, while strongly maximize the vlaue of y. This is a base of gradient based methods, like steepest descent technique (used in the traditional backpropagation methods).
Jacobian is yet another generalization, as your function might have many values, like g(x,y)=(x+1, xy, x-z), thus you now have 23 partial derivatives, one gradient per each output value (each of 2 values) thus forming together a matrix of 2*3=6 values.
Now, derivative shows you the dynamics of the function itself. But you can go one step further, if you can use this dynamics to find the optimum of the function, maybe you can do even better if you find out the dynamics of this dynamics, and so - compute derivatives of second order? This is exactly what Hessian is, it is a matrix of second order derivatives of your function. It captures the dynamics of the derivatives, so how fast (in what direction) does the change change. It may seem a bit complex at the first sight, but if you think about it for a while it becomes quite clear. You want to go in the direction of the gradient, but you do not know "how far" (what is the correct step size). And so you define new, smaller optimization problem, where you are asking "ok, I have this gradient, how can I tell where to go?" and solve it analogously, using derivatives (and derivatives of the derivatives form the Hessian).
You may also look at this in the geometrical way - gradient based optimization approximates your function with the line. You simply try to find a line which is closest to your function in a current point, and so it defines a direction of change. Now, lines are quite primitive, maybe we could use some more complex shapes like.... parabolas? Second derivative, hessian methods are just trying to fit the parabola (quadratic function, f(x)=ax^2+bx+c) to your current position. And based on this approximation - chose the valid step.

Function with errors in numerical integration

I'm looking for a function that generates significant errors in numerical integration using Gaussian quadrature or Simpson quadrature.
Since Simpson's and Gaussian's methods are trying to fit a supposedly smooth function with pieces of simple smooth functions, such as 2nd-order polynomials, and otherwise make use of low-order polynomials and other simple algebraic functions such as $$a+5/6$$, it makes sense that the biggest challenges would be functions that aren't 2nd order polynomials or resembling those simple functions.
Step functions, or more generally functions that are constant for short runs then jump to another value. A staircase, or the Walsh functions (used for a kind of binary Fourier transform) should be interesting. Just a plain simple single step does not fit any polynomial approximation very well.
Try a high-order polynomial. Just x^n for a large n should be interesting. Maybe subtract x^n - x^(n-1) for some large n. How large is "large"? For Simpson, perhaps 4 or more. For Gaussian using k points, n>k. (Don't go nuts trying n beyond modest two digit numbers; that just becomes nasty calculation apart from any integration.)
Few numerical integration methods like poles, that is, functions resembling 1/(x-a) for some neighborhood around a. Since it may be trouble to deal with actual infinity, try pushing it off the real line, or a complex conjugate pair. Make a big but finite spike using 1/( (x-a)^2 + b) where b>0 is small. Or the square root of that expression, or the sine or exponential of it. You could replace the "2" with a bigger power, I bet that'll be nasty.
Once upon a time I wanted to test a numerical integration routine. I started with a stairstep function, or train of rectangular pulses, sampled on some set of points.
I computed an approximate derivative using a Savitzky-Golay filter. SG can differentiate numerical data using a finite window of neighboring points, though normally it's used for smoothing. It takes a window size (number of points), polynomial order (2 or 4 in practice, but you may want to go nuts with higher), and differentiation order (normally 0 to smooth, 1 to get derivatives).
The result was a series of pulses, which I then integrated. A good routine will recreate the original stairstep or rectangular pulses. I imagine if the SG parameters are chosen right, you will make Simpson and Gauss roll over in their graves.
If you are looking for a difficult function to integrate as a test method, you could consider the one in the CS Stack Exchange question:
Method for numerical integration of difficult oscillatory integral
In this question, one of the answers suggests using the chebfun library for Matlab, which contains an implementation of a basic Levin-type method. This suggests to me that the function would fail using a simpler method such as Simpsons rule.

Scipy/Python indirect spline interpolation

I need to fit data in quite an indirect way. The original data to be recovered in the fit is some linear function with small oscillations and drifts on it, that I would like to identify. Let's call this f(t). We can not record this parameter in the experiment directly, but only indirectly, let's say as g(f) = sin(a f(t)). (The real transfer funcion is more complex, but it should not play a role in here)
So if f(t) changes direction towards the turning points of the sin function, it is difficult to identify and I tried an alternative approach to recover f(t) than just the inverse function of g and some data continuing guesses:
I create a model function fm(t) which undergoes the same and known transfer function g() and fit g(fm(t)) to the data. As the dataset is huge, I do this piecewise for successive chunks of data guaranteeing the continuity of fm across the whole set.
A first try was to use linear functions using the optimize.leastsq, where the error estimate is derived from g(fm). It is not completely satisfactory, and I think it would be far better to fit a spline to the data to get fspline(t) as a model for f(t), guaranteeing the continuity of the data and of its derivative.
The problem with it is, that spline fitting from the interpolate package works on the data directly, so I can not wrap the spline using g(fspline) and do the spline interpolation on this. Is there a way this can be done in scipy?
Any other ideas?
I tried quadratic functions and fixing the offset and slope such to match the ones of the preceeding fitted chunk of data, so there is only one fitting parameter, the curvature, which very quickly starts to deviate
Thanks
What you would need is a matrix of spline basis functions, b(t), so you can approximate f(t) as a linear combination of spline basis function
f(t) = np.dot(b(t), coefs)
and then estimate the coefficients, coefs, by optimize.leastsq.
However, spline basis functions are not readily available in python, as far as I know (unless you borrow experimental scripts or search through the code of some packages).
Instead you could also use polynomials, for example
b(t) = np.polynomial.chebvander(t, order)
and use a polynomial approximation instead of the splines.
The structure of this problem is very similar to generalized linear models where g is your known link function and similar to index problems in econometrics.
It would be possible to use the scipy splines in an indirect way if you create artificial data
y_i = f(t_i)
where f(t_i) are scipy.interpolate splines, and the y_i are the parameters to be estimated in the least squares optimization. (Loosely based on a script that I saw some time ago that used this for creating a different kind of smoothing splines than the scipy version. I don't remember where I saw this.)
Thank you for these comments. I tried out the polynomial basis suggested above, but polynomials are no option for my needs, ads they tend to create ringing, which is difficult to condition.
The solution on using splines I now found is quite simple and straightforward, and I think it is what you meant by "using the splines in an indirect way".
The fitting function f(t) is obtained by the interpolate.splev(x, (t,c,k)) function, but providing the spline coefficients c by the omptimize.leastsq function. In this way, f(t) is no direct spline fit (as one would usually obtain with the splrep(x, y) function) but indirectly optimized in the fit, and therefore it is possible to use the link function g on it. The initial guess for c might be obtained by one evaluation of splrep(xinit, yinit, t=knots) on model data.
One trick is to restrict the number of knots for the spline to below the number of datapoints by explicitly specifying them during the function call of splrep() and giving this reduced set during the evaluation using splev().

Functional form of 2D interpolation in Matlab

I need to construct an interpolating function from a 2D array of data. The reason I need something that returns an actual function is, that I need to be able to evaluate the function as part of an expression that I need to numerically integrate.
For that reason, "interp2" doesn't cut it: it does not return a function.
I could use "TriScatteredInterp", but that's heavy-weight: my grid is equally spaced (and big); so I don't need the delaunay triangularisation.
Are there any alternatives?
(Apologies for the 'late' answer, but I have some suggestions that might help others if the existing answer doesn't help them)
It's not clear from your question how accurate the resulting function needs to be (or how big, 'big' is), but one approach that you could adopt is to regress the data points that you have using a least-squares or Kalman filter-based method. You'd need to do this with a number of candidate function forms and then choose the one that is 'best', for example by using an measure such as MAE or MSE.
Of course this requires some idea of what the form underlying function could be, but your question isn't clear as to whether you have this kind of information.
Another approach that could work (and requires no knowledge of what the underlying function might be) is the use of the fuzzy transform (F-transform) to generate line segments that provide local approximations to the surface.
The method for this would be:
Define a 2D universe that includes the x and y domains of your input data
Create a 2D fuzzy partition of this universe - chosing partition sizes that give the accuracy you require
Apply the discrete F-transform using your input data to generate fuzzy data points in a 3D fuzzy space
Pass the inverse F-transform as a function handle (along with the fuzzy data points) to your integration function
If you're not familiar with the F-transform then I posted a blog a while ago about how the F-transform can be used as a universal approximator in a 1D case: http://iainism-blogism.blogspot.co.uk/2012/01/fuzzy-wuzzy-was.html
To see the mathematics behind the method and extend it to a multidimensional case then the University of Ostravia has published a PhD thesis that explains its application to various engineering problems and also provides an example of how it is constructed for the case of a 2D universe: http://irafm.osu.cz/f/PhD_theses/Stepnicka.pdf
If you want a function handle, why not define f=#(xi,yi)interp2(X,Y,Z,xi,yi) ?
It might be a little slow, but I think it should work.
If I understand you correctly, you want to perform a surface/line integral of 2-D data. There are ways to do it but maybe not the way you want it. I had the exact same problem and it's annoying! The only way I solved it was using the Surface Fitting Tool (sftool) to create a surface then integrating it.
After you create your fit using the tool (it has a GUI as well), it will generate an sftool object which you can then integrate in (2-D) using quad2d
I also tried your method of using interp2 and got the results (which were similar to the sfobject) but I had no idea how to do a numerical integration (line/surface) with the data. Creating thesfobject and then integrating it was much faster.
It was the first time I do something like this so I confirmed it using a numerically evaluated line integral. According to Stoke's theorem, the surface integral and the line integral should be the same and it did turn out to be the same.
I asked this question in the mathematics stackexchange, wanted to do a line integral of 2-d data, ended up doing a surface integral and then confirming the answer using a line integral!

Using matlab to calculate the properties of a polygon defined as a list of points

Does MATLAB have a built-in function to find general properties like center of mass & moments of inertia for a polygon defined as a list of (non-integer valued) points?
regionprops performs this task for integer valued points, on the assumption that these represent indices of pixels in an image. But the only functions I can find that treat non integral point lists are polyarea and inpolygon.
My kludge for now is to create a bwconncomp structure with all the points multiplied by some large value (like 10,000), then feeding it in to regionprops, but wondered if there is a more elegant solution.
You should check out the submission POLYGEOM by H.J. Sommer on the MathWorks File Exchange. It looks like it has all the property measurements you want, and nice documentation describing the formulae used in the code.
I don't know of a function in MATLAB that would do this for you.
However, poly2mask might be of use for you to create the pixel masks to feed into regionprops. I also suggest that, should you decide to go this route, you carefully test how much the discretization affects the results, so that you don't create crazy large arrays (and waste time) for no real gain in accuracy.
One possibility is to farm out the calculations to the Java Topology Suite. I don't know about "moments of inertia", but it does at least have a centroid method.