Function with errors in numerical integration - matlab

I'm looking for a function that generates significant errors in numerical integration using Gaussian quadrature or Simpson quadrature.

Since Simpson's and Gaussian's methods are trying to fit a supposedly smooth function with pieces of simple smooth functions, such as 2nd-order polynomials, and otherwise make use of low-order polynomials and other simple algebraic functions such as $$a+5/6$$, it makes sense that the biggest challenges would be functions that aren't 2nd order polynomials or resembling those simple functions.
Step functions, or more generally functions that are constant for short runs then jump to another value. A staircase, or the Walsh functions (used for a kind of binary Fourier transform) should be interesting. Just a plain simple single step does not fit any polynomial approximation very well.
Try a high-order polynomial. Just x^n for a large n should be interesting. Maybe subtract x^n - x^(n-1) for some large n. How large is "large"? For Simpson, perhaps 4 or more. For Gaussian using k points, n>k. (Don't go nuts trying n beyond modest two digit numbers; that just becomes nasty calculation apart from any integration.)
Few numerical integration methods like poles, that is, functions resembling 1/(x-a) for some neighborhood around a. Since it may be trouble to deal with actual infinity, try pushing it off the real line, or a complex conjugate pair. Make a big but finite spike using 1/( (x-a)^2 + b) where b>0 is small. Or the square root of that expression, or the sine or exponential of it. You could replace the "2" with a bigger power, I bet that'll be nasty.
Once upon a time I wanted to test a numerical integration routine. I started with a stairstep function, or train of rectangular pulses, sampled on some set of points.
I computed an approximate derivative using a Savitzky-Golay filter. SG can differentiate numerical data using a finite window of neighboring points, though normally it's used for smoothing. It takes a window size (number of points), polynomial order (2 or 4 in practice, but you may want to go nuts with higher), and differentiation order (normally 0 to smooth, 1 to get derivatives).
The result was a series of pulses, which I then integrated. A good routine will recreate the original stairstep or rectangular pulses. I imagine if the SG parameters are chosen right, you will make Simpson and Gauss roll over in their graves.

If you are looking for a difficult function to integrate as a test method, you could consider the one in the CS Stack Exchange question:
Method for numerical integration of difficult oscillatory integral
In this question, one of the answers suggests using the chebfun library for Matlab, which contains an implementation of a basic Levin-type method. This suggests to me that the function would fail using a simpler method such as Simpsons rule.

Related

integral or trapz, which one is the more appropriate in MATLAB?

I'm computing multiple integrals using MATLAB.
I'm using the integral function to compute the integral but I was wondering is it faster to use trapz instead of using integral?
I know that trapz introduces a bit of error in the computation, but despite that, with is the best function to compute integrals in MATLAB?
Short and sweet:
Use trapz for discrete data or for selected functional data if you don't care about (potentially extremely) low accuracy of the integral value
Use integral for integrands that have a functional form, adjusting tolerances as needed for speed.
As mentioned by the MATLAB documentation, trapz is intended "to perform numerical integrations on discrete data sets" and leverages the trapezoidal rule for the integrations. The error between the true integral and the trapz approximation is almost entirely dependent on the input x vector (sometimes called the abscissa in integration parlance) with no automatic adaptability. The good part is that if the underlying function is "nice" (i.e., continuous, smooth, no sharp peaks or excessive oscillations, etc.), trapz will likely be the fastest function to approximate the integral since it
Doesn't have to call a function for values (they're input)
Doesn't automatically adapt (which takes time and can be complex to
implement).
However, for general integrals, trapz may also be the most inaccurate and may require a denser x vector to calculate a low-error value.
For discrete data, this is a short-coming that must be lived with, but if the integrand has a functional form, integral and its family is highly recommended.
The black-box numeric integrators in MATLAB have evolved over the years, and MathWorks co-founder Clever Moler has a nice blog post going over some of the evolutions. The post discusses the quad, quadl, and quadgk functions and how quadgk became the core for integral and its ilk. The basic breakdown of the three functions is
quad uses a three-point and five-point Simpson's Rule
quadl uses a four-seven-thirteen point1 Lobatto-Kronrod2 rule
quadgk a uses seven-fifteen point Gauss-Kronrod2 rule
to acquire both an approximation of the integral and an error approximation for adaptive quadrature. The summary of the history lesson and test problems is that quadgk was written with vectorization incorporated3, uses a higher-order rule which excludes end-points, and gives extremely accurate answers faster than its competitors. As a result, quadgk is the core of the new and highly-recommended integral family.
1 Adaptive quadrature usually lists the number of points used to form its approximation of the value and the error. Typically, there are two numbers that indicate the number of points to form the low-order and high-order approximations. quadl is interesting in that it uses a four-point Gauss-Lobatto rule and seven-point and thirteen-point Kronrod extensions for its error handling.
2 Gaussian Quadrature, which is an integration technique that chooses it abscissa to exactly integrate a family of polynomials over a given interval instead of prescribing them as in Newton-Cotes, has a lot of names associated with it to indicate a lot of "stuff" that's going on without being explicit about it (which can be very annoying to newcomers). "Gauss" refers to the aforementioned method of choosing abscissa and associated weights for the integration. "Lobatto" indicates an extension to Gauss-Legendre integration methods that incorporates end-points (others may not like my link between these two, but I find the parallels pleasing). "Kronrod" indicates an extension to any particular Gauss rule that creates a high-order rule using a given set of abscissa and adding to it; this creates a "nesting" (the low-order points are part of the high-order point set) that results in fewer function evaluations overall.
3 Since vectorization is written into integral, integrands or limits that are vector-valed must use the 'ArrayValued' flag to tell the program to make functional evaluations differently so as not to create a size-mismatch error. It might be possible to program around this to a certain extent, but the MathWorks decided not to.

What's the best way to calculate a numerical derivative in MATLAB?

(Note: This is intended to be a community Wiki.)
Suppose I have a set of points xi = {x0,x1,x2,...xn} and corresponding function values fi = f(xi) = {f0,f1,f2,...,fn}, where f(x) is, in general, an unknown function. (In some situations, we might know f(x) ahead of time, but we want to do this generally, since we often don't know f(x) in advance.) What's a good way to approximate the derivative of f(x) at each point xi? That is, how can I estimate values of dfi == d/dx fi == df(xi)/dx at each of the points xi?
Unfortunately, MATLAB doesn't have a very good general-purpose, numerical differentiation routine. Part of the reason for this is probably because choosing a good routine can be difficult!
So what kinds of methods are there? What routines exist? How can we choose a good routine for a particular problem?
There are several considerations when choosing how to differentiate in MATLAB:
Do you have a symbolic function or a set of points?
Is your grid evenly or unevenly spaced?
Is your domain periodic? Can you assume periodic boundary conditions?
What level of accuracy are you looking for? Do you need to compute the derivatives within a given tolerance?
Does it matter to you that your derivative is evaluated on the same points as your function is defined?
Do you need to calculate multiple orders of derivatives?
What's the best way to proceed?
These are just some quick-and-dirty suggestions. Hopefully somebody will find them helpful!
1. Do you have a symbolic function or a set of points?
If you have a symbolic function, you may be able to calculate the derivative analytically. (Chances are, you would have done this if it were that easy, and you would not be here looking for alternatives.)
If you have a symbolic function and cannot calculate the derivative analytically, you can always evaluate the function on a set of points, and use some other method listed on this page to evaluate the derivative.
In most cases, you have a set of points (xi,fi), and will have to use one of the following methods....
2. Is your grid evenly or unevenly spaced?
If your grid is evenly spaced, you probably will want to use a finite difference scheme (see either of the Wikipedia articles here or here), unless you are using periodic boundary conditions (see below). Here is a decent introduction to finite difference methods in the context of solving ordinary differential equations on a grid (see especially slides 9-14). These methods are generally computationally efficient, simple to implement, and the error of the method can be simply estimated as the truncation error of the Taylor expansions used to derive it.
If your grid is unevenly spaced, you can still use a finite difference scheme, but the expressions are more difficult and the accuracy varies very strongly with how uniform your grid is. If your grid is very non-uniform, you will probably need to use large stencil sizes (more neighboring points) to calculate the derivative at a given point. People often construct an interpolating polynomial (often the Lagrange polynomial) and differentiate that polynomial to compute the derivative. See for instance, this StackExchange question. It is often difficult to estimate the error using these methods (although some have attempted to do so: here and here). Fornberg's method is often very useful in these cases....
Care must be taken at the boundaries of your domain because the stencil often involves points that are outside the domain. Some people introduce "ghost points" or combine boundary conditions with derivatives of different orders to eliminate these "ghost points" and simplify the stencil. Another approach is to use right- or left-sided finite difference methods.
Here's an excellent "cheat sheet" of finite difference methods, including centered, right- and left-sided schemes of low orders. I keep a printout of this near my workstation because I find it so useful.
3. Is your domain periodic? Can you assume periodic boundary conditions?
If your domain is periodic, you can compute derivatives to a very high order accuracy using Fourier spectral methods. This technique sacrifices performance somewhat to gain high accuracy. In fact, if you are using N points, your estimate of the derivative is approximately N^th order accurate. For more information, see (for example) this WikiBook.
Fourier methods often use the Fast Fourier Transform (FFT) algorithm to achieve roughly O(N log(N)) performance, rather than the O(N^2) algorithm that a naively-implemented discrete Fourier transform (DFT) might employ.
If your function and domain are not periodic, you should not use the Fourier spectral method. If you attempt to use it with a function that is not periodic, you will get large errors and undesirable "ringing" phenomena.
Computing derivatives of any order requires 1) a transform from grid-space to spectral space (O(N log(N))), 2) multiplication of the Fourier coefficients by their spectral wavenumbers (O(N)), and 2) an inverse transform from spectral space to grid space (again O(N log(N))).
Care must be taken when multiplying the Fourier coefficients by their spectral wavenumbers. Every implementation of the FFT algorithm seems to have its own ordering of the spectral modes and normalization parameters. See, for instance, the answer to this question on the Math StackExchange, for notes about doing this in MATLAB.
4. What level of accuracy are you looking for? Do you need to compute the derivatives within a given tolerance?
For many purposes, a 1st or 2nd order finite difference scheme may be sufficient. For higher precision, you can use higher order Taylor expansions, dropping higher-order terms.
If you need to compute the derivatives within a given tolerance, you may want to look around for a high-order scheme that has the error you need.
Often, the best way to reduce error is reducing the grid spacing in a finite difference scheme, but this is not always possible.
Be aware that higher-order finite difference schemes almost always require larger stencil sizes (more neighboring points). This can cause issues at the boundaries. (See the discussion above about ghost points.)
5. Does it matter to you that your derivative is evaluated on the same points as your function is defined?
MATLAB provides the diff function to compute differences between adjacent array elements. This can be used to calculate approximate derivatives via a first-order forward-differencing (or forward finite difference) scheme, but the estimates are low-order estimates. As described in MATLAB's documentation of diff (link), if you input an array of length N, it will return an array of length N-1. When you estimate derivatives using this method on N points, you will only have estimates of the derivative at N-1 points. (Note that this can be used on uneven grids, if they are sorted in ascending order.)
In most cases, we want the derivative evaluated at all points, which means we want to use something besides the diff method.
6. Do you need to calculate multiple orders of derivatives?
One can set up a system of equations in which the grid point function values and the 1st and 2nd order derivatives at these points all depend on each other. This can be found by combining Taylor expansions at neighboring points as usual, but keeping the derivative terms rather than cancelling them out, and linking them together with those of neighboring points. These equations can be solved via linear algebra to give not just the first derivative, but the second as well (or higher orders, if set up properly). I believe these are called combined finite difference schemes, and they are often used in conjunction with compact finite difference schemes, which will be discussed next.
Compact finite difference schemes (link). In these schemes, one sets up a design matrix and calculates the derivatives at all points simultaneously via a matrix solve. They are called "compact" because they are usually designed to require fewer stencil points than ordinary finite difference schemes of comparable accuracy. Because they involve a matrix equation that links all points together, certain compact finite difference schemes are said to have "spectral-like resolution" (e.g. Lele's 1992 paper--excellent!), meaning that they mimic spectral schemes by depending on all nodal values and, because of this, they maintain accuracy at all length scales. In contrast, typical finite difference methods are only locally accurate (the derivative at point #13, for example, ordinarily doesn't depend on the function value at point #200).
A current area of research is how best to solve for multiple derivatives in a compact stencil. The results of such research, combined, compact finite difference methods, are powerful and widely applicable, though many researchers tend to tune them for particular needs (performance, accuracy, stability, or a particular field of research such as fluid dynamics).
Ready-to-Go Routines
As described above, one can use the diff function (link to documentation) to compute rough derivatives between adjacent array elements.
MATLAB's gradient routine (link to documentation) is a great option for many purposes. It implements a second-order, central difference scheme. It has the advantages of computing derivatives in multiple dimensions and supporting arbitrary grid spacing. (Thanks to #thewaywewalk for pointing out this glaring omission!)
I used Fornberg's method (see above) to develop a small routine (nderiv_fornberg) to calculate finite differences in one dimension for arbitrary grid spacings. I find it easy to use. It uses sided stencils of 6 points at the boundaries and a centered, 5-point stencil in the interior. It is available at the MATLAB File Exchange here.
Conclusion
The field of numerical differentiation is very diverse. For each method listed above, there are many variants with their own set of advantages and disadvantages. This post is hardly a complete treatment of numerical differentiation.
Every application is different. Hopefully this post gives the interested reader an organized list of considerations and resources for choosing a method that suits their own needs.
This community wiki could be improved with code snippets and examples particular to MATLAB.
I believe there is more in to these particular questions. So I have elaborated on the subject further as follows:
(4) Q: What level of accuracy are you looking for? Do you need to compute the derivatives within a given tolerance?
A: The accuracy of numerical differentiation is subjective to the application of interest. Usually the way it works is, if you are using the ND in forward problem to approximate the derivatives to estimate features from signal of interest, then you should be aware of noise perturbations. Usually such artifacts contain high frequency components and by the definition of the differentiator, the noise effect will be amplified in the magnitude order of $i\omega^n$. So, increasing the accuracy of differentiator (increasing the polynomial accuracy) will no help at all. In this case you should be able to cancelt the effect of noise for differentiation. This can be done in casecade order: first smooth the signal, and then differentiate. But a better way of doing this is to use "Lowpass Differentiator". A good example of MATLAB library can be found here.
However, if this is not the case and you're using ND in inverse problems, such as solvign PDEs, then the global accuracy of differentiator is very important. Depending on what kind of bounady condition (BC) suits your problem, the design will be adapted accordingly. The rule of thump is to increase the numerical accuracy known is the fullband differentiator. You need to design a derivative matrix that takes care of suitable BC. You can find comprehensive solutions to such designs using the above link.
(5) Does it matter to you that your derivative is evaluated on the same points as your function is defined?
A: Yes absolutely. The evaluation of the ND on the same grid points is called "centralized" and off the points "staggered" schemes. Note that using odd order of derivatives, centralized ND will deviate the accuracy of frequency response of the differentiator. Therefore, if you're using such design in inverse problems, this will perturb your approximation. Also, the opposite applies to the case of even order of differentiation utilized by staggered schemes. You can find comprehensive explanation on this subject using the link above.
(6) Do you need to calculate multiple orders of derivatives?
This totally depends on your application at hand. You can refer to the same link I have provided and take care of multiple derivative designs.

How calculating hessian works for Neural Network learning

Can anyone explain to me in a easy and less mathematical way what is a Hessian and how does it work in practice when optimizing the learning process for a neural network ?
To understand the Hessian you first need to understand Jacobian, and to understand a Jacobian you need to understand the derivative
Derivative is the measure of how fast function value changes withe the change of the argument. So if you have the function f(x)=x^2 you can compute its derivative and obtain a knowledge how fast f(x+t) changes with small enough t. This gives you knowledge about basic dynamics of the function
Gradient shows you in multidimensional functions the direction of the biggest value change (which is based on the directional derivatives) so given a function ie. g(x,y)=-x+y^2 you will know, that it is better to minimize the value of x, while strongly maximize the vlaue of y. This is a base of gradient based methods, like steepest descent technique (used in the traditional backpropagation methods).
Jacobian is yet another generalization, as your function might have many values, like g(x,y)=(x+1, xy, x-z), thus you now have 23 partial derivatives, one gradient per each output value (each of 2 values) thus forming together a matrix of 2*3=6 values.
Now, derivative shows you the dynamics of the function itself. But you can go one step further, if you can use this dynamics to find the optimum of the function, maybe you can do even better if you find out the dynamics of this dynamics, and so - compute derivatives of second order? This is exactly what Hessian is, it is a matrix of second order derivatives of your function. It captures the dynamics of the derivatives, so how fast (in what direction) does the change change. It may seem a bit complex at the first sight, but if you think about it for a while it becomes quite clear. You want to go in the direction of the gradient, but you do not know "how far" (what is the correct step size). And so you define new, smaller optimization problem, where you are asking "ok, I have this gradient, how can I tell where to go?" and solve it analogously, using derivatives (and derivatives of the derivatives form the Hessian).
You may also look at this in the geometrical way - gradient based optimization approximates your function with the line. You simply try to find a line which is closest to your function in a current point, and so it defines a direction of change. Now, lines are quite primitive, maybe we could use some more complex shapes like.... parabolas? Second derivative, hessian methods are just trying to fit the parabola (quadratic function, f(x)=ax^2+bx+c) to your current position. And based on this approximation - chose the valid step.

maximum of a polynomial

I have a polynomial of order N (where N is even). This polynomial is equal to minus infinity for x minus/plus infinity (thus it has a maximum). What I am doing right now is taking the derivative of the polynomial by using polyder then finding the roots of the N-1 th order polynomial by using the roots function in Matlab which returns N-1 solutions. Then I am picking the real root that really maximizes the polynomial. The problem is that I am updating my polynomial a lot and at each time step I am using the above procedure to find the maximizer. Therefore, the roots function takes too much of a computation time making my application slow. Is there a way either in Matlab or a proposed algorithm that does this maximization in a computationally efficient fashion( i.e. just finding one solution instead of N-1 solutions)? Thanks.
Edit: I would also like to know whether there is a routine in Matlab that only returns the real roots instead of
roots which returns all real/complex ones.
I think that you are probably out of luck. If the coefficients of the polynomial change at every time step in an arbitrary fashion, then ultimately you are faced with a distinct and unrelated optimisation problem at every stage. There is insufficient information available to consider calculating just a subset of roots of the derivative polynomial - how could you know which derivative root provides the maximum stationary point of the polynomial without comparing the function value at ALL of the derivative roots?? If your polynomial coefficients were being perturbed at each step by only a (bounded) small amount or in a predictable manner, then it is conceivable that you would be able to try something iterative to refine the solution at each step (for example something crude such as using your previous roots as starting point of a new set of newton iterations to identify the updated derivative roots), but the question does not suggest that this is in fact the case so I am just guessing. I could be completely wrong here but you might just be out of luck in getting something faster unless you can provide more information of have some kind of relationship between the polynomials generated at each step.
There is a file exchange submission by Steve Morris which finds all real roots of functions on a given interval. It does so by interpolating the polynomial by a Chebychev polynomial, and finding its roots.
You can modify the eig evaluation of the companion matrix in there, to eigs. This allows you to find only one (or a few) roots and save time (there's a fair chance it's also possible to compute the roots or extrema of a Chebychev analytically, although I could not find a good reference for that (or even a bad one for that matter...)).
Another attempt that you can make in speeding things up, is to note that polyder does nothing more than
Pprime = (numel(P)-1:-1:1) .* P(1:end-1);
for your polynomial P. Also, roots does nothing more than find the eigenvalues of the companion matrix, so you could find these eigenvalues yourself, which prevents a call to roots. This could both be beneficial, because calls to non-builtin functions inside a loop prevent Matlab's JIT compiler from translating the loop to machine language. This could otherwise give you a large speed gain (factors of 100 or more are not uncommon).

Scipy/Python indirect spline interpolation

I need to fit data in quite an indirect way. The original data to be recovered in the fit is some linear function with small oscillations and drifts on it, that I would like to identify. Let's call this f(t). We can not record this parameter in the experiment directly, but only indirectly, let's say as g(f) = sin(a f(t)). (The real transfer funcion is more complex, but it should not play a role in here)
So if f(t) changes direction towards the turning points of the sin function, it is difficult to identify and I tried an alternative approach to recover f(t) than just the inverse function of g and some data continuing guesses:
I create a model function fm(t) which undergoes the same and known transfer function g() and fit g(fm(t)) to the data. As the dataset is huge, I do this piecewise for successive chunks of data guaranteeing the continuity of fm across the whole set.
A first try was to use linear functions using the optimize.leastsq, where the error estimate is derived from g(fm). It is not completely satisfactory, and I think it would be far better to fit a spline to the data to get fspline(t) as a model for f(t), guaranteeing the continuity of the data and of its derivative.
The problem with it is, that spline fitting from the interpolate package works on the data directly, so I can not wrap the spline using g(fspline) and do the spline interpolation on this. Is there a way this can be done in scipy?
Any other ideas?
I tried quadratic functions and fixing the offset and slope such to match the ones of the preceeding fitted chunk of data, so there is only one fitting parameter, the curvature, which very quickly starts to deviate
Thanks
What you would need is a matrix of spline basis functions, b(t), so you can approximate f(t) as a linear combination of spline basis function
f(t) = np.dot(b(t), coefs)
and then estimate the coefficients, coefs, by optimize.leastsq.
However, spline basis functions are not readily available in python, as far as I know (unless you borrow experimental scripts or search through the code of some packages).
Instead you could also use polynomials, for example
b(t) = np.polynomial.chebvander(t, order)
and use a polynomial approximation instead of the splines.
The structure of this problem is very similar to generalized linear models where g is your known link function and similar to index problems in econometrics.
It would be possible to use the scipy splines in an indirect way if you create artificial data
y_i = f(t_i)
where f(t_i) are scipy.interpolate splines, and the y_i are the parameters to be estimated in the least squares optimization. (Loosely based on a script that I saw some time ago that used this for creating a different kind of smoothing splines than the scipy version. I don't remember where I saw this.)
Thank you for these comments. I tried out the polynomial basis suggested above, but polynomials are no option for my needs, ads they tend to create ringing, which is difficult to condition.
The solution on using splines I now found is quite simple and straightforward, and I think it is what you meant by "using the splines in an indirect way".
The fitting function f(t) is obtained by the interpolate.splev(x, (t,c,k)) function, but providing the spline coefficients c by the omptimize.leastsq function. In this way, f(t) is no direct spline fit (as one would usually obtain with the splrep(x, y) function) but indirectly optimized in the fit, and therefore it is possible to use the link function g on it. The initial guess for c might be obtained by one evaluation of splrep(xinit, yinit, t=knots) on model data.
One trick is to restrict the number of knots for the spline to below the number of datapoints by explicitly specifying them during the function call of splrep() and giving this reduced set during the evaluation using splev().