Why should a kernel function satisfy Mercer's condition in ELM or SVM? - neural-network

In kernalized ELM, they (www.ntu.edu.sg/home/egbhuang/pdf/ELM-Unified-Learning.pdf) mentioned that a kernel should satistsfy Mercer's condition. I didn't find a specific reason behind that. Please explain the reason.

The reason is explained here.
Let me quote it:
Finally, what happens if one uses a kernel which does not satisfy Mercer�s condition? In general, there may exist data such that the Hessian is indefinite, and for which the quadratic programming problem will have no solution (the dual objective function can become arbitrarily large). However, even for kernels that do not satisfy Mercer�s condition, one might still find that a given training set results in a positive semidefinite Hessian, in which case the training will converge perfectly well. In this case, however, the geometrical interpretation described above is lacking."
Burgess (1998)
So without a kernel which is satisfying Mercer's condition, you lose at least some convergence guarantees (it's possible, that you lose even more: e.g. convergence-speed or approximation-accuracy when early-stopping)!

Related

Dymola: why choosing which integration method

Simulating models with dymola, I get different results depending on the chosen integration method. So my question is: why choose which method?
Ideally the choice of method should be based on which one most quickly gives a result close enough to the exact result.
But we don't know the exact result, and in this case at least some of the solvers (likely all) don't generate a result close enough to it.
One possibility is to first try to simulate with a lot stricter tolerance - e.g. 1e-9. (Note: for the fixed step-size solvers, Euler and Rkfix* it would be smaller step-size, but don't start using them.)
Hopefully the difference between the solvers will decrease and the different solvers give more similar results (which should be close to the exact one).
You can then always use this stricter tolerance. Or in case one solver already gave the same result with less strict tolerance - then you can use that one at less strict tolerance; but you also have to make a trade-off between the accuracy and simulation-time.
Sometimes this does not happen and even the same solver generate different results for different tolerances; without converging to a true solution.
(Assumedly the solution are close together at the start but then quickly diverge.)
It is likely that the model is chaotic in that case. That is a bit more complicated to handle and there are several options:
It could be due to a modeling error that can be corrected
It could be that the model is correct, but the system could be changed to become smoother
It could be that the important outputs converge regardless of differences in other variables
It could also be some other error (including problems during initialization), but that requires a more complete example to investigate.
Choose the solver that matches best the exact solution.
Maybe Robert Piché's work on numerical solvers gives some more clues.

lapack simple vs expert driver speed comparison

I want to use lapack to solve problems of type Ax=b, least square, cholesky decomposition and SVD decomposition etc. The manual says two type of drivers exist: simple and expert where expert driver gives more output information but at the cost of more workspace.
I want to know about speed difference between the two drivers.
Is it something like both are same, except for time consumed in copying/saving data to pointers in expert driver mode which is not that significant.
It depends on the driver. For linear square solve ?GESV and ?GESVX the difference is that a condition number estimate is also returned and more importantly the solution is fed to ?GERFS for a refined solution to reduce the error.
Often a relatively(!) considerable slowdown is expected from expert routines. You can test it yourself by using the same input. For GESV/GESVX comparison we had a significant slow down which is now fixed in SciPy 1.0 and solution refining will be skipped while keeping the condition number reporting.
See https://github.com/scipy/scipy/issues/7847 for more information.

Who knows the computational complexity of the function quadprog in MATLAB?

The QP problem is convex. For Wiki, the problem can be solved in polynomial time.
But what exactly is the order?
That is an interesting question with (in my opinion) no clear answer. I am going to assume your problem is convex and you are interested in run-time complexity (as opposed to Iteration complexity).
As you may know, QuadProg is not one algorithm but rather, a generic name for something that solves Quadratic problems. It uses a set of algorithms underneath viz. Interior Point (Default), Trust-Region and Active-Set. Source.
Depending upon what you choose, each of these algorithms will have its own complexity analysis. For Trust-Region and Active-Set methods, the complexity analysis is extremely hard. In fact, Active-Set methods are not polynomial to begin with. Counterexamples exist where Active-Set methods take exponential "time" to converge (This is true also for the Simplex Method for Linear Programs). Source.
Now, assuming that you choose Interior Point methods, the answer is still not straightforward because there are various flavours of these methods. When Karmarkar first proposed this method, it was the first known polynomial algorithm for solving Linear Programs and it had a complexity of O(n^3.5). Source. These bounds were improved quite a lot later. However, this is for Linear Programs.
Finally, to answer your question, Ye and Tse proved in 1989 that we can have an Interior Point method with complexity O(n^3). However, whether MATLAB uses this exact flavor of Interior Point method is a little tricky to know but O(n^3) would be my best guess.
Of course, my answer is rather theoretical; if you want to empirically test it out, you can do so by gradually increasing the number of variables and plotting the CPU time required to get an estimate.

Alternatives to FMINCON

Are there any faster and more efficient solvers other than fmincon? I'm using fmincon for a specific problem and I run out of memory for modest sized vector variable. I don't have any supercomputers or cloud computing options at my disposal, either. I know that any alternate solution will still run out of memory but I'm just trying to see where the problem is.
P.S. I don't want a solution that would change the way I'm approaching the actual problem. I know convex optimization is the way to go and I have already done enough work to get up until here.
P.P.S I saw the other question regarding the open source alternatives. That's not what I'm looking for. I'm looking for more efficient ones, if someone faced the same problem adn shifted to a better solver.
Hmmm...
Without further information, I'd guess that fmincon runs out of memory because it needs the Hessian (which, given that your decision variable is 10^4, will be 10^4 x numel(f(x1,x2,x3,....)) large).
It also takes a lot of time to determine the values of the Hessian, because fmincon normally uses finite differences for that if you don't specify derivatives explicitly.
There's a couple of things you can do to speed things up here.
If you know beforehand that there will be a lot of zeros in your Hessian, you can pass sparsity patterns of the Hessian matrix via HessPattern. This saves a lot of memory and computation time.
If it is fairly easy to come up with explicit formulae for the Hessian of your objective function, create a function that computes the Hessian and pass it on to fmincon via the HessFcn option in optimset.
The same holds for the gradients. The GradConstr (for your non-linear constraint functions) and/or GradObj (for your objective function) apply here.
There's probably a few options I forgot here, that could also help you. Just go through all the options in the optimization toolbox' optimset and see if they could help you.
If all this doesn't help, you'll really have to switch optimizers. Given that fmincon is the pride and joy of MATLAB's optimization toolbox, there really isn't anything much better readily available, and you'll have to search elsewhere.
TOMLAB is a very good commercial solution for MATLAB. If you don't mind going to C or C++...There's SNOPT (which is what TOMLAB/SNOPT is based on). And there's a bunch of things you could try in the GSL (although I haven't seen anything quite as advanced as SNOPT in there...).
I don't know on what version of MATLAB you have, but I know for a fact that in R2009b (and possibly also later), fmincon has a few real weaknesses for certain types of problems. I know this very well, because I once lost a very prestigious competition (the GTOC) because of it. Our approach turned out to be exactly the same as that of the winners, except that they had access to SNOPT which made their few-million variable optimization problem converge in a couple of iterations, whereas fmincon could not be brought to converge at all, whatever we tried (and trust me, WE TRIED). To this day I still don't know exactly why this happens, but I verified it myself when I had access to SNOPT. Once, when I have an infinite amount of time, I'll find this out and report this to the MathWorks. But until then...I lost a bit of trust in fmincon :)

Return elements of the Groebner Basis as they are found

This question could refer to any computer algebra system which has the ability to compute the Groebner Basis from a set of polynomials (Mathematica, Singular, GAP, Macaulay2, MatLab, etc.).
I am working with an overdetermined system of polynomials for which the full groebner basis is too difficult to compute, however it would be valuable for me to be able to print out the groebner basis elements as they are found so that I may know if a particular polynomial is in the groebner basis. Is there any way to do this?
If you implement Buchberger's algorithm on your own, then you can simply print out the elements as the are found.
If you have Mathematica, you can use this code as your starting point.
https://www.msu.edu/course/mth/496/snapshot.afs/groebner.m
See the function BuchbergerSteps.
Due to the way the Buchberger algorithm works (see, for instance, Wikipedia or IVA), the partial results that you could obtain by printing intermediate results are not guaranteed to constitute a Gröbner basis.
Depending on your ultimate goal, you may want to try instead an algorithm for triangularization of ideals, such as Ritt-Wu's algorithm (see IVA or Shang-Ching Chou's book). This is somewhat similar to reduction to row echelon form in Linear Algebra, and you may interrupt the algorithm at any point to get a partially reduced system of polynomial equations.