lapack simple vs expert driver speed comparison - lapack

I want to use lapack to solve problems of type Ax=b, least square, cholesky decomposition and SVD decomposition etc. The manual says two type of drivers exist: simple and expert where expert driver gives more output information but at the cost of more workspace.
I want to know about speed difference between the two drivers.
Is it something like both are same, except for time consumed in copying/saving data to pointers in expert driver mode which is not that significant.

It depends on the driver. For linear square solve ?GESV and ?GESVX the difference is that a condition number estimate is also returned and more importantly the solution is fed to ?GERFS for a refined solution to reduce the error.
Often a relatively(!) considerable slowdown is expected from expert routines. You can test it yourself by using the same input. For GESV/GESVX comparison we had a significant slow down which is now fixed in SciPy 1.0 and solution refining will be skipped while keeping the condition number reporting.
See https://github.com/scipy/scipy/issues/7847 for more information.

Related

Dymola: why choosing which integration method

Simulating models with dymola, I get different results depending on the chosen integration method. So my question is: why choose which method?
Ideally the choice of method should be based on which one most quickly gives a result close enough to the exact result.
But we don't know the exact result, and in this case at least some of the solvers (likely all) don't generate a result close enough to it.
One possibility is to first try to simulate with a lot stricter tolerance - e.g. 1e-9. (Note: for the fixed step-size solvers, Euler and Rkfix* it would be smaller step-size, but don't start using them.)
Hopefully the difference between the solvers will decrease and the different solvers give more similar results (which should be close to the exact one).
You can then always use this stricter tolerance. Or in case one solver already gave the same result with less strict tolerance - then you can use that one at less strict tolerance; but you also have to make a trade-off between the accuracy and simulation-time.
Sometimes this does not happen and even the same solver generate different results for different tolerances; without converging to a true solution.
(Assumedly the solution are close together at the start but then quickly diverge.)
It is likely that the model is chaotic in that case. That is a bit more complicated to handle and there are several options:
It could be due to a modeling error that can be corrected
It could be that the model is correct, but the system could be changed to become smoother
It could be that the important outputs converge regardless of differences in other variables
It could also be some other error (including problems during initialization), but that requires a more complete example to investigate.
Choose the solver that matches best the exact solution.
Maybe Robert Piché's work on numerical solvers gives some more clues.

Why should a kernel function satisfy Mercer's condition in ELM or SVM?

In kernalized ELM, they (www.ntu.edu.sg/home/egbhuang/pdf/ELM-Unified-Learning.pdf) mentioned that a kernel should satistsfy Mercer's condition. I didn't find a specific reason behind that. Please explain the reason.
The reason is explained here.
Let me quote it:
Finally, what happens if one uses a kernel which does not satisfy Mercer�s condition? In general, there may exist data such that the Hessian is indefinite, and for which the quadratic programming problem will have no solution (the dual objective function can become arbitrarily large). However, even for kernels that do not satisfy Mercer�s condition, one might still find that a given training set results in a positive semidefinite Hessian, in which case the training will converge perfectly well. In this case, however, the geometrical interpretation described above is lacking."
Burgess (1998)
So without a kernel which is satisfying Mercer's condition, you lose at least some convergence guarantees (it's possible, that you lose even more: e.g. convergence-speed or approximation-accuracy when early-stopping)!

Efficiently extracting column of a matrix

I currently have a piece of code that I'm trying to optimise, and the bottleneck seems to be extracting a given column from a fairly large matrix.
In particular, my code spends 50% of its time doing Wi=W(:,minColIdx). I've also tried linear indexing, but there was no change.
I was wondering if anyone knows why this is the case, and if anyone has any tips that could help me optimise this part of my code.
Thanks!
EDIT: Here is my code: http://pastebin.com/TnTy6a8D
It's really poorly optimised right now, I was just playing around a bit with gpuArray on my new GPU. Lines 44 and 53, where I try to extract columns from W, are where the code bottlenecks.
Can the speed of the operation be improved?
Of course
Is it worth it to optimize the indexing code?
Probably not
Matlab is REALLY good at basic matrix operations (if doing it in C++ is 10% faster I would really be surprised). You can forget about finding a better way to index a matrix, if you really want a noticable performance increase improving your hardware is probably your best bet.
That being said, it is of course always worth thinking about whether you really need to do the heavy calculation that you are attempting, or whether you can think of a smarter algorithm.
I'll take at face value your statement about the bottleneck being extracting the column from a matrix since you don't provide enough detail for me to speculate otherwise, though I find this somewhat surprising.
If you have access to the matlab compiler, I suggest you try compiling your bottleneck function. Try:
help mcc
From within that help you will see that a typical use is:
Make a stand-alone C executable for myfun.m:
mcc -m myfun
You could also try writing a c function to get your column and compile with mex:
http://www.mathworks.com/help/matlab/ref/mex.html

Matlab `corr` gives different results on the same dataset. Is floating-point calculation deterministic?

I am using Matlab's corr function to calculate the correlation of a dataset. While the results agree within the double point accuracy (<10^-14), they are not exactly the same even on the same computer for different runs.
Is floating-point calculation deterministic? Where is the source of the randomness?
Yes and no.
Floating point arithmetic, as in a sequence of operations +, *, etc. is deterministic. However in this case, linear algebra libraries (BLAS, LAPACK, etc) are most likely being used, which may not be: for example, matrix multiplication is typically not performed as a "triple loop" as some references would have you believe, but instead matrices are split up into blocks that are optimised for maximum performance based on things like cache size. Therefore, you will get different sequences of operations, with different intermediate rounding, which will give slightly different results. Typically, however, the variation in these results is smaller than the total rounding error you are incurring.
I have to admit, I am a little bit surprised that you get different results on the same computer, but it is difficult to know why without knowing what the library is doing (IIRC, Matlab uses the Intel BLAS libraries, so you could look at their documentation).

MATLAB sparse matrix solvers? memory errors

In the context of a finite element problem, I have a 12800x12800 sparse matrix. I'm trying to solve the linear system just using MATLAB's \ operator to solve and I get an out of memory error using mldivide. So I'm just wondering if there's a way to speed this up.
I mean, will something like LU factorization actually help here in terms of not getting the memory error anymore? I increased the heap size to 256 GB in preferences, which is the max I can get it to, and I still get the out of memory error.
Also, just a general question. I have 8GB of RAM on my laptop right now. Will upgrading to 16GB help at all? Or maybe something I can do to allocate more memory to MATLAB? I'm pretty unfamiliar with this stuff.
According to this and this you have some options to avoid out of memory problem in matlab:
Increase operating system's virtual memory
Give Higher priority to MATLAB process in task manager
Use 64-bit version of MATLAB
Few months ago, I was working on integer programming in matlab. I faced "out of memory" problem, so I used sparse matrices and followed the mentioned tips, finally the problem is solved!
Are you locked in to using mldivide? Sounds like the perfect situation for an iterative method - bicg, gmres etc?
While backslash takes advantage of the sparsity of A, the qr method it uses produces full matrices that require (number_occupied_elements)^3 memory to be allocated. A few things you can try
If you're dividing sparse matrices with a few diagonals, you can try try to solve the system with forward/backwards substitution
Try breaking the problem into a smaller you break up the problem into a smaller
Run whos to see what elements are occupying your memory before you start the matrix division, can any of these be cleared beforehand?
Not applicable to your problem as you've stated it here, but if your system is defined (A has more rows than columns) than using the pseudo-inverse (A.'*A)\(A.'*b) produces a result using the smaller columns dimension
As for adding additional memory; Matlab32 uses 2^32 bytes of memory (4 Gb) so increasing the physical RAM on your computer won't help unless you're using the the 64 bit version.
MATLAB \ usually tries several methods to solve a problem. First, if it sees that if the structure of your matrix is symmetric it tries a Cholesky factorization. After several steps if it can not find a suitable answer current version of Matlab uses UMFPACK Suitsparse package.
UMFPack is a specific LU implemenation, and it is known for its speed and good usage of memory in practice. It also tries to reduce fill-in and keep matrix as sparse as possible. It is why MATLAB uses this code.
(I am working on UMFPACK for my PhD under supervision of Dr Tim Davis, its creator)
Therefor, using another LU factorization won't help. It is an LU factorization already.
One of the easiest way to solve your problem is testing your problem on another device with a better memory to see if it works.
I guess matlab do some garbage collection and waste some memory, so if you use the UMFPACK directly it might help you. You can either implement it in C/C++ or use MATLAB interface for it. Take a look at the SuitSparse package.
Based on the structure of your matrix I think MATLAB tries to use Cholesky; I don't know what is the strategy of MATLAB if Cholesky fails in memory management. Take into account that Cholesky is easier to manage in terms of memory.
There are other packages that might help you as well. CSparse is a lightweight package and it might help. There are other famouse packages that might be helpful; search for superLU.