what is high performance version of LAPACK and BLAS? - lapack

This page of IMSL says
To obtain improved performance we recommend linking with High
Performance versions of LAPACK and BLAS, if available.
What is High Performance versions of LAPACK and BLAS ?

There are plenty of good implementations to pick from:
Intel MKL is likely the best on Intel machines. It's not free though, so that may be a problem.
According to their benchmark, OpenBLAS compares quite well with Intel MKL and is free
Eigen is also an option and has a largish (albeit old) benchmark showing good performance on small matrices (though it's not technically a drop-in BLAS library)
ATLAS, OSKI, POSKI are examples of auto-tuned kernels which will claim to work on many architectures
Generally, it is quite hard to pick one of these without benchmarking because:
some implementations work better on different types of matrices. For example Eigen works better on matrices with small rank (100s)
some are optimised for specific architectures (e.g. Intel's)
in some cases the multithreading of the BLAS library may conflict with a multithreaded application (e.g. OpenBLAS)
developer's benchmarks may tend to emphasise cases which work better on their implementation.
I would suggest pick one or two of these libraries that apply for your use case and benchmark them for your particular application on your particular (or similar) machine. This is quite easy to do even after compiling your code.

LAPACK and BLAS are performance libraries that provides basically Linear Algebra mathematical operations for a system of linear equations. you can find such libraries useful in the computer vision for example ( Object detection and classifications ) , Classical algorithms, Modelling , ...
TAsking provides a full C implementation of the LAPACK and BLAS performance libraries, both libraries are ISO-C99 Compliant with full documentation and examples, you can check it here
http://www.tasking.com/products/tasking-lapack-performance-libraries

Related

GPy and GPflow mathematical background - references

Does GPy and GPflow share a common mathematical background? I'm asking this because I'm using GPy but I cannot see the references. However, GPflow provides references in its examples.
Is it Ok using keep using GPy or would you suggest the use GPflow inmediately for gaussian processes purposes?
GPy and GPflow definitely share a common mathematical background: Gaussian processes Rasmussen and Williams, and many of the concepts are very similar in both frameworks: kernels, likelihoods, mean-functions, inducing points, etc. For me, the biggest difference between GPy and GPflow is the computational backend: AFAIK GPy uses plain Python and numpy to perform all its computations, whereas GPflow relies on TensorFlow. This gives GPflow multiple nice features for free: GPU acceleration, automatic gradients, compatibility with TF eco-system, etc. Depending on your use-case, these features can be crucial or simply nice-to-have.
Here is more information on the technical details between the two frameworks:
https://gpflow.readthedocs.io/en/master/intro.html#what-s-the-difference-between-gpy-and-gpflow
That would depend on what you are actually doing.
The very basic GPs should be similar, just that GPflow relies on tensorflow for the gradients (if used) and probably some technical implementation differences.
For the other more advanced models, both libraries provide references to the respective papers in the docs. In my opinion, GPflow's design is mainly centered around the SVGP framework from [1] and [2] (and many other extensions.. I can really recommend [2] if you are interested in the theory).
But they still do provide some other implementations.
I use GPflow since it works on the GPU and offers a lot of state-of-the-art implementations. However, the disadvantage would be that it is under a lot of change.
If you want to use classic GPs and are not too concerned with performance or very up-to-date methods I'd say GPy should be sufficient and the more stable variant.
[1] Hensman, James, Alexander Matthews, and Zoubin Ghahramani. "Scalable variational Gaussian process classification." (2015).
[2] Matthews, Alexander Graeme de Garis. Scalable Gaussian process inference using variational methods. Diss. University of Cambridge, 2017.

pycuda vs theano vs pylearn2

I am currently learning programming with GPU to improve the performance of machine learning algorithms. Initially I try to learn programming cuda with pure c, then I found pycuda which to me a wrapper of cuda library, and then I found theano and pylearn2 and got a little confused:
I understand them in this way:
pycuda: python wrapper for cuda library
theano: similar to numpy but transparent to GPU and CPU
pylearn2: deep learning package which build on theano and implemented several machine learning/deep learning model
Since I am new to GPU programming, should I start learning from C/C++ implementation or starting from pycuda is enough, even starting from theano? E.g. I would like to implement randomForest model after learning the GPU programming.Thanks.
Your understand is almost right. I would just add some remarks about Theano. It's much more than a Numpy which can run on the GPU. Theano is indeed is math expression compiler, which translates symbolic math expressions in highly optimized C/CUDA code, targeted for both CPU and GPU. The code it generates is often much more efficient than the one most programmers would write. Theano also can make symbolic differentiation (very useful for gradient based optimization) and has also a feature to achieve better numerical stability (which probably is something useful, though I don't know for real to what extent). It's very likely Theano will be enough to implement what you need. If you still decide to learn CUDA or PyCUDA, choose the one based no the language you will use, C++ or Python.

matlab parallel eigenvalue decomposition

It's been a wile I'm trying to come up with an algorithm for parallel eigenvalue decomposition, but non of the algorithms I tried can beat matlab's eig algorithm, so is there anyone who knows which algorithm does matlab use for the eig function ?
or is can anyone suggest me a good parallel algorithm for eigenvalue decomposition ?
MATLAB uses LAPACK for its higher level linear algebra. According to MATLAB's version command, it is Intel's Math Kernel Library (MKL):
>> version('-lapack')
ans =
Intel(R) Math Kernel Library Version 11.0.2 Product Build 20130124 for Intel(R) 64 architecture applications
Linear Algebra PACKage Version 3.4.1
Intel MKL includes very fast implementations of BLAS and LAPACK, but it is not free. For open source options, try Eigen and Armadillo. Their APIs are very intuitive and they are quite fast. And if you believe Eigen's claims, they are the fastest open BLAS available with a superior API to the reference netlib LAPACK (IMO the API claim is pretty obvious once you take a look at the Fortran version!)
You can use Armadillo with OpenBLAS. Both are open source. Recent versions of OpenBLAS also provide LAPACK functions. OpenBLAS uses multiple cores (ie. runs in parallel).
When using Armadillo's eig_sym() function, specify that the divide-and-conquer method is to be used. This makes a big difference on large matrices. For example:
eig_sym(eigval, eigvec, X, "dc")
btw, you can also link Armadillo based code with Intel MKL instead of OpenBLAS. MKL also provides highly optimised LAPACK functions.

iOS5 Objective-C library for numerical analysis or GNU Octave wrapper class?

I'm doing some numerical estimation and correction with the Kalman filter, and would like to better estimate my parameters of Q and R, preferably dynamically.
http://en.wikipedia.org/wiki/Kalman_filter#Estimation_of_the_noise_covariances_Qk_and_Rk
That article mentions that GNU Octave is currently the best way of determining these parameters from data:
http://en.wikipedia.org/wiki/GNU_Octave#C.2B.2B_integration
Unfortunately it is written for Matlab, and there's supposedly a C++ implementation. I'm very weak in C++ and would not even know how to import a C++ library and link it properly in XCode. All of my C++ libraries to date have been wrapped in 3rd party Objective-C classes.
Has anyone used the C++ implementation for scientific computing or engineering applications on iPhone? I'd appreciate any pointers or tutorials on how to do this kind of analysis with Objective-C.
Additional keywords:
estimating covariance from data
Autocovariance Least-Squares (ALS) technique
noise covariance
Thank you!
I do not know of any such C++ library, if you fancy doing numerical analysis on iOS, the best way to go is the accelerate framework, specifically (from this description):
Linear Algebra: LAPACK and BLAS
The Basic Linear Algebra Subprograms (BLAS) and Linear Algebra Package
(LAPACK) libraries contain—as you would expect—functions to perform
linear algebra computations such as solving simultaneous linear
equations, least squares solutions of linear equations, and eigenvalue
problems. The BLAS library serves as a building block for the LAPACK
library. The BLAS and LAPACK libraries are widely distributed and
industry standard computational libraries. They are available on a
number of different platforms and architectures. So, if you are
already using these libraries you should feel right at home, as the
APIs are exactly the same on Mac OS X.
You'll need a fairly good grounding in C, pointers, arrays and such though, no way around it I feel. There is a detailed description of how to use these linear algebra primitives to implement kalman filtering (although this is using R, so probably not of mush use to you).
This is a SO post on Kalman Filtering which expressed my opinion quite well. I'm afraid I think the chances of finding a magic Objective-C wrapper for Kalman Filtering are fairly low, though I would be very happy to be proven wrong!

compiler-optimization for numerical stability

do GCC or similar compilers perform optimizations that are aimed at improving the numerical stability of floating-point operations.
It is known that seemingly simply operations like addition or computing the norm of a vector are numerically unstable if implemented in the obvious manner, and, on the other hand, compilers sometimes destroy work-arounds for these problems for the sake of speed optimization.
What is the state of the art of optimizing compiler output for numerical stable computation? Anything better than pending?
Oracle (formerly Sun) compilers for Linux and Solaris.
Their C++ and Fortran compilers have support for Interval Arithmetics