It's been a wile I'm trying to come up with an algorithm for parallel eigenvalue decomposition, but non of the algorithms I tried can beat matlab's eig algorithm, so is there anyone who knows which algorithm does matlab use for the eig function ?
or is can anyone suggest me a good parallel algorithm for eigenvalue decomposition ?
MATLAB uses LAPACK for its higher level linear algebra. According to MATLAB's version command, it is Intel's Math Kernel Library (MKL):
>> version('-lapack')
ans =
Intel(R) Math Kernel Library Version 11.0.2 Product Build 20130124 for Intel(R) 64 architecture applications
Linear Algebra PACKage Version 3.4.1
Intel MKL includes very fast implementations of BLAS and LAPACK, but it is not free. For open source options, try Eigen and Armadillo. Their APIs are very intuitive and they are quite fast. And if you believe Eigen's claims, they are the fastest open BLAS available with a superior API to the reference netlib LAPACK (IMO the API claim is pretty obvious once you take a look at the Fortran version!)
You can use Armadillo with OpenBLAS. Both are open source. Recent versions of OpenBLAS also provide LAPACK functions. OpenBLAS uses multiple cores (ie. runs in parallel).
When using Armadillo's eig_sym() function, specify that the divide-and-conquer method is to be used. This makes a big difference on large matrices. For example:
eig_sym(eigval, eigvec, X, "dc")
btw, you can also link Armadillo based code with Intel MKL instead of OpenBLAS. MKL also provides highly optimised LAPACK functions.
Related
Is there any maximum limit for decision variables in scipy linear programming module (minimization) in python? If so, Can it be extended the number of decision variables to 10000? If scipy is limited to number of decision variables, Is there any other software which can be installed in python so that I can proceed with?
The original scipy Simplex LP solver was only for very small problems. The newer scipy Interior Point solver can handle larger problems more reliably. Also make sure to pass on A_eq and/or A_ub as sparse matrices. If you don't do this you may run out of memory.
Having said this, I would be more comfortable with LP solvers that have seen more large, sparse problems than scipy. Most LP solvers have a Python interface.
Finally, larger problems are often (but not always) more complex and it may help to use a modeling tool. This will allow you to express the problem in a more natural way than using matrices. For Python there is PuLP and Pyomo (among others). Some commercial solvers also provide excellent modeling tools.
I like to use cuSolver code for Eigen value decomposition of complex matrix in Matlab.
I am using MATLAB CUDA kernel and it seems that its not possible to interface cuSolver with MATLAB as the cuSolver contains the code for host as well as for device (as mentioned here: http://docs.nvidia.com/cuda/cusolver/#syevd-example1)
while MATLAB CUDA kernel works only for the kernel function..
Please comment.
Any other idea to compute Eigenvalue decomposition of large no of matrices containing complex data in parallel on GPU by using Matlab environment?
You almost certainly need to use the MEX interface. This allows you to take in gpuArray data, and call kernels and other CUDA library functions.
See the doc: http://uk.mathworks.com/help/distcomp/run-mex-functions-containing-cuda-code.html for more.
This page of IMSL says
To obtain improved performance we recommend linking with High
Performance versions of LAPACK and BLAS, if available.
What is High Performance versions of LAPACK and BLAS ?
There are plenty of good implementations to pick from:
Intel MKL is likely the best on Intel machines. It's not free though, so that may be a problem.
According to their benchmark, OpenBLAS compares quite well with Intel MKL and is free
Eigen is also an option and has a largish (albeit old) benchmark showing good performance on small matrices (though it's not technically a drop-in BLAS library)
ATLAS, OSKI, POSKI are examples of auto-tuned kernels which will claim to work on many architectures
Generally, it is quite hard to pick one of these without benchmarking because:
some implementations work better on different types of matrices. For example Eigen works better on matrices with small rank (100s)
some are optimised for specific architectures (e.g. Intel's)
in some cases the multithreading of the BLAS library may conflict with a multithreaded application (e.g. OpenBLAS)
developer's benchmarks may tend to emphasise cases which work better on their implementation.
I would suggest pick one or two of these libraries that apply for your use case and benchmark them for your particular application on your particular (or similar) machine. This is quite easy to do even after compiling your code.
LAPACK and BLAS are performance libraries that provides basically Linear Algebra mathematical operations for a system of linear equations. you can find such libraries useful in the computer vision for example ( Object detection and classifications ) , Classical algorithms, Modelling , ...
TAsking provides a full C implementation of the LAPACK and BLAS performance libraries, both libraries are ISO-C99 Compliant with full documentation and examples, you can check it here
http://www.tasking.com/products/tasking-lapack-performance-libraries
AccelerEyes announced in December 2012 that it works with Mathworks on the GPU code and has discontinued its product Jacket for MATLAB:
http://blog.accelereyes.com/blog/2012/12/12/exciting-updates-from-accelereyes/
Unfortunately they do not sell Jacket licences anymore.
As far as I understand, the Jacket GPU Array solution based on ArrayFire was much faster than the gpuArray solution provided by MATLAB.
I started working with gpuArray, but I see that many functions are implemented poorly. For example a simple
myArray(:) = 0
is very slow. I have written some custom CUDA-Kernels, but the poorly-implemented standard MATLAB functionality adds a lot of overhead, even if working with gpuArrays consistently throughout the code. I fixed some issues by replacing MATLAB code with hand written CUDA code - but I do not want to reimplement the MATLAB standard functionality.
Another feature I am missing is sparse GPU matrices.
So my questions are:
How do is speed up the badly implemented default GPU implementations provided by MATLAB? In particular, how do I speed up sparse matrix operations in MATLAB using the GPU?
MATLAB does support CUDA based GPU. You have to access it from the "Parallel Computing Toolbox". Hope these 2 links also help:
Parallel Computing Toolbox Features
Key Features
Parallel for-loops (parfor) for running task-parallel algorithms on multiple processors
Support for CUDA-enabled NVIDIA GPUs
Full use of multicore processors on the desktop via workers that run locally
Computer cluster and grid support (with MATLAB Distributed Computing Server)
Interactive and batch execution of parallel applications
Distributed arrays and single program multiple data (spmd) construct for large dataset handling and data-parallel algorithms
MATLAB GPU Computing Support for NVIDIA CUDA-Enabled GPUs
Using MATLAB for GPU computing lets you accelerate your applications with GPUs more easily than by using C or Fortran. With the familiar MATLAB language you an take advantage of the CUDA GPU computing technology without having to learn the intricacies of GPU architectures or low-level GPU computing libraries.
You can use GPUs with MATLAB through Parallel Computing Toolbox, which supports:
CUDA-enabled NVIDIA GPUs with compute capability 2.0 or higher. For releases 14a and earlier, compute capability 1.3 is sufficient.
GPU use directly from MATLAB
GPU-enabled MATLAB functions such as fft, filter, and several linear algebra operations
GPU-enabled functions in toolboxes: Image Processing Toolbox, Communications System Toolbox, Statistics and Machine Learning Toolbox, Neural Network Toolbox, Phased Array Systems Toolbox, and Signal Processing Toolbox (Learn more about GPU support for signal processing algorithms)
CUDA kernel integration in MATLAB applications, using only a single line of MATLAB code
Multiple GPUs on the desktop and computer clusters using MATLAB workers in Parallel Computing Toolbox and MATLAB Distributed Computing Server
I had the pleasure of attending a talk by John, the founder of AccelerEyes. They did not get the speedup because they just removed poorly written code and replaced it with code that saved a few bits here and there. Their speedup was mostly from exploiting the availability of cache and doing a lot of operations in-memory (GPU's). Matlab relied on transferring data between GPU and CPU, if I remember correctly, and hence the speedup was crazy.
I'm doing some numerical estimation and correction with the Kalman filter, and would like to better estimate my parameters of Q and R, preferably dynamically.
http://en.wikipedia.org/wiki/Kalman_filter#Estimation_of_the_noise_covariances_Qk_and_Rk
That article mentions that GNU Octave is currently the best way of determining these parameters from data:
http://en.wikipedia.org/wiki/GNU_Octave#C.2B.2B_integration
Unfortunately it is written for Matlab, and there's supposedly a C++ implementation. I'm very weak in C++ and would not even know how to import a C++ library and link it properly in XCode. All of my C++ libraries to date have been wrapped in 3rd party Objective-C classes.
Has anyone used the C++ implementation for scientific computing or engineering applications on iPhone? I'd appreciate any pointers or tutorials on how to do this kind of analysis with Objective-C.
Additional keywords:
estimating covariance from data
Autocovariance Least-Squares (ALS) technique
noise covariance
Thank you!
I do not know of any such C++ library, if you fancy doing numerical analysis on iOS, the best way to go is the accelerate framework, specifically (from this description):
Linear Algebra: LAPACK and BLAS
The Basic Linear Algebra Subprograms (BLAS) and Linear Algebra Package
(LAPACK) libraries contain—as you would expect—functions to perform
linear algebra computations such as solving simultaneous linear
equations, least squares solutions of linear equations, and eigenvalue
problems. The BLAS library serves as a building block for the LAPACK
library. The BLAS and LAPACK libraries are widely distributed and
industry standard computational libraries. They are available on a
number of different platforms and architectures. So, if you are
already using these libraries you should feel right at home, as the
APIs are exactly the same on Mac OS X.
You'll need a fairly good grounding in C, pointers, arrays and such though, no way around it I feel. There is a detailed description of how to use these linear algebra primitives to implement kalman filtering (although this is using R, so probably not of mush use to you).
This is a SO post on Kalman Filtering which expressed my opinion quite well. I'm afraid I think the chances of finding a magic Objective-C wrapper for Kalman Filtering are fairly low, though I would be very happy to be proven wrong!