armadillo linear system solver (with openblas) - multicore

I've been testing various open source codes for solving a linear system of equations in C++. So far the fastest I've found is armadillo, using the OPENblas package as well. To solve a dense linear NxN system, where N=5000 takes around 8.3 seconds on my system, which is really really fast (without openblas installed, it takes around 30 seconds).
One reason for this increase is that armadillo+openblas seems to enable using multiple threads. It runs on two of my cores, whereas armadillo without openblas only uses 1. I have an i7 processor, so I want to increase the number of cores, and test it further. I'm using ubuntu, so from the openblas documentation I can do in the terminal:
export OPENBLAS_NUM_THREADS=4
however, running the code again doesn't seem to increase the number of cores being used or the speed. Am i doing something wrong, or is the 2 the max amount for using armadillo's "solve(A,b)" command? I wasn't able to find armadillo's source code anywhere to take a look.
Incidentally does anybody know the methods armadillo/openblas use for solving Ax=b (standard LU decomposition with parallelism or something else) ? Thanks!
edit: Actually the number of cores stuck at 2 seems to be a bug when installing openblas with synaptic package manager see here. Reinstalling from source allows it to detect how many cores i actutally have (8). Now I can use export OPENBLAS_NUM_THREADS=4 etc to govern it.

Armadillo doesn't prevent OpenBlas from using more cores. It's possible that the current implementation of OpenBlas simply chooses 2 cores for certain operations.
You can see Armadillo's source code directly in the downloadable package (it's open source), in the folder "include". Specifically, have a look at the file "include/armadillo_bits/fn_solve.hpp" (which contains the user accessible solve() function), and the file "include/armadillo_bits/auxlib_meat.hpp" (which contains the wrapper and housekeeping code for calling the torturous Blas and Lapack functions).
If you already have Armadillo installed on your machine, have a look at "/usr/include/armadillo_bits" or "/usr/local/include/armadillo_bits".

Related

Very slow execution of Matlab code under ubuntu

I was using MATLAB 2012a under windows 7 and I was executing some intense code, and I mean by intense in terms of memory usage and processing time, however, the code was working fine on Windows. Now, I changed my OS to ubuntu 12.04 and I installed Matlab 2013a. The amount of memory used is considerably less than the way it was in Windows, but the time taken by matlab to execute the same code is extremely high-really high.
I need to mention that my code contain nothing that may take such huge time except a statement of sparse with symbolic substitution as one of the arguments as follows
K=zeros(Np,Np);
for i=1:ord
K=K+sparse(t(1:ord,:),repmat(t(i,:),ord,1),double(subs(Kv(:,i),Arg(Kv,1,1,6),Arg(Kv,1,2,6))),Np,Np);
end
Note: that Kv is a symbolic matrix and Arg is a function to provide OLD and NEW and it depends on a number of global variables.
I have the feeling that I missed to add something to ubuntu that might help accelerate the execution of the Matlab codes.
Any ideas ?
I had a similar problem at windows, but I believe the solution is same on Ubuntu LTS.
So, if you increase the Java Heap Memory of Matlab, the Matlab will consume more memory from your system but it will be faster.
To do that go to:
File->preferences->General->Java Heap Memory and increase to the maximum.
The default value is 128, that is too little.
If heap memory limit doesn't fix the issue, then try increasing matlab process.
First start matlab, then do
ps aux|grep MATLAB
In my case the result is:
comtom 9769 28.2 19.8 4360632 761808 tty2 S<l+ 14:00 1:50 /usr/local/MATLAB/MATLAB_Production_Server/R2015a/bin/glnxa64/MATLAB -desktop
Look at first number (PID). Then use it with command renice to change process priority:
renice -3 -p 9769
That's it. The GUI is very slow because it's built against outdated Xorg libs. So changing priority helps, you may notice some gnome effect's tear, but matlab's interface will work a lot better.

Will Matlab standalone be faster than Matlab from UI for long execution code?

I have built an standalone Matlab application. I was expecting it to be faster than running the application from the Matlab environent but it is indeed a bit slower (1.3 seg per iteration vs 1.5 seg per iteration)
I am not counting the init time required by MCR but the execution of my code.
Is that the expected performance or should I be obtaining a performance improvement?
I haven't found any settings on the deployment tool that could help to reduce execution time.
Thanks in advance
Applications built with MATLAB Compiler should execute at pretty much exactly the same speed as within MATLAB.
MATLAB Compiler does not convert your MATLAB code into machine code in the same way as a C compiler does for C. What it does is to archive and encrypt your MATLAB code (note, it properly encrypts it, not just pcodes it as a comment suggests), create a thin executable wrapper and package them together, possibly also with MATLAB Compiler Runtime (MCR). MCR is very similar to MATLAB itself, without a graphical user interface, and is freely redistibutable.
When you run the executable, it dearchives and decrypts your MATLAB code and runs it against the MCR. It should run exactly the same, both in terms of results and speed.
Very old versions of MATLAB Compiler (pre-version 4.0) worked in a different way, converting a subset of the MATLAB language into C code, and compiling this. This provided a potentially significant speed-up, but only a subset of the language was supported and results, unless you were careful, could sometimes be different. Similar functionality is now available in the separate MATLAB Coder product.
There are a few small things you can do to improve performance: for example, within deploytool you can specify which toolboxes your application uses. deploytool uses a dependency checker to package up all MATLAB functionality that it thinks your code might possibly depend on, but it can't always tell exactly, as the functions your code needs might change at runtime. It therefore errs on the side of caution and includes more than necessary. By specifying only the toolboxes you know to be necessary, you can speed things up a little (it also speeds up the build process quite a bit).

Determine the Maximum Number of Processors Available for matlabpool (MATLAB Parallel Toolbox)

I'm currently writing some code in MATLAB that uses the parfor loop to speed up some tedious calculations.
My issue is that the code will be run on a remote cluster, and could be run on 4-core, 8-core or 12-core machines (I won't know which one in advance)...
I basically need a code snippet that will allow MATLAB to determine the maximum number of cores that can be used in matlabpool. If we call this variable maxcores, I can then go ahead and use
matlabpool('open',maxcores).
so that I can make sure that I am using all the cores that are available to me.
You can get the number of cores on the machine through feature('numCores'), which is undocumented but seems unlikely to break. (source)
Someone claims there that getNumberOfComputationalThreads also works since R2007a, but it doesn't on my R2012a.
Beyond Dougal's response, I found getenv('NUMBER_OF_PROCESSORS') returns the number of threads on my Windows systems.

MATLAB and using multiple cores to run calculations

Hey all. Im trying to sort out how to get MATLAB running as best as possible. I have a pretty decent new machine.
12GB RAM
Core i7 3.2Ghz Cpu
and lots of free space.
and a strong graphics card.
However when I run the benchmark test of MATLAB (command bench) it lists the computer as being near the worst, around a Windows XP single core 1.7Ghz machine.
Any ideas why and how I can improve this??
Thanks very much
Firstly, I would recommend re-running the bench command a few times to make sure MATLAB has fully loaded all the libraries etc. it needs. Much of MATLAB is loaded on demand, so it's always best to time the second or third run.
MATLAB automatically takes advantage of multiple cores when executing certain operations which are multithreaded. For example lots of elementwise operations such as +, .* and so on as well as BLAS-backed operations (and probably others). This page lists those things which are multithreaded.
Parallel Computing Toolbox is useful when MATLAB's intrinsic multithreading can't help (if it can, then it's usually the fastest way to do things). This gives you explicit parallelism via PARFOR, SPMD and distributed arrays.
You need the Parallel Processing Toolbox. A lot of MATLAB functions are multithreaded but to parallelize your own code, you'll need it. A dumb hack is to open several instances of command-line MATLAB. You could also write multithreaded MEX files but the right way to go about it would be the purchase and use the aforementioned toolbox.
This may be obvious, but make sure that you have enabled multithreaded computation in the preferences (File > Preferences > General > Multithreading). In some versions of MATLAB, it's not enabled by default.

matlab shared c++ libraries and OpenCL

I have a project that requires lots of image processing and wanted to add GPU support to speed things up.
I was wondering if i compiled my matlab into c++ shared library and called it from within OpenCL program, does that mean that the matlab code is going to be run on GPU?
My own (semi-educated) guess is that you are going to find this very difficult to do. But, others have trodden the same path. This paper might be a good place to start your research. And Googling turned up Accelereyes and a couple of references to items on the Mathworks File Exchange which you might want to follow up.
Everything in jacket is written in c/ c++ / cuda.
Infact we now have a beta version libjacket (http://www.accelereyes.com/downloadLibjacket) which can be used to extend not just matlab but other languages if you are willing.
#OSaad
Most of our functions are the fastest options out there. Be it in C or matlab.
The Parallel Computing Toolbox in the upcoming release R2010b (due September 1st) supports GPU processing for several functions. Unfortunately, it only supports CUDA (version 1.3 and later), so with an ATI graphics card, you're out of luck. However, you may just want to buy a dedicated GPU, anyway.
Typically, if you can write your Matlab code in a "vectorized" way, then the packages like AccelerEyes and Jacket have a reasonable chance of making things run on the GPU. You can verify this to some extent beforehand by checking whether Matlab itself is able to run on multiple cores on the CPU (these days Matlab will use multiple cores if things are parallelizable in an obvious way).
If that doesn't work, then you need to drop down to C/C++ via mex and then, from there, call OpenCL yourself. Mex is how Matlab talks to C code, so you write C code that is called by Matlab (and receives the matrices, etc), then initialises and calls OpenCL. This is more work, but may be your only route (and, even if the automated packages work to some extent, this approach can still give more speedups because you can be smarter about memory management, for example, if you know what your are doing).