MATLAB and using multiple cores to run calculations - matlab

Hey all. Im trying to sort out how to get MATLAB running as best as possible. I have a pretty decent new machine.
12GB RAM
Core i7 3.2Ghz Cpu
and lots of free space.
and a strong graphics card.
However when I run the benchmark test of MATLAB (command bench) it lists the computer as being near the worst, around a Windows XP single core 1.7Ghz machine.
Any ideas why and how I can improve this??
Thanks very much

Firstly, I would recommend re-running the bench command a few times to make sure MATLAB has fully loaded all the libraries etc. it needs. Much of MATLAB is loaded on demand, so it's always best to time the second or third run.
MATLAB automatically takes advantage of multiple cores when executing certain operations which are multithreaded. For example lots of elementwise operations such as +, .* and so on as well as BLAS-backed operations (and probably others). This page lists those things which are multithreaded.
Parallel Computing Toolbox is useful when MATLAB's intrinsic multithreading can't help (if it can, then it's usually the fastest way to do things). This gives you explicit parallelism via PARFOR, SPMD and distributed arrays.

You need the Parallel Processing Toolbox. A lot of MATLAB functions are multithreaded but to parallelize your own code, you'll need it. A dumb hack is to open several instances of command-line MATLAB. You could also write multithreaded MEX files but the right way to go about it would be the purchase and use the aforementioned toolbox.

This may be obvious, but make sure that you have enabled multithreaded computation in the preferences (File > Preferences > General > Multithreading). In some versions of MATLAB, it's not enabled by default.

Related

How to use parallel 'for' loop in Octave or Scilab?

I have two for loops running in my Matlab code. The inner loop is parallelized using Matlabpool in 12 processors (which is maximum Matlab allows in a single machine).
I dont have Distributed computing license. Please help me how to do it using Octave or Scilab. I just want to parallelize 'for' loop ONLY.
There are some broken links given while I searched for it in google.
parfor is not really implemented in octave yet. The keyword is accepted, but is a mere synonym of for (http://octave.1599824.n4.nabble.com/Parfor-td4630575.html).
The pararrayfun and parcellfun functions of the parallel package are handy on multicore machines.
They are often a good replacement to a parfor loop.
For examples, see
http://wiki.octave.org/Parallel_package.
To install, issue (just once)
pkg install -forge parallel
And then, once on each session
pkg load parallel
before using the functions
In Scilab you can use parallel_run:
function a=g(arg1)
a=arg1*arg1
endfunction
res=parallel_run(1:10, g);
Limitations
uses only one core on Windows platforms.
For now, parallel_run only handles arguments and results of scalar matrices of real values and the types argument is not used
one should not rely on side effects such as modifying variables from outer scope : only the data stored into the result variables will be copied back into the calling environment.
macros called by parallel_run are not allowed to use the JVM
no stack resizing (via gstacksize() or via stacksize()) should take place during a call to parallel_run
In GNU Octave you can use the parfor construct:
parfor i=1:10
# do stuff that may run in parallel
endparfor
For more info: help parfor
To see a list of Free and Open Source alternatives to MATLAB-SIMULINK please check its Alternativeto page or my answer here. Specifically for SIMULINK alternatives see this post.
something you should consider is the difference between vectorized, parallel, concurrent, asynchronous and multithreaded computing. Without going much into the details vectorized programing is a way to avoid ugly for-loops. For example map function and list comprehension on Python is vectorised computation. It is the way you write the code not necesarily how it is being handled by the computer. Parallel computation, mostly used for GPU computing (data paralleism), is when you run massive amount of arithmetic on big arrays, using GPU computational units. There is also task parallelism which mostly refers to ruing a task on multiple threads, each processed by a separate CPU core. Concurrent or asynchronous is when you have just one computational unit, but it does multiple jobs at the same time, without blocking the processor unconditionally. Basically like a mom cooking and cleaning and taking care of its kid at the same time but doing only one job at the time :)
Given the above description there are lot in the FOSS world for each one of these. For Scilab specifically check this page. There is MPI interface for distributed computation (multithreading/parallelism on multiple computers). OpenCL interfaces for GPU/data-parallel computation. OpenMP interface for multithreading/task-parallelism. The feval functions is not parallelism but a way to vectorize a conventional function.Scilab matrix arithmetic and parallel_run are vectorized or parallel depending to the platform, hardware and version of the Scilab.

How to get significant speed up using Parallel Computing Toolbox of MATLAB in core i7 processor?

I am working on Image Processing . I am having a computer with Intel(R) Core(TM) i7 -3770 CPU #3.40 GHz, RAM 4 GB Configuration. I just want parallelize our code of an algorithm of image processing using SPMD command of PCT. For this i have divided image vertically into 8 parts and send it different labs and by using SPMD command i executed algorithm of image processing parallely on different parts on different lab.
I got the right answer which i got from sequential code. But this is taking more time than a sequential code . I have tried this with very largest image to smallest image but didn't get the significant result.
Suggest me how can i get significant speed up using SPMD command?
Since you did not provide any code I'll have to stick to a general answer. In all parallel computing there are several design considerations, the two most important are: is your code able to run in parallel, and secondly: how much communication overhead do you create.
Calling workers means sending information back and forth, so there is an optimum in parallel computing. Make sure you provide your workers with enough work so that the communication to and from your workers requires less time than the speed-up gained from parallel computing.
Last but not least: if you provide a working code example the community is able to help you much better!
If you want to apply the same operation to several blocks within an image, then rather than worry about constructs such as spmd, you can just apply the command blockproc and set the UseParallel option to true. It will parallelize everything for you, without you needing to do anything.
If that doesn't work for you and you really have a requirement to implement your algorithm directly using spmd, you'll need to post some example code to indicate what you've tried, and where it's going wrong.

Gforth parallel processing

I have written a Forth Mandelbrot fractal plotter, and as much as a technical exercise as anything else I would like to try to speed it up with some parallel processing.
For the time being I would be happy if I could just use both of my cores (have one core do one half of the image and the other the other half).
I have noticed that Windows XP will quite happily manage two instances of Gforth and try use as much processor capacity as possible, so running two processes could be a start. However I am not sure if they can share memory, or if they can both write to a file at the same time (or how to tell one process to start writing at x bytes from the start of the file).
In summary, how can I do parallel processing using Gforth on Windows XP?
You could have each program do a grid of pixels rather than a single pixel, and then recombine them in the end.
AFAIK, pixels in Mandelbrot sets are independent of each other (someone correct me if I am wrong), however the computation of each of them is non-deterministic, making it a hard problem to properly parallelize, without having some kind of central dispatch thread (then again you run into potential problems with contention).
See GForth Pipes.

How to utilise parallel processing in Matlab

I am working on a time series based calculation. Each iteration of the calculation is independent. Could anyone share some tips / online primers on using utilising parallel processing in Matlab? How can this be specified inside the actual code?
Since you have access to the Parallel toolbox, I suggest that you first check whether you can do it the easy way.
Basically, instead of writing
for i=1:lots
out(:,i)=do(something);
end
You write
parfor i=1:lots
out(:,i)=do(something);
end
Then, you use matlabpool to create a number of workers (you can have a maximum of 8 on your local machine with the toolbox, and tons on a remote cluster if you also have a Distributed Computing Server license), and you run the code, and see nice speed gains when your iterations are run by 8 cores instead of one.
Even though the parfor route is the easiest, it may not work right out of the box, since you might do your indexing wrong, or you may be referencing an array in a problematic way etc. Look at the mlint warnings in the editor, read the documentation, and rely on good old trial and error, and you should figure it out reasonably fast. If you have nested loops, it's often best parallelize only the innermost one and ensure it does tons of iterations - this is not only good design, it also reduces the amount of code that could give you trouble.
Note that especially if you run the code on a local machine, you may run into memory issues (which might manifest in really slow execution in parallel mode because you're paging): Every worker gets a copy of the workspace, so if your calculation involves creating a 500MB array, 8 workers will need a total 4GB of RAM - and then you haven't even started counting the RAM of the parent process! In addition, it can be good to only use N-1 cores on your machine, so that there is still one core left for other processes that may run on the computer (such as a mandatory antivirus...).
Mathworks offers its own parallel computing toolbox. If you do not want to purchase that, there a few options
You could write your own mex file and use pthreads or OpenMP.
However make sure you do not call any Mex api in the parallel part of the code, because they arent thread safe
If you want coarser grained parallelism via MPI you can try pmatlab
Same with parmatlab
Edit: Adding link Parallel MATLAB with openmp mex files
I have only tried the first.
Don't forget that many Matlab functions are already multithreaded. By careful programming you may be able to take advantage of them -- check the documentation for your version as the Mathworks seem to be increasing the range and number of multithreaded functions with each new release. For example, it seems that 2010a has multithreaded ffts which may be useful for time series processing.
If the intrinsic multithreading is not what you need, then as #srean suggests, the Parallel Computing Toolbox is available. For my money (or rather, my employers' money) it's the way to go, allowing you to program in parallel in Matlab, rather than having to bolt things on. I have to admit, too, that I'm quite impressed by the toolbox and the facilities it offers.

matlab shared c++ libraries and OpenCL

I have a project that requires lots of image processing and wanted to add GPU support to speed things up.
I was wondering if i compiled my matlab into c++ shared library and called it from within OpenCL program, does that mean that the matlab code is going to be run on GPU?
My own (semi-educated) guess is that you are going to find this very difficult to do. But, others have trodden the same path. This paper might be a good place to start your research. And Googling turned up Accelereyes and a couple of references to items on the Mathworks File Exchange which you might want to follow up.
Everything in jacket is written in c/ c++ / cuda.
Infact we now have a beta version libjacket (http://www.accelereyes.com/downloadLibjacket) which can be used to extend not just matlab but other languages if you are willing.
#OSaad
Most of our functions are the fastest options out there. Be it in C or matlab.
The Parallel Computing Toolbox in the upcoming release R2010b (due September 1st) supports GPU processing for several functions. Unfortunately, it only supports CUDA (version 1.3 and later), so with an ATI graphics card, you're out of luck. However, you may just want to buy a dedicated GPU, anyway.
Typically, if you can write your Matlab code in a "vectorized" way, then the packages like AccelerEyes and Jacket have a reasonable chance of making things run on the GPU. You can verify this to some extent beforehand by checking whether Matlab itself is able to run on multiple cores on the CPU (these days Matlab will use multiple cores if things are parallelizable in an obvious way).
If that doesn't work, then you need to drop down to C/C++ via mex and then, from there, call OpenCL yourself. Mex is how Matlab talks to C code, so you write C code that is called by Matlab (and receives the matrices, etc), then initialises and calls OpenCL. This is more work, but may be your only route (and, even if the automated packages work to some extent, this approach can still give more speedups because you can be smarter about memory management, for example, if you know what your are doing).