parallel computing with matlab for dependent loops - matlab

I have a kinetic monte carlo code. Now its kinetic and hence each loop updates the current state to a future state, making it to be a dependent for loop.
I want to use parallel computing feature of matlab, but it seems the famous 'parfor' command only works for independent loops.
So my question, is it possible to use parallel computing in matlab to parallelize code where loops are not independent?

Usually these kinds of calculations are done on a grid, and the grid is distributed across the workers, each worker having its own part of the grid to calculate. This can't be done independently in general because the value at one point on the grid will depend on neighbouring points. These boundary values are communicated between the workers using some mechanism such as message passing or shared memory.
In MATLAB you can either use spmd or communicating jobs with the labSend and labReceive functions or you can use distributed arrays.

Related

Possible to run multiple iterations of fsolve(in serial) on multiple cores?

I am trying to solve a complex optimization problem using Matlab's Particle Swarm Optimization. The objective function called in PSO solves an interior optimization problem using matlab's fsolve() function, where the solution from fsolve is used to compute the function evaluation that PSO will try to minimize.
Typically, without PSO wrapped around the problem, we run fsolve() in parallel to compute the central differences to perform gradient calculations; however, I am aware that I cannot do this if I want to run PSO in parallel since there cannot be nested parallel functions.
The next thought was to run fsolve in serial, when computing the gradients, and free up the parallel pool to allow PSO to evaluate the objective function (which involves solving the interior optimization problem) on multiple cores at the same time.
i.e.: if there are 5 particles in the swarm, I want to run each interior optimization problem on a different core simulataneously for each iteration. My question is then, is it possible (even in serial) to run fsolve on different cores simultaneously?
Unforunately, due to the nature of the work, I am unable to share the code for this work, but moreso need to figure out if this is possible before moving forward with the current direction.
Any and all thoughts on this would be greatly appreciated. Thanks!
I have tried to implement the described thought above, but am getting errors that I am unable to diagnose at the moment. I expected each particle iteration to run slower since fsolve cannot use multiple cores to compute the gradients, but overall, I thought that allowing PSO to compute each particles solution individually on different cores would still be more efficient than running all of the iterations in serial with fsolve running in parallel.
Edit:
Despite what I say below, I wasn't able to get fsolve to work inside a parfor loop or using parfeval. I get an Undefined function 'fsolve' ... error. It appears that parallel execution of fsolve is unsupported. Even if you only call it once per worker, it will still cause that worker to fail. Any iteration of fsolve inside each worker (and hence my answer below) is irrelevant.
It shouldn’t matter how many times a function is called inside each parallel worker. Each iteration is separated from the others in time. As long as the overall code block is compatible with parallel execution, the instruction stream within is executed as normal MATLAB code on each worker.

Is it beneficial to run Matlab calculations in parallel on a multi-core computer?

I have a laptop with a multi-core processor and I would like to run a lengthy loop in which Simulink simulations are performed. Is it beneficial to split the loop into two parts (it is possible in my case), open the Matlab application twice, and run a Matlab script in each of them?
Someone told me that Matlab/Simulink always uses one core per opened Matlab application. Is that correct?
MATLAB splits some builtin functions across multiple cores, but standard MATLAB code uses just one core. Generally, if you are running several independent iterations, then the computation time can benefit from parallelization. You can do this easily using either parfor (if the have the Parallel Computing Toolbox), or batch_job.

How does Matlab implement GPU computation in CPU parallel loops?

Can we improve performance by calculating some parts of CPU's parfor or spmd blocks using gpuArray of GPU functions? Is this a rational way to improve performance or there are limitations in this procedure? I read somewhere that we can use this procedure when we have some GPU units. Is this the only way that we can use GPU computing besides CPU parallel loops?
It is possible that using gpuArray within a parfor loop or spmd block can give you a performance benefit, but really it depends on several factors:
How many GPUs you have on your system
What type of GPUs you have (some are better than others at dealing with being "oversubscribed" - i.e. where there are multiple processes using the same GPU)
How many workers you run
How much GPU memory you need for your alogrithm
How well suited the problem is to the GPU in the first place.
So, if you had two high-powered GPUs in your machine and ran two workers in a parallel pool on a problem that could keep a single GPU fully occupied - you'd expect to see good speedup. You might still get decent speedup if you ran 4 workers.
One thing that I would recommend is: if possible, try to avoid transferring gpuArray data from client to workers, as this is slower than usual data transfers (the gpuArray is first gathered to the CPU and then reconstituted on the worker).

SPMD vs. Parfor

I'm new about parallel computing in matlab. I have a function which creates a classifiers (SVM) and I'd like to test it with several dataset. I've got a 2 core workstation so I'd like to run test in parallel. Can someone explain me the difference between:
dataset_array={dataset1, dataset2}
matlabpool open 2
spmd
my_function(dataset(labindex));
end
and
dataset_array={dataset1, dataset2}
matlabpool open 2
parfor i:1=2
my_function(dataset(i));
end
spmd is a parallel region, while parfor is a parallel for loop. The difference is that in spmd region you have a much larger flexibility when it comes to the tasks you can perform in parallel. You can write a for loop, you can operate on distributed arrays and vectors. You can program an entire work flow, which in general consists of more than loops. This comes at a price: you need to know more about distributing the work and the data among your threads. Parallelizing the loop for example requires explicitly dividing the loop index ranges amongst the workers (which you did in your code by using labindex), and maybe creating distributed arrays.
parfor on the other hand only does this - a parallelized for loop. Automatically parallelized, you can add, so the work is divided between the workers by MATLAB.
If you only want to run a single loop in parallel and later work on the result on your local client, you should use parfor. If you want to parallelize your entire MATLAB program, you will have to deal with the complexities of spmd and work distribution.

Parallelizing a for loop to run simultaneously on multiple GPU cores?

I understand that you can use a matlabpool and parfor to run for loop iterations in parallel, however, I want to try and take advantage of using the high number of cores in my GPU to run a larger number of simultaneous iterations. I was wondering if there is any built in functionality to do this?
To my understanding, the method in which MATLAB runs code on the GPU is through a GPUarray, but that does not seem to parallelize a loop, only certain functions inside the loop.
For the loop that I am running, each iteration can run independently and the only variables that need to exist outside of the loop is the data to be processed (a 3-D array, where the first index is time, and each iteration is operating on a different time) and a 2-D output array where each iteration is storing the result for a particular time. Each time is independent.
Thanks
With a GPUArray, you can run elementwise operations in parallel by structuring your algorithm in terms of MATLAB's arrayfun. Effectively, this implicitly loops over each element of your arrays, and can apply the body of a MATLAB function to each element. The doc is: here.
There's a simple demo: here.