Parallelizing a for loop to run simultaneously on multiple GPU cores? - matlab

I understand that you can use a matlabpool and parfor to run for loop iterations in parallel, however, I want to try and take advantage of using the high number of cores in my GPU to run a larger number of simultaneous iterations. I was wondering if there is any built in functionality to do this?
To my understanding, the method in which MATLAB runs code on the GPU is through a GPUarray, but that does not seem to parallelize a loop, only certain functions inside the loop.
For the loop that I am running, each iteration can run independently and the only variables that need to exist outside of the loop is the data to be processed (a 3-D array, where the first index is time, and each iteration is operating on a different time) and a 2-D output array where each iteration is storing the result for a particular time. Each time is independent.
Thanks

With a GPUArray, you can run elementwise operations in parallel by structuring your algorithm in terms of MATLAB's arrayfun. Effectively, this implicitly loops over each element of your arrays, and can apply the body of a MATLAB function to each element. The doc is: here.
There's a simple demo: here.

Related

How we can have two parfor loops simultaneously in MATLAB to reduce calculation IDLE times?

Suppose that I want run my code using a parfor loop in MATLAB R2016a. first four iterations of this loop (1:4) has higher computational load comparing to second four iterations (5:8) so I have high differences in calculation time of these two parts.
How we can have two parfor loops simultaneously in this case? For example second part of parfor loop (5:8) is doing it's job without waiting for completion of first part (1:4)? I want reduce the IDLE times using a new coding structure or any other tricks.

MATLAB: Is it inefficient to use parfor (parallel for loop) within a while loop.

I'm having a trouble doing MCMC(Monte Carlo Markov Chain). So for MCMC, say I will run 10000 iterations, then within each iteration, I will draw some parameters. But in each iteration, I have some individual data which are independently, so I can do parfor. However, the problem is, it seems the time to finish one iteration just grows quickly as MCMC goes on. Soon, it's extremely time consuming.
My question is: is there any efficient way to combine parfor and while loop?
I have the following pseudo-code:
r=1;
while r<10000
parfor i=1:I
make draws from proposal distribution
calculate acceptance rate
accept or reject current draw
end
r=r+1;
end
Launching lots of separate parfor loops can be inefficient if each loop duration is small. Unfortunately, as you are probably aware, you cannot break out of a parfor loop. One alternative might be to use parfeval. The idea would be to make many parfeval calls (but not too many), and then you can terminate when you have sufficient results.
This (fairly long) blog article shows an example of using parfeval in a situation where you might wish to terminate the computations early.

parallel computing with matlab for dependent loops

I have a kinetic monte carlo code. Now its kinetic and hence each loop updates the current state to a future state, making it to be a dependent for loop.
I want to use parallel computing feature of matlab, but it seems the famous 'parfor' command only works for independent loops.
So my question, is it possible to use parallel computing in matlab to parallelize code where loops are not independent?
Usually these kinds of calculations are done on a grid, and the grid is distributed across the workers, each worker having its own part of the grid to calculate. This can't be done independently in general because the value at one point on the grid will depend on neighbouring points. These boundary values are communicated between the workers using some mechanism such as message passing or shared memory.
In MATLAB you can either use spmd or communicating jobs with the labSend and labReceive functions or you can use distributed arrays.

SPMD vs. Parfor

I'm new about parallel computing in matlab. I have a function which creates a classifiers (SVM) and I'd like to test it with several dataset. I've got a 2 core workstation so I'd like to run test in parallel. Can someone explain me the difference between:
dataset_array={dataset1, dataset2}
matlabpool open 2
spmd
my_function(dataset(labindex));
end
and
dataset_array={dataset1, dataset2}
matlabpool open 2
parfor i:1=2
my_function(dataset(i));
end
spmd is a parallel region, while parfor is a parallel for loop. The difference is that in spmd region you have a much larger flexibility when it comes to the tasks you can perform in parallel. You can write a for loop, you can operate on distributed arrays and vectors. You can program an entire work flow, which in general consists of more than loops. This comes at a price: you need to know more about distributing the work and the data among your threads. Parallelizing the loop for example requires explicitly dividing the loop index ranges amongst the workers (which you did in your code by using labindex), and maybe creating distributed arrays.
parfor on the other hand only does this - a parallelized for loop. Automatically parallelized, you can add, so the work is divided between the workers by MATLAB.
If you only want to run a single loop in parallel and later work on the result on your local client, you should use parfor. If you want to parallelize your entire MATLAB program, you will have to deal with the complexities of spmd and work distribution.

Matlab parfor work distribution

I have a parfor loop through say 100 iterations, and the workload on every iteration is different but changes linearly in a way that the first one takes the most time and the last one is the fastest. But when I run through the parfor loop with my four instances/labs, during the last few hours only one lab is active as it's running through the few first iterations by its own.
So I know which iterations are the slow ones. How could I make workload between cores more even. For example somehow force all labs to start working on the first four slow ones and then proceed in order? Or something similar to prevent only one active core running the few slow ones alone..
Matlab parfor does nothing more but split up the indices and distributes them to the workers. It does this by creating contiguous chunks from the indices. I don't know the exact algorithm but this means that data with similar indices get computed in the same chunk and by the same worker.
The simplest solution would be a stochastic one. Just shuffle your indices so that the work intensive steps are distributed nicely. While this doesn't give you any guarantees on performance it is simple and will work most of the time.
Some example code:
% dummy data
N=10;
data=1:N;
% generate the permutated indices
permIndex=randperm(N);
% permute the data
dataPermuted=data(permIndex);
% run the loop
parfor i=1:N
% do something e.g. pause for the time as specified by data
pause(dataPermuted(i));
end
%invert the index permutation
dataInversePermuted(permIndex)=dataPermuted;
I used pause to simulate the different computation times.
I don't think this is documented anywhere, but you can quickly deduce that PARFOR runs iterations in reverse loop order (using pause and disp if you want to see it in action). So, you should simply reverse your loop. PARFOR gives you no means to explicitly control execution order, but SPMD using for-drange does (PARFOR is significantly easier to use though).
#denahiro's suggestion is also a good one.