Parfor in MATLAB Problem - matlab

Why can't I use the parfor in this piece of code?
parfor i=1:r
for j=1:N/r
xr(j + (N/r) * (i-1)) = x(i + r * (j-1));
end
end
This is the error:
Error: The variable xr in a parfor cannot be classified.
See Parallel for Loops in MATLAB, "Overview".

The issue here is that of improper indexing of the sliced array. parfor loops are run asynchronously, meaning the order in which each iteration is executed is random. From the documentation:
MATLAB workers evaluate iterations in no particular order, and independently of each other. Because each iteration is independent, there is no guarantee that the iterations are synchronized in any way, nor is there any need for this.
You can easily verify the above statement by typing the following in the command line:
parfor i=1:100
i
end
You'll see that the ordering is arbitrary. Hence if you split a parallel job between different workers, one worker has no way of telling if a different iteration has finished or not. Hence, your variable indexing cannot depend on past/future values of the iterator.
Let me demonstrate this with a simple example. Consider the Fibonacci series 1,1,2,3,5,8,.... You can generate the first 10 terms of the series easily (in a naïve for loop) as:
f=zeros(1,10);
f(1:2)=1;
for i=3:10
f(i)=f(i-1)+f(i-2);
end
Now let's do the same with a parfor loop.
f=zeros(1,10);
f(1:2)=1;
parfor i=3:10
f(i)=f(i-1)+f(i-2);
end
??? Error: The variable f in a parfor cannot be classified.
See Parallel for Loops in MATLAB, "Overview"
But why does this give an error?
I've shown that iterations are executed in an arbitrary order. So let's say that a worker gets the loop index i=7 and the expression f(i)=f(i-1)+f(i-2);. It is now supposed to execute the expression and return the results to the master node. Now has iteration i=6 finished? Is the value stored in f(6) reliable? What about f(5)? Do you see what I'm getting at? Supposing f(5) and f(6) are not done, then you'll incorrectly calculate that the 7th term in the Fibonnaci series is 0!
Since MATLAB has no way of telling if your calculation can be guaranteed to run correctly and reproduce the same result each time, such ambiguous assignments are explicitly disallowed.

Related

How Should I Parallelize My Genetic Algorithm Fitness Evaluation?

I have a GA code that I developed myself. Since I'm new to coding, my code is not fast. I have a Dual-Core CPU 2.6GHz.
The only line of the code that takes a long time to run is the fitness function. I am not familiar with the GA toolbox and my fitness function is quite complex so I assume even if I knew how to use the GA toolbox, I would have to code the fitness function myself.
The algoritm's structure is as follows:
after generating the initial generation and evaluating the fitness values (which takes long but does not matter that much because this is only run once), it starts a loop which will be iterated for up to 10000 times. In each iteration, we have a new generation whose fitness values needs to be calculated. So when a new generation of 50 individuals is generated, the whole generation is fed to the fitness_function. In this function there is a for loop which calculates the fitness value for each 50 individual (so the for loop is iterated 50 times). Here is my question. How should I use parfor so that 25 individual is evaluated by one CPU core and the other 25 individuals with the other core, so that the calculation time is decreased to almost half. I already know from here
I have tried changing the for loop in the fitness_function directly to parfor and I have received the following error: "The PARFOR loop cannot run due to the way variable "Z" is used." and "Variable z is indexed in different ways. Potentially causing dependencies between iterations." Variable Z is a 50*3 matrix which stores the fitness values for each of the individuals.
The problem with your assignment into Z is that you have three different assignment statements, and that is not allowed. You need to make the assignment into Z meet the requirements for a "sliced" variable. The easiest way to do this is to make a temporary variable Zrow to store the values for the ith row of Z, and then make a single assignment, like this
parfor i = 1:50
Zrow = zeros(1, 3); % allocate to ensure parfor knows this is a temporary
...
Zrow(1) = TTT;
...
Zrow(2) = sum(FSL,1);
Zrow(3) = 0.5*Z(i,1)+0.5*Z(i,2);
% Finally, make a single sliced assignment into Z
Z(i, :) = Zrow;
end
Also, in general, it's best to have the parfor loop be the outermost one. Also, whether parfor actually gives you any speed-up depends a lot on whether the body of the loop is already being multithreaded by MATLAB's built-in multithreaded capabilities. (If it is, then parfor using only your local machine cannot make things faster because in that case, the multithreaded code is already taking full advantage of your computer's resources).

Using a for loop inside a parallel for loop in Matlab

I am trying to use a for loop inside of a parfor loop in Matlab.
The for loop is equivalent to the ballode example in here.
Inside the for loop a function ballBouncing is called which is a system of 6 differential equations.
So, what I am trying to do is to use 500 different sets of parameter values for the ODE system and run it, but for each parameter set, a sudden impulse is added, which is handled through the code in 'for' loop.
However, I don't understand how to implement this using a parfor and a for loop as below.
I could run this code by using two for loops but when the outer loop is made to be a parfor it gives the errors,
the PARFOR loop cannot run due to the way variable results is used,
the PARFOR loop cannot run due to the way variable y0 is used and
Valid indices for results are restricted in PARFOR loops
results=NaN(500,100);
x=rand(500,10);
parfor j=1:500
bouncingTimes=[10,50];%at time 10 a sudden impulse is added
refine=2;
tout=0;
yout=y0;%initial conditions of ODE system
paras=x(j,:);%parameter values for the ODE
for i=1:2
tfinal=bouncingTimes(i);
[t,y]=ode45(#(t,y)ballBouncing(t,y,paras),tstart:1:tfinal,y0,options);
nt=length(t);
tout=[tout;t(2:nt)];
yout=[yout;y(2:nt,:)];
y0(1:5)=y(nt,1:5);%updating initial conditions with the impulse
y0(6)=y(nt,6)+paras(j,10);
options = odeset(options,'InitialStep',t(nt)-t(nt-refine),...
'MaxStep',t(nt)-t(1));
tstart =t(nt);
end
numRows=length(yout(:,1));
results(1:numRows,j)=yout(:,1);
end
results;
Can someone help me to implement this using a parfor outer loop.
Fixing the assignment into results is relatively straightforward - what you need to do is ensure you always assign a whole column. Here's how I would do that:
% We will always need the full size of results in dimension 1
numRows = size(results, 1);
parfor j = ...
yout = ...; % variable size
yout(end:numRows, :) = NaN; % Expand if necessary
results(:, j) = yout(1:numRows, 1); % Shrink 'yout' if necessary
end
However, y0 is harder to deal with - the iterations of your parfor loop are not order-independent because of the way you're passing information from one iteration to the next. parfor can only handle loops where the iterations are order-independent.

How to use SPMD for different input variables and save the output in order?

I am using a simulated annealing algorithm to optimize my problem, I have to do it for 100 different input variables and save the output for all variables in order. the problem is that I don't know how to implement spmd in my code to do parallel computing so that each input run on one CPU core and the final results stored in a matrix with 100 rows. I've tried to put it before the first for loop but it only returns a composite consists of 4 elements, since my CPU has 4 cores. Here is my code
spmd
for v=1:100
posmat=loading_param(Matrix,v);
nvar=size(posmat,2);
popsize=50;
maxiter=20;
T0=1000;
Tf=1;
Tdamp=((T0-Tf)/maxiter);
nn=5;
T=T0;
%% initial population
tic
emp.var=[];
emp.fit=inf;
pop=repmat(emp,popsize,1);
for i=1:popsize
pop(i).var=randperm(nvar);
pop_double=pop(i).var;
posmat_new=tabdil(nvar,pop_double,posmat);
dis=cij(posmat_new);
pop(i).fit=fittness(dis);
end
[value,index]=min([pop.fit]);
gpop=pop(index);
%% algorithm main loop
BEST=zeros(maxiter,1);
for iter=1:maxiter
for i=1:popsize
bnpop=emp;
for j=1:nn
npop=create_new_pop(pop(j),nvar,posmat);
if npop.fit<bnpop.fit
bnpop=npop;
end
end
if bnpop.fit<pop(i).fit
pop(i)=bnpop;
else
E=bnpop.fit-pop(i).fit;
pr=exp(-E/T);
if rand<pr
pop(i)=bnpop;
end
end
end
T=T-Tdamp;
[value,index]=min([pop.fit]);
if value<gpop.fit
gpop=pop(index);
BEST(iter)=gpop.fit;
disp([ 'iter= ' num2str(iter) 'BEST=' num2str(BEST(iter))])
end
end
%% algorithm results
disp([ ' Best solution=' num2str(gpop.var)])
disp([ ' Best fittness=' num2str(gpop.fit)])
disp([ ' Best time=' num2str(toc)])
bnpop_all(d,:)=bnpop.var;
d=d+1;
end %end of main for loop
end % end of spmd
From the documentation on spmd:
Values returning from the body of an spmd statement are converted to Composite objects on the MATLAB client. A Composite object contains references to the values stored on the remote MATLAB workers, and those values can be retrieved using cell-array indexing. The actual data on the workers remains available on the workers for subsequent spmd execution, so long as the Composite exists on the client and the parallel pool remains open.
Thus the output is a composite with 4 elements, since you have 4 CPU cores, so output{1} gives you the first element, output{2} the second etc. Just concatenate those to get your output in a single matrix.
Your code at this point just runs four times, one complete 100 iteration for loop per worker. An easier way to solve this, is to use parfor instead of spmd, as you can leave your loop the same. If you want to use spmd, first cut your v into four pieces (of 25 elements each), then on each worker iterate over just those 25 elements.
Seeing your code, with its three nested loops, I suggest not parallellising now, but instead try to profile your code, find out where the bottlenecks are, and try to speed up those. Probably trying to vectorise your nested loops will improve a lot already.

How to nest multiple parfor loops

parfor is a convenient way to distribute independent iterations of intensive computations among several "workers". One meaningful restriction is that parfor-loops cannot be nested, and invariably, that is the answer to similar questions like there and there.
Why parallelization across loop boundaries is so desirable
Consider the following piece of code where iterations take a highly variable amount of time on a machine that allows 4 workers. Both loops iterate over 6 values, clearly hard to share among 4.
for row = 1:6
parfor col = 1:6
somefun(row, col);
end
end
It seems like a good idea to choose the inner loop for parfor because individual calls to somefun are more variable than iterations of the outer loop. But what if the run time for each call to somefun is very similar? What if there are trends in run time and we have three nested loops? These questions come up regularly, and people go to extremes.
Pattern needed for combining loops
Ideally, somefun is run for all pairs of row and col, and workers should get busy irrespectively of which iterand is being varied. The solution should look like
parfor p = allpairs(1:6, 1:6)
somefun(p(1), p(2));
end
Unfortunately, even if I knew which builtin function creates a matrix with all combinations of row and col, MATLAB would complain with an error The range of a parfor statement must be a row vector. Yet, for would not complain and nicely iterate over columns. An easy workaround would be to create that matrix and then index it with parfor:
p = allpairs(1:6, 1:6);
parfor k = 1:size(pairs, 2)
row = p(k, 1);
col = p(k, 2);
somefun(row, col);
end
What is the builtin function in place of allpairs that I am looking for? Is there a convenient idiomatic pattern that someone has come up with?
MrAzzman already pointed out how to linearise nested loops. Here is a general solution to linearise n nested loops.
1) Assuming you have a simple nested loop structure like this:
%dummy function for demonstration purposes
f=#(a,b,c)([a,b,c]);
%three loops
X=cell(4,5,6);
for a=1:size(X,1);
for b=1:size(X,2);
for c=1:size(X,3);
X{a,b,c}=f(a,b,c);
end
end
end
2) Basic linearisation using a for loop:
%linearized conventional loop
X=cell(4,5,6);
iterations=size(X);
for ix=1:prod(iterations)
[a,b,c]=ind2sub(iterations,ix);
X{a,b,c}=f(a,b,c);
end
3) Linearisation using a parfor loop.
%linearized parfor loop
X=cell(4,5,6);
iterations=size(X);
parfor ix=1:prod(iterations)
[a,b,c]=ind2sub(iterations,ix);
X{ix}=f(a,b,c);
end
4) Using the second version with a conventional for loop, the order in which the iterations are executed is altered. If anything relies on this you have to reverse the order of the indices.
%linearized conventional loop
X=cell(4,5,6);
iterations=fliplr(size(X));
for ix=1:prod(iterations)
[c,b,a]=ind2sub(iterations,ix);
X{a,b,c}=f(a,b,c);
end
Reversing the order when using a parfor loop is irrelevant. You can not rely on the order of execution at all. If you think it makes a difference, you can not use parfor.
You should be able to do this with bsxfun. I believe that bsxfun will parallelise code where possible (see here for more information), in which case you should be able to do the following:
bsxfun(#somefun,(1:6)',1:6);
You would probably want to benchmark this though.
Alternatively, you could do something like the following:
function parfor_allpairs(fun, num_rows, num_cols)
parfor i=1:(num_rows*num_cols)
fun(mod(i-1,num_rows)+1,floor(i/num_cols)+1);
end
then call with:
parfor_allpairs(#somefun,6,6);
Based on the answers from #DanielR and #MrAzzaman, I am posting two functions, iterlin and iterget in place of prod and ind2sub that allow iteration over ranges also if those do not start from one. An example for the pattern becomes
rng = [1, 4; 2, 7; 3, 10];
parfor k = iterlin(rng)
[plate, row, col] = iterget(rng, k);
% time-consuming computations here %
end
The script will process the wells in rows 2 to 7 and columns 3 to 10 on plates 1 to 4 without any workers idling while more wells are waiting to be processed. In hope that this helps someone, I deposited iterlin and iterget at the MATLAB File Exchange.

Using parfor in matlab for nested loops

I want to parallelize block2 for each block1 and parallerlize outer loop too.
previous code:
for i=rangei
<block1>
for j=rangej
<block2> dependent on <block1>
end
end
changed code:
parfor i=rangei
<block1>
parfor j=rangej
<block2> dependent on <block1>
end
end
how much efficient can this get and will the changed code do the right thing?
Is the changed code valid for my requirements?
In MATLAB, parfor cannot be nested. Which means, in your code, you should replace one parfor by a for (the outer loop most likely). More generally, I advise you to look at this tutorial on parfor.
parfor cannot be nested. In nested parfor statements, only the outermost call to parfor is paralellized, which means that the inner call to parfor only adds unnecessary overhead.
To get high efficiency with parfor, the number of iterations should be much higher than the number of workers (or an exact multiple in case each iteration takes the same time), and you want a single iteration to take more than just a few milliseconds to avoid feeling the overhead from paralellization.
parfor i=rangei
<block1>
for j=rangej
<block2> dependent on <block1>
end
end
may actually fit that description, depending on the size of rangei. Alternatively, you may want to try unrolling the nested loop into a single loop, where you iterate across linear indices.
The following code uses a single parfor loop to implicitly manage two nested loops. The loop1_index and loop2_index are the ranges, and the loop1_counter and loop2_counter are the actual loop iterators. Also, the iterators are put in reverse order in order to have a better load balance, because usually the load of higher range values is bigger than those of smaller values.
loop1_index = [1:5]
loop2_index = [1:4]
parfor temp_label_parfor = 1 : numel(loop1_index) * numel(loop2_index)
[loop1_counter, loop2_counter] = ind2sub([numel(loop1_index), numel(loop2_index)], temp_label_parfor)
loop1_counter = numel(loop1_index) - loop1_counter + 1;
loop2_counter = numel(loop2_index) - loop2_counter + 1;
end
You can't use nested parfor, From your question it seems that you are working on a matrix( with parameter i,j),
try using blockproc, go through this link once blockproc