Using parfor in matlab for nested loops - matlab

I want to parallelize block2 for each block1 and parallerlize outer loop too.
previous code:
for i=rangei
<block1>
for j=rangej
<block2> dependent on <block1>
end
end
changed code:
parfor i=rangei
<block1>
parfor j=rangej
<block2> dependent on <block1>
end
end
how much efficient can this get and will the changed code do the right thing?
Is the changed code valid for my requirements?

In MATLAB, parfor cannot be nested. Which means, in your code, you should replace one parfor by a for (the outer loop most likely). More generally, I advise you to look at this tutorial on parfor.

parfor cannot be nested. In nested parfor statements, only the outermost call to parfor is paralellized, which means that the inner call to parfor only adds unnecessary overhead.
To get high efficiency with parfor, the number of iterations should be much higher than the number of workers (or an exact multiple in case each iteration takes the same time), and you want a single iteration to take more than just a few milliseconds to avoid feeling the overhead from paralellization.
parfor i=rangei
<block1>
for j=rangej
<block2> dependent on <block1>
end
end
may actually fit that description, depending on the size of rangei. Alternatively, you may want to try unrolling the nested loop into a single loop, where you iterate across linear indices.

The following code uses a single parfor loop to implicitly manage two nested loops. The loop1_index and loop2_index are the ranges, and the loop1_counter and loop2_counter are the actual loop iterators. Also, the iterators are put in reverse order in order to have a better load balance, because usually the load of higher range values is bigger than those of smaller values.
loop1_index = [1:5]
loop2_index = [1:4]
parfor temp_label_parfor = 1 : numel(loop1_index) * numel(loop2_index)
[loop1_counter, loop2_counter] = ind2sub([numel(loop1_index), numel(loop2_index)], temp_label_parfor)
loop1_counter = numel(loop1_index) - loop1_counter + 1;
loop2_counter = numel(loop2_index) - loop2_counter + 1;
end

You can't use nested parfor, From your question it seems that you are working on a matrix( with parameter i,j),
try using blockproc, go through this link once blockproc

Related

how can I make these four loop compute paralleling?

I have a problem with MathWorks Parallel Computing Toolbox in Matlab. See my code below
for k=1:length(Xab)
n1=length(Z)*(k-1)+1:length(Z)*k;
MX_j(1,n1)=JXab{k};
MY_j(1,n1)=JYab{k};
MZ_j(1,n1)=Z;
end
for k=length(Xab)+1:length(Xab)+length(Xbc)
n2=length(Z)*(k-1)+1:length(Z)*k;
MX_j(1,n2)=JXbc{k-length(Xab)};
MY_j(1,n2)=JYbc{k-length(Yab)};
MZ_j(1,n2)=Z;
end
for k=length(Xab)+length(Xbc)+1:length(Xab)+length(Xbc)+length(Xcd)
n3=length(Z)*(k-1)+1:length(Z)*k;
MX_j(1,n3)=JXcd{k-length(Xab)-length(Xbc)};
MY_j(1,n3)=JYcd{k-length(Yab)-length(Ybc)};
MZ_j(1,n3)=Z;
end
for k=length(Xab)+length(Xbc)+length(Xcd)+1:length(Xab)+length(Xbc)+length(Xcd)+length(Xda)
n4=length(Z)*(k-1)+1:length(Z)*k;
MX_j(1,n4)=JXda{k-length(Xab)-length(Xbc)-length(Xcd)};
MY_j(1,n4)=JYda{k-length(Yab)-length(Ybc)-length(Ycd)};
MZ_j(1,n4)=Z;
end
If I change the for-loop to parfor-loop, matlab warns me that MX_j is not an efficient variable. I have no idea how to solve this and how to make these for loops compute in parallel?
For me, it looks like you can combine it to one loop. Create combined cell arrays.
JX = cat(2,JXab, JXbc, JXcd, JXda);
JY = cat(2,JYab, JYbc, JYcd, JYda);
Check for the right dimension here. If your JXcc arrays are column arrays, use cat(1,....
After doing that, one single loop should do it:
n = length(Xab)+length(Xbc)+length(Xcd)+length(Xda);
for k=1:n
k2 = length(Z)*(k-1)+1:length(Z)*k;
MX_j(1,k2)=JX{k};
MY_j(1,k2)=JY{k};
MZ_j(1,k2)=Z;
end
Before parallizing anything, check if this still valid. I haven't tested it. If everything's nice, you can switch to parfor.
When using parfor, the arrays must be preallocated. The following code could work (untested due to lack of test-data):
n = length(Xab)+length(Xbc)+length(Xcd)+length(Xda);
MX_j = zeros(1,n*length(Z));
MY_j = MX_j;
MZ_j = MX_j;
parfor k=1:n
k2 = length(Z)*(k-1)+1:length(Z)*k;
MX_j(1,k2)=JX{k};
MY_j(1,k2)=JY{k};
MZ_j(1,k2)=Z;
end
Note: As far as I can see, the parfor loop will be much slower here. You simply assign some values... no calculation at all. The setup of the worker pool will take 99.9% of the total execution time.

Broadcast variable using parfor

I am now trying to do parallel computing in Matlab and want to use parfor loop to improve the efficiency. The problem is I can guarantee that each loop is independent with each other but I finally need to update a global variable (maybe called broadcast variable in Matlab), when I want to assign some value to it there is a problem says it can't be classified. If I still want to do it in this Matlab, how can I solve this problem or is there any other way I can try to improve the efficiency?
The code is like this:
Atoms(1:nOfAtomsInTwoDim,:)=TwoDimAtoms;
odd_type=TwoDimAtoms;
even_type=TwoDimAtoms;
even_type(:,1)=TwoDimAtoms(:,1)+LatticeSpacing/2;
even_type(:,2)=TwoDimAtoms(:,2)+LatticeSpacing/2;
parfor i=2:1:nOflayers+1
temp_type=TwoDimAtoms;
if mod(i,2)
temp_type=odd_type;
temp_type(:,3)=TwoDimAtoms(:,3)+(i-1)*LatticeSpacing/2;
else
temp_type=even_type;
temp_type(:,3)=TwoDimAtoms(:,3)+(i-1)*LatticeSpacing/2;
end
iBegin=(i-1)*nOfAtomsInTwoDim+1;
iEnd=i*nOfAtomsInTwoDim;
Atoms(iBegin,iEnd,:)=temp_type;
end
Your code isn't executable which makes it slightly tricky to work out what's going on, and as #PetrH points out I assume your indexing expression at the end is intended to be Atoms(iBegin:iEnd,:).
To make this work in parfor, you need to arrange for Atoms to be sliced (broadcast variables are inputs to the parfor loop which are constant and the same for each iteration). In other words, your indexing expression needs to be something more like
parfor i = ...
...
Atoms(i, :) = ...;
end
Having said all that, if this is your entire parfor loop, I would concentrate instead on vectorising things rather than applying parfor. It appears that the amount of work inside the parfor loop is rather small and it is unlikely to give you much benefit, whereas my guess is that vectorisation should give you much better speedup.

Matlab - Eliminating for loops

So, I am aware that there are a number of other posts about eliminating for loops but I still haven't been able to figure this out.
I am looking to rewrite my code so that it has fewer for loops and runs a little faster. The code describes an optics problem calculating the intensity of different colors after the light has propagated through a medium. I have already gotten credit for this assignment but I would like to learn of better ways than just throwing in for loops all over the place. I tried rewriting the innermost loop using recursion which worked and looked nice but was a little slower.
Any other comments/improvements are also welcome.
Thanks!
n_o=1.50;
n_eo=1.60;
d=20e-6;
N_skiv=100;
lambda=[650e-9 510e-9 475e-9];
E_in=[1;1]./sqrt(2);
alfa=pi/2/N_skiv;
delta=d/N_skiv;
points=100;
int=linspace(0,pi/2,points);
I_ut=zeros(3,points);
n_eo_theta=#(theta)n_eo*n_o/sqrt(n_o^2*cos(theta)^2+n_eo^2*sin(theta)^2);
hold on
for i=1:3
for j=1:points
J_last=J_pol2(0);
theta=int(j);
for n=0:N_skiv
alfa_n=alfa*n;
J_last=J_ret_uppg2(alfa_n, delta , n_eo_theta(theta) , n_o , lambda(i) ) * J_last;
end
E_ut=J_pol2(pi/2)*J_last*E_in;
I_ut(i,j)=norm(E_ut)^2;
end
end
theta_grad=linspace(0,90,points);
plot(theta_grad,I_ut(1,:),'r')
plot(theta_grad,I_ut(2,:),'g')
plot(theta_grad,I_ut(3,:),'b')
And the functions:
function matris=J_proj(alfa)
matris(1,1)=cos(alfa);
matris(1,2)=sin(alfa);
matris(2,1)=-sin(alfa);
matris(2,2)=cos(alfa);
end
function matris=J_pol2(alfa)
J_p0=[1 0;0 0];
matris=J_proj(-alfa)*J_p0*J_proj(alfa);
end
function matris=J_ret_uppg2(alfa_n,delta,n_eo_theta,n_o,lambda)
k0=2*pi/lambda;
J_r0_u2(1,1)=exp(1i*k0*delta*n_eo_theta);
J_r0_u2(2,2)=exp(1i*k0*n_o*delta);
matris=J_proj(-alfa_n)*J_r0_u2*J_proj(alfa_n);
end
Typically you cannot get rid of a for-loop if you are doing a calculation that depends on a previous answer, which seems to be the case with the J_last-variable.
However I saw at least one possible improvement with the n_eo_theta inline-function, instead of doing that calculation 100 times, you could instead simply change this line:
n_eo_theta=#(theta)n_eo*n_o/sqrt(n_o^2*cos(theta)^2+n_eo^2*sin(theta)^2);
into:
theta_0 = 1:100;
n_eo_theta=n_eo*n_o./sqrt(n_o^2*cos(theta_0).^2+n_eo^2*sin(theta_0).^2);
This would run as is, although you should also want to remove the variable "theta" in the for-loop. I.e. simply change
n_eo_theta(theta)
into
n_eo_theta(j)
The way of using the "." prefix in the calculations is the furthermost tool for getting rid of for-loops (i.e. using element-wise calculations). For instance; see element-wise multiplication.
You can use matrices!!!!
For example, you have the statement:
theta=int(j)
which is inside a nested loop. You can replace it by:
theta = [int(1:points);int(1:points);int(1:points)];
or:
theta = int(repmat((1:points), 3, 1));
Then, you have
alfa_n=alfa * n;
you can replace it by:
alfa_n = alfa .* (0:N_skiv);
And have all the calculation done in a row like fashion. That means, instead looping, you will have the values of a loop in a row. Thus, you perform the calculations at the rows using the MATLAB's functionalities and not looping.

How to nest multiple parfor loops

parfor is a convenient way to distribute independent iterations of intensive computations among several "workers". One meaningful restriction is that parfor-loops cannot be nested, and invariably, that is the answer to similar questions like there and there.
Why parallelization across loop boundaries is so desirable
Consider the following piece of code where iterations take a highly variable amount of time on a machine that allows 4 workers. Both loops iterate over 6 values, clearly hard to share among 4.
for row = 1:6
parfor col = 1:6
somefun(row, col);
end
end
It seems like a good idea to choose the inner loop for parfor because individual calls to somefun are more variable than iterations of the outer loop. But what if the run time for each call to somefun is very similar? What if there are trends in run time and we have three nested loops? These questions come up regularly, and people go to extremes.
Pattern needed for combining loops
Ideally, somefun is run for all pairs of row and col, and workers should get busy irrespectively of which iterand is being varied. The solution should look like
parfor p = allpairs(1:6, 1:6)
somefun(p(1), p(2));
end
Unfortunately, even if I knew which builtin function creates a matrix with all combinations of row and col, MATLAB would complain with an error The range of a parfor statement must be a row vector. Yet, for would not complain and nicely iterate over columns. An easy workaround would be to create that matrix and then index it with parfor:
p = allpairs(1:6, 1:6);
parfor k = 1:size(pairs, 2)
row = p(k, 1);
col = p(k, 2);
somefun(row, col);
end
What is the builtin function in place of allpairs that I am looking for? Is there a convenient idiomatic pattern that someone has come up with?
MrAzzman already pointed out how to linearise nested loops. Here is a general solution to linearise n nested loops.
1) Assuming you have a simple nested loop structure like this:
%dummy function for demonstration purposes
f=#(a,b,c)([a,b,c]);
%three loops
X=cell(4,5,6);
for a=1:size(X,1);
for b=1:size(X,2);
for c=1:size(X,3);
X{a,b,c}=f(a,b,c);
end
end
end
2) Basic linearisation using a for loop:
%linearized conventional loop
X=cell(4,5,6);
iterations=size(X);
for ix=1:prod(iterations)
[a,b,c]=ind2sub(iterations,ix);
X{a,b,c}=f(a,b,c);
end
3) Linearisation using a parfor loop.
%linearized parfor loop
X=cell(4,5,6);
iterations=size(X);
parfor ix=1:prod(iterations)
[a,b,c]=ind2sub(iterations,ix);
X{ix}=f(a,b,c);
end
4) Using the second version with a conventional for loop, the order in which the iterations are executed is altered. If anything relies on this you have to reverse the order of the indices.
%linearized conventional loop
X=cell(4,5,6);
iterations=fliplr(size(X));
for ix=1:prod(iterations)
[c,b,a]=ind2sub(iterations,ix);
X{a,b,c}=f(a,b,c);
end
Reversing the order when using a parfor loop is irrelevant. You can not rely on the order of execution at all. If you think it makes a difference, you can not use parfor.
You should be able to do this with bsxfun. I believe that bsxfun will parallelise code where possible (see here for more information), in which case you should be able to do the following:
bsxfun(#somefun,(1:6)',1:6);
You would probably want to benchmark this though.
Alternatively, you could do something like the following:
function parfor_allpairs(fun, num_rows, num_cols)
parfor i=1:(num_rows*num_cols)
fun(mod(i-1,num_rows)+1,floor(i/num_cols)+1);
end
then call with:
parfor_allpairs(#somefun,6,6);
Based on the answers from #DanielR and #MrAzzaman, I am posting two functions, iterlin and iterget in place of prod and ind2sub that allow iteration over ranges also if those do not start from one. An example for the pattern becomes
rng = [1, 4; 2, 7; 3, 10];
parfor k = iterlin(rng)
[plate, row, col] = iterget(rng, k);
% time-consuming computations here %
end
The script will process the wells in rows 2 to 7 and columns 3 to 10 on plates 1 to 4 without any workers idling while more wells are waiting to be processed. In hope that this helps someone, I deposited iterlin and iterget at the MATLAB File Exchange.

Parfor in MATLAB Problem

Why can't I use the parfor in this piece of code?
parfor i=1:r
for j=1:N/r
xr(j + (N/r) * (i-1)) = x(i + r * (j-1));
end
end
This is the error:
Error: The variable xr in a parfor cannot be classified.
See Parallel for Loops in MATLAB, "Overview".
The issue here is that of improper indexing of the sliced array. parfor loops are run asynchronously, meaning the order in which each iteration is executed is random. From the documentation:
MATLAB workers evaluate iterations in no particular order, and independently of each other. Because each iteration is independent, there is no guarantee that the iterations are synchronized in any way, nor is there any need for this.
You can easily verify the above statement by typing the following in the command line:
parfor i=1:100
i
end
You'll see that the ordering is arbitrary. Hence if you split a parallel job between different workers, one worker has no way of telling if a different iteration has finished or not. Hence, your variable indexing cannot depend on past/future values of the iterator.
Let me demonstrate this with a simple example. Consider the Fibonacci series 1,1,2,3,5,8,.... You can generate the first 10 terms of the series easily (in a naïve for loop) as:
f=zeros(1,10);
f(1:2)=1;
for i=3:10
f(i)=f(i-1)+f(i-2);
end
Now let's do the same with a parfor loop.
f=zeros(1,10);
f(1:2)=1;
parfor i=3:10
f(i)=f(i-1)+f(i-2);
end
??? Error: The variable f in a parfor cannot be classified.
See Parallel for Loops in MATLAB, "Overview"
But why does this give an error?
I've shown that iterations are executed in an arbitrary order. So let's say that a worker gets the loop index i=7 and the expression f(i)=f(i-1)+f(i-2);. It is now supposed to execute the expression and return the results to the master node. Now has iteration i=6 finished? Is the value stored in f(6) reliable? What about f(5)? Do you see what I'm getting at? Supposing f(5) and f(6) are not done, then you'll incorrectly calculate that the 7th term in the Fibonnaci series is 0!
Since MATLAB has no way of telling if your calculation can be guaranteed to run correctly and reproduce the same result each time, such ambiguous assignments are explicitly disallowed.