The original code is like this:
for i = 1 : size(H, 1)
for j = 1 : size(H, 2)
H{i,j} blabla
and I tried to adapt it into parallel code like this:
parfor ind = 1 : numel(H)
[i, j] = ind2sub(ind);
H{i,j} blabla
which generates an error saying parfor cannot run due to H{i,j}.
Then what's the error here? And how can I adapt the nested loop into parfor?
One possible solution is
for i = 1 : size(H, 1)
parfor j = 1 : size(H, 2)
H{i,j} blabla
But I doubt using a parfor within another loop will multiply the overhead of parfor which results in additional computation time.
I think the error for using parfor is that Matlab is unable to detect that [i,j] is unique through the loop because it is the result of a function. Thus, for the engine, you may access to H{i,j} multiple times, iterations are not analyzed to be independent from each other.
Edit: as mentioned by patrik, you have to be sure that there is no dependence between two iterations, that is here H{i,j} does not depend on H{k,l}, i!=k and j!=l, nor the value of a variable in the iteration is used in another iteration. This requirement is the basic one to allow a parfor, except from reduction assignment.
Besides that point, if you want to run independent computations in parallel, and if it worth it, always choose to parfor the outermost loop. In addition to this, remind that Matlab does not allow nested parfor; instead, you have to make a function which runs a parfor if you want to parallelize inner for-loops. The parallelization of inner loops may not bring a speed-up (depends on how many workers are there in the parpool).
From my experience, it is not recommended to run parallel inner loops. As an example (outside Matlab), I would cite LibSVM, which recommends to parallelize only the outermost loop with openmp if you want to speed-up the computation, never other inner loops.
The reason of this recommendation is that you have a limited pool of workers, and workers may be viewed as threads; there is a limit where if you add threads, the computation run slower because of the time of switching between threads. Matlab may manage this part very well, but the point is that you will have a pool of workers limited in size. If each outermost iteration takes a lot of time and if you have many iteration, you will gain no time to parallelize inner loops because each worker will be busy to run the whole iteration (including inner loops).
Nevertheless, it's always a good thing to test each option, some of them may be counter-intuitively more adapted to your problem!
Why not simply use the linear index to assign into H? For example:
H = cell(4, 4);
parfor idx = 1:16
[i, j] = ind2sub([4, 4], idx);
H{idx} = rand(i, j); % or whatever
end
Otherwise, it's always best to make the outermost loop the PARFOR loop. The following also works:
H = cell(4, 4);
parfor r = 1:4
for c = 1:4
H{r, c} = rand(r, c);
end
end
Related
I'm trying to run this code in Matlab
a = ones(4,4);
b=[1,0,0,1;0,0,0,1;0,1,0,0;0,0,0,0];
b(:,:,2)=[0,1,1,0;1,1,1,0;1,0,1,1;1,1,1,1];
parfor i = 1:size(b,3)
c = b(:,:,i)
a(c) = i;
end
but get the error:
Error: The variable a in a parfor cannot be classified.
See Parallel for Loops in MATLAB, "Overview".
There are restrictions in how you can write into arrays inside the body of a parfor loop. In general, you will need to use sliced arrays.
The reason behind this issue is that Matlab needs to prevent that different worksers access the same data, leading to unpredictable results (as the timely order in which the parfor loops through i is not detemined).
So, although in your example the workers don't operate on the same entries of a, due to the way how you index a (with an array of logicals), it is currently not possible for Matlab to decide if this is the case or not (in other words, Matlab cannot classify a).
Edit: For completeness I add some code that is equivalent to your example, although I assume that your actual problem involves more complicated logical indexing?
a = ones(4,4,4);
parfor i = 1:size(a,1)
a(i, :, :) = zeros(4, 4) + i; % this is sliced indexing
end
Edit: As the OP example was modified, the above code is not equivalent to the example anymore.
I have a problem with MathWorks Parallel Computing Toolbox in Matlab. See my code below
for k=1:length(Xab)
n1=length(Z)*(k-1)+1:length(Z)*k;
MX_j(1,n1)=JXab{k};
MY_j(1,n1)=JYab{k};
MZ_j(1,n1)=Z;
end
for k=length(Xab)+1:length(Xab)+length(Xbc)
n2=length(Z)*(k-1)+1:length(Z)*k;
MX_j(1,n2)=JXbc{k-length(Xab)};
MY_j(1,n2)=JYbc{k-length(Yab)};
MZ_j(1,n2)=Z;
end
for k=length(Xab)+length(Xbc)+1:length(Xab)+length(Xbc)+length(Xcd)
n3=length(Z)*(k-1)+1:length(Z)*k;
MX_j(1,n3)=JXcd{k-length(Xab)-length(Xbc)};
MY_j(1,n3)=JYcd{k-length(Yab)-length(Ybc)};
MZ_j(1,n3)=Z;
end
for k=length(Xab)+length(Xbc)+length(Xcd)+1:length(Xab)+length(Xbc)+length(Xcd)+length(Xda)
n4=length(Z)*(k-1)+1:length(Z)*k;
MX_j(1,n4)=JXda{k-length(Xab)-length(Xbc)-length(Xcd)};
MY_j(1,n4)=JYda{k-length(Yab)-length(Ybc)-length(Ycd)};
MZ_j(1,n4)=Z;
end
If I change the for-loop to parfor-loop, matlab warns me that MX_j is not an efficient variable. I have no idea how to solve this and how to make these for loops compute in parallel?
For me, it looks like you can combine it to one loop. Create combined cell arrays.
JX = cat(2,JXab, JXbc, JXcd, JXda);
JY = cat(2,JYab, JYbc, JYcd, JYda);
Check for the right dimension here. If your JXcc arrays are column arrays, use cat(1,....
After doing that, one single loop should do it:
n = length(Xab)+length(Xbc)+length(Xcd)+length(Xda);
for k=1:n
k2 = length(Z)*(k-1)+1:length(Z)*k;
MX_j(1,k2)=JX{k};
MY_j(1,k2)=JY{k};
MZ_j(1,k2)=Z;
end
Before parallizing anything, check if this still valid. I haven't tested it. If everything's nice, you can switch to parfor.
When using parfor, the arrays must be preallocated. The following code could work (untested due to lack of test-data):
n = length(Xab)+length(Xbc)+length(Xcd)+length(Xda);
MX_j = zeros(1,n*length(Z));
MY_j = MX_j;
MZ_j = MX_j;
parfor k=1:n
k2 = length(Z)*(k-1)+1:length(Z)*k;
MX_j(1,k2)=JX{k};
MY_j(1,k2)=JY{k};
MZ_j(1,k2)=Z;
end
Note: As far as I can see, the parfor loop will be much slower here. You simply assign some values... no calculation at all. The setup of the worker pool will take 99.9% of the total execution time.
I am using parfor for parallel computing in Matlab. I am not familiar with this command. If that is possible, please look at my code below and tell me if I can write it with parfor. The error :
The parfor loop cannot be run due to the way variable pyra is used.
parfor i = 1:inter
scaled = resize(im, 1/sc^(i-1));
pyra.feat{i} = descripteurs(scaled,class);
pyra.scale(i) = 1/sc^(i-1);
for j = i+inter:inter:max_scale
scaled = reduce(scaled);
pyra.feat{j} = descripteurs(scaled,class);
pyra.scale(j) = 0.6 * pyra.scale(j-inter);
end
end
The issue is that your code is not parallelizable since each iteration through the parfor loop depends on other iterations of the loop.
Specifically, you're trying to access values of pyra.scale from within the inner loop that were computed on previous iterations through the outer loop. Because of this, the execution of each iteration of the parfor loop is dependent upon the previous iteration and therefore the two iterations cannot be executed in parallel (at the same time).
See more about the use of variables in parfor loops in the documentation.
parfor is a convenient way to distribute independent iterations of intensive computations among several "workers". One meaningful restriction is that parfor-loops cannot be nested, and invariably, that is the answer to similar questions like there and there.
Why parallelization across loop boundaries is so desirable
Consider the following piece of code where iterations take a highly variable amount of time on a machine that allows 4 workers. Both loops iterate over 6 values, clearly hard to share among 4.
for row = 1:6
parfor col = 1:6
somefun(row, col);
end
end
It seems like a good idea to choose the inner loop for parfor because individual calls to somefun are more variable than iterations of the outer loop. But what if the run time for each call to somefun is very similar? What if there are trends in run time and we have three nested loops? These questions come up regularly, and people go to extremes.
Pattern needed for combining loops
Ideally, somefun is run for all pairs of row and col, and workers should get busy irrespectively of which iterand is being varied. The solution should look like
parfor p = allpairs(1:6, 1:6)
somefun(p(1), p(2));
end
Unfortunately, even if I knew which builtin function creates a matrix with all combinations of row and col, MATLAB would complain with an error The range of a parfor statement must be a row vector. Yet, for would not complain and nicely iterate over columns. An easy workaround would be to create that matrix and then index it with parfor:
p = allpairs(1:6, 1:6);
parfor k = 1:size(pairs, 2)
row = p(k, 1);
col = p(k, 2);
somefun(row, col);
end
What is the builtin function in place of allpairs that I am looking for? Is there a convenient idiomatic pattern that someone has come up with?
MrAzzman already pointed out how to linearise nested loops. Here is a general solution to linearise n nested loops.
1) Assuming you have a simple nested loop structure like this:
%dummy function for demonstration purposes
f=#(a,b,c)([a,b,c]);
%three loops
X=cell(4,5,6);
for a=1:size(X,1);
for b=1:size(X,2);
for c=1:size(X,3);
X{a,b,c}=f(a,b,c);
end
end
end
2) Basic linearisation using a for loop:
%linearized conventional loop
X=cell(4,5,6);
iterations=size(X);
for ix=1:prod(iterations)
[a,b,c]=ind2sub(iterations,ix);
X{a,b,c}=f(a,b,c);
end
3) Linearisation using a parfor loop.
%linearized parfor loop
X=cell(4,5,6);
iterations=size(X);
parfor ix=1:prod(iterations)
[a,b,c]=ind2sub(iterations,ix);
X{ix}=f(a,b,c);
end
4) Using the second version with a conventional for loop, the order in which the iterations are executed is altered. If anything relies on this you have to reverse the order of the indices.
%linearized conventional loop
X=cell(4,5,6);
iterations=fliplr(size(X));
for ix=1:prod(iterations)
[c,b,a]=ind2sub(iterations,ix);
X{a,b,c}=f(a,b,c);
end
Reversing the order when using a parfor loop is irrelevant. You can not rely on the order of execution at all. If you think it makes a difference, you can not use parfor.
You should be able to do this with bsxfun. I believe that bsxfun will parallelise code where possible (see here for more information), in which case you should be able to do the following:
bsxfun(#somefun,(1:6)',1:6);
You would probably want to benchmark this though.
Alternatively, you could do something like the following:
function parfor_allpairs(fun, num_rows, num_cols)
parfor i=1:(num_rows*num_cols)
fun(mod(i-1,num_rows)+1,floor(i/num_cols)+1);
end
then call with:
parfor_allpairs(#somefun,6,6);
Based on the answers from #DanielR and #MrAzzaman, I am posting two functions, iterlin and iterget in place of prod and ind2sub that allow iteration over ranges also if those do not start from one. An example for the pattern becomes
rng = [1, 4; 2, 7; 3, 10];
parfor k = iterlin(rng)
[plate, row, col] = iterget(rng, k);
% time-consuming computations here %
end
The script will process the wells in rows 2 to 7 and columns 3 to 10 on plates 1 to 4 without any workers idling while more wells are waiting to be processed. In hope that this helps someone, I deposited iterlin and iterget at the MATLAB File Exchange.
I want to parallelize block2 for each block1 and parallerlize outer loop too.
previous code:
for i=rangei
<block1>
for j=rangej
<block2> dependent on <block1>
end
end
changed code:
parfor i=rangei
<block1>
parfor j=rangej
<block2> dependent on <block1>
end
end
how much efficient can this get and will the changed code do the right thing?
Is the changed code valid for my requirements?
In MATLAB, parfor cannot be nested. Which means, in your code, you should replace one parfor by a for (the outer loop most likely). More generally, I advise you to look at this tutorial on parfor.
parfor cannot be nested. In nested parfor statements, only the outermost call to parfor is paralellized, which means that the inner call to parfor only adds unnecessary overhead.
To get high efficiency with parfor, the number of iterations should be much higher than the number of workers (or an exact multiple in case each iteration takes the same time), and you want a single iteration to take more than just a few milliseconds to avoid feeling the overhead from paralellization.
parfor i=rangei
<block1>
for j=rangej
<block2> dependent on <block1>
end
end
may actually fit that description, depending on the size of rangei. Alternatively, you may want to try unrolling the nested loop into a single loop, where you iterate across linear indices.
The following code uses a single parfor loop to implicitly manage two nested loops. The loop1_index and loop2_index are the ranges, and the loop1_counter and loop2_counter are the actual loop iterators. Also, the iterators are put in reverse order in order to have a better load balance, because usually the load of higher range values is bigger than those of smaller values.
loop1_index = [1:5]
loop2_index = [1:4]
parfor temp_label_parfor = 1 : numel(loop1_index) * numel(loop2_index)
[loop1_counter, loop2_counter] = ind2sub([numel(loop1_index), numel(loop2_index)], temp_label_parfor)
loop1_counter = numel(loop1_index) - loop1_counter + 1;
loop2_counter = numel(loop2_index) - loop2_counter + 1;
end
You can't use nested parfor, From your question it seems that you are working on a matrix( with parameter i,j),
try using blockproc, go through this link once blockproc