I am now trying to do parallel computing in Matlab and want to use parfor loop to improve the efficiency. The problem is I can guarantee that each loop is independent with each other but I finally need to update a global variable (maybe called broadcast variable in Matlab), when I want to assign some value to it there is a problem says it can't be classified. If I still want to do it in this Matlab, how can I solve this problem or is there any other way I can try to improve the efficiency?
The code is like this:
Atoms(1:nOfAtomsInTwoDim,:)=TwoDimAtoms;
odd_type=TwoDimAtoms;
even_type=TwoDimAtoms;
even_type(:,1)=TwoDimAtoms(:,1)+LatticeSpacing/2;
even_type(:,2)=TwoDimAtoms(:,2)+LatticeSpacing/2;
parfor i=2:1:nOflayers+1
temp_type=TwoDimAtoms;
if mod(i,2)
temp_type=odd_type;
temp_type(:,3)=TwoDimAtoms(:,3)+(i-1)*LatticeSpacing/2;
else
temp_type=even_type;
temp_type(:,3)=TwoDimAtoms(:,3)+(i-1)*LatticeSpacing/2;
end
iBegin=(i-1)*nOfAtomsInTwoDim+1;
iEnd=i*nOfAtomsInTwoDim;
Atoms(iBegin,iEnd,:)=temp_type;
end
Your code isn't executable which makes it slightly tricky to work out what's going on, and as #PetrH points out I assume your indexing expression at the end is intended to be Atoms(iBegin:iEnd,:).
To make this work in parfor, you need to arrange for Atoms to be sliced (broadcast variables are inputs to the parfor loop which are constant and the same for each iteration). In other words, your indexing expression needs to be something more like
parfor i = ...
...
Atoms(i, :) = ...;
end
Having said all that, if this is your entire parfor loop, I would concentrate instead on vectorising things rather than applying parfor. It appears that the amount of work inside the parfor loop is rather small and it is unlikely to give you much benefit, whereas my guess is that vectorisation should give you much better speedup.
Related
The following loop results in an error in C_mat and B_mat:
%previously defined
N_RIPETIZIONI=2;
K=201;
parfor n=1:N_RIPETIZIONI*K
[r,k]=ind2sub([N_RIPETIZIONI,K],n);
B=B_mat{r};
C=C_mat{r};
end
The warning says:
The entire array or structure B_mat is a broadcast variable. This might result in unnecessary communication overhead.
The same for C_mat.
How can I fix it so that the indices of B_mat and C_mat are no more broadcast variables?
The issue is that the way you index B_mat (i.e. not using n), every thread in the parfor requires the entirety of B_mat to run. The big bottleneck in parfor code is transferring copies of the data to each node.
MATLAB is basically telling you that if you were to do this, you may actually have slower code than otherwise. Its not that B_mat is some type of variable called "broadcast", its that the way you wrote the code, each n in parfor requires a copy of B_mat.
I assume this is not your real code, so we can't really help you fix it, but hopefully this explains it.
I have a GA code that I developed myself. Since I'm new to coding, my code is not fast. I have a Dual-Core CPU 2.6GHz.
The only line of the code that takes a long time to run is the fitness function. I am not familiar with the GA toolbox and my fitness function is quite complex so I assume even if I knew how to use the GA toolbox, I would have to code the fitness function myself.
The algoritm's structure is as follows:
after generating the initial generation and evaluating the fitness values (which takes long but does not matter that much because this is only run once), it starts a loop which will be iterated for up to 10000 times. In each iteration, we have a new generation whose fitness values needs to be calculated. So when a new generation of 50 individuals is generated, the whole generation is fed to the fitness_function. In this function there is a for loop which calculates the fitness value for each 50 individual (so the for loop is iterated 50 times). Here is my question. How should I use parfor so that 25 individual is evaluated by one CPU core and the other 25 individuals with the other core, so that the calculation time is decreased to almost half. I already know from here
I have tried changing the for loop in the fitness_function directly to parfor and I have received the following error: "The PARFOR loop cannot run due to the way variable "Z" is used." and "Variable z is indexed in different ways. Potentially causing dependencies between iterations." Variable Z is a 50*3 matrix which stores the fitness values for each of the individuals.
The problem with your assignment into Z is that you have three different assignment statements, and that is not allowed. You need to make the assignment into Z meet the requirements for a "sliced" variable. The easiest way to do this is to make a temporary variable Zrow to store the values for the ith row of Z, and then make a single assignment, like this
parfor i = 1:50
Zrow = zeros(1, 3); % allocate to ensure parfor knows this is a temporary
...
Zrow(1) = TTT;
...
Zrow(2) = sum(FSL,1);
Zrow(3) = 0.5*Z(i,1)+0.5*Z(i,2);
% Finally, make a single sliced assignment into Z
Z(i, :) = Zrow;
end
Also, in general, it's best to have the parfor loop be the outermost one. Also, whether parfor actually gives you any speed-up depends a lot on whether the body of the loop is already being multithreaded by MATLAB's built-in multithreaded capabilities. (If it is, then parfor using only your local machine cannot make things faster because in that case, the multithreaded code is already taking full advantage of your computer's resources).
I have a problem with MathWorks Parallel Computing Toolbox in Matlab. See my code below
for k=1:length(Xab)
n1=length(Z)*(k-1)+1:length(Z)*k;
MX_j(1,n1)=JXab{k};
MY_j(1,n1)=JYab{k};
MZ_j(1,n1)=Z;
end
for k=length(Xab)+1:length(Xab)+length(Xbc)
n2=length(Z)*(k-1)+1:length(Z)*k;
MX_j(1,n2)=JXbc{k-length(Xab)};
MY_j(1,n2)=JYbc{k-length(Yab)};
MZ_j(1,n2)=Z;
end
for k=length(Xab)+length(Xbc)+1:length(Xab)+length(Xbc)+length(Xcd)
n3=length(Z)*(k-1)+1:length(Z)*k;
MX_j(1,n3)=JXcd{k-length(Xab)-length(Xbc)};
MY_j(1,n3)=JYcd{k-length(Yab)-length(Ybc)};
MZ_j(1,n3)=Z;
end
for k=length(Xab)+length(Xbc)+length(Xcd)+1:length(Xab)+length(Xbc)+length(Xcd)+length(Xda)
n4=length(Z)*(k-1)+1:length(Z)*k;
MX_j(1,n4)=JXda{k-length(Xab)-length(Xbc)-length(Xcd)};
MY_j(1,n4)=JYda{k-length(Yab)-length(Ybc)-length(Ycd)};
MZ_j(1,n4)=Z;
end
If I change the for-loop to parfor-loop, matlab warns me that MX_j is not an efficient variable. I have no idea how to solve this and how to make these for loops compute in parallel?
For me, it looks like you can combine it to one loop. Create combined cell arrays.
JX = cat(2,JXab, JXbc, JXcd, JXda);
JY = cat(2,JYab, JYbc, JYcd, JYda);
Check for the right dimension here. If your JXcc arrays are column arrays, use cat(1,....
After doing that, one single loop should do it:
n = length(Xab)+length(Xbc)+length(Xcd)+length(Xda);
for k=1:n
k2 = length(Z)*(k-1)+1:length(Z)*k;
MX_j(1,k2)=JX{k};
MY_j(1,k2)=JY{k};
MZ_j(1,k2)=Z;
end
Before parallizing anything, check if this still valid. I haven't tested it. If everything's nice, you can switch to parfor.
When using parfor, the arrays must be preallocated. The following code could work (untested due to lack of test-data):
n = length(Xab)+length(Xbc)+length(Xcd)+length(Xda);
MX_j = zeros(1,n*length(Z));
MY_j = MX_j;
MZ_j = MX_j;
parfor k=1:n
k2 = length(Z)*(k-1)+1:length(Z)*k;
MX_j(1,k2)=JX{k};
MY_j(1,k2)=JY{k};
MZ_j(1,k2)=Z;
end
Note: As far as I can see, the parfor loop will be much slower here. You simply assign some values... no calculation at all. The setup of the worker pool will take 99.9% of the total execution time.
I want to parallelize block2 for each block1 and parallerlize outer loop too.
previous code:
for i=rangei
<block1>
for j=rangej
<block2> dependent on <block1>
end
end
changed code:
parfor i=rangei
<block1>
parfor j=rangej
<block2> dependent on <block1>
end
end
how much efficient can this get and will the changed code do the right thing?
Is the changed code valid for my requirements?
In MATLAB, parfor cannot be nested. Which means, in your code, you should replace one parfor by a for (the outer loop most likely). More generally, I advise you to look at this tutorial on parfor.
parfor cannot be nested. In nested parfor statements, only the outermost call to parfor is paralellized, which means that the inner call to parfor only adds unnecessary overhead.
To get high efficiency with parfor, the number of iterations should be much higher than the number of workers (or an exact multiple in case each iteration takes the same time), and you want a single iteration to take more than just a few milliseconds to avoid feeling the overhead from paralellization.
parfor i=rangei
<block1>
for j=rangej
<block2> dependent on <block1>
end
end
may actually fit that description, depending on the size of rangei. Alternatively, you may want to try unrolling the nested loop into a single loop, where you iterate across linear indices.
The following code uses a single parfor loop to implicitly manage two nested loops. The loop1_index and loop2_index are the ranges, and the loop1_counter and loop2_counter are the actual loop iterators. Also, the iterators are put in reverse order in order to have a better load balance, because usually the load of higher range values is bigger than those of smaller values.
loop1_index = [1:5]
loop2_index = [1:4]
parfor temp_label_parfor = 1 : numel(loop1_index) * numel(loop2_index)
[loop1_counter, loop2_counter] = ind2sub([numel(loop1_index), numel(loop2_index)], temp_label_parfor)
loop1_counter = numel(loop1_index) - loop1_counter + 1;
loop2_counter = numel(loop2_index) - loop2_counter + 1;
end
You can't use nested parfor, From your question it seems that you are working on a matrix( with parameter i,j),
try using blockproc, go through this link once blockproc
I have this piece of code which I want to run in parallel. One of the arguments passed to the function containing the parfor loop is a function handle which then is executed inside the parfor loop. Like this
[X] = randstep( X,params,roomfun )
...
parfor i=1:N
while ~ok
X(:,i) = A*X(:,i);
if roomfun(X(:,i))
ok = 1;
end
end
end
However, MATLAB complains about roomfun, saying it is indexed but not sliced. This is of course not the case since it is function which can be executed fine without the other loop iterations.
Is there any way I can tell matlab this is a function or maybe another way I can organize the loops in order to get this running in parallel?
First of all, this is only a warning, not an error. So the parfor loop should run just fine.
Secondly, the warning about something being indexed but not sliced comes from mlint, which, while pretty good, doesn't understand everything, so if it interprets something wrong, don't sweat it.
Thirdly, in R2012b, I don't even get the warning anymore (mlint has gotten smarter), which further shows that there's nothing to worry about.
Finally, if there really was a slicing problem, it's still not an issue that would prevent the loop from running in parallel - all the warning says is that an array is sent in its entirety to the workers, which can slow down the parfor loop in case the array is large.