When I have the parallel computing toolbox installed and use parfor in my code, MATLAB starts the pool automatically once it reaches the parfor loop. This however makes it difficult to debug at times, which is why I would like to prevent MATLAB from opening a pool in certain situations. So, how can I tell MATLAB not to open a pool? Obviously I could go through my code and remove all parfor loops and replace them with normal for loops, but this is tedious and I might forget to undo my changes.
edit: To specify, I ideally would like the parfor loop to behave exactly like a for when setting a control or variable or something. That is, I should for example also be able to place breakpoints in the for-loop.
Under Home->parallel->parallel preferences you can deselect the check box "Automatically create a parallel pool (if one doesn't already exist) when parallel keywords are executed." This makes all the parfor loops behave as a normal for loop.
I'll get back to you if I figure out a way to do this in the code as opposed to using the check box.
Update turns out it is indeed possible to change the settings through code, although I would not recommend this, as it involves changing MATLAB's preference file. This is taken from the Undocumented MATLAB blog by Yair Altman.
ps = parallel.Settings;
ps.Pool
ans =
PoolSettings with properties:
AutoCreate: 1
RestartOnClusterChange: 1
RestartOnPreferredNumWorkersChange: 1
IdleTimeout: 30
PreferredNumWorkers: 12
where you need to change the AutoCreate switch to 0.
As alternative I'd suggest wrapping everything inside your parfor in a function, thus calling
parfor 1:N
output = function(..)
end
Now modify your script/function to have a Parallel switch on top:
if Parallel
parfor 1:N
output = function(..)
end
else
for 1:N
output = function(..)
end
end
You can edit and debug the function itself and set your switch on top of your program to execute in parallel or serial.
As well as the normal syntax
parfor i = 1:10
you can also use
parfor (i = 1:10, N)
where N is the maximum number of workers to be used in the loop. N can be a variable set by other parts of the code, so you can effectively turn on and off parallelism by setting the variable N to 1 or 0.
Edit: to be clear, this only controls the number of workers on which the code is executed (and if N is zero, whether a pool is started at all). If no pool exists, the code will execute on the client. Nevertheless, the code remains a parfor loop, which does not have the same semantics as a for loop - there are restrictions on the loop code for parfor loops that do not exist for for loops, and there is no guarantee on the order in which the loop iterations are executed.
When you use parfor, you're doing more than just saying "speed this up please". You're saying to MATLAB "I can guarantee to you that the iterations of this loop are independent, and can be executed in any order, so you will be OK if you try to parallelize it". Because you've guaranteed that, MATLAB is able to speed things up by using different semantics than it would do for a for loop.
The only way to completely get for loop behaviour is to use for, and if you need to switch back and forth for debugging purposes you'll need to comment and uncomment the for/parfor (or perhaps use an if/else block, switching between a for and a parfor depending on some variable).
I think that the way to go here, is not to disable the parfor, but rather to let it behave like a simple for.
This should be possible by setting the number of workers to 1.
parpool(1)
Depending on your code you may be able to just do this once before you run the code, or perhaps you need to do this (conditionally) each time when you set the number of workers anywhere in your code.
Related
Could you explain me why
the following code with parfor in Matlab does not work and how to fix it?
R=10;
Power=zeros(2,R);
parfor s=1:R
Power(1,s)=1
Power(2,s)=2;
end
It doesn't work because you have 1 variable that is sent to different workers (power) and you want to write on it using different cores.
How can you write in the same variable with different workers? who stores the memory? How do workers communicate where did they write and where not? The structure of the code is very important when doing parallel computing, as you need to be aware of what memory you send to what worker. Just choosing a wrong approach in passing variables can make your code slower than the non-parallel one.
The code you show can be changed to:
R=10;
Power1=zeros(1,R);
Power2=zeros(1,R);
parfor s=1:R
Power1(1,s)=1
Power2(1,s)=2;
end
Power=[Power1;Power2]
I suggest you go to http://uk.mathworks.com/help/distcomp/parallel-for-loops-parfor.html
and read the "concepts" section, especially the variable types in parfors, that the MATLAB error directs you to.
I have a program that call a m-file that contains parfor for calculation. You know that in MATLAB R2014a we don't need open parallel computing using parpool or something likes that and parfor doing the same.
My question is about closing parallel computing. If i have this structure ( only parfor ) MATLAB closing parallel computing after ending process of parfor? I'm calling this parfor every 10 seconds. I don't want MATLAB close the pool in every iteration of my system.
Thanks.
From the documentation of parpool:
If you set your parallel preferences to automatically create a
parallel pool when necessary, you do not need to explicitly call the
parpool command. You might explicitly create a pool to control when
you incur the overhead time of setting it up, so the pool is ready for
subsequent parallel language constructs.
It is true that we don't have to use parpool, but it makes sense to use it if you want to control the overhead it causes.
As for your question - take a look at the Parallel Computing Toolbox Preferences:
I believe that the highlighted option is what was bothering you. If the default timeout is too short, you could either postpone it or disable it altogether.
I'm running a large code so I would like to ensure everything is correct when it starts to run. One thing I do is to plot some data to see if they make sense. I had several such plots (figure (1), figure (2) ...) and all of them are buried in different 'if' sentences. But I found only some of them pop up during the code is running, the others just don't show up until the code finished running. I have checked all the if statements are true.
Since my code is a little large I can't put it here. Can someone tell me the possible reasons that affect when figures pop up (during the code is running or after it's finished)? Thanks a lot!
You can use the drawnow function to update your figures while code is still running. If you want your figure window to "pop up" at any given time, you may also need to explicitly call figure(figNum).
Note that while it can be fun to watch your figures update in real time, you may incur a severe performance penalty by doing so. If you have long-running loops, you might consider only showing every N-th update, where N is 10, 100, 1000, or some other appropriate value:
for iter = 1:1e6
plot(x,y);
if ~mod(f0,1000); drawnow; end
end
If you are just interested in plotting right at the beginning as a sanity check, you might run the first iteration of your code outside of a for loop, and run drawnow to make sure it plots. Then enter into the for loop starting at the second iteration. The advantage over testing iter==1 is that you don't have to waste cycles on a conditional at every iteration.
I am running a small script with a parfor loop. The script starts with the followin line:
parfor i=1:length(vX)
fprintf('%d/%d\n',i,length(X));
...
So apparently I should see printouts right away. When I run it with a pool of two workers, opened by matlabpool(2), I see no printout. When I close the workers, keeping the parfor loop, I see printouts but only when I hit ctrl-c. When I change the parfor into a regular for I see the output. Note that I never saw the loop run to completion as it is quite long, but the printout is the second line the script and should happen immediately, unless there is some buffer-flushing issue I am not aware of in matlab.
What is happening??
I suspect that the delay is due to the overhead of setting up a parfor loop.
In my experience it can take several, up to 15, seconds just to initialize the loop. This is particularly true when the code in my a loop uses variables with large memory footprints, as the variables must be copied to the workers.
The copying of data between workers is done over the network and a basic network activity monitor should reveal this activity. I find it useful to determine how much of my parfor execution time is getting eaten up initializing the workers.
I use fprintf all the time in my parfor loops; however, I try to be creative about the generated output as the loops iterations are not sequentially.
If all you are looking for is a progress indicator check out: http://www.mathworks.com/matlabcentral/fileexchange/32101-progress-monitor-progress-bar-that-works-with-parfor
I had the same problem and tried for months to find a solution. But in the end I just resorted to using
disp(['text ',num2str(variable));
I'm working with a long running parfor loop in matlab.
parfor iter=1:1000
chunk_of_work(iter);
end
There are generally about 2-3 timing outliers per run. That is to say for every 1000 chunks of work performed there are 2-3 that take about 100 times longer than the rest. As the loop nears completion, the workers that evaluated the outliers continue to run while the rest of the workers have no computational load.
This is consistent with the parfor loop distributing work statically. This is in contrast with the documentation for the parallel computing toolbox found here:
"Work distribution is dynamic. Instead of being allocated a fixed
iteration range, the workers are allocated a new iteration only after
they finish processing their current iteration, which results in an
even work load distribution."
Any ideas about what's going on?
I think the doc you quote has a pretty good description what is considered a static allocation of work: each worker "being allocated a fixed iteration range". For 4 workers, this would mean the first being assigned iter 1:250, the second iter 251:500,... or the 1:4:100 for the first, 2:4:1000 for the second and so on.
You did not say exactly what you observe, but what you describe is well consistent with dynamic workload distribution: First, the four (example) workers work on one iter each, the first one that is finished works on a fifth, the next one that is done (which may well be the same if three of the first four take somewhat longer) works on a sixth, and so on. Now if your outliers are number 20, 850 and 900 in the order MATLAB chooses to process the loop iterations and each take 100 times as long, this only means that the 21st to 320th iterations will be solved by three of the four workers while one is busy with the 20th (by 320 it will be done, now assuming roughly even distribution of non-outlier calculation time). The worker being assigned the 850th iteration will, however, continue to run even after another has solved #1000, and the same for #900. In fact, if there were about 1100 iterations, the one working on #900 should be finished roughly at the time when the others are.
[edited as the orginal wording implied MATLAB would still assign the iterations of the parfor loop in order from 1 to 1000, which should not be assumed]
So long story short, unless you find a way to process your outliers first (which of course requires you to know a priori which ones are the outliers, and to find a way to make MATLAB start the parfor loop processing with these), dynamic workload distribution alone cannot avoid the effect you observe.
Addition: I think, however, that your observation that as "the loop nears completion, the worker*s* that evaluated the outliers continue to run" seems to imply at least one of the following
The outliers somehow are among the last iterations MATLAB starts to process
You have many workers, in the order of magnitude of the number of iterations
Your estimate of the number of outliers (2-3) or your estimate of their computation time penalty (factor 100) is too low
The work distribution in PARFOR is somewhat deterministic. You can observe precisely what's going on by having each worker log to disk how things go, but basically it turns out that PARFOR divides your loop up into chunks in a deterministic way, but farms them out dynamically. Unfortunately, there's currently no way to control that chunking.
However, if you cannot predict which of your 1000 cases are going to be outliers, it's hard to imagine an efficient scheme for distributing the work.
If you can predict your outliers, you might be able to take advantage of the fact that roughly speaking, PARFOR executes loop iterations in reverse order, so you could put them at the "end" of the loop so work starts on them immediately.
The problem you face is well described in #arne.b's answer, I have nothing to add to that.
But, the parallel compute toolbox does contain functions for decomposing a job into tasks for independent execution. From your question it's not possible to conclude either that this is suitable or that this is not suitable for your application. If it is, the general strategy is to break the job into tasks of some size and have each processor tackle a task, when finished go back to the stack of unfinished tasks and start on another.
You might be able to decompose your problem such that one task replaces one loop iteration (lots of tasks, lots of overhead in managing the computation but best load-balancing) or so that one task replaces N loop iterations (fewer tasks, less overhead, poorer load-balancing). Jobs and tasks are a little trickier to implement than parfor too.
As an alternative to PARFOR, in R2013b and later, you can use PARFEVAL and divide up the work any way you see fit. You could even cancel the 'timing outliers' once you've got sufficient results, if that's appropriate. There is, of course, overhead when dividing up your existing loop into 1000 individual remote PARFEVAL calls. Perhaps that's a problem, perhaps not. Here's the sort of thing I'm imagining:
for idx = 1:1000
futures(idx) = parfeval(#chunk_of_work, 1, idx);
end
done = false; numComplete = 0;
timer = tic();
while ~done
[idx, result] = fetchNext(futures, 10); % wait up to 10 seconds
if ~isempty(idx)
numComplete = numComplete + 1;
% stash result
end
done = (numComplete == 1000) || (toc(timer) > 100);
end
% cancel outstanding work, has no effect on completed futures
cancel(futures);