I have parfor loop in matlab, when it is running, only one process is using CPU (Top and system monitor shows same CPU usage,see attached screenshot), and the parfor doesn't run faster. Why???
ubuntu 12.04 LTS, 64bits matlab 2012b
pools = matlabpool('size');
if pools ~= 10
if pools > 0
matlabpool('close');
end
matlabpool local 10; %10+ the one I am using = 11 matlab process in system monitor
end
parfor i = 1:num_utt
dojob();
end
Thanks, Marcin & Edric,
I had run a small test case as you suggest, and then I noticed that the problem is caused by the inner loop code access outer loop data, in this http://www.mathworks.com/help/distcomp/advanced-topics.html , they call it as access broadcast variables.
At the start of a parfor-loop, the values of any broadcast variables
are sent to all workers. Although this type of variable can be useful
or even essential, broadcast variables that are large can cause a lot
of communication between client and workers. In some cases it might be
more efficient to use temporary variables for this purpose, creating
and assigning them inside the loop.
For my case the broadcast variable holds lots of data, so it has problem to pass it to the worker.
After I remove some of the data, the parfor loop works fine.
Related
In my current setup I have a for loop in which I extract different type of data from a SQL database hosted on Amazon EC2. This extraction is done in the function extractData(variableName). After that the data gets parsed and stored as a mat file in parsestoreData(data):
variables = {'A','B','C','D','E'}
for i = 1:length(variables)
data = extractData(variables{i});
parsestoreData(data);
end
I would like to parallelize this extraction and parsing of the data and to speed up the process. I argue that I could do this using a parfor instead of for in the above example.
However, I am worried that the extraction will not be improved as the SQL database will get slowed down when multiple requests are made on the same database.
I am therefore wondering if Matlab can handle this issue in a smart way, in terms of parralelization?
The workers in parallel pool running parfor are basically full MATLAB processes without a UI, and they default to running in "single computational thread" mode. I'm not sure whether parfor will benefit you in this case - the parfor loop simply arranges for the MATLAB workers to execute the iterations of your loop in parallel. You can estimate for yourself how well your problem will parallelise by launching multiple full desktop MATLABs, and set them off running your problem simultaneously. I would run something like this:
maxNumCompThreads(1);
while true
t = tic();
data = extractData(...);
parsestoreData(data);
toc(t)
end
and then check how the times reported by toc vary as the number of MATLAB clients varies. If the times remain constant, you could reasonably expect parfor to give you benefit (because it means the body can be parallelised effectively). If however, the times decrease significantly as you run more MATLAB clients, then it's almost certain that parfor would experience the same (relative) slow-down.
I have a MATLAB loop that looks something like this:
S = zeros(20, 1);
for i = 1:1:20
system(command)
S(i) = fetch_results_of_command_from_file()
end
where command is a string for a system call which runs a python command, fetch...gets results of the system call, and it's shoved into a return vector S.
Some details: 'command' calls a python script. That script reads some data from a file, then performs an optimization procedure, then writes results back to file. The python script is small - less than 1,000 lines - and on a shared disk. All commands call this, but use different input data - one input file per worker. The optimization itself is a single thread. The cores are are all residing on a local machine which has over 40 CPUs, 2 cores each.
Each iteration of this loop runs in about 2 minutes, and everything is fine, but slow.
I am on a local machine with over 20 cores, and so I want to parallelize my code as such:
S = zeros(20, 1);
parfor i = 1:1:20
system(command)
S(i) = fetch_results_of_command_from_file()
end
It seems each system call deploys to a different processor just fine. I would expect this loop to run on an order of 2 minutes + a little overhead time for the parfor loop, since it can be embarrassingly parallelized.
Unfortunately, no system call finishes in over 10 minutes (it never even gets to the fetch command). Somehow, parfor is slowing every command down, as though it were being called in serial. I do have the parallel toolbox installed, so that shouldn't be the problem. What could be going on?
When I have the parallel computing toolbox installed and use parfor in my code, MATLAB starts the pool automatically once it reaches the parfor loop. This however makes it difficult to debug at times, which is why I would like to prevent MATLAB from opening a pool in certain situations. So, how can I tell MATLAB not to open a pool? Obviously I could go through my code and remove all parfor loops and replace them with normal for loops, but this is tedious and I might forget to undo my changes.
edit: To specify, I ideally would like the parfor loop to behave exactly like a for when setting a control or variable or something. That is, I should for example also be able to place breakpoints in the for-loop.
Under Home->parallel->parallel preferences you can deselect the check box "Automatically create a parallel pool (if one doesn't already exist) when parallel keywords are executed." This makes all the parfor loops behave as a normal for loop.
I'll get back to you if I figure out a way to do this in the code as opposed to using the check box.
Update turns out it is indeed possible to change the settings through code, although I would not recommend this, as it involves changing MATLAB's preference file. This is taken from the Undocumented MATLAB blog by Yair Altman.
ps = parallel.Settings;
ps.Pool
ans =
PoolSettings with properties:
AutoCreate: 1
RestartOnClusterChange: 1
RestartOnPreferredNumWorkersChange: 1
IdleTimeout: 30
PreferredNumWorkers: 12
where you need to change the AutoCreate switch to 0.
As alternative I'd suggest wrapping everything inside your parfor in a function, thus calling
parfor 1:N
output = function(..)
end
Now modify your script/function to have a Parallel switch on top:
if Parallel
parfor 1:N
output = function(..)
end
else
for 1:N
output = function(..)
end
end
You can edit and debug the function itself and set your switch on top of your program to execute in parallel or serial.
As well as the normal syntax
parfor i = 1:10
you can also use
parfor (i = 1:10, N)
where N is the maximum number of workers to be used in the loop. N can be a variable set by other parts of the code, so you can effectively turn on and off parallelism by setting the variable N to 1 or 0.
Edit: to be clear, this only controls the number of workers on which the code is executed (and if N is zero, whether a pool is started at all). If no pool exists, the code will execute on the client. Nevertheless, the code remains a parfor loop, which does not have the same semantics as a for loop - there are restrictions on the loop code for parfor loops that do not exist for for loops, and there is no guarantee on the order in which the loop iterations are executed.
When you use parfor, you're doing more than just saying "speed this up please". You're saying to MATLAB "I can guarantee to you that the iterations of this loop are independent, and can be executed in any order, so you will be OK if you try to parallelize it". Because you've guaranteed that, MATLAB is able to speed things up by using different semantics than it would do for a for loop.
The only way to completely get for loop behaviour is to use for, and if you need to switch back and forth for debugging purposes you'll need to comment and uncomment the for/parfor (or perhaps use an if/else block, switching between a for and a parfor depending on some variable).
I think that the way to go here, is not to disable the parfor, but rather to let it behave like a simple for.
This should be possible by setting the number of workers to 1.
parpool(1)
Depending on your code you may be able to just do this once before you run the code, or perhaps you need to do this (conditionally) each time when you set the number of workers anywhere in your code.
I have a quad-core desktop computer
I have the Parallel Computing toolbox in Matlab.
I have a script file that I need to run simultaneously on each core
I'm not sure what the most efficient way to do this is, I know I can create a 'matlabpool' with 4 local workers, but how do I then assign the same script to each one? Or can I use the 'batch' command to run the script on a specific thread, then do that for each one?
Thank you!
You can run a single script using multiple cores using the Parallel Computing toolbox, by using
matlabpool open local 4
and using parfor instead of for loops to execute whatever is in your loop across four threads. I'm not sure if Parallel Computing toolbox supports running the entirety of the script individually on each core, this will likely not be supported by your hardware.
Not sure if this works, but here is something to try:
When trying to paralelize calculations, they are usually wrapped with something like parfor
So I would recommend doing the same with your script, make sure that all required inputs and outputs have the neccesary dimensions and just call:
parfor ii = 1:4
myscript;
end
Sidenote: Before trying this kind of stuff you may want to check your cpu utilization. If it is already high that means that the inner part of the code uses parallel processing and you should not expect much speedup.
I was wondering when we are running spmd blocks and create individual lab workers, then how much is the memory allocated to each of them them?
I have an 8 core machine and I used 8 lab workers.
Thanks.
When you launch workers using the matlabpool command in Parallel Computing Toolbox, each worker process starts the same - they're essentially an ordinary MATLAB process but with no desktop visible. They consume memory as and when you create arrays on them. For example, in the following case, each worker uses the same amount of memory to store x:
spmd
x = zeros(1000);
end
But in the following case, each worker consumes a different amount of memory to store their copy of x:
spmd
x = zeros(100 * labindex);
end