I was wondering when we are running spmd blocks and create individual lab workers, then how much is the memory allocated to each of them them?
I have an 8 core machine and I used 8 lab workers.
Thanks.
When you launch workers using the matlabpool command in Parallel Computing Toolbox, each worker process starts the same - they're essentially an ordinary MATLAB process but with no desktop visible. They consume memory as and when you create arrays on them. For example, in the following case, each worker uses the same amount of memory to store x:
spmd
x = zeros(1000);
end
But in the following case, each worker consumes a different amount of memory to store their copy of x:
spmd
x = zeros(100 * labindex);
end
Related
I am writing a matlab code, which does some operations on a large matrix. First I create three 3D array
dw2 = 0.001;
W2 = [0:dw2:1];
dp = 0.001;
P1 = [dp:dp:1];
dI = 0.001;
I = [1:-dI:0];
[II,p1,ww2] = ndgrid(I,P1,W2);
Then my code basically does the following
G = [0:0.1:10]
Y = zeros(length(G),1)
for i = 1:1:length(G)
g = G(i);
Y(i) = myfunction(II,p1,ww2,g)
end
This code roughly takes about 100s, with each iteration being nearly 10s.
However, after I start parfor
ProcessPool with properties:
Connected: true
NumWorkers: 48
Cluster: local
AttachedFiles: {}
AutoAddClientPath: true
IdleTimeout: 30 minutes (30 minutes remaining)
SpmdEnabled: true
Then it is like running forever. The maximum number of workers is 48. I've also tried 2, 5, 10. All of these are slower than non-parallel computing. Is that because matlab copied II,p1,ww2 48 times and that causes the problem? Also myfunction involves a lot of vectorization. I have already optimized the myfunction. Will that lead to slow performance of parfor? Is there a way to utilize (some of) the 48 workers to speed up the code? Any comments are highly appreciated. I need to run millions of cases. So I really hope that I can utilize the 48 workers in some way.
It seems that you have large data, and a lot of cores. It is likely that you simply run out of memory, which is why things get so slow.
I would suggest that you set up your workers to be threads, not separate processes.
You can do this with parpool('threads'). Your code must conform to some limitations, not all code can be run this way, see here.
In thread-based parallelism, you have shared memory (arrays are not copied). In process-based parallelism, you have 48 copies of MATLAB running on your computer at the same time, each needing their own copy of your data. That latter system was originally designed to work on a compute cluster, and was later retrofitted to work on a single machine with two or four cores. I don’t think it was ever meant for 48 cores.
If you cannot use threads with your code, configured your parallel pool to have fewer workers. For example parpool('local',8).
For more information, see this documentation page.
I've had some parfor code running for around a day in order to perform grid search on classifier parameters. Anyway, from the output I'm able to tell that I'm about 95% of the way through the search. I had started my pool with 8 workers. From looking at task manager, it appears that only two of the workers are still running. This is my assumption given two MATLAB.exe processes are at 700MB and six are at 170MB. Anyway, my real concern is that all 8 of these MATLAB.exe instances have static memory usage. I.e., memory usage is not jumping around, which is what I would typically see. In the past, when not using parfor I would assume this means the program has crashed and I'll have to restart. MATLAB GUI is responding and usable.
I'm unsure what to think of this though when using the parallel computing. Anyone experienced this before? I'm running MATLAB R2013a
I don't think there's cause for concern just yet. The MATLAB processes will always use some memory even when idle and 170 MB is not unusual. In fact on my machine, if I start a pool of 4 workers using 'local', and do nothing, each worker uses around 250 MB. The worker processes will continue to exist and remain in an idle state until you close the pool.
I'm using the TreeBagger class provided by Matlab (R2014a&b), in conjunction with the distributed computing toolbox. I have a local cluster running, with 30 workers, on a Windows 7 machine with 40 cores.
I call the TreeBagger constructor to generate a regression forest (an ensemble containing 32 trees), passing an options structure with 'UseParallel' set to 'always'.
However, TreeBagger seems to only make use of 8 or so workers, out of the 30 available (judging by CPU usage per process, observed using the Task Manager). When I try to test the pool with a simple parfor loop:
parfor i=1:30
a = fft(rand(20000));
end
Then all 30 workers are engaged.
My question is:
(How) can I force TreeBagger to use all available resources?
Based on the documentation for the TreeBagger class it would appear that the operations required are quite memory intensive. Without knowing more about the internal scheduling system used by Matlab it seems likely that distributing the workload across fewer workers with more memory for each worker is what the scheduler believes will be the most efficient way to solve the problem.
The number of workers used/available may also depend on the number of physical cores on the system(which is different from the number of hyper threaded cores), as well as the resources Matlab is allowed to consume.
Splitting memory intensive tasks across a less than maximum number of workers is a common technique in HPC for some types of problems.
maxNumCompThreads is deprecated, but is it still working with R2014a?
I tried to force a script to use a single computational thread, but it uses 2 logical cores:
maxNumCompThreads(1); % limit MATLAB to a single computational thread.
signal = rand(1, 1000000);
for i=1:100
cwt(signal,1:10,'sym2');
i
end
Any idea why?
Setting the -singleCompThread option when starting MATLAB does work fine (the script then uses one core only).
Note that my computer has hyperthreading, so 2 logical cores is actually only 1 physical core but usually Matlab count with logical cores, not physical ones (e.g. when setting the number of cores in a parallel pool).
I have parfor loop in matlab, when it is running, only one process is using CPU (Top and system monitor shows same CPU usage,see attached screenshot), and the parfor doesn't run faster. Why???
ubuntu 12.04 LTS, 64bits matlab 2012b
pools = matlabpool('size');
if pools ~= 10
if pools > 0
matlabpool('close');
end
matlabpool local 10; %10+ the one I am using = 11 matlab process in system monitor
end
parfor i = 1:num_utt
dojob();
end
Thanks, Marcin & Edric,
I had run a small test case as you suggest, and then I noticed that the problem is caused by the inner loop code access outer loop data, in this http://www.mathworks.com/help/distcomp/advanced-topics.html , they call it as access broadcast variables.
At the start of a parfor-loop, the values of any broadcast variables
are sent to all workers. Although this type of variable can be useful
or even essential, broadcast variables that are large can cause a lot
of communication between client and workers. In some cases it might be
more efficient to use temporary variables for this purpose, creating
and assigning them inside the loop.
For my case the broadcast variable holds lots of data, so it has problem to pass it to the worker.
After I remove some of the data, the parfor loop works fine.