I've got three functions (qrcalc, zcalc, pcalc) with three unique set of inputs which I want to run in parallel. This is my attempt but it doesn't work:
function [outall]=parallelfunc(in1,in2,in3)
if parpool('size') == 0 % checking to see if pool is already open
A=feature('numCores');
parpool('local',A);
else
parpool close
A=feature('numCores');
parpool('local',A);
end
spmd
if labindex==2
out1=qrcalc(in1);
elseif labindex==3
out2=zcalc(in2);
elseif labindex==4
out3=pcalc(in3);
end
outall=[out1;out2;out3];
end
Error: Error using parallelattempt>(spmd body) (line 20) Error
detected on worker 3. An UndefinedFunction error was thrown on the
workers for 'out1'. This may be because the file containing 'out1' is
not accessible on the workers. Specify the required files for this
parallel pool using the command: addAttachedFiles(pool, ...). See the
documentation for parpool for more details.
Error in parallelattempt>(spmd) (line 11) spmd
Error in parallelattempt (line 11) spmd
Are there any suggestions for how this can be done?
Here is a version of the code that does not require the custom functions. Therefore I replaced them with zeros, magic and ones:
function [outall]=parallelattempt(in1,in2,in3)
poolobj = gcp;
addAttachedFiles(poolobj,{'zeros.m','ones.m','magic.m'})
spmd
if labindex==2
out1=zeros(in1);
elseif labindex==3
out2=magic(in2);
elseif labindex==4
out3=ones(in3);
end
outall=[out1;out2;out3];
end
If you use an spmd-statement, the code inside will be sent to all workers. By the use of labindex you only create the variables outX on one specific worker. The problem is, that outall=[out1;out2;out3]; should now be executed on workers where two outX-variables are not declared. The direct fix for this error is to declare the variables before the spmd-statement (out1=[]; out2=[]; out3=[];). But this is not the best solution.
You can use a single variable inside the spmd-statement instead of several ones (outX), lets call this variable out. Now the code executes on each worker and stores its result in out, which is a Composite-object. Concatenation is not necessary because it is done automatically. Additionally, you can specify with spmd (3) at the beginning of the block that only 3 workers should be used. Composite-objects can be indexed like cell arrays where the index equals to the number of the worker/lab. Therefore we can concatenate it after the block.
This is the specific code for that part:
spmd (3)
if labindex==1
out = qrcalc(in1);
elseif labindex==2
out = zcalc(in2);
elseif labindex==3
out = pcalc(in3);
end
end
outall = [out{1};out{2};out{3}];
Note that the creation of the pool will be done automatically, if none exists. You may need to attach the files of your functions before the statement.
An even better approach in my opinion is to use parfeval here. This does exactly what you want to achieve in the first place - it solves your initial problem. The outX variables get calculated in parallel (non-blocking). With the function fetchOutputs you can block execution until the result is calculated. Use it on all the outX-variables and concatenate it in the same line.
Here is the code for that:
out1 = parfeval(#qrcalc,1,in1);
out2 = parfeval(#zcalc,1,in2);
out3 = parfeval(#pcalc,1,in3);
outall = [fetchOutputs(out1);fetchOutputs(out2);fetchOutputs(out3)];
Related
I asked today a question about Parallel Computing with Matlab-Simulink. Since my earlier question is a bit messy and there are a lot of things in the code which doesnt really belong to the problem.
My problem is
I want to simulate something in a parfor-Loop, while my Simulink-Simulation uses the "From Workspace" block to integrate the needed Data from the workspace into the simulation. For some reason it doesnt work.
My code looks as follows:
load DemoData
path = pwd;
apool = gcp('nocreate');
if isempty(apool)
apool = parpool('local');
end
parfor k = 1 : 2
load_system(strcat(path,'\DemoMDL'))
set_param('DemoMDL/Mask', 'DataInput', 'DemoData')
SimOut(k) = sim('DemoMDL')
end
delete(apool);
My simulation looks as follows
The DemoData-File is just a zeros(100,20)-Matrix. It's an example for Data.
Now if I simulate the Script following error message occures:
Errors
Error using DemoScript (line 9)
Error evaluating parameter 'DataInput' in 'DemoMDL/Mask'
Caused by:
Error using parallel_function>make_general_channel/channel_general (line 907)
Error evaluating parameter 'DataInput' in 'DemoMDL/Mask'
Error using parallel_function>make_general_channel/channel_general (line 907)
Undefined function or variable 'DemoData'.
Now do you have an idea why this happens??
The strange thing is, that if I try to acces the 'DemoData' inside the parfor-Loop it works. For excample with that code:
load DemoData
path = pwd;
apool = gcp('nocreate');
if isempty(apool)
apool = parpool('local');
end
parfor k = 1 : 2
load_system(strcat(path,'\DemoMDL'))
set_param('DemoMDL/Mask', 'DataInput', 'DemoData')
fprintf(num2str(DemoData))
end
delete(apool);
Thats my output without simulating and displaying the Data
'>>'DemoScript
00000000000000000 .....
Thanks a lot. That's the original question with a lot more (unnecessary) details:
EarlierQuestion
I suspect the issue is that when MATLAB is pre-processing the parfor loop to determine what variables need to be passed to the workers it does not know what DemoData is. In your first example it's just a string, so no data gets sent over. In your second example it explicitly knows about the variable and hence does pass it over.
You could try either using the Model Workspace, or perhaps just inserting the line
DemoData = DemoData;
in the parfor loop code.
Your error is because workers did not have access to DemoData in the client workspace.
When running parallel simulations with Simulink it would be easier to manage data from workspace if you move them to model workspace. Then each worker can access this data from its model workspace. You can load a MAT file or write MATLAB code to initialize data in model workspace. You can access model workspace using the Simulink model menu View->Model Explorer->Model Workspace.
Also see documentation at http://www.mathworks.com/help/simulink/ug/running-parallel-simulations.html that talks about "Resolving workspace access issues".
You can also move the line
load DemoData
to within the parfor loop. Doing this, you assure that the data will be available in each worker base workspace, wich is accessible to the model, instead of the client workspace.
When an out-of-memory error is raised in a parfor, is there any way to kill only one Matlab slave to free some memory instead of having the entire script terminate?
Here is what happens by default when an out-of-memory error occurs in a parfor: the script terminated, as shown in the screenshot below.
I wish there was a way to just kill one slave (i.e. removing a worker from parpool) or stop using it to release as much memory as possible from it:
If you get a out of memory in the master process there is no chance to fix this. For out of memory on the slave, this should do it:
The simple idea of the code: Restart the parfor again and again with the missing data until you get all results. If one iteration fails, a flag (file) is written which let's all iterations throw an error as soon as the first error occurred. This way we get "out of the loop" without wasting time producing other out of memory.
%Your intended iterator
iterator=1:10;
%flags which indicate what succeeded
succeeded=false(size(iterator));
%result array
result=nan(size(iterator));
FLAG='ANY_WORKER_CRASHED';
while ~all(succeeded)
fprintf('Another try\n')
%determine which iterations should be done
todo=iterator(~succeeded);
%initialize array for the remaining results
partresult=nan(size(todo));
%initialize flags which indicate which iterations succeeded (we can not
%throw erros, it throws aray results)
partsucceeded=false(size(todo));
%flag indicates that any worker crashed. Have to use file based
%solution, don't know a better one. #'
delete(FLAG);
try
parfor falseindex=1:sum(~succeeded)
realindex=todo(falseindex);
try
% The flag is used to let all other workers jump out of the
% loop as soon as one calculation has crashed.
if exist(FLAG,'file')
error('some other worker crashed');
end
% insert your code here
%dummy code which randomly trowsexpection
if rand<.5
error('hit out of memory')
end
partresult(falseindex)=realindex*2
% End of user code
partsucceeded(falseindex)=true;
fprintf('trying to run %d and succeeded\n',realindex)
catch ME
% catch errors within workers to preserve work
partresult(falseindex)=nan
partsucceeded(falseindex)=false;
fprintf('trying to run %d but it failed\n',realindex)
fclose(fopen(FLAG,'w'));
end
end
catch
%reduce poolsize by 1
newsize = matlabpool('size')-1;
matlabpool close
matlabpool(newsize)
end
%put the result of the current iteration into the full result
result(~succeeded)=partresult;
succeeded(~succeeded)=partsucceeded;
end
After quite bit of research, and a lot of trial and error, I think I may have a decent, compact answer. What you're going to do is:
Declare some max memory value. You can set it dynamically using the MATLAB function memory, but I like to set it directly.
Call memory inside your parfor loop, which returns the memory information for that particular worker.
If the memory used by the worker exceeds the threshold, cancel the task that worker was working on. Now, here it get's a bit tricky. Depending on the way you're using parfor, you'll either need to delete or cancel either the task or worker. I've verified that it works with the code below when there is one task per worker, on a remote cluster.
Insert the following code at the beginning of your parfor contents. Tweak as necessary.
memLimit = 280000000; %// This doesn't have to be in parfor. Everything else does.
memData = memory;
if memData.MemUsedMATLAB > memLimit
task = getCurrentTask();
cancel(task);
end
Enjoy! (Fun question, by the way.)
One other option to consider is that since R2013b, you can open a parallel pool with 'SpmdEnabled' set to false - this allows MATLAB worker processes to die without the whole pool being shut down - see the doc here http://www.mathworks.co.uk/help/distcomp/parpool.html . Of course, you still need to arrange somehow to shutdown the workers.
I'm trying to parallelize my code, and I finally got the parfor loops set up such that Matlab doesn't crash every time. However, I've now got an error that I can't seem to figure out.
I have a driver script (Driver12.m) that calls the script that I'm trying to parallelize (Worker12.m). If I run Worker12.m directly, it usually finishes with no problem. However, every time I try to run it from Driver12.m, it either 1) causes Matlab to crash, or 2) throws a strange error at me. Here's some of my code:
%Driver script
run('(path name)/Worker12.m');
%Relevant worker script snippet
parfor q=1:number_of_ranges
timenumber = squeeze(new_TS(q,:,:));
timenumber_shift = circshift(timenumber, [0 1]);
for m = 1:total_working_channels
timenumberm = timenumber(m,:);
for n = 1:total_working_channels
R_P(m,n,q) = mean(timenumberm.*conj(timenumber(n,:)),2);
R_V(m,n,q) = mean(timenumberm.*conj(timenumber_shift(n,:)),2);
end
end
end
Outcome #1: "Matlab has encountered an unexpected error and needs to close."
Outcome #2: "An UndefinedFunction error was thrown on the workers for ''. This might be because the file containing '' is not accessible on the workers. Use addAttachedFiles(pool, files) to specify the required files to be attached. See the documentation for 'parallel.Pool/addAttachedFiles' for more details. Caused by: Undefined function or variable ""."
However, if I run Worker12.m directly, it works fine. It's only when I run it from the driver script that I get issues. Obviously, this error message from Outcome #2 isn't all that useful. Any suggestions?
Edit: So I created a toy example that reproduces an error, but now both my toy example and the original code are giving me a new, 3rd error. Here's the toy example:
%Driver script
run('parpoolexample.m')
%parpoolexample.m
clear all
new_TS = rand([1000,32,400]);
[number_of_ranges,total_working_channels,~] = size(new_TS);
R_P = zeros(total_working_channels,total_working_channels,number_of_ranges);
R_V = zeros(total_working_channels,total_working_channels,number_of_ranges);
parfor q=1:number_of_ranges
timenumber = squeeze(new_TS(q,:,:));
timenumber_shift = circshift(timenumber, [0 1]);
for m = 1:total_working_channels
timenumberm = timenumber(m,:);
for n = 1:total_working_channels
R_P(m,n,q) = mean(timenumberm.*conj(timenumber(n,:)),2);
R_V(m,n,q) = mean(timenumberm.*conj(timenumber_shift(n,:)),2);
end
end
end
Outcome #3: "Index exceeds matrix dimensions (line 7)."
So, at the 'parfor' line, it's saying that I'm exceeding the matrix dimensions, even though I believe that should not be the case. Now I can't even get my original script to recreate Outcomes #1 or #2.
Don't use run with parallel language constructs like parfor and spmd. Unfortunately it doesn't work very well. Instead, use cd or addpath to let MATLAB see your script.
I'm sorry that currently I'm not able to boil my code down to a minimal example.
It's a huge bunch of image processing code.
I have a loop that iterates over images (descriptors in variable stphogs) and for each image runs a detection.
function hogpatches = extractDetectionsFromImages(stphogs, poselet)
hogpatches = cell(1,length(stphogs));
parfor i = 1:length(stphogs)
tmp = extractDetectionsFromImage(stphogs(i), poselet); %e.g. 1x6 struct
if ~isempty(tmp)
hogpatches{i} = tmp;
end
end
hogpatches = cell2mat(hogpatches);
end
So this is the main loop. But the function calls in extractDetectionsFromImage go very deep.
My problem: Running this with a normal for-loop gives the correct result. When using PARFOR as above, hogpatches only contains 5 instead of 18 structs.
Where can I start to look for the error? I had a global variable the program did change. I removed that already. There is still a global variable 'config' which is however only read.. Any other hints? What could be the problem?
EDIT:
Even if I just run one iteration (size of stphogs is 1), the parfor fails. It doesn't have anything to do with the isempty part. Problem persists if I remove that.
EDIT2:
Ok here I boiled it to a minimal working example.
It is indeed caused by a global variable:
function parGlobalTest()
global testVar;
testVar = 123;
parfor i = 1:1
fprintf('A Value: %d\n', testVar);
testFunction();
end
end
function testFunction()
global testVar;
fprintf('B Value: %d\n', testVar);
end
In this example. The output for A will be 123, for B it will be nothing (undefined).
Why is that?
Ok, here is my solution:
function syncTestVar()
global testVar;
save('syncvar.mat', 'testVar');
pctRunOnAll global testVar;
pctRunOnAll load('syncvar.mat');
end
If someone has a better approach please tell me... This one works though
Btw: The save/load is needed because in my real program, testVar is a complex struct
I have a matlab script which executes 5 algorithm sequentially. All these 5 algorithms needs to run for 10 different initialization.
Whenever there is an error in i-th initialization, the script exit with an error message. I fix the issue(say, data issue) and start running the script again which executes from the first initialization.
I dont want to my code to run for previously executed initialization. ( from 1 run to i-1 the run)
One way is to reassign the value of index to start from i, which in turn require to modify the scrip everytime again and again.
Is there any way to restart the script from the i-th initialization onwards which dont require to modify the script?
I suggest that you use try and catch, and check which indexes succeeded.
function errorIndexes = myScript(indexes)
errorIndexes = [];
errors = {};
for i = indexes
try
%Do something
catch me
errorIndexes(end+1) = i;
errors{end+1} = me;
end
end
end
On the outside you should have a main file like that:
function RunMyScript()
if exist('unRunIndexes.mat','file')
unRunIndexes= load('unRunIndexes.mat');
else
unRunIndexes= 1:n;
end
unRunIndexes= myScript( indexes)
save('unRunIndexes.mat',unRunIndexes);
end
Another technique you might like to consider is checkpointing. I've used something similar with long-running (more than one day) loops operating in an environment where the machine could become unavailable at any time, e.g. distributed clusters of spare machines in a lab.
Basically, you check to see if a 'checkpoint' file exists before you start your loop. If it does than that suggests the loop did not finish successfully last time. It contains information on where the loop got up to as well as any other state that you need to get going again.
Here's a simplified example:
function myFunction()
numIter = 10;
startIter = 1;
checkpointFilename = 'checkpoint.mat';
% Check for presence of checkpoint file suggesting the last run did not
% complete
if exist(checkpointFilename, 'file')
s = load(checkpointFilename);
startIter = s.i;
fprintf('Restarting from iteration %d\n', startIter);
end
for i = startIter:numIter
fprintf('Starting iteration %d\n', i);
expensiveComputation();
save(checkpointFilename, 'i');
end
% We succefully finished. Let's delete our checkpoint file
delete(checkpointFilename);
function expensiveComputation()
% Pretend to do lots of work!
pause(1);
end
end
Running and breaking out with ctrl-c part way through looks like this:
>> myFunction
Starting iteration 1
Starting iteration 2
Starting iteration 3
Starting iteration 4
Operation terminated by user during myFunction/expensiveComputation (line 27)
In myFunction (line 18)
expensiveComputation();
>> myFunction
Restarting from iteration 4
Starting iteration 4
Starting iteration 5
...
You can type (in the command line):
for iter=l:n,
%%% copy - paste your code inside the loop
end