I'm sorry that currently I'm not able to boil my code down to a minimal example.
It's a huge bunch of image processing code.
I have a loop that iterates over images (descriptors in variable stphogs) and for each image runs a detection.
function hogpatches = extractDetectionsFromImages(stphogs, poselet)
hogpatches = cell(1,length(stphogs));
parfor i = 1:length(stphogs)
tmp = extractDetectionsFromImage(stphogs(i), poselet); %e.g. 1x6 struct
if ~isempty(tmp)
hogpatches{i} = tmp;
end
end
hogpatches = cell2mat(hogpatches);
end
So this is the main loop. But the function calls in extractDetectionsFromImage go very deep.
My problem: Running this with a normal for-loop gives the correct result. When using PARFOR as above, hogpatches only contains 5 instead of 18 structs.
Where can I start to look for the error? I had a global variable the program did change. I removed that already. There is still a global variable 'config' which is however only read.. Any other hints? What could be the problem?
EDIT:
Even if I just run one iteration (size of stphogs is 1), the parfor fails. It doesn't have anything to do with the isempty part. Problem persists if I remove that.
EDIT2:
Ok here I boiled it to a minimal working example.
It is indeed caused by a global variable:
function parGlobalTest()
global testVar;
testVar = 123;
parfor i = 1:1
fprintf('A Value: %d\n', testVar);
testFunction();
end
end
function testFunction()
global testVar;
fprintf('B Value: %d\n', testVar);
end
In this example. The output for A will be 123, for B it will be nothing (undefined).
Why is that?
Ok, here is my solution:
function syncTestVar()
global testVar;
save('syncvar.mat', 'testVar');
pctRunOnAll global testVar;
pctRunOnAll load('syncvar.mat');
end
If someone has a better approach please tell me... This one works though
Btw: The save/load is needed because in my real program, testVar is a complex struct
Related
I asked today a question about Parallel Computing with Matlab-Simulink. Since my earlier question is a bit messy and there are a lot of things in the code which doesnt really belong to the problem.
My problem is
I want to simulate something in a parfor-Loop, while my Simulink-Simulation uses the "From Workspace" block to integrate the needed Data from the workspace into the simulation. For some reason it doesnt work.
My code looks as follows:
load DemoData
path = pwd;
apool = gcp('nocreate');
if isempty(apool)
apool = parpool('local');
end
parfor k = 1 : 2
load_system(strcat(path,'\DemoMDL'))
set_param('DemoMDL/Mask', 'DataInput', 'DemoData')
SimOut(k) = sim('DemoMDL')
end
delete(apool);
My simulation looks as follows
The DemoData-File is just a zeros(100,20)-Matrix. It's an example for Data.
Now if I simulate the Script following error message occures:
Errors
Error using DemoScript (line 9)
Error evaluating parameter 'DataInput' in 'DemoMDL/Mask'
Caused by:
Error using parallel_function>make_general_channel/channel_general (line 907)
Error evaluating parameter 'DataInput' in 'DemoMDL/Mask'
Error using parallel_function>make_general_channel/channel_general (line 907)
Undefined function or variable 'DemoData'.
Now do you have an idea why this happens??
The strange thing is, that if I try to acces the 'DemoData' inside the parfor-Loop it works. For excample with that code:
load DemoData
path = pwd;
apool = gcp('nocreate');
if isempty(apool)
apool = parpool('local');
end
parfor k = 1 : 2
load_system(strcat(path,'\DemoMDL'))
set_param('DemoMDL/Mask', 'DataInput', 'DemoData')
fprintf(num2str(DemoData))
end
delete(apool);
Thats my output without simulating and displaying the Data
'>>'DemoScript
00000000000000000 .....
Thanks a lot. That's the original question with a lot more (unnecessary) details:
EarlierQuestion
I suspect the issue is that when MATLAB is pre-processing the parfor loop to determine what variables need to be passed to the workers it does not know what DemoData is. In your first example it's just a string, so no data gets sent over. In your second example it explicitly knows about the variable and hence does pass it over.
You could try either using the Model Workspace, or perhaps just inserting the line
DemoData = DemoData;
in the parfor loop code.
Your error is because workers did not have access to DemoData in the client workspace.
When running parallel simulations with Simulink it would be easier to manage data from workspace if you move them to model workspace. Then each worker can access this data from its model workspace. You can load a MAT file or write MATLAB code to initialize data in model workspace. You can access model workspace using the Simulink model menu View->Model Explorer->Model Workspace.
Also see documentation at http://www.mathworks.com/help/simulink/ug/running-parallel-simulations.html that talks about "Resolving workspace access issues".
You can also move the line
load DemoData
to within the parfor loop. Doing this, you assure that the data will be available in each worker base workspace, wich is accessible to the model, instead of the client workspace.
I've got three functions (qrcalc, zcalc, pcalc) with three unique set of inputs which I want to run in parallel. This is my attempt but it doesn't work:
function [outall]=parallelfunc(in1,in2,in3)
if parpool('size') == 0 % checking to see if pool is already open
A=feature('numCores');
parpool('local',A);
else
parpool close
A=feature('numCores');
parpool('local',A);
end
spmd
if labindex==2
out1=qrcalc(in1);
elseif labindex==3
out2=zcalc(in2);
elseif labindex==4
out3=pcalc(in3);
end
outall=[out1;out2;out3];
end
Error: Error using parallelattempt>(spmd body) (line 20) Error
detected on worker 3. An UndefinedFunction error was thrown on the
workers for 'out1'. This may be because the file containing 'out1' is
not accessible on the workers. Specify the required files for this
parallel pool using the command: addAttachedFiles(pool, ...). See the
documentation for parpool for more details.
Error in parallelattempt>(spmd) (line 11) spmd
Error in parallelattempt (line 11) spmd
Are there any suggestions for how this can be done?
Here is a version of the code that does not require the custom functions. Therefore I replaced them with zeros, magic and ones:
function [outall]=parallelattempt(in1,in2,in3)
poolobj = gcp;
addAttachedFiles(poolobj,{'zeros.m','ones.m','magic.m'})
spmd
if labindex==2
out1=zeros(in1);
elseif labindex==3
out2=magic(in2);
elseif labindex==4
out3=ones(in3);
end
outall=[out1;out2;out3];
end
If you use an spmd-statement, the code inside will be sent to all workers. By the use of labindex you only create the variables outX on one specific worker. The problem is, that outall=[out1;out2;out3]; should now be executed on workers where two outX-variables are not declared. The direct fix for this error is to declare the variables before the spmd-statement (out1=[]; out2=[]; out3=[];). But this is not the best solution.
You can use a single variable inside the spmd-statement instead of several ones (outX), lets call this variable out. Now the code executes on each worker and stores its result in out, which is a Composite-object. Concatenation is not necessary because it is done automatically. Additionally, you can specify with spmd (3) at the beginning of the block that only 3 workers should be used. Composite-objects can be indexed like cell arrays where the index equals to the number of the worker/lab. Therefore we can concatenate it after the block.
This is the specific code for that part:
spmd (3)
if labindex==1
out = qrcalc(in1);
elseif labindex==2
out = zcalc(in2);
elseif labindex==3
out = pcalc(in3);
end
end
outall = [out{1};out{2};out{3}];
Note that the creation of the pool will be done automatically, if none exists. You may need to attach the files of your functions before the statement.
An even better approach in my opinion is to use parfeval here. This does exactly what you want to achieve in the first place - it solves your initial problem. The outX variables get calculated in parallel (non-blocking). With the function fetchOutputs you can block execution until the result is calculated. Use it on all the outX-variables and concatenate it in the same line.
Here is the code for that:
out1 = parfeval(#qrcalc,1,in1);
out2 = parfeval(#zcalc,1,in2);
out3 = parfeval(#pcalc,1,in3);
outall = [fetchOutputs(out1);fetchOutputs(out2);fetchOutputs(out3)];
I'm trying to parallelize my code, and I finally got the parfor loops set up such that Matlab doesn't crash every time. However, I've now got an error that I can't seem to figure out.
I have a driver script (Driver12.m) that calls the script that I'm trying to parallelize (Worker12.m). If I run Worker12.m directly, it usually finishes with no problem. However, every time I try to run it from Driver12.m, it either 1) causes Matlab to crash, or 2) throws a strange error at me. Here's some of my code:
%Driver script
run('(path name)/Worker12.m');
%Relevant worker script snippet
parfor q=1:number_of_ranges
timenumber = squeeze(new_TS(q,:,:));
timenumber_shift = circshift(timenumber, [0 1]);
for m = 1:total_working_channels
timenumberm = timenumber(m,:);
for n = 1:total_working_channels
R_P(m,n,q) = mean(timenumberm.*conj(timenumber(n,:)),2);
R_V(m,n,q) = mean(timenumberm.*conj(timenumber_shift(n,:)),2);
end
end
end
Outcome #1: "Matlab has encountered an unexpected error and needs to close."
Outcome #2: "An UndefinedFunction error was thrown on the workers for ''. This might be because the file containing '' is not accessible on the workers. Use addAttachedFiles(pool, files) to specify the required files to be attached. See the documentation for 'parallel.Pool/addAttachedFiles' for more details. Caused by: Undefined function or variable ""."
However, if I run Worker12.m directly, it works fine. It's only when I run it from the driver script that I get issues. Obviously, this error message from Outcome #2 isn't all that useful. Any suggestions?
Edit: So I created a toy example that reproduces an error, but now both my toy example and the original code are giving me a new, 3rd error. Here's the toy example:
%Driver script
run('parpoolexample.m')
%parpoolexample.m
clear all
new_TS = rand([1000,32,400]);
[number_of_ranges,total_working_channels,~] = size(new_TS);
R_P = zeros(total_working_channels,total_working_channels,number_of_ranges);
R_V = zeros(total_working_channels,total_working_channels,number_of_ranges);
parfor q=1:number_of_ranges
timenumber = squeeze(new_TS(q,:,:));
timenumber_shift = circshift(timenumber, [0 1]);
for m = 1:total_working_channels
timenumberm = timenumber(m,:);
for n = 1:total_working_channels
R_P(m,n,q) = mean(timenumberm.*conj(timenumber(n,:)),2);
R_V(m,n,q) = mean(timenumberm.*conj(timenumber_shift(n,:)),2);
end
end
end
Outcome #3: "Index exceeds matrix dimensions (line 7)."
So, at the 'parfor' line, it's saying that I'm exceeding the matrix dimensions, even though I believe that should not be the case. Now I can't even get my original script to recreate Outcomes #1 or #2.
Don't use run with parallel language constructs like parfor and spmd. Unfortunately it doesn't work very well. Instead, use cd or addpath to let MATLAB see your script.
I am looking for a way to save 'everything' in the matlab session when it is stopped for debugging.
Example
function funmain
a=1;
if a>1
funsub(1)
end
funsub(2)
end
function funsub(c)
b = c + 1;
funsubsub(c)
end
function funsubsub(c)
c = c + 2; %Line with breakpoint
end
When I finally reach the line with the breakpoint, I can easily navigate all workspaces and see where all function calls are made.
The question
How can I preserve this situation?
When debugging nested programs that take a long time to run, I often find myself waiting for a long time to reach a breakpoint. And sometimes I just have to close matlab, or want to try some stuff and later return to this point, so therefore finding a way to store this state would be quite desirable. I work in Windows Server 2008, but would prefer a platform independant solution that does not require installation of any software.
What have I tried
1. Saving all variables in the workspace: This works sometimes, but often I will also need to navigate other workspaces
2. Saving all variables in the calling workspace: This is already better as I can run the lowest function again, but may still be insufficient. Doing this for all nested workspaces is not very convenient, and navigating the saved workspaces may be even worse.
Besides the mentioned inconveniences, this also doesn't allow me to see the exact route via which the breakpoint is reached. Therefore I hope there is a better solution!
Code structure example
The code looks a bit like this
function fmain
fsub1()
fsub2()
fsub3()
end
function fsub1
fsubsub11
fsubsub12
...
fsubsub19
end
function fsub2
fsubsub21
fsubsub22
...
fsubsub29
end
function fsub3
fsubsub31
fsubsub32
...
fsubsub39
end
function fsubsub29
fsubsubsub291
fsubsubsub292% The break may occur in here
...
fsubsubsub299
The break can of course occur anywhere, and normally I would be able to navigate the workspace and all those above it.
Checkpointing
What you're looking to implement is known as checkpointing code. This can be very useful on pieces of code that run for a very long time. Let's take a very simple example:
f=zeros(1e6,1);
for i=1:1e6
f(i) = g(i) + i*2+5; % //do some stuff with f, not important for this example
end
This would obviously take a while on most machines so it would be a pain if it ran half way, and then you had to restart. So let's add a checkpoint!
f=zeros(1e6,1);
i=1; % //start at 1
% //unless there is a previous checkpoint, in which case skip all those iterations
if exist('checkpoint.mat')==2
load('checkpoint.mat'); % //this will load f and i
end
while i<1e6+1
f(i) = g(i) + i*2+5;
i=i+1;
if mod(i,1000)==0 % //let's save our state every 1000 iterations
save('checkpoint.mat','f','i');
end
end
delete('checkpoint.mat') % //make sure to remove it when we're done!
This allows you to quit your code midway through processing without losing all of that computation time. Deciding when and how often to checkpoint is the balance between performance and lost time!
Sample Code Implementation
Your Sample code would need to be updated as follows:
function fmain
sub1done=false; % //These really wouldn't be necessary if each function returns
sub2done=false; % //something, you could just check if the return exists
sub3done=false;
if exist('checkpoint_main.mat')==2, load('checkpoint_main.mat');end
if ~sub1done
fprintf('Entering fsub1\n');
fsub1()
fprintf('Finished with fsub1\n');
sub1done=true;
save('checkpoint_main.mat');
end
if ~sub2done
fprintf('Entering fsub2\n');
fsub2()
fprintf('Finished with fsub2\n');
sub2done=true;
save('checkpoint_main.mat');
end
if ~sub3done
fprintf('Entering fsub3\n');
fsub3()
fprintf('Finished with fsub3\n');
sub3done=true;
save('checkpoint_main.mat');
end
delete('checkpoint_main.mat');
end
function fsub2
subsub21_done=false;subsub22_done=false;...subsub29_done=false;
if exist('checkpoint_fsub2')==2, load('checkpoint_fsub2');end
if ~subsub21_done
fprintf('\tEntering fsubsub21\n');
fsubsub21
fprintf('\tFinished with fsubsub21\n');
subsub21_done=true;
save('checkpoint_fsub2.mat');
end
...
if ~subsub29_done
fprintf('\tEntering fsubsub29\n');
fsubsub29
fprintf('\tFinished with fsubsub29\n');
subsub29_done=true;
save('checkpoint_fsub2.mat');
end
delete('checkpoint_fsub2.mat');
end
function fsubsub29
subsubsub291_done=false;...subsubsub299_done=false;
if exist('checkpoint_fsubsub29.mat')==2,load('checkpoint_fsubsub29.mat');end
if ~subsubsub291_done
fprintf('\t\tEntering fsubsubsub291\n');
fsubsubsub291
fprintf('\t\tFinished with fsubsubsub291\n');
subsubsub291_done=true;
save('checkpoint_fsubsub29.mat');
end
if ~subsubsub292_done
fprintf('\t\tEntering fsubsubsub292\n');
fsubsubsub292% The break may occur in here
fprintf('\t\tFinished with fsubsubsub292\n')
subsubsub292_done=true;
save(checkpoint_fsubsub29.mat');
end
delete('checkpoint_fsubsub29.mat');
end
So in this structure if you restarted the program after it was killed it would resume back to the last saved checkpoint. So for example if the program died in subsubsub291, the program would skip fsub1 altogether, just loading the result. And then it would skip subsub21 all the way down to subsub29 where it would enter subsub29. Then it would skip subsubsub291 and enter 292 where it left off, having loaded all of the variables in that workspace and in previous workspaces. So if you backed out of 292 into 29 you would have the same workspace as if the code just ran. Note that this will also print a nice tree structure as it enters and exits functions to help debug execution order.
Reference:
https://wiki.hpcc.msu.edu/pages/viewpage.action?pageId=14781653
After a bit of googling, I found that using putvar (custom function from here: http://au.mathworks.com/matlabcentral/fileexchange/27106-putvar--uigetvar ) solved this.
I have a matlab script which executes 5 algorithm sequentially. All these 5 algorithms needs to run for 10 different initialization.
Whenever there is an error in i-th initialization, the script exit with an error message. I fix the issue(say, data issue) and start running the script again which executes from the first initialization.
I dont want to my code to run for previously executed initialization. ( from 1 run to i-1 the run)
One way is to reassign the value of index to start from i, which in turn require to modify the scrip everytime again and again.
Is there any way to restart the script from the i-th initialization onwards which dont require to modify the script?
I suggest that you use try and catch, and check which indexes succeeded.
function errorIndexes = myScript(indexes)
errorIndexes = [];
errors = {};
for i = indexes
try
%Do something
catch me
errorIndexes(end+1) = i;
errors{end+1} = me;
end
end
end
On the outside you should have a main file like that:
function RunMyScript()
if exist('unRunIndexes.mat','file')
unRunIndexes= load('unRunIndexes.mat');
else
unRunIndexes= 1:n;
end
unRunIndexes= myScript( indexes)
save('unRunIndexes.mat',unRunIndexes);
end
Another technique you might like to consider is checkpointing. I've used something similar with long-running (more than one day) loops operating in an environment where the machine could become unavailable at any time, e.g. distributed clusters of spare machines in a lab.
Basically, you check to see if a 'checkpoint' file exists before you start your loop. If it does than that suggests the loop did not finish successfully last time. It contains information on where the loop got up to as well as any other state that you need to get going again.
Here's a simplified example:
function myFunction()
numIter = 10;
startIter = 1;
checkpointFilename = 'checkpoint.mat';
% Check for presence of checkpoint file suggesting the last run did not
% complete
if exist(checkpointFilename, 'file')
s = load(checkpointFilename);
startIter = s.i;
fprintf('Restarting from iteration %d\n', startIter);
end
for i = startIter:numIter
fprintf('Starting iteration %d\n', i);
expensiveComputation();
save(checkpointFilename, 'i');
end
% We succefully finished. Let's delete our checkpoint file
delete(checkpointFilename);
function expensiveComputation()
% Pretend to do lots of work!
pause(1);
end
end
Running and breaking out with ctrl-c part way through looks like this:
>> myFunction
Starting iteration 1
Starting iteration 2
Starting iteration 3
Starting iteration 4
Operation terminated by user during myFunction/expensiveComputation (line 27)
In myFunction (line 18)
expensiveComputation();
>> myFunction
Restarting from iteration 4
Starting iteration 4
Starting iteration 5
...
You can type (in the command line):
for iter=l:n,
%%% copy - paste your code inside the loop
end