Trouble with running a parallelized script from a driver script - matlab

I'm trying to parallelize my code, and I finally got the parfor loops set up such that Matlab doesn't crash every time. However, I've now got an error that I can't seem to figure out.
I have a driver script (Driver12.m) that calls the script that I'm trying to parallelize (Worker12.m). If I run Worker12.m directly, it usually finishes with no problem. However, every time I try to run it from Driver12.m, it either 1) causes Matlab to crash, or 2) throws a strange error at me. Here's some of my code:
%Driver script
run('(path name)/Worker12.m');
%Relevant worker script snippet
parfor q=1:number_of_ranges
timenumber = squeeze(new_TS(q,:,:));
timenumber_shift = circshift(timenumber, [0 1]);
for m = 1:total_working_channels
timenumberm = timenumber(m,:);
for n = 1:total_working_channels
R_P(m,n,q) = mean(timenumberm.*conj(timenumber(n,:)),2);
R_V(m,n,q) = mean(timenumberm.*conj(timenumber_shift(n,:)),2);
end
end
end
Outcome #1: "Matlab has encountered an unexpected error and needs to close."
Outcome #2: "An UndefinedFunction error was thrown on the workers for ''. This might be because the file containing '' is not accessible on the workers. Use addAttachedFiles(pool, files) to specify the required files to be attached. See the documentation for 'parallel.Pool/addAttachedFiles' for more details. Caused by: Undefined function or variable ""."
However, if I run Worker12.m directly, it works fine. It's only when I run it from the driver script that I get issues. Obviously, this error message from Outcome #2 isn't all that useful. Any suggestions?
Edit: So I created a toy example that reproduces an error, but now both my toy example and the original code are giving me a new, 3rd error. Here's the toy example:
%Driver script
run('parpoolexample.m')
%parpoolexample.m
clear all
new_TS = rand([1000,32,400]);
[number_of_ranges,total_working_channels,~] = size(new_TS);
R_P = zeros(total_working_channels,total_working_channels,number_of_ranges);
R_V = zeros(total_working_channels,total_working_channels,number_of_ranges);
parfor q=1:number_of_ranges
timenumber = squeeze(new_TS(q,:,:));
timenumber_shift = circshift(timenumber, [0 1]);
for m = 1:total_working_channels
timenumberm = timenumber(m,:);
for n = 1:total_working_channels
R_P(m,n,q) = mean(timenumberm.*conj(timenumber(n,:)),2);
R_V(m,n,q) = mean(timenumberm.*conj(timenumber_shift(n,:)),2);
end
end
end
Outcome #3: "Index exceeds matrix dimensions (line 7)."
So, at the 'parfor' line, it's saying that I'm exceeding the matrix dimensions, even though I believe that should not be the case. Now I can't even get my original script to recreate Outcomes #1 or #2.

Don't use run with parallel language constructs like parfor and spmd. Unfortunately it doesn't work very well. Instead, use cd or addpath to let MATLAB see your script.

Related

Matlab unit tests alternate between pass and fail (pass "odd" runs, fail on "even")

I have some unit tests for code that is doing some very minor image manipulation (combining several small images into a larger image). When I was running the tests, I noticed that three out of four of them failed on the line where they are reading an image from a directory (fails with an index out of bounds error).
However, if I run it again, they all pass. When I was writing the code as well, I noticed that whenever I set a breakpoint in my code, I'd have to run the unit tests twice because (after the very first time) it would run through the tests without hitting any breakpoints.
I have my repo organized like this:
src/
/* source code .m files are in here */
unit_tests/
images/
squares/
- img1.png
- img2.png
...
- imgn.png
- unit_tests.m
And I have a line in my setup (inside unit_tests.m) to generate and add paths for all of the code:
function tests = unit_tests()
addpath(genpath('..'));
tests = functiontests(localfunctions);
end
The unit tests all have this format:
function testCompositeImage_2x3(testCase)
squares = dir('images/squares/*.png');
num_images = length(squares);
img = imread([squares(1).folder filesep squares(1).name]); % all same size squares
rows = 2;
cols = 3;
buffer = 2;
for idx = 1:num_images - (rows*cols)
imarray = cell(1,(rows*cols));
n = 1;
for ii = idx:idx +(rows*cols) -1
imarray{n} = imread([squares(ii).folder filesep squares(ii).name]);
n = n + 1;
end
newimg = createCompositeImage(rows,cols,imarray, buffer);
expCols = cols*size(img,1) + (cols+1)*2*buffer;
expRows = rows*size(img,2) + (rows+1)*2*buffer;
assert(checksize(newimg, expRows, expCols, 3) == true);
end
end
("checksize" is just a helper I wrote that returns a boolean b/c assert doesn't compare matrices)
When I launch a fresh matlab session and run the unit tests (using the "Run Tests" button in the editor tab), they pass with this output:
>> runtests('unit_tests\unit_tests.m')
Running unit_tests
.......
Done unit_tests
__________
ans =
1×7 TestResult array with properties:
Name
Passed
Failed
Incomplete
Duration
Details
Totals:
7 Passed, 0 Failed, 0 Incomplete.
0.49467 seconds testing time.
Running it a second time (again, by pressing the button):
>> runtests('unit_tests')
Running unit_tests
..
================================================================================
Error occurred in unit_tests/testCompositeImage_2x2 and it did not run to completion.
---------
Error ID:
---------
'MATLAB:badsubscript'
--------------
Error Details:
--------------
Index exceeds array bounds.
Error in unit_tests>testCompositeImage_2x2 (line 47)
img = imread([squares(1).folder filesep squares(1).name]); % all same size
================================================================================
/*similar error info for the other two failing tests...*/
...
Done unit_tests
__________
Failure Summary:
Name Failed Incomplete Reason(s)
==================================================================
unit_tests/testCompositeImage_2x2 X X Errored.
------------------------------------------------------------------
unit_tests/testCompositeImage_2x3 X X Errored.
------------------------------------------------------------------
unit_tests/testCompositeImage_3x2 X X Errored.
ans =
1×7 TestResult array with properties:
Name
Passed
Failed
Incomplete
Duration
Details
Totals:
4 Passed, 3 Failed (rerun), 3 Incomplete.
0.0072287 seconds testing time.
The fact that it's failing on basically the first line because it's not reading anything from the folder leads me to suspect that even when the other 4 tests supposedly pass, they are in fact not running at all. And yet, if I run the tests again, they all pass. Run it a 4th time, and they fail once again.
At first, I thought that perhaps the unit tests were executing too quickly (only on even-numbered runs somehow?) and it was running the unit tests before the addpath/genpath functions in the setup had finished, so I added a pause statement and re-ran the tests, but I had the same issue only this time it would wait for the requisite number of seconds before going ahead and failing. If I run it again, no problem - all my tests pass.
I am completely at a loss as to why this is happening; I am using vanilla matlab (R2018a) running on a Win10 machine and don't have anything fancy going on. I feel like you should be able to run your unit tests as many times as you like and expect the same result! Is there something I've just somehow overlooked? Or is this some bizarre feature?
Adding my fix just in case someone else runs into the same issue.
As pointed out by Cris, something about the line
addpath(genpath('..'));
causes the GUI to go into a weird state where pressing the "Run Tests" button alternates between calling runtests('unit_tests\unit_tests.m') and runtests('unit_tests') which in turn causes the tests to alternately pass and fail. It does not seem to be an issue with the path variable itself (as it always contains - at a minimum - the necessary directories), but rather something intrinsic to matlab itself. The closest I could get to the root of the issue was the call to the (compiled) dir function inside the genpath function.
The "correct" solution was to remove the line from the unit_tests function entirely and add it to a setupOnce function:
function tests = unit_tests()
tests = functiontests(localfunctions);
end
function setupOnce(testCase)
addpath(genpath('..'));
end
A hack (not recommended) which doesn't require a setupOnce function is the following:
function tests = unit_tests()
pth = fullfile(fileparts(fileparts(mfilename('fullpath'))),'src');
paths = regexp(genpath(pth), ';', 'split');
for idx = 1:length(paths) - 1 % last element is empty
addpath(paths{idx});
end
end
I needed to relaunch matlab for the changes to take effect. This worked for my setup using r2018a running on Win10.

Simulink-Simulation with parfor (Parallel Computing)

I asked today a question about Parallel Computing with Matlab-Simulink. Since my earlier question is a bit messy and there are a lot of things in the code which doesnt really belong to the problem.
My problem is
I want to simulate something in a parfor-Loop, while my Simulink-Simulation uses the "From Workspace" block to integrate the needed Data from the workspace into the simulation. For some reason it doesnt work.
My code looks as follows:
load DemoData
path = pwd;
apool = gcp('nocreate');
if isempty(apool)
apool = parpool('local');
end
parfor k = 1 : 2
load_system(strcat(path,'\DemoMDL'))
set_param('DemoMDL/Mask', 'DataInput', 'DemoData')
SimOut(k) = sim('DemoMDL')
end
delete(apool);
My simulation looks as follows
The DemoData-File is just a zeros(100,20)-Matrix. It's an example for Data.
Now if I simulate the Script following error message occures:
Errors
Error using DemoScript (line 9)
Error evaluating parameter 'DataInput' in 'DemoMDL/Mask'
Caused by:
Error using parallel_function>make_general_channel/channel_general (line 907)
Error evaluating parameter 'DataInput' in 'DemoMDL/Mask'
Error using parallel_function>make_general_channel/channel_general (line 907)
Undefined function or variable 'DemoData'.
Now do you have an idea why this happens??
The strange thing is, that if I try to acces the 'DemoData' inside the parfor-Loop it works. For excample with that code:
load DemoData
path = pwd;
apool = gcp('nocreate');
if isempty(apool)
apool = parpool('local');
end
parfor k = 1 : 2
load_system(strcat(path,'\DemoMDL'))
set_param('DemoMDL/Mask', 'DataInput', 'DemoData')
fprintf(num2str(DemoData))
end
delete(apool);
Thats my output without simulating and displaying the Data
'>>'DemoScript
00000000000000000 .....
Thanks a lot. That's the original question with a lot more (unnecessary) details:
EarlierQuestion
I suspect the issue is that when MATLAB is pre-processing the parfor loop to determine what variables need to be passed to the workers it does not know what DemoData is. In your first example it's just a string, so no data gets sent over. In your second example it explicitly knows about the variable and hence does pass it over.
You could try either using the Model Workspace, or perhaps just inserting the line
DemoData = DemoData;
in the parfor loop code.
Your error is because workers did not have access to DemoData in the client workspace.
When running parallel simulations with Simulink it would be easier to manage data from workspace if you move them to model workspace. Then each worker can access this data from its model workspace. You can load a MAT file or write MATLAB code to initialize data in model workspace. You can access model workspace using the Simulink model menu View->Model Explorer->Model Workspace.
Also see documentation at http://www.mathworks.com/help/simulink/ug/running-parallel-simulations.html that talks about "Resolving workspace access issues".
You can also move the line
load DemoData
to within the parfor loop. Doing this, you assure that the data will be available in each worker base workspace, wich is accessible to the model, instead of the client workspace.

Execute 3 functions in parallel on 3 workers

I've got three functions (qrcalc, zcalc, pcalc) with three unique set of inputs which I want to run in parallel. This is my attempt but it doesn't work:
function [outall]=parallelfunc(in1,in2,in3)
if parpool('size') == 0 % checking to see if pool is already open
A=feature('numCores');
parpool('local',A);
else
parpool close
A=feature('numCores');
parpool('local',A);
end
spmd
if labindex==2
out1=qrcalc(in1);
elseif labindex==3
out2=zcalc(in2);
elseif labindex==4
out3=pcalc(in3);
end
outall=[out1;out2;out3];
end
Error: Error using parallelattempt>(spmd body) (line 20) Error
detected on worker 3. An UndefinedFunction error was thrown on the
workers for 'out1'. This may be because the file containing 'out1' is
not accessible on the workers. Specify the required files for this
parallel pool using the command: addAttachedFiles(pool, ...). See the
documentation for parpool for more details.
Error in parallelattempt>(spmd) (line 11) spmd
Error in parallelattempt (line 11) spmd
Are there any suggestions for how this can be done?
Here is a version of the code that does not require the custom functions. Therefore I replaced them with zeros, magic and ones:
function [outall]=parallelattempt(in1,in2,in3)
poolobj = gcp;
addAttachedFiles(poolobj,{'zeros.m','ones.m','magic.m'})
spmd
if labindex==2
out1=zeros(in1);
elseif labindex==3
out2=magic(in2);
elseif labindex==4
out3=ones(in3);
end
outall=[out1;out2;out3];
end
If you use an spmd-statement, the code inside will be sent to all workers. By the use of labindex you only create the variables outX on one specific worker. The problem is, that outall=[out1;out2;out3]; should now be executed on workers where two outX-variables are not declared. The direct fix for this error is to declare the variables before the spmd-statement (out1=[]; out2=[]; out3=[];). But this is not the best solution.
You can use a single variable inside the spmd-statement instead of several ones (outX), lets call this variable out. Now the code executes on each worker and stores its result in out, which is a Composite-object. Concatenation is not necessary because it is done automatically. Additionally, you can specify with spmd (3) at the beginning of the block that only 3 workers should be used. Composite-objects can be indexed like cell arrays where the index equals to the number of the worker/lab. Therefore we can concatenate it after the block.
This is the specific code for that part:
spmd (3)
if labindex==1
out = qrcalc(in1);
elseif labindex==2
out = zcalc(in2);
elseif labindex==3
out = pcalc(in3);
end
end
outall = [out{1};out{2};out{3}];
Note that the creation of the pool will be done automatically, if none exists. You may need to attach the files of your functions before the statement.
An even better approach in my opinion is to use parfeval here. This does exactly what you want to achieve in the first place - it solves your initial problem. The outX variables get calculated in parallel (non-blocking). With the function fetchOutputs you can block execution until the result is calculated. Use it on all the outX-variables and concatenate it in the same line.
Here is the code for that:
out1 = parfeval(#qrcalc,1,in1);
out2 = parfeval(#zcalc,1,in2);
out3 = parfeval(#pcalc,1,in3);
outall = [fetchOutputs(out1);fetchOutputs(out2);fetchOutputs(out3)];

Different results using PARFOR and FOR

I'm sorry that currently I'm not able to boil my code down to a minimal example.
It's a huge bunch of image processing code.
I have a loop that iterates over images (descriptors in variable stphogs) and for each image runs a detection.
function hogpatches = extractDetectionsFromImages(stphogs, poselet)
hogpatches = cell(1,length(stphogs));
parfor i = 1:length(stphogs)
tmp = extractDetectionsFromImage(stphogs(i), poselet); %e.g. 1x6 struct
if ~isempty(tmp)
hogpatches{i} = tmp;
end
end
hogpatches = cell2mat(hogpatches);
end
So this is the main loop. But the function calls in extractDetectionsFromImage go very deep.
My problem: Running this with a normal for-loop gives the correct result. When using PARFOR as above, hogpatches only contains 5 instead of 18 structs.
Where can I start to look for the error? I had a global variable the program did change. I removed that already. There is still a global variable 'config' which is however only read.. Any other hints? What could be the problem?
EDIT:
Even if I just run one iteration (size of stphogs is 1), the parfor fails. It doesn't have anything to do with the isempty part. Problem persists if I remove that.
EDIT2:
Ok here I boiled it to a minimal working example.
It is indeed caused by a global variable:
function parGlobalTest()
global testVar;
testVar = 123;
parfor i = 1:1
fprintf('A Value: %d\n', testVar);
testFunction();
end
end
function testFunction()
global testVar;
fprintf('B Value: %d\n', testVar);
end
In this example. The output for A will be 123, for B it will be nothing (undefined).
Why is that?
Ok, here is my solution:
function syncTestVar()
global testVar;
save('syncvar.mat', 'testVar');
pctRunOnAll global testVar;
pctRunOnAll load('syncvar.mat');
end
If someone has a better approach please tell me... This one works though
Btw: The save/load is needed because in my real program, testVar is a complex struct

How to restart matlab script from where it left off?

I have a matlab script which executes 5 algorithm sequentially. All these 5 algorithms needs to run for 10 different initialization.
Whenever there is an error in i-th initialization, the script exit with an error message. I fix the issue(say, data issue) and start running the script again which executes from the first initialization.
I dont want to my code to run for previously executed initialization. ( from 1 run to i-1 the run)
One way is to reassign the value of index to start from i, which in turn require to modify the scrip everytime again and again.
Is there any way to restart the script from the i-th initialization onwards which dont require to modify the script?
I suggest that you use try and catch, and check which indexes succeeded.
function errorIndexes = myScript(indexes)
errorIndexes = [];
errors = {};
for i = indexes
try
%Do something
catch me
errorIndexes(end+1) = i;
errors{end+1} = me;
end
end
end
On the outside you should have a main file like that:
function RunMyScript()
if exist('unRunIndexes.mat','file')
unRunIndexes= load('unRunIndexes.mat');
else
unRunIndexes= 1:n;
end
unRunIndexes= myScript( indexes)
save('unRunIndexes.mat',unRunIndexes);
end
Another technique you might like to consider is checkpointing. I've used something similar with long-running (more than one day) loops operating in an environment where the machine could become unavailable at any time, e.g. distributed clusters of spare machines in a lab.
Basically, you check to see if a 'checkpoint' file exists before you start your loop. If it does than that suggests the loop did not finish successfully last time. It contains information on where the loop got up to as well as any other state that you need to get going again.
Here's a simplified example:
function myFunction()
numIter = 10;
startIter = 1;
checkpointFilename = 'checkpoint.mat';
% Check for presence of checkpoint file suggesting the last run did not
% complete
if exist(checkpointFilename, 'file')
s = load(checkpointFilename);
startIter = s.i;
fprintf('Restarting from iteration %d\n', startIter);
end
for i = startIter:numIter
fprintf('Starting iteration %d\n', i);
expensiveComputation();
save(checkpointFilename, 'i');
end
% We succefully finished. Let's delete our checkpoint file
delete(checkpointFilename);
function expensiveComputation()
% Pretend to do lots of work!
pause(1);
end
end
Running and breaking out with ctrl-c part way through looks like this:
>> myFunction
Starting iteration 1
Starting iteration 2
Starting iteration 3
Starting iteration 4
Operation terminated by user during myFunction/expensiveComputation (line 27)
In myFunction (line 18)
expensiveComputation();
>> myFunction
Restarting from iteration 4
Starting iteration 4
Starting iteration 5
...
You can type (in the command line):
for iter=l:n,
%%% copy - paste your code inside the loop
end