How to restart matlab script from where it left off? - matlab

I have a matlab script which executes 5 algorithm sequentially. All these 5 algorithms needs to run for 10 different initialization.
Whenever there is an error in i-th initialization, the script exit with an error message. I fix the issue(say, data issue) and start running the script again which executes from the first initialization.
I dont want to my code to run for previously executed initialization. ( from 1 run to i-1 the run)
One way is to reassign the value of index to start from i, which in turn require to modify the scrip everytime again and again.
Is there any way to restart the script from the i-th initialization onwards which dont require to modify the script?

I suggest that you use try and catch, and check which indexes succeeded.
function errorIndexes = myScript(indexes)
errorIndexes = [];
errors = {};
for i = indexes
try
%Do something
catch me
errorIndexes(end+1) = i;
errors{end+1} = me;
end
end
end
On the outside you should have a main file like that:
function RunMyScript()
if exist('unRunIndexes.mat','file')
unRunIndexes= load('unRunIndexes.mat');
else
unRunIndexes= 1:n;
end
unRunIndexes= myScript( indexes)
save('unRunIndexes.mat',unRunIndexes);
end

Another technique you might like to consider is checkpointing. I've used something similar with long-running (more than one day) loops operating in an environment where the machine could become unavailable at any time, e.g. distributed clusters of spare machines in a lab.
Basically, you check to see if a 'checkpoint' file exists before you start your loop. If it does than that suggests the loop did not finish successfully last time. It contains information on where the loop got up to as well as any other state that you need to get going again.
Here's a simplified example:
function myFunction()
numIter = 10;
startIter = 1;
checkpointFilename = 'checkpoint.mat';
% Check for presence of checkpoint file suggesting the last run did not
% complete
if exist(checkpointFilename, 'file')
s = load(checkpointFilename);
startIter = s.i;
fprintf('Restarting from iteration %d\n', startIter);
end
for i = startIter:numIter
fprintf('Starting iteration %d\n', i);
expensiveComputation();
save(checkpointFilename, 'i');
end
% We succefully finished. Let's delete our checkpoint file
delete(checkpointFilename);
function expensiveComputation()
% Pretend to do lots of work!
pause(1);
end
end
Running and breaking out with ctrl-c part way through looks like this:
>> myFunction
Starting iteration 1
Starting iteration 2
Starting iteration 3
Starting iteration 4
Operation terminated by user during myFunction/expensiveComputation (line 27)
In myFunction (line 18)
expensiveComputation();
>> myFunction
Restarting from iteration 4
Starting iteration 4
Starting iteration 5
...

You can type (in the command line):
for iter=l:n,
%%% copy - paste your code inside the loop
end

Related

Matlab unit tests alternate between pass and fail (pass "odd" runs, fail on "even")

I have some unit tests for code that is doing some very minor image manipulation (combining several small images into a larger image). When I was running the tests, I noticed that three out of four of them failed on the line where they are reading an image from a directory (fails with an index out of bounds error).
However, if I run it again, they all pass. When I was writing the code as well, I noticed that whenever I set a breakpoint in my code, I'd have to run the unit tests twice because (after the very first time) it would run through the tests without hitting any breakpoints.
I have my repo organized like this:
src/
/* source code .m files are in here */
unit_tests/
images/
squares/
- img1.png
- img2.png
...
- imgn.png
- unit_tests.m
And I have a line in my setup (inside unit_tests.m) to generate and add paths for all of the code:
function tests = unit_tests()
addpath(genpath('..'));
tests = functiontests(localfunctions);
end
The unit tests all have this format:
function testCompositeImage_2x3(testCase)
squares = dir('images/squares/*.png');
num_images = length(squares);
img = imread([squares(1).folder filesep squares(1).name]); % all same size squares
rows = 2;
cols = 3;
buffer = 2;
for idx = 1:num_images - (rows*cols)
imarray = cell(1,(rows*cols));
n = 1;
for ii = idx:idx +(rows*cols) -1
imarray{n} = imread([squares(ii).folder filesep squares(ii).name]);
n = n + 1;
end
newimg = createCompositeImage(rows,cols,imarray, buffer);
expCols = cols*size(img,1) + (cols+1)*2*buffer;
expRows = rows*size(img,2) + (rows+1)*2*buffer;
assert(checksize(newimg, expRows, expCols, 3) == true);
end
end
("checksize" is just a helper I wrote that returns a boolean b/c assert doesn't compare matrices)
When I launch a fresh matlab session and run the unit tests (using the "Run Tests" button in the editor tab), they pass with this output:
>> runtests('unit_tests\unit_tests.m')
Running unit_tests
.......
Done unit_tests
__________
ans =
1×7 TestResult array with properties:
Name
Passed
Failed
Incomplete
Duration
Details
Totals:
7 Passed, 0 Failed, 0 Incomplete.
0.49467 seconds testing time.
Running it a second time (again, by pressing the button):
>> runtests('unit_tests')
Running unit_tests
..
================================================================================
Error occurred in unit_tests/testCompositeImage_2x2 and it did not run to completion.
---------
Error ID:
---------
'MATLAB:badsubscript'
--------------
Error Details:
--------------
Index exceeds array bounds.
Error in unit_tests>testCompositeImage_2x2 (line 47)
img = imread([squares(1).folder filesep squares(1).name]); % all same size
================================================================================
/*similar error info for the other two failing tests...*/
...
Done unit_tests
__________
Failure Summary:
Name Failed Incomplete Reason(s)
==================================================================
unit_tests/testCompositeImage_2x2 X X Errored.
------------------------------------------------------------------
unit_tests/testCompositeImage_2x3 X X Errored.
------------------------------------------------------------------
unit_tests/testCompositeImage_3x2 X X Errored.
ans =
1×7 TestResult array with properties:
Name
Passed
Failed
Incomplete
Duration
Details
Totals:
4 Passed, 3 Failed (rerun), 3 Incomplete.
0.0072287 seconds testing time.
The fact that it's failing on basically the first line because it's not reading anything from the folder leads me to suspect that even when the other 4 tests supposedly pass, they are in fact not running at all. And yet, if I run the tests again, they all pass. Run it a 4th time, and they fail once again.
At first, I thought that perhaps the unit tests were executing too quickly (only on even-numbered runs somehow?) and it was running the unit tests before the addpath/genpath functions in the setup had finished, so I added a pause statement and re-ran the tests, but I had the same issue only this time it would wait for the requisite number of seconds before going ahead and failing. If I run it again, no problem - all my tests pass.
I am completely at a loss as to why this is happening; I am using vanilla matlab (R2018a) running on a Win10 machine and don't have anything fancy going on. I feel like you should be able to run your unit tests as many times as you like and expect the same result! Is there something I've just somehow overlooked? Or is this some bizarre feature?
Adding my fix just in case someone else runs into the same issue.
As pointed out by Cris, something about the line
addpath(genpath('..'));
causes the GUI to go into a weird state where pressing the "Run Tests" button alternates between calling runtests('unit_tests\unit_tests.m') and runtests('unit_tests') which in turn causes the tests to alternately pass and fail. It does not seem to be an issue with the path variable itself (as it always contains - at a minimum - the necessary directories), but rather something intrinsic to matlab itself. The closest I could get to the root of the issue was the call to the (compiled) dir function inside the genpath function.
The "correct" solution was to remove the line from the unit_tests function entirely and add it to a setupOnce function:
function tests = unit_tests()
tests = functiontests(localfunctions);
end
function setupOnce(testCase)
addpath(genpath('..'));
end
A hack (not recommended) which doesn't require a setupOnce function is the following:
function tests = unit_tests()
pth = fullfile(fileparts(fileparts(mfilename('fullpath'))),'src');
paths = regexp(genpath(pth), ';', 'split');
for idx = 1:length(paths) - 1 % last element is empty
addpath(paths{idx});
end
end
I needed to relaunch matlab for the changes to take effect. This worked for my setup using r2018a running on Win10.

Execute 3 functions in parallel on 3 workers

I've got three functions (qrcalc, zcalc, pcalc) with three unique set of inputs which I want to run in parallel. This is my attempt but it doesn't work:
function [outall]=parallelfunc(in1,in2,in3)
if parpool('size') == 0 % checking to see if pool is already open
A=feature('numCores');
parpool('local',A);
else
parpool close
A=feature('numCores');
parpool('local',A);
end
spmd
if labindex==2
out1=qrcalc(in1);
elseif labindex==3
out2=zcalc(in2);
elseif labindex==4
out3=pcalc(in3);
end
outall=[out1;out2;out3];
end
Error: Error using parallelattempt>(spmd body) (line 20) Error
detected on worker 3. An UndefinedFunction error was thrown on the
workers for 'out1'. This may be because the file containing 'out1' is
not accessible on the workers. Specify the required files for this
parallel pool using the command: addAttachedFiles(pool, ...). See the
documentation for parpool for more details.
Error in parallelattempt>(spmd) (line 11) spmd
Error in parallelattempt (line 11) spmd
Are there any suggestions for how this can be done?
Here is a version of the code that does not require the custom functions. Therefore I replaced them with zeros, magic and ones:
function [outall]=parallelattempt(in1,in2,in3)
poolobj = gcp;
addAttachedFiles(poolobj,{'zeros.m','ones.m','magic.m'})
spmd
if labindex==2
out1=zeros(in1);
elseif labindex==3
out2=magic(in2);
elseif labindex==4
out3=ones(in3);
end
outall=[out1;out2;out3];
end
If you use an spmd-statement, the code inside will be sent to all workers. By the use of labindex you only create the variables outX on one specific worker. The problem is, that outall=[out1;out2;out3]; should now be executed on workers where two outX-variables are not declared. The direct fix for this error is to declare the variables before the spmd-statement (out1=[]; out2=[]; out3=[];). But this is not the best solution.
You can use a single variable inside the spmd-statement instead of several ones (outX), lets call this variable out. Now the code executes on each worker and stores its result in out, which is a Composite-object. Concatenation is not necessary because it is done automatically. Additionally, you can specify with spmd (3) at the beginning of the block that only 3 workers should be used. Composite-objects can be indexed like cell arrays where the index equals to the number of the worker/lab. Therefore we can concatenate it after the block.
This is the specific code for that part:
spmd (3)
if labindex==1
out = qrcalc(in1);
elseif labindex==2
out = zcalc(in2);
elseif labindex==3
out = pcalc(in3);
end
end
outall = [out{1};out{2};out{3}];
Note that the creation of the pool will be done automatically, if none exists. You may need to attach the files of your functions before the statement.
An even better approach in my opinion is to use parfeval here. This does exactly what you want to achieve in the first place - it solves your initial problem. The outX variables get calculated in parallel (non-blocking). With the function fetchOutputs you can block execution until the result is calculated. Use it on all the outX-variables and concatenate it in the same line.
Here is the code for that:
out1 = parfeval(#qrcalc,1,in1);
out2 = parfeval(#zcalc,1,in2);
out3 = parfeval(#pcalc,1,in3);
outall = [fetchOutputs(out1);fetchOutputs(out2);fetchOutputs(out3)];

Trouble with running a parallelized script from a driver script

I'm trying to parallelize my code, and I finally got the parfor loops set up such that Matlab doesn't crash every time. However, I've now got an error that I can't seem to figure out.
I have a driver script (Driver12.m) that calls the script that I'm trying to parallelize (Worker12.m). If I run Worker12.m directly, it usually finishes with no problem. However, every time I try to run it from Driver12.m, it either 1) causes Matlab to crash, or 2) throws a strange error at me. Here's some of my code:
%Driver script
run('(path name)/Worker12.m');
%Relevant worker script snippet
parfor q=1:number_of_ranges
timenumber = squeeze(new_TS(q,:,:));
timenumber_shift = circshift(timenumber, [0 1]);
for m = 1:total_working_channels
timenumberm = timenumber(m,:);
for n = 1:total_working_channels
R_P(m,n,q) = mean(timenumberm.*conj(timenumber(n,:)),2);
R_V(m,n,q) = mean(timenumberm.*conj(timenumber_shift(n,:)),2);
end
end
end
Outcome #1: "Matlab has encountered an unexpected error and needs to close."
Outcome #2: "An UndefinedFunction error was thrown on the workers for ''. This might be because the file containing '' is not accessible on the workers. Use addAttachedFiles(pool, files) to specify the required files to be attached. See the documentation for 'parallel.Pool/addAttachedFiles' for more details. Caused by: Undefined function or variable ""."
However, if I run Worker12.m directly, it works fine. It's only when I run it from the driver script that I get issues. Obviously, this error message from Outcome #2 isn't all that useful. Any suggestions?
Edit: So I created a toy example that reproduces an error, but now both my toy example and the original code are giving me a new, 3rd error. Here's the toy example:
%Driver script
run('parpoolexample.m')
%parpoolexample.m
clear all
new_TS = rand([1000,32,400]);
[number_of_ranges,total_working_channels,~] = size(new_TS);
R_P = zeros(total_working_channels,total_working_channels,number_of_ranges);
R_V = zeros(total_working_channels,total_working_channels,number_of_ranges);
parfor q=1:number_of_ranges
timenumber = squeeze(new_TS(q,:,:));
timenumber_shift = circshift(timenumber, [0 1]);
for m = 1:total_working_channels
timenumberm = timenumber(m,:);
for n = 1:total_working_channels
R_P(m,n,q) = mean(timenumberm.*conj(timenumber(n,:)),2);
R_V(m,n,q) = mean(timenumberm.*conj(timenumber_shift(n,:)),2);
end
end
end
Outcome #3: "Index exceeds matrix dimensions (line 7)."
So, at the 'parfor' line, it's saying that I'm exceeding the matrix dimensions, even though I believe that should not be the case. Now I can't even get my original script to recreate Outcomes #1 or #2.
Don't use run with parallel language constructs like parfor and spmd. Unfortunately it doesn't work very well. Instead, use cd or addpath to let MATLAB see your script.

Save the debug state in matlab

I am looking for a way to save 'everything' in the matlab session when it is stopped for debugging.
Example
function funmain
a=1;
if a>1
funsub(1)
end
funsub(2)
end
function funsub(c)
b = c + 1;
funsubsub(c)
end
function funsubsub(c)
c = c + 2; %Line with breakpoint
end
When I finally reach the line with the breakpoint, I can easily navigate all workspaces and see where all function calls are made.
The question
How can I preserve this situation?
When debugging nested programs that take a long time to run, I often find myself waiting for a long time to reach a breakpoint. And sometimes I just have to close matlab, or want to try some stuff and later return to this point, so therefore finding a way to store this state would be quite desirable. I work in Windows Server 2008, but would prefer a platform independant solution that does not require installation of any software.
What have I tried
1. Saving all variables in the workspace: This works sometimes, but often I will also need to navigate other workspaces
2. Saving all variables in the calling workspace: This is already better as I can run the lowest function again, but may still be insufficient. Doing this for all nested workspaces is not very convenient, and navigating the saved workspaces may be even worse.
Besides the mentioned inconveniences, this also doesn't allow me to see the exact route via which the breakpoint is reached. Therefore I hope there is a better solution!
Code structure example
The code looks a bit like this
function fmain
fsub1()
fsub2()
fsub3()
end
function fsub1
fsubsub11
fsubsub12
...
fsubsub19
end
function fsub2
fsubsub21
fsubsub22
...
fsubsub29
end
function fsub3
fsubsub31
fsubsub32
...
fsubsub39
end
function fsubsub29
fsubsubsub291
fsubsubsub292% The break may occur in here
...
fsubsubsub299
The break can of course occur anywhere, and normally I would be able to navigate the workspace and all those above it.
Checkpointing
What you're looking to implement is known as checkpointing code. This can be very useful on pieces of code that run for a very long time. Let's take a very simple example:
f=zeros(1e6,1);
for i=1:1e6
f(i) = g(i) + i*2+5; % //do some stuff with f, not important for this example
end
This would obviously take a while on most machines so it would be a pain if it ran half way, and then you had to restart. So let's add a checkpoint!
f=zeros(1e6,1);
i=1; % //start at 1
% //unless there is a previous checkpoint, in which case skip all those iterations
if exist('checkpoint.mat')==2
load('checkpoint.mat'); % //this will load f and i
end
while i<1e6+1
f(i) = g(i) + i*2+5;
i=i+1;
if mod(i,1000)==0 % //let's save our state every 1000 iterations
save('checkpoint.mat','f','i');
end
end
delete('checkpoint.mat') % //make sure to remove it when we're done!
This allows you to quit your code midway through processing without losing all of that computation time. Deciding when and how often to checkpoint is the balance between performance and lost time!
Sample Code Implementation
Your Sample code would need to be updated as follows:
function fmain
sub1done=false; % //These really wouldn't be necessary if each function returns
sub2done=false; % //something, you could just check if the return exists
sub3done=false;
if exist('checkpoint_main.mat')==2, load('checkpoint_main.mat');end
if ~sub1done
fprintf('Entering fsub1\n');
fsub1()
fprintf('Finished with fsub1\n');
sub1done=true;
save('checkpoint_main.mat');
end
if ~sub2done
fprintf('Entering fsub2\n');
fsub2()
fprintf('Finished with fsub2\n');
sub2done=true;
save('checkpoint_main.mat');
end
if ~sub3done
fprintf('Entering fsub3\n');
fsub3()
fprintf('Finished with fsub3\n');
sub3done=true;
save('checkpoint_main.mat');
end
delete('checkpoint_main.mat');
end
function fsub2
subsub21_done=false;subsub22_done=false;...subsub29_done=false;
if exist('checkpoint_fsub2')==2, load('checkpoint_fsub2');end
if ~subsub21_done
fprintf('\tEntering fsubsub21\n');
fsubsub21
fprintf('\tFinished with fsubsub21\n');
subsub21_done=true;
save('checkpoint_fsub2.mat');
end
...
if ~subsub29_done
fprintf('\tEntering fsubsub29\n');
fsubsub29
fprintf('\tFinished with fsubsub29\n');
subsub29_done=true;
save('checkpoint_fsub2.mat');
end
delete('checkpoint_fsub2.mat');
end
function fsubsub29
subsubsub291_done=false;...subsubsub299_done=false;
if exist('checkpoint_fsubsub29.mat')==2,load('checkpoint_fsubsub29.mat');end
if ~subsubsub291_done
fprintf('\t\tEntering fsubsubsub291\n');
fsubsubsub291
fprintf('\t\tFinished with fsubsubsub291\n');
subsubsub291_done=true;
save('checkpoint_fsubsub29.mat');
end
if ~subsubsub292_done
fprintf('\t\tEntering fsubsubsub292\n');
fsubsubsub292% The break may occur in here
fprintf('\t\tFinished with fsubsubsub292\n')
subsubsub292_done=true;
save(checkpoint_fsubsub29.mat');
end
delete('checkpoint_fsubsub29.mat');
end
So in this structure if you restarted the program after it was killed it would resume back to the last saved checkpoint. So for example if the program died in subsubsub291, the program would skip fsub1 altogether, just loading the result. And then it would skip subsub21 all the way down to subsub29 where it would enter subsub29. Then it would skip subsubsub291 and enter 292 where it left off, having loaded all of the variables in that workspace and in previous workspaces. So if you backed out of 292 into 29 you would have the same workspace as if the code just ran. Note that this will also print a nice tree structure as it enters and exits functions to help debug execution order.
Reference:
https://wiki.hpcc.msu.edu/pages/viewpage.action?pageId=14781653
After a bit of googling, I found that using putvar (custom function from here: http://au.mathworks.com/matlabcentral/fileexchange/27106-putvar--uigetvar ) solved this.

Controlling a matlab script (Pause, Reset)

I am trying to create a matlab script (m-file) which shall be controlled by an external VBA script.
The matlab script shall do the same operation every time (even params change, but this is not the matter in this case) for a certain number of loops.
If I see it right, I can use matlab funktions in VBA like this: http://www.mathworks.de/help/techdoc/matlab_external/f135590.html#f133975
My main problem is how to implement the matlab part of this problem...at the moment my control part looks like this:
start.m:
run = 1;
reset = 0;
while run ~= 0 % Loop until external reset of 'run' to '0'
if reset ~= 0
doReset(); % Reset the parameters for the processing
reset = 0;
disp('I did a reset');
end
disp('I am processing');
doProcess();
pause(1)
end
disp('I am done');
The reset part works very fine while changing the value by the script, but when I manually try to change the value of 'run' or 'reset' to any other value in my workspace, nothing happens...my script doen't abort, neither does the reset-if do it's work...
this seems to me that the script doesn't recognize any changes in the workspace?!
later the variables 'run' and 'reset' shall be set or unset by the VBA script.
Is there any plausible reason why I can't abort the loop by hand?
Thanks for any advice!
greets, poeschlorn
Edit:
It seems that the script loads the variables once before starting and never again during runtime...is there a possibility to have explicit access to a workspace variable?
Edit 2:
I use Matlab 2010b with no additional Toolboxes at the moment
Edit 3:
I found out, that there are several 'workspaces' or RAMs in Matlab. If my function is running, the variables are stored in 'base' (?) workspace, which is not the matlab workspace on which you can click and change every value. So I have to get access to this ominous 'base' space and change the flag 'run' to zero.
I assume your problem is simply that your loop is blocking execution of the external interface. While the loop runs you cannot access the other interfaces.
I wanted to do a similar thing -- allow control of a matlab loop by an external program (either Ruby or another matlab instance). The most flexible solution by far was using UDP. There is a great toolbox called PNET for matlab, and I assume VB must have a socket library too. I simply open a UDP port on both sides, and use simple text commands to control and give feedback.
obj.conn = pnet('udpsocket', 9999);
command = '';
while run ~= 0
nBytes = pnet(obj.conn, 'readpacket');
if nBytes > 0
command = pnet(obj.conn, 'read', nBytes, 'string');
end
switch command
case '--reset--'
doReset(); % Reset the parameters for the processing
reset = 0;
disp('I did a reset');
case '--abort--'
run = 0;
disp('Going to abort');
case '--echo--'
pnet(obj.conn, 'write', '--echo--');
pnet(obj.conn, 'writepacket', remoteAddress, remotePort);
end
doProcess();
end
This way I can build my own extensible control interface without worrying about blocking from the loop, it can work cross-platform and cross-language, can work within a machine or across the network.
UPDATE:
To talk between two UDP clients, you need to set up two complimentary UDP ports, both are clients (this example is all in matlab, pretend obj here is a structure, in my case it is a class i wrap around the pnet functionality):
obj = struct();
obj.success = 0;
obj.client1Port = 9999;
obj.client2Port = 9998;
obj.client1Address = '127.0.0.1';
obj.client2Address = '127.0.0.1';
obj.conn1 = pnet('udpsocket', obj.client1Port);
obj.conn2 = pnet('udpsocket', obj.client2Port);
pnet(obj.conn1, 'write', '--echo--')
pnet(obj.conn1, 'writepacket', obj.client2Address, obj.client2Port);
nBytes = pnet(obj.conn2, 'readpacket');
if nBytes > 0
command = pnet(obj.conn2, 'read', nBytes, 'string');
if regexpi(command,'--echo--')
obj.success = obj.success+1;
fprintf('Client 2 recieved this message: %s\n',command);
pnet(obj.conn2, 'write', '--echo--')
pnet(obj.conn2, 'writepacket', obj.client1Address, obj.client1Port);
end
end
nBytes = pnet(obj.conn1, 'readpacket');
if nBytes > 0
command = pnet(obj.conn1, 'read', nBytes, 'string');
if regexpi(command,'--echo--')
obj.success = obj.success+1;
fprintf('Client 1 got this back: %s\n',command);
end
end
if obj.success == 2
fprintf('\nWe both sent and received messages!\n');
end
Is your script a script m-file or a function?
If it's a function, you'll be losing the scope of the workspace variables which is why it's not working. I'd turn your code into a function like this:
function processRun(run,reset)
while run ~= 0 % Loop until external reset of 'run' to '0'
if reset ~= 0
doReset; % Reset the parameters for the processing
reset = 0;
disp('I did a reset');
end
disp('I am processing');
[run,reset] = doProcess;
pause(1)
end
You can then set the values of run and reset evertime you call the function from VBA.
If you have a script, try removing the run and reset lines from the top, and set their values in the workspace before you run the script. I think you're overwriting your workspace values by running the script file.
Sorry, I don't have enough rep to make a comment so I'll quote it here:
#Adam Leadbetter: Thanks, this makes sense. The only thing I habe trouble with is how to pause (after this reset and then resume) the script when it has been started by run=1 as param... – poeschlorn Feb 25 at 7:17
If you want to break out of the loop once reset has been set to one, and then wait for the loop to continue again once run = 1 that is pretty much the same as just starting over again?
function processRun()
run = 1;
while run ~= 1
run = doProcess();
end
if doProcess() returns 0 then the function processRun() will end (like the behaviour you want to have when reset), the next time processRun is called it starts over, with "reset"/default values.
Or am I missing something?