I have Matlab .m script that sets and trains Neural network ("nn") using Matlab's Neural network toolbox. The script launches some GUI that shows trainig progress etc. The training of nn usually takes long time.
I'm doing these experiments on computer with 64 processor cores. I want to train several networks at the same time without having to run multiple Matlab sessions.
So I want to:
Start training of neural network
Modify script that creates network to create different one
Start training of modified network
Modify script to create yet another network...
Repeat steps 1-4 several times
The problem is that when I run the scrip it blocks Matlab terminal so I cannot do anything else until the script executes its last command - and that takes long. How can I run all those computations in parallel? I do have Matlab parallel toolbox.
EDIT: Matlab bug??
Update: This problem seems to happen only on R2012a, looks like fixed on R2012b.
There is very strange error when I try command sequence recommended in Edric's answer.
Here is my code:
>> job = batch(c, #nn, 1, {A(:, 1:end -1), A(:, end)});
>> wait(job);
>> r = fetchOutputs(job)
Error using parallel.Job/fetchOutputs (line 677)
An error occurred during execution of Task with ID 1.
Caused by:
Error using nntraintool (line 35)
Java is not available.
Here are the lines 27-37 of nntraintool (part of Matlab's Neural networks toolkit) where error originated:
if ~usejava('swing')
if (nargin == 1) && strcmp(command,'check')
result = false;
result2 = false;
return
else
disp('java used');
error(message('nnet:Java:NotAvailable'));
end
end
So it looks like the problem is that GUI (because Swing is not available) cannot be used when job is executed using batch command. The strange thing is that the nn function does not launch any GUI in it's current form. The error is caused by train that launches GUI by default but in nn I have switched that off:
net.trainParam.showWindow = false;
net = train(net, X, y);
More interestingly if the same nn function is launched normally (>> nn(A(:, 1:end -1), A(:, end));) it never enters the outer if-then statement of nntraintool on line 27 (I have checked that using debugger). So using the same function, the same arguments expression ~usejava('swing') evaluates to 0 when command is launched normally but to 1 when launched using batch.
What do you think about this? It looks like ugly Matlab or Neural networks toolbox bug :(((
With Parallel Computing Toolbox, you can run up to 12 'local workers' to execute your scripts (to run more than that, you'd need to purchase additional MATLAB Distributed Computing Server licences). Given your workflow, the best thing might be to use the BATCH command to submit a series of non-interactive jobs. Note that you will not be able to see any GUI from the workers. You might do something like this (using R2012a+ syntax):
c = parcluster('local'); % get the 'local' cluster object
job = batch(c, 'myNNscript'); % submit script for execution
% now edit 'myNNscript'
job2 = batch(c, 'myNNscript'); % submit script for execution
...
wait(job); load(job) % get the results
Note that the BATCH command automatically attaches a copy of the script to run to the job, so that you are free to make changes to it after submission.
Related
I'm trying to accelerate our test environment by using the ParralelToolbox of Mathworks. However I am unable to start several Matlab instances in parallel (up to now we run our tests sequentially and each one is starting a new Matlab instance via an ActX server).
So when I run the following code below
ML=ver('Matlab');
ML_Path=matlabroot;
ML_Ver=ML.Version;
parfor i = 1:3
NewMatlab = actxserver(['matlab.application.single.',ML_Ver])
Answer = NewMatlab.Feval('test',1);
NewMatlab.Quit;
NewMatlab.release;
end
the Matlab instances are called sequentially (test is just a very simple script that sums up a few numbers).
However if I start a new Matlab via command line
ML=ver('Matlab');
ML_Path=matlabroot;
ML_Ver=ML.Version;
parfor i = 1:3
dos('matlab -nodesktop -minimize -wait -batch "test"');
end
it works. I see that these two methods are quite different in the handling of starting Matlab, but the first approach would be
If you want each iteration of your test to run in a completely separate MATLAB instance, you could use the batch function, like this:
for i = 1:3
j(i) = batch(#test, nOut, {argsIn...});
end
% Later, collect results
for i = 1:3
wait(j(i)), fetchOutputs(j(i))
end
Or, you could simply use parfor directly
parpool() % If necessary
parfor i = 1:3
out{i} = test(...)
end
(You only need to call parpool if no pool is currently open, and you have your preferences set so that a pool is not automatically created when you hit the parfor).
I have a code from matlab 2010a that I want to run it in matlab 2019a, I'm using parallelism.
matlabpool open 4 %prepares matlab to run in 4 parallel procesors
j1 = batch('parallel1', 'matlabpool', 0);
pause(1)
j2 = batch('parallel2', 'matlabpool', 0);
pause(1)
j3 = batch('parallel3', 'matlabpool', 0);
pause(1)
j4 = batch('parallel4', 'matlabpool', 0);
matlabpool close
But, the code dosen't run in this version of matlab, because I have to use parpool.
So, I'm asking to someone who know how to convert or how to change this part of the code to run in my new matlab version.
The literal translation of your code is to do this:
parpool(4) % Creates a parallel pool with 4 workers
j1 = batch('parallel1', 'Pool', 0) % creates a batch job with no pool
... % etc.
However, I'm curious as to whether this is actually what you want to do. The parpool(4) command launches 4 worker processes to be used by your desktop MATLAB - for when you use parfor, spmd, or parfeval. Each batch command spawns an additional worker process, which cannot access the workers from the parallel pool.
First step is to check the original documentation, since 2010a is no longer online here the corresponding 2013a documentation. It still has matlabpool explained:
'Matlabpool' — An integer specifying the number of workers to make into a MATLAB pool for the job in addition to the worker running the batch job itself. The script or function uses this pool for execution of statements such as parfor and spmd that are inside the batch code. Because the MATLAB pool requires N workers in addition to the worker running the batch, there must be at least N+1 workers available on the cluster. You do not have to have a MATLAB pool already running to execute batch; and the new pool that batch opens is not related to a MATLAB pool you might already have open. (See Run a Batch Parallel Loop.) The default value is 0, which causes the script or function to run on only the single worker without a MATLAB pool.
In current MATLAB-Versions this option is replaced by the pool parameter. 0 Is still the default behavior, you can use:
j1 = batch('parallel1');
This has got me stumped.
I've written a function parObjectiveFunction that runs several simulations in parallel using createJob and createTask. It takes as an argument an objectiveFunction which is passed deeper into the code to calculate the objective function value for each simulation.
When I run parObjectiveFunction from the directory where objectiveFunction is found, it works as expected, but when I go one level up, it can no longer find objectiveFunction. The specific error I get is
Error using parallel.Job/fetchOutputs (line 1255)
An error occurred during execution of Task with ID 1.
Error in parObjectiveFunction (line 35)
taskoutput = fetchOutputs(job);
Caused by:
Error using behaviourObjective/getPenalty (line 57)
Undefined function 'objectiveFunction' for input arguments of type 'double'.
(behaviourObjective is an object)
This is weird for several reasons.
objectiveFunction is definitely in path, and when I try which objectiveFunction, it points to the correct function.
I have other components of the deeper code in other directories, and they are found without issue (they are objects rather than functions, but that shouldn't make a difference).
There's a line of code in parObjectiveFunction that runs the simulation, and when I run that directly in the matlab command window it finds objectiveFunction without issue.
I get the same results on my local machine and an HPC server.
My first thought was that the individual task might have its own path which didn't include objectiveFunction, but then that should cause problems for the other components (it doesn't). The problem is compounded because I can't work out how to debug the parallel code.
What am I doing wrong? Code that produced the issue is below.
Are there any known issues where matlab can't find functions when
using parallel processing with createJob, createTask, submit
and fetchOutputs?.
How can you debug in matlab when the issue is
only when operating in parallel? None of my print statements appear.
To make something work for external testing would take quite a bit of hacking, but for the sake of the question, the parallel function is:
function penalty = parObjectiveFunction(params, objectiveFunction, N)
% Takes a vector of input parameters, runs N instances of them in parallel
% then assesses the output through the objectiveFunction
n = params(1);
np = params(2);
ees = params(3);
ms = params(4);
cct = params(5);
wt = params(6);
vf = params(7);
dt = 0.001;
bt = 10;
t = 10;
c = parcluster;
job = createJob(c);
testFunction = #(run_number)behaviourObjective(objectiveFunction,n,np,ees,ms,cct,wt,vf,t,dt,bt,run_number);
for i = 1:N
createTask(job, testFunction, 1, {i});
end
submit(job);
wait(job);
taskoutput = fetchOutputs(job);
pensum = 0;
for i = 1:N
pensum = pensum + taskoutput{i}.penalty;
end
penalty = pensum/N;
end
It sounds like you need to attach some additional files to your job. You can see which files were picked up by MATLAB's dependency analysis by running listAutoAttachedFiles, i.e.
job.listAutoAttachedFiles()
If this isn't showing your objectiveFunction, then you can manually attach this by modifying the AttachedFiles property of the job.
It seems as though objectiveFunction is a function_handle though, so you might need to something like this:
f = functions(objectiveFunction)
job.AttachedFiles = {f.file}
I am using the nftool GUI to set up a regression neural network.
My database has various NaN (missing values). When I run the GUI, everything seems to go right. It gives me the performance and the regression graph.
I read that by line code you can add a processFcn named 'fixunknowns' to the network.
My question is: In the GUI, is the neural network making the fixunknows? How the GUI is procesing this NaN?
When I generate the script, the fixunknows function does not appear.
I wonder if it is only possible to treat this NaN values on line code? Or... perhaps the GUI implements the fixnknowns automatically?
Thank you.
When you get to the end of nftool, click 'advanced script'. This script will repeat exactly what you have done in nftool. In it you will see
% Choose Input and Output Pre/Post-Processing Functions
% For a list of all processing functions type: help nnprocess
net.input.processFcns = {'removeconstantrows','mapminmax'};
net.output.processFcns = {'removeconstantrows','mapminmax'};
This would indicate that fixunknowns is not run, (unless it is called by a function we can't see in this script). So you can add this line
net.input.processFcns = {'fixunknowns'};
Note that you should not run fixunknowns on the output. If there are NaNs in the output, delete the entire row/sample.
If you make a change in the GUI, you will have to add the fixunknowns each time.
I am trying to train a neural network, by using the train function. The thing is that I want to do this remotely over the internet by using a SSH connection.
However, I am receiving the following error:
??? Error using ==> nntraintool at 28
NNTRAINTOOL requires Java which is not available
Error in ==> trainbr>train_network at 257
[userStop,userCancel] = nntraintool('check');`
Error in ==> trainbr at 116`
[net,tr] = train_network(net,tr,data,fcns,param);`
Error in ==> network.train at 107`
[net,tr] = feval(net.trainFcn,net,X,T,Xi,Ai,EW,net.trainParam);`
Error in ==> ClassifierScript at 28`
[MFLDefectSNetwork, tr] = train(MFLDefectSNetwork, TrainingInputSet,
TrainingSTargets);`
I think I receive this error because of the training interface which is displayed when you want to perform a neural net training. If so, could you please tell me, how can I turn that visual interface off so that I can run this by using ssh connection.
I believe you can solve this by setting the trainParam.showWindow parameter of your network object to false before calling nntraintool. For example, if your network object is stored in the variable net, you would do this before you train:
net.trainParam.showWindow = false;
This MATLAB Newsgroup thread also suggests that you may have to comment out some lines in nntraintool, which you can open in the editor with the command edit nntraintool.
(Disclaimer: the following is untested. I currently only have access to a Windows installation of MATLAB)
Try the following sequence of commands to start MATLAB (note that you should NOT use the -nojvm option):
# on your machine
ssh -x user#host
# on the host
unset DISPLAY
matlab -nodisplay
Once in MATLAB, you can explicitly check that Java is available:
>> usejava('jvm')
>> java.lang.String('str')
Next, proceed to create and use the neural network (you just have to suppress training feedback):
%# load sample dataset
load simpleclass_dataset
%# create and train neural network
net = newpr(simpleclassInputs, simpleclassTargets, 20);
net.trainParam.showWindow = false; %# no GUI (as #gnovice suggested)
net.trainParam.showCommandLine = true; %# display in command line
net.trainParam.show = 1; %# display every iteration
net = train(net, simpleclassInputs, simpleclassTargets);
%# predict and evaluate performance
simpleclassOutputs = sim(net, simpleclassInputs);
[c,cm] = confusion(simpleclassTargets,simpleclassOutputs)
As a side note, even though we disabled all display, we can still plot stuff (although invisible) and export figures to files, as I have shown in previous related questions...