If createJob/createTask works for my function? What is the difference between create multiple jobs and create multiple tasks in one job? - matlab

I want to run multiple completely independent scripts, which only differs from each other by 1 or 2 parameters, in parallel, so I write the main part as a function and pass the parameters by createJob and createTask as follow:
% Run_DMRG_HubbardKondo
UList = [1, 2, 4, 8];
J_UList = [-1, 0:0.2:2];
c = parcluster;
c.NumThreads = 3;
j = createJob(c);
for iU = 1:numel(UList)
for iJ_U = 1:numel(J_UList)
t = createTask(j, #DMRG_HubbardKondo, 0, {{UList(iU), J_UList(iJ_U)}});
end
end
submit(j);
wait(j,'finished')
delete(j);
clear j t
exit
function DMRG_HubbardKondo(U_Job, J_U_Job)
...% (skipped)
end
What if I createJob multiple times each with one createTask? I know there are some options like attachedfile in createJob. But with respect to independency, is there any difference between createJob and createTask? The reason I ask about independency is that there are setenv inside the DMRG_HubbardKondo function as follow:
function DMRG_HubbardKondo(U_Job, J_U_Job)
...% (skipped)
DirTmp = '/tmp/swan';
setenv('LMA', DirTmp)
Para.DateStr = datestr(datetime('now'),30);
% RCDir named by parameter and datetime
Para.RCDir = [DirTmp,'/RCStore',Para.DateStr,sprintf('U%.4gJ%.4g', [U_Job,J_U_Job])];
k = [strfind(Para.Symm,'SU2'), strfind(Para.Symm,'-v')];
if ~isempty(k)
RC = Para.RCDir
if exist(RC, 'dir')==0
mkdir(RC); % creat if not exist
fprintf([RC,' made.\n'])
end
setenv('RC_STORE', RC);
setenv('CG_VERBOSE', '0');
end
... % (skipped)
end
The main part DMRG_HubbardKondo will use some mex-compiled functions which act like wigner-eckart theorem. Specifically, it will generate and retrieve data(cg coefficients) in RCDir in every steps. I guess those mex-compiled functions will find the corresponding RCDir by "getenv" and I want to know whether createJob/createTask will work correctly.
In summary, my questions are:
difference between create multiple tasks in one job and create multiple jobs each with one task.
will createJob/createTask work for my function?
I know sbatch will work by writing a script passing parameters to submit.sh as follow:
function GenSubmitsh(partition,nodeNo,TLim,NCore,mem,logName,JobName,ParaName,ScriptName)
if isnan(nodeNo)
nodeStr = '##SBATCH --nodelist=auto \n';
else
nodeStr = sprintf('#SBATCH --nodelist=node%g \n',nodeNo);
end
Submitsh = sprintf([
'#!/bin/bash -l \n',...
'#SBATCH --partition=%s \n',...
nodeStr,...
'#SBATCH --exclude=node1051 \n',...
'#SBATCH --time=%s \n',...
'#SBATCH --nodes=1 \n',...
'#SBATCH --ntasks=1 \n',...
'#SBATCH --cpus-per-task=%g \n',...
'#SBATCH --mem=%s \n',...
'#SBATCH --output=%s \n',...
'#SBATCH --job-name=%s \n',...
'\n',...
'##Do not remove or change this line in GU_CLUSTER \n',...
'##export OMP_NUM_THREADS=$SLURM_CPUS_PER_TASK \n',...
'\n',...
'echo "Job Started At" \n',...
'date \n',...
'\n',...
'matlab -nodesktop -nojvm -nodisplay -r "ParaName=''%s'',%s" \n',...
'\n',...
'echo "Job finished at" \n',...
'date \n'],...
partition,TLim,NCore,mem,logName,JobName,ParaName,ScriptName);
fileID = fopen('Submit.sh','w');
fprintf(fileID,'%s',Submitsh);
fclose(fileID);
end
I hope createJob/createTask will work equivalently.(i.e. completely independent)

There are only minor differences between multiple createJob calls each with a single createTask vs. single createJob with multiple createTask calls. I would say it is generally better to use a single Job with multiple Tasks, unless you have a specific reason not to. Here are some considerations:
Having a single Job object allows some of the stages of the submission process to be done once instead of multiple times (e.g. some pieces of attaching files etc.)
It is possible (although admittedly awkward) to vectorise the calls to createTask. (This doesn't affect execution)
On the MATLAB Job Scheduler (MJS) system, you can set more properties per Job object, such as a range of workers to be used during execution
When using schedulers such as SLURM, multiple Tasks of a single Job can be submitted to the scheduler as a "job array", which I believe can be more efficient for the scheduler itself.
When using schedulers other than MJS, each Task runs in a fresh MATLAB worker process, regardless of whether it is the only Task in a Job or not.

Related

Create an array that grows at a regular interval on Matlab

I have been thinking about how to create an array that grows at a regular interval of time (for instance every 5 seconds) on Matlab.
I figured out 2 ways, either using tic/ toc or timer function. Later this program will be complexified. I am not sure which way is the best but so far I am trying with using timer.
Here is what I have tried :
clc;
period=5;%period at which the file should be updated
freq=4;
l=freq*period;
time=[0];
a = timer('ExecutionMode','fixedRate','Period',period,'TimerFcn',{#time_append, time,l,freq},'TasksToExecute',3 );
start(a);
function [time]=time_append(obj,event,time,l,freq)
time_append=zeros(l,1);
last_time=time(end)
for i=1:1:l
time_append(i)=last_time+i/freq;
end
time=[time;time_append];
end
After compiling this code, I only get a time array of length 1 containing the value 0 wheras it should contain values from 0 to 3x5 =15 I think it is a stupid mistake but I can't see why. I have tried the debug mode and it seems that at the end of the line time=[time;time_append], the concatenation works but the time array is reinitialised when we go out of the function. Also I have read that callback function can't have output. Does someone would know how I could proceed? Using globals? Any other suggestion?
Thank you for reading
You can do this by using nested functions. Nested functions allow you to access "uplevel variables", and you can modify those. Here's one way to do it:
function [a, fcn] = buildTimer()
period=5;%period at which the file should be updated
freq=4;
l=freq*period;
time=0;
function time_append(~,~,l,freq)
time_append=zeros(l,1);
last_time=time(end);
for i=1:1:l
time_append(i)=last_time+i/freq;
end
time=[time;time_append];
end
function out = time_get()
out = time;
end
fcn = #time_get;
a = timer('ExecutionMode','fixedRate',...
'Period',period,...
'TimerFcn',{#time_append,l,freq},...
'TasksToExecute',3 );
start(a);
end
Note that the variable time is shared by time_append and time_get. The timer object invokes time_append, and updates time. You need to hand out the function handle time_get to retrieve the current value of time.
>> [a,fcn] = buildTimer; size(fcn()), pause(10); size(fcn())
ans =
21 1
ans =
61 1

How can I run a MATLAB script on .csv files in two separate folders at the same time?

So I have an iterative loop that extracts data from .csv files in MATLAB's active folder and plots it. I would like to take it one step further and run the script on two folders, each with their own .csv files.
One folder is called stress and the other strain. As the name implies, they contain .csv files for stress and strain data for several samples, each of which is called E3-01, E3-02, E3-03, etc. In other words, both folders have the same number of files and the same names.
The way I see it, the process would have the following steps:
Look in the stress folder, look inside file E3-01, extract the data in the column labelled Stress
Look in the strain folder, look inside file E3-01, extract the data in the column labelled Strain
Combine the data together for sample E3-01 and plot it
Repeat steps 1-3 for all files in the folders
Like I said, I already have a script that can find the right column and extract the data. What I'm not sure about is how to tell MATLAB to alternate the folder that the script is being run on.
Instead of a script, would a function be better? Something that accepts 4 inputs: the names of the two folders and the columns to extract?
EDIT: Apologies, here's the code I have so far:
clearvars;
files = dir('*.csv');
prompt = {'Plot name:','x label:','y label:','x values:','y values:','Points to eliminate:'};
dlg_title = 'Input';
num_lines = 1;
defaultans = {'Title','x label','y label','Surface component 1.avg(epsY) [True strain]','Stress','0'};
answer = inputdlg(prompt,dlg_title,num_lines,defaultans);
name_plot = answer{1};
x_label = answer{2};
y_label = answer{3};
x_col = answer{4};
y_col = answer{5};
des_cols = {y_col,x_col};
smallest_n = 100000;
points_elim = answer{6};
avg_x_values = [];
avg_y_values = [];
for file = files'
M=xlsread(file.name);
[row,col]=size(M);
if smallest_n > row
smallest_n = row;
end
end
smallest_n=smallest_n-points_elim;
avg_x_values = zeros(smallest_n,size(files,1));
avg_y_values = zeros(smallest_n,size(files,1));
hold on;
set(groot, 'DefaultLegendInterpreter', 'none');
set(gca,'FontSize',20);
ii = 0;
for file = files'
ii = ii + 1;
[n,s,r] = xlsread(file.name);
colhdrs = s(1,:);
[row, col] = find(strcmpi(s,x_col));
x_values = n(1:end-points_elim,col);
[row, col] = find(strcmpi(s,y_col));
y_values = n(1:end-points_elim,col);
plot(x_values,y_values,'DisplayName',s{1,1});
avg_x_values(:,ii)=x_values(1:smallest_n);
avg_y_values(:,ii)=y_values(1:smallest_n);
end
ylabel({y_label});
xlabel({x_label});
title({name_plot});
colormap(gray);
hold off;
avg_x_values = mean(avg_x_values,2);
avg_y_values = mean(avg_y_values,2);
plot(avg_x_values,avg_y_values);
set(gca,'FontSize',20);
ylabel({y_label});
xlabel({x_label});
title({name_plot});
EDIT 2: #Adriaan I tried to write the following function to get a column from a file:
function [out_col] = getcolumn(col,file)
file = dir(file);
[n,s,r] = xlsread(file.name);
colhdrs = s(1,:);
[row, col] = find(strcmpi(s,col));
out_col = n(1:end,col);
end
but I get the error
Function 'subsindex' is not defined for values of class 'struct'.
Error in getcolumn (line 21)
y = x(:,n);
not sure why.
You can do both, of course, and it depends on preference mainly, provided you're the sole user of the script. If others are going to use it as well, use functions instead, as they can contain a proper help file and calling help functionname will then give you that help.
For instance:
folders1 = dir(../strain/*)
folders2 = dir(../stress/*)
for ii 1 = 1:numel(folders)
operand1 = folders1{ii};
operand2 = folders2{ii};
%... rest of script
%
% Or function:
data = YourFunction(folders1{ii},folders2{ii})
end
So all in all you can use both, although from experience I find functions easier to use in the end, as you just pass parameters and don't need to trawl through the complete code to change the parameters each run.
Additionally you can partition off small parts of your program which do a fix task. If you nest your functions, and finally call just a single function in your scripts, you don't have to look at hundreds of lines of code each time you run the script, but rather can just run a single function (which can also be inside a script or function, ad infinitum).
Finally, a function has its own scope; meaning that any variables that are in that function stay within that function unless you explicitly set them as output (apart from global variables, but those are problematic anyway). This can be a good thing, or a bad thing, depending on the rest of your code. If you function would output ~20 variables for further processing, the function probably should include more steps. It'd be a good thing if you create lots of intermediate variables (I always do), because when the function's finished running, the scope of that function will be removed from memory, saving you clear tmpVar1 tmpVar2 tmpVar3 etc every few lines in your script.
For the script the argument in favour would be that it is easier to debug; you don't need dbstop on error and can step a bit easier through the script, keeping check of all your variables. But, after the debugging has been completed, this argument becomes moot, and thus in general I'd start with writing a script, and once it performs as desired, I rework it to a function at minimal extra effort.

How to print to the same file from different PARFOR iterations?

Consider the following pair of functions:
function fileTop
test = fopen('test.txt','w');
fprintf(test,'In function "fileTop"\r\n');
fileMid(test)
fprintf(test,'Back in function "fileTop"');
fclose(test);
end
and:
function fileMid(fid)
for k = 1:5
pause(rand)
fprintf(test,'In "fileMid %d" at %f\r\n',k,now);
end
end
If you just run fileTop you get a new text file (in case it's the first time) with the following content:
In function "fileTop"
In "fileMid 1" at 736847.920072
In "fileMid 2" at 736847.920073
In "fileMid 3" at 736847.920081
In "fileMid 4" at 736847.920087
In "fileMid 5" at 736847.920096
Back in function "fileTop"
which is just fine!
Now, try to change the loop in fileMid to parfor, and you get an error:
Error using fileMid (line 2)
Invalid file identifier. Use fopen to generate a valid file identifier.
Is there a way to solve this?
BTW, I don't care for the order in which the iterations are printed.
It's generally unwise to have multiple processes modifying the same resource. If the two processes happen to write to the file at the exact same time you could end up with their outputs being interleaved or overwriting one another.
A better idea would be to have each worker output to its own unique file, such as the file name plus some identifier unique to the worker. This identifier could be gotten from labindex when using spmd:
[filePath, fileName, fileExt] = fileparts(file_name);
workerFile = fullfile(filePath, [file_name '_' int2str(labindex) fileExt]);
or from the current task object when using parfor:
[filePath, fileName, fileExt] = fileparts(file_name);
task = getCurrentTask;
workerFile = fullfile(filePath, [file_name '_' int2str(task.ID) fileExt]);
Then, once the workers have completed, have the master process collect the individual data files into a single file.
The problem in the code above is that after the first access to the file, I'm trying to pass it's fid to the workers, but they can't connect with the main program, and so they find the file with fid as inaccessible.
The way to solve this, albeit I'm not sure if it's recommended, is to pass the file name itself to the workers, and open and close the file in every worker. This is how fileTop look like after the change:
function fileTop
file_name = 'test.txt';
test = fopen(file_name,'w');
fprintf(test,'In function "fileTop"\r\n');
fclose(test);
parfor k = 1:5
fileMid(test,k)
end
test = fopen(file_name,'a');
fprintf(test,'Back in function "fileTop"');
fclose(test);
end
And this is fileMid:
function fileMid(file_name,k)
test = fopen(file_name,'a');
pause(rand)
fprintf(test,'In "fileMid %d" at %f\r\n',k,now);
fclose(test);
end
And a possible output would be:
In function "fileTop"
In "fileMid 2" at 736847.917401
In "fileMid 4" at 736847.917404
In "fileMid 3" at 736847.917405
In "fileMid 1" at 736847.917409
In "fileMid 5" at 736847.917410
Back in function "fileTop"

Different behaviour of function in a for-loop or when unrolling of the loop is performed

I got an odd behaviour of my functions and since i'm not so used to matlab coding i guess is due to something really easy that i don't get.
I can't understand how this could print something different
fx(Punti(1,:),Punti(2,:))
fx(Punti(2,:),Punti(3,:))
fx(Punti(3,:),Punti(4,:))
fx(Punti(4,:),Punti(5,:))
from this
for i_unic=1:4
fx(Punti(i_unic,:),Punti(i_unic+1,:))
end
Consider fx as a generic function.
Is it possible that fx uses some variables that for some reason are erased at the end of each iteration?
EDIT
-->"Punti" is just matrix containing the points a SCARA robot should follow
-->fx is the function "Retta" and it's the following
function retta(PuntoA,PuntoB,Asse_A,q_ini,rot,contaerro,varargin)
global SCARA40
global inizio XX YY ZZ
global seg_Nsteps
npassi = seg_Nsteps;
ipuntofin = inizio + npassi;
for ipunto = inizio : ipuntofin
P4 = PuntoA + (ipunto-inizio)*(PuntoB-PuntoA)/npassi;
q = kineinversa(Asse_A,P4,q_ini,rot);
Mec = SCARA40.fkine(q);
Pec = Mec(:,4);
if (dot((P4-Pec),(P4-Pec),3)>0.0001)
fprintf(1,'\n P4 Desid. = [%9.1f %9.1f %9.1f %9.1f ] \n',P4);
fprintf(1,'\n P4 Attuato = [%9.1f %9.1f %9.1f %9.1f ] \n',Pec);
contaerro = contaerro + 1;
else
q_ini = q;
end
SCARA40.plot(q);
XX(ipunto) = Pec(1);
YY(ipunto) = Pec(2);
ZZ(ipunto) = Pec(3);
if(nargin>6)
color = varargin{1};
else
color = 'r';
end
plot3(XX,YY,ZZ,color,'LineWidth',1 );
drawnow;
hold on
end
end
the test function with the results
Punti = [ 10,10,0,1 ;10,-10,0,1 ;-10,-10,0,1 ; -10,10,0,1 ] ;
%inizio=1
%retta(Punti(1,:)',Punti(2,:)',Asse_A,q_ini,rot,contaerro)
%inizio=21
%retta(Punti(2,:)',Punti(3,:)',Asse_A,q_ini,rot,contaerro)
%inizio=41
%retta(Punti(3,:)',Punti(4,:)',Asse_A,q_ini,rot,contaerro)
%inizio=61
inizio=1
for i=1:length(Punti)-1
retta(Punti(i,:)',Punti(i+1,:)',Asse_A,q_ini,rot,contaerro)
inizio=inizio+20;
end
the two images have been generated restarting Matlab
Addressing the question in the most general sense (since there is no sample given for the function fx or the function/variable Punti) then the reason you are getting different results is likely that the state of your variables/workspace is different when you test one case versus the other. How could this happen? Here are some obvious ways...
Your functions (or possibly other functions they call) are making use of the random number generator, and the starting state of the RNG is different when you test the loop versus unrolled loop case.
Your functions are sharing global variables that aren't reset to some default value at the start of each test case. You mention in a comment that the functions use global variables, so this is likely your problem.
Your functions aren't really functions, but scripts. Scripts all share a common workspace (the base workspace), whereas a function (and specifically each call to a function) will have its own unique workspace. If fx is actually a script, each call may change any or all of the variables in the base workspace. Furthermore, any other scripts, or anything you type into the command line, can change things as well. The contents of the base workspace may therefore be different when you test the loop versus unrolled loop case.
If I were to hazard a guess, I'd say that if you were to exit and restart MATLAB before each test case (i.e. reset everything to the same default starting state) you would probably get the same exact result for the loop versus unrolled loop case.

Using both strings and functions in Matlab UnitTest diagnostics?

Please refer to the documentation for the testCase.verifyEqual method here. The documentation says that only one of the diagnostic features can be used. My requirement is I need two diagnostics at the same time - strings and function handle. The following is simple example of what I'm trying to achieve,
classdef testArrays < matlab.unittest.TestCase
methods (Test)
function testArraysEquality(testCase)
a = 1:10;
b = 1:10;
incrementFunc = #(x)x+1;
failureCount;
for i=1:length(a)
testCase.verifyEqual(a(i),b(i),'AbsTol',10e-3,['Test failed array element# ' num2str(i) ' failure count ' num2str(incrementFunc(failureCount))]);
end
disp([num2str(failureCount) ' out of ' num2str(length(a)) ' test cases failed']);
end
end
end
The problem is Anonymous function don't store values. On the other hand with the 'assignin' feature shown below, the value can be incremented and stored, but cannot be returned for use inside disp(). Is there any work around for this?
incrementFunc1 = #(x) assignin('caller', inputname(1), x+1);
You can include more than one (as well as more than one type) of diagnostic in the MATLAB Unit Test Framework by simply providing a diagnostic array to verifyEqual. You can actually do this explicitly as follows:
import matlab.unittest.diagnostics.StringDiagnostic;
import matlab.unittest.diagnostics.FunctionHandleDiagnostic;
testCase.verifyEqual(a,e, [StringDiagnostic('some string'), FunctionHandleDiagnostic(#() someFunction)]);
However, the Diagnostic.join method is provided to make that easier:
import matlab.unittest.diagnostics.Diagnostic;
testCase.verifyEqual(a,e, Diagnostic.join('some string', #() someFunction));
In order to do the increment call you are probably going to want to add a failed listener to the testCase in order to increment properly. Note that people/plugins can actually add listeners and execute these diagnostics in passing cases in addition to failing cases. As such your diagnostic messages should not assume that every time they are invoked it is in a failure condition. This not only applies to your incrementing code but also to just the message you are providing. I would suggest that instead of saying:
Test failed array element# 3 failure count 2
you should say:
Tested array element# 3 failure count 2
The framework diagnostic will let you know whether it failed or not. Anyway, takeaway, don't rely on invoking the diagnostics to determine failure count. What then? Take a look at the Events section here. You should listen explicitly for verification failed events in order to add that information to your diagnostics.
For the first solution, I am not sure why you need to provide the failure count for every failure. It seems like that would be very verbose. If you don't need that then you can do something like this:
classdef testArrays < matlab.unittest.TestCase
methods (Test)
function testArraysEquality(testCase)
a = 1:10;
b = 1:10;
failureCount = 0;
testCase.addlistener('VerificationFailed', #incrementFailureCount);
function incrementFailureCount(varargin)
% This is a nested function & has the scope and can see/modify
% the failureCount variable. This could also be done with a
% property on the class ans a method that increments it
failureCount = failureCount + 1;
end
for i=1:length(a)
testCase.verifyEqual(a(i),b(i),'AbsTol',10e-3,['Tested array element # ' num2str(i)]);
end
% I suggest using log instead of disp. If you want it to show up most of the time you can
% log it at Terse (1) verbosity. However, if you don't want to see it you can turn it off.
testCase.log(1, sprintf('%d out of %d test cases failed', failureCount, length(a)));
end
end
end
Is that good enough? If you really want to show the failure count in the diagnostics for each failure you can its just a bit more complicated and requires another nested function (or property access).
classdef testArrays < matlab.unittest.TestCase
methods (Test)
function testArraysEquality(testCase)
import matlab.unittest.diagnostics.Diagnostic;
a = 1:10;
b = 1:10;
failureCount = 0;
testCase.addlistener('VerificationFailed', #incrementFailureCount);
function incrementFailureCount(varargin)
failureCount = failureCount + 1;
end
function displayFailureCount
fprintf(1, 'Failure Count: %d', failureCount);
end
for i=1:length(a)
testCase.verifyEqual(a(i),b(i),'AbsTol',10e-3, ...
Diagnostic.join(...
['Tested array element #' num2str(i)], ...
#displayFailureCount));
end
testCase.log(1, sprintf('%d out of %d test cases failed', failureCount, length(a)));
end
end
end
Does that help you accomplish what you are trying to do?