I am creating a set of tasks in a cluster using the -qsub command.
Inside my matlab code I would like to make a synchronization point to check whether all of my workers are finished execution or not. If all have finished execution I want to assign some other tasks to them.
For e.g.:my function (matlab) is:
function test(taskId)
do_task_1(taskId);
__sync() ----->check whether all the workers have finished the job successfully
do_task_2(taskId);
end
How can I do it?
p.s. I am a beginner to cluster computing.
A little late to the game, but might still be useful to you or others. I doubt you can do this from within matlab. Others with more experience may suggest otherwise. I would split the two tasks into separate matlab programs and use the -sync option with Grid Engine:
qsub -sync y do_task_1.sh
# -sync y makes qsub wait for job to finish
qsub do_task_2.sh
You can put the above commands in a script so you don't have to wait for first task to complete.
Alternatively, you can use '-hold_jobid`:
qsub do_task_1.sh
qsub -hold_jid <job_id_of_task_1> do_task_2.sh
Related
Does the execution of addpath/rmpath/savepath in one MATLAB instance affect other instances?
Motivation: Imagine that you are developing a MATLAB package, which provides a group of functions to the users. You have multiple versions of this package being developed on a single laptop. You would like to test these different versions in multiple instances of MATLAB:
You open one MATLAB window, type run_test(DIRECTORY_OF_PACKAGE_VERSION1), and hit enter;
While the first test is running, you open another MATLAB window, type run_test(DIRECTORY_OF_PACKAGE_VERSION2), and hit enter.
See the pseudo-code below for a better idea about the tests.
No code or data is shared between different tests --- except for those embedded in MATLAB, as the tests are running on the same laptop, using the same installation of MATLAB. Below is a piece of pseudo-code for such a scenario.
% MATLAB instance 1
run_test(DIRECTORY_OF_PACKAGE_VERSION1);
% MATLAB instance 2
run_test(DIRECTORY_OF_PACKAGE_VERSION2);
% Code for the tests
function run_test(package_directory)
setup_package(package_dirctory);
RUN EXPERIMENTS TO TEST THE FUNCTIONS PROVIDED BY THE PACKAGE;
uninstall_package(package_directory);
end
% This is the setup of the package that you are developing.
% It should be called as a black box in the tests.
function setup_package(package_dirctory)
addpath(PATH_TO_THE_FUNCTIONS_PROVIDED_BY_THE_PACKAGE);
% Make the package available in subsequent MATLAB sessions
savepath;
end
% The function that uninstalls the package: remove the paths
% added by `setup_package` and delete the files etc.
function uninstall_package(package_directory)
rmpath(PATH_TO_THE_FUNCTIONS_PROVIDED_BY_THE_PACKAGE);
savepath;
end
You want to make sure the following.
The tests do not interfere with each other;
Each test is calling funtions from the correct version of the package.
Hence here come our questions.
Questions:
Does the execuation of addpath, rmpath, and savepath in one MATLAB instance affect the other instance, sooner or later?
More generally, what kind of commands executed in one MATLAB instance can affect the other instance?
3. What if I am running only one instance of MATLAB, but invoke a parfor loop with two loops running in parallel? Does the execution of addpath/rmpath/savepath in one loop affect the other loop, sooner or later? In general, what kind of commands executed in one parallel loop can affect the other loop? (As pointed out by #Edric, this can be complicated; so let us not worry about it. Thank you, #Edric.)
Thank you very much for any comments and insights. It would be much appreciated if you could direct me to relevant sections in the official documentation of MATLAB --- I did some searching in the documentation, but have not found an answer to my question.
BTW, in case you find that the test described in the pseudo code is conducted in a wrong/bad manner, I will be very grateful if you could recommend a better way of doing it.
The documentation page for the MATLAB Search Path specifies at the bottom:
When you change the search path, MATLAB uses it in the current session, but does not update pathdef.m. To use the modified search path in the current and future sessions, save the changes using savepath or the Save button in the Set Path dialog box. This updates pathdef.m.
So, standard MATLAB sessions are "isolated" in terms of their MATLAB Search Path unless you use savepath. After a call to savepath, new MATLAB sessions will read the updated pathdef.m on startup.
The situation with a parallel pool is slightly more complex. There are a couple of things that affect this. First is the parameter AutoAddClientPath that you can specify for the parpool command. When true, an attempt is made to reflect the desktop MATLAB's path on the workers. (This might not work if the workers cannot access the same folders).
When a parallel pool is running, any changes to the path on the desktop MATLAB client are sent to the workers, so they can attempt to add or remove path entries. Parallel pool workers calling addpath or rmpath do so in isolation. (I'm afraid I can't find a documentation reference for this).
Is it possible to catch signals received (specifically SIGSEGV, SIGABRT) by child processes of a program without actually modifying it (or with minimal modification)?
The program I'm talking about is a pretty complex tool of which I don't have low-level (implementation details) knowledge of. I do have access to its source code. I can start it using a command like:
$ ./tool_name start # tool_name is an executable created after compiling and building its source code
It forks many child processes and I want to see if those child processes are being killed by a signal or not.
What I have thought about is to create a simple C program and call above command through that (using system()). Write a signal handler for above signals I'm looking for, and do other stuffs. Is it a right way to keep track of signals received by child processes? Is there a better way to do the same?
I understand that some indeterminism stems from parfor's parallel nature but I don't understand why it should be entirely random. Is there any way to force parfor to respect (at least loosely) the order of the loop? More specifically I would like that in the case of:
parfor i=1:100
do_independent_stuff()
end
each worker of the pool when asking for a new task (i.e. in this case a new iteration of the loop) to be affected the lowest i that hasn't been computed or affected to a worker yet.
I think its by design that running something in parallel assumes that order is not important, at least in Matlab. Each thread/worker should be independent of each other. However, as indicated in this question, you could try job and task control interface to give you some level of control.
Firstly, in practice, PARFOR isn't "entirely random" - you can easily observe that it sends out chunks of loop iterates in reverse order. In R2013b and later, if you need more control over ordering (if, for example, you know that certain of your independent things are likely to take a long time, and therefore wish to start computing them first), you can use PARFEVAL.
If you need to loosely synchronize things, for instance wait until some thread as finished or has reach some point before to start another one, best should be to use semaphores, locks, mutex, etc...
I don't know if 'Parallel toolbox' includes such synchronization objects, but here is some workaround to create semaphore for instance:
https://stackoverflow.com/a/22874669/684399
You can also use objects in 'System.Threading' namespace (requires .NET):
Init:
someResultAvailable = System.Threading.ManualResetEvent(false);
In some job:
... do work ...
someResultAvailable .Set();
... continue ...
In another one:
... do work ...
if (!someResultAvailable.WaitOne(10000))
{
error('Timeout waiting for result from other thread');
}
... continue now knowing that results are available ...
I am writing an application similar to what was suggested here. Essentially, I am using Perl to manage the execution of multiple CPU intensive processes in parallel via fork and wait. However, I am running on a 4-core machine, and I have many more processes, all with very dissimilar expected run-times which aren't known a priori.
Ultimately, it would take more effort to estimate the run times and gang them appropriately, than to simply utilize a queue system for each core. Ultimately I want each core to be processing, with as little downtime as possible, until everything is done. Is there a preferred algorithm or mechanism for doing this? I would assume this is a common problem/use so I don't want to re-invent the wheel, as my wheel will probably be inferior to the 'right way. '
As a minor aside, I would prefer to not have to import additional modules (like Parallel::ForkManager) to accomplish this, but if that is the best way to go, then I will consider it.
~Thanks!
EDIT: Fixed 'here' link: Thanks to ikegami
EDIT: P::FM is too easy to use, not to... Today I Learned.
Forks::Super has some features that are good for this sort of task.
extended syntax, but not a lot of new syntax: if you already have a program with fork and wait calls, you can still use the features of Forks::Super without too many changes. That is, your new code will still have fork and wait calls.
job throttling: like Parallel::ForkManager, you can control how many jobs you run simultaneously. When one job completes, the module can start another one, keeping your system fully utilized. You can also specify more complex logic like "run at most 6 background jobs on the weekends or between midnight and 6:00 am, but 2 background jobs the rest of the time"
timing utilities: Forks::Super keeps track of the start time and end time of every job, letting you log and analyze how long each job took:
fork { cmd => "some command" };
...
$pid = wait;
$elapsed = $pid->{end} - $pid->{start};
print LOG "That job took ${elapsed}s\n";
CPU affinity control: I can't tell whether this is something you need, but Guarav seemed to think it mattered. You can assign background jobs to specific cores
# restrict job to cores #0 and #2
$job = fork { sub => \&background_process, args => \#args,
cpu_affinity => 0x05 };
Is it possible in matlab to call a function when the program I'm running is idle? I don't want this to be a parallel process. Also, I would prefer a solution where I could pause and resume the function when the main program has to run again. Kind of like an interrupt in embedded systems, in my case the main program is the interrupt.
how would I do this?
you could use a timer object to start / stop the second function. see the matlab documentation. See also this mathworks blog entry.