Manually Stop Jupyter Kernel and Prevent from Restarting - jupyter

Background
I have created a Jupyter kernel A from which I launch another kernel B. I am doing this in order to audit the execution of kernel B. So when a user selects kernel A from the interface, kernel B is launched in the background which then executes the notebook code. strace is being used to audit the execution. After the audit phase, code, data, and provenance etc. of the program execution are recorded and stored for analysis later on.
Problem
After the notebook program ends, I intend to stop tracing the execution of kernel B. This does not happen unless I stop the execution of kernel B launched internally by kernel A. The only way I have been able to do this is using the kill command as such:
os.kill(os.getpid(), 9)
This does the job but with a side-effect: Jupyter restarts the kernel automatically which means kernel A and B are launched and start auditing the execution again. This causes certain race conditions and overwrites of some files which I want to avoid.
Possible Solution
To my mind, there are two things I can do to resolve this issue:
Exit the kernel B program gracefully so the auditing of the notebook code gets completed and stored. This does not happen with the kill command so would need some other solution
Avoid automatic restart of the kernel, with or without the kill command.
I have looked into different ways to achieve the above two but have not been successful yet. Any advice on achieving either of the above two solutions would be appreciated, or perhaps another way of solving the problem.

have you tried terminating kernel B instead of killing it
using 15 instead of 9
os.kill(os.getpid(), signal.SIGTERM)
#or
os.kill(os.getpid(), 15)
other values for kill are signal.SIGSTOP=23 , signal.SIGHUP=1
Other option could be to insert following code snippet on top of the code
import signal
import sys
def onSigExit():
sys.exit(0)
#os._exit(0)
signal.signal(signal.SIGINT, onSigExit)
signal.signal(signal.SIGTERM, onSigExit)
now you should be able to send
os.kill(os.getpid(),15)
and it should exit gracefully and not restart

Related

When using pycaffe to run solver.solve(), only one iteration is executed, then the current process is killed

I am using pycaffe to do a multilabel classification task. When I run solver.slove() or solver.step(2), only one iteration is executed, then the current process is killed somehow. ipython console says the kernel died unexpectedly. No other error information is provided.
Then, I use terminal to run the command "python Test.py", and get the "Floating point exception (core dumped)" information.
Besides, the net.forward() and net.backward() methods are all ok.
What is the reason? And how to solve the problem?

Matlab/Simulink: run batch of simulations in parallel?

I have to run a series of simulations and save the results. Since by default Matlab only uses one core, I wonder if it is possible to open multiple worker tasks and assign different simulation runs to them?
You could run each simulation in a separate MATLAB instance and let the OS handle the process to core assignment.
One master MATLAB could synchronize each child instances checking for example if simulation results file are existing.
I aso have the same problem but I did not manage to really understand how to make it in MatLab. The documentation in matlab is too advanced to get to know how to make it.
Since I am working with Ubuntu I find a way to do the work calling the unix command from MatLab and using the parallel GNU command
So I mange to run my simulation in parallel with 4 cores.
unix('parallel --progress -j4 flow > /dev/null :::: Pool.txt','-echo')
you can find more info in the link
Shell, run four processes parallel
Details of the syntaxis can be found at https://www.gnu.org/software/parallel/
but breifly I can tell you
--progress shows a status of the progress
-j4 tells the amount or jobs in parallel you want to have
flow is the name of my simulator
/dev/null was just to avoid the screen run output of the simulator to show up
Pool.txt is a file I made with the required simulator input that is basically the path and the main simulator file.
echo I do not remember now what was it for :D

The client lost connection to lab X Matlab

I need help on how to tackle the Matlab error below. After a couple of successful runs I got the error message below in Matlab using parfor.
Opened 2 pools. Send function1 to worker1 and send function2 to worker2. Both functions does some sort of calcs on matrices and generate CSV at the end. It was fine until after a few runs.
The session that parfor is using has shut down
The client lost connection to lab 2. This might be due to network
problems, or the interactive matlabpool job might have errored.
We're using VM machine with a processor Intel Xeon X7560 #2.27GHz (4 processors). The RAM is 16GB. 64-bit OS.
This is part of a batch run. To resolve the issue instead of re-using the pools for every batch iteration. Make sure to "close" it. Then open fresh Matlab pools for every iteration. Seems to be more stable now, although a lot slower than the previous implementation.

Matlab process termination in slurm

I have two questions that to me seem related:
First, is it necessary to explicitly terminate Matlab in my sbatch command? I have looked through several online slurm tutorials, and in some cases the authors include an exit command:
http://www.umbc.edu/hpcf/resources-tara-2013/how-to-run-matlab.html
And in some they don't:
http://www.buffalo.edu/ccr/support/software-resources/compilers-programming-languages/matlab/PCT.html
Second, when creating a parallel pool in a job, I almost always get the following warning:
Warning: Found 4 pre-existing communicating job(s) created by pool that are
running, and 2 communicating job(s) that are pending or queued. You can use
'delete(myCluster.Jobs)' to remove all jobs created with profile local. To
create 'myCluster' use 'myCluster = parcluster('local')'
Why is this happening, and is there any way to avoid it happening to myself and to others because of me?
It depends on how you launch Matlab. Note that your two examples use distinct methods for running a matlab script; the first one uses the -r option
matlab -nodisplay -r "matrixmultiply, exit"
while the second one uses stdin redirection from a file
matlab < runjob.m
In the first solution, the Matlab process will be left running after the script is finished, that is why the exit command is needed there. In the second solution, the Matlab process is terminated as stdin closes when the end of the file is reached.
If you do not end the matlab process, Slurm will kill it when the maximum allocation time is reached, as defined by the --time option in you submission script or by the default cluster (or partition) value.
To avoid the warning you mention, make sure to systematically use matlabpool close at the end of your job. If you have several instances of Matlab running on the same node, and you have a shared home directory, you will probably get the warning anyhow, as I believe the information about open matlab pools is stored in a hidden folder in your home. Rebooting will probably not help, but finding those files and removing them will (be careful though and ask the system administrator).
to avoid your warning, you have to delete
.matlab/local_cluster_jobs/
directory

Problems using batch with matlabpool

I want to use some parallel features in Matlab.
And execute following command.
matlabpool open local 12;
batch(funcname,1,{arg},'PathDependencies',p,'Matlabpool',1);
Then all processes keep silent for the rest of time...
But without opening matlabpool. It would finish normally.
Is there any conflicts between the use of matlabpool and batch?
The matlabpool command runs a parallel job on the local scheduler to give you workers on which to run the body of your parfor loops and spmd blocks. This means that while matlabpool is open, the number of workers available to the local scheduler is reduced. Then, when you try to run a batch job, it can only run when there are workers free.
You can find out how many running jobs you have on your local scheduler either using the "job monitor" from the "Parallel" desktop menu item (your matlabpool session would show up there as a job in state running with 12 tasks), or by executing the following piece of code:
s = findResource( 'scheduler', 'Type', 'local' );
[pending, queued, running, finished] = findJob(s);
running
if you want to batch and parfor at the same time, open one less worker with matlabpool than you otherwise would. so 11 in your case. if you batch first and then matlabpool, it will automatically do this, but not vice versa.
to see the queue:
c=parcluster
c.Jobs
interestingly, if you open up a second matlab instance, you can get another 12 workers. but strangely not with a third. makes sense though i guess as if you actually use them all it will thrash.