I am running matlab in my HPC server from terminal. My program generates video output. The program runs fine, but produces output with a banner like below:
How to avoid this banner?
Updating with a minimal reproducible script etc:
My slurm script in HPC:
#!/bin/sh
### sbatch config parameters must start with #SBATCH and must precede any other command. to ignore just add another # - like ##SBATCH
#SBATCH --partition main ### specify partition name where to run a job. Any node: ‘main’; NVidia 2080: ‘rtx2080’; 1080: ‘gtx1080’
#SBATCH --time 0-10:30:00 ### limit the time of job running. Make sure it is not greater than the partition time limit (7 days)!! Format: D-H:MM:SS
#SBATCH --job-name my_job ### name of the job. replace my_job with your desired job name
#SBATCH --output my_job-id-%J.out ### output log for running job - %J is the job number variable
#SBATCH --mail-user=xxxxxxxxx#gmail.com ### user’s email for sending job status notifications
#SBATCH --mail-type=BEGIN,END,FAIL ### conditions for sending the email. ALL,BEGIN,END,FAIL, REQUEU, NONE
#SBATCH --gres=gpu:1 ### number of GPUs (can't exceed 8 gpus for now) allocating more than 1 requires the IT team permission
### Print some data to output file ###
echo "SLURM_JOBID"=$SLURM_JOBID
echo "SLURM_JOB_NODELIST"=$SLURM_JOB_NODELIST
matlab -nodisplay -nosplash -nodesktop -r "run('./mat2video.m');exit;"
My matlab script in HPC:
data=rand(100,768,1024);
x=size(data);
v = VideoWriter('video.avi');
open(v);
for i=1:x(1)
d=reshape(data(i,:,:),[x(2) x(3)]);
imshow(d);
frame = getframe(gcf);
writeVideo(v,frame);
end
close(v);
I submit the job via slurm sbatch command. The version of matlab is 2021A.
Related
So essentially this is a question about how to use your multi-core processor more efficiently.
I have an optimization script (written in matlab) that would call 20 instances of matlab to evaluate functions. The results will be saved as .mat file and then the optimization script would take these results and do some other work. The way I call 20 matlab instances is first using matlab built-in function "system" to call a batch file, which would then open 20 instances of matlab to evaluate the functions.
The code I'm using in batch file is:
( start matlab -nosplash -nodesktop -minimize -r "Worker01;exit"
ping -n 5 127.0.0.1 >nul
start matlab -nosplash -nodesktop -minimize -r "Worker02;exit"
ping -n 5 127.0.0.1 >nul
...... % repeat the pattern
start matlab -nosplash -nodesktop -minimize -r "Worker19;exit"
ping -n 5 127.0.0.1 >nul
start matlab -nosplash -nodesktop -minimize -r "Worker20;exit"
ping -n 5 127.0.0.1 >nul ) | set /P "="
All "start" commands are included in a parenthesis following by command
"| set/P"=""
because I want my optimization script move on after all 20 evaluations done. I learnt this technique from my another question but I don't really understand what it really does. If you can also explain this I would be very appreciated.
Anyway, this is a way to achieve parallel computing under matlab 2007 which doesn't have original parallel computing feature. However, I found that it's not an efficient way to run 20 instances at the same time because after opening like 12 instances, my cpu (a xeon server cpu, 14 cores available) usage reach 100%. My theory is that opening more instance than cpu could handle would make processor less efficient. So I think the best strategy would be like this:
start the first 12 instances;
start next one(s) on the list once any of current running instance finishes. (Even though workers are opened at roughly the same time and do the same job, they still tend to finish at different times.)
This would make sure that computing power is fully utilized (cpu usage always 100%) all the time without overstressing the cpu.
How could I achieve this strategy in batch file? If batch file is hard to achieve this, could powershell do this?
Please show the actual code and explain. I'm not a programmer so I don't know much of the coding.
Thanks.
I'm thinking this in powershell...
<#
keep a queue of all jobs to be executed
keep a list of running jobs
number of running jobs cannot exceed the throttle value
#>
$throttle = 12
$queue = New-Object System.Collections.Queue
$running = New-Object System.Collections.Generic.List[System.Diagnostics.Process]
# generate x number of queue commands
# runs from 1 to x
1..20 | ForEach-Object {
# the variable $_ contains the current number
$program = "matlab"
$args = "-nosplash -nodesktop -minimize -r `"Worker$_;exit`""
# args will be
# -nosplash -nodesktop -minimize -r "Worker1;exit"
# -nosplash -nodesktop -minimize -r "Worker2;exit"
# etc
# save it
$queue.Enqueue(#($program, $args))
}
# begin executing jobs
while($queue.Count) {
# remove jobs that are done
$running.Where({ $_.HasExited }) |
ForEach-Object { [void]$running.Remove($_) }
if($running.Count -ge $throttle) {
# busy, so wait
Start-Sleep -Milliseconds 50
}
else {
# ready for new job
$cmd = $queue.Dequeue()
[void]$running.Add([System.Diagnostics.Process]::Start($cmd[0], $cmd[1]))
}
}
# wait for rest to be done
while($running.Where({ !$_.HasExited }).Count) {
Start-Sleep -Milliseconds 50
}
When executing a job on LSF you can specify the working directory and create a output directory, i.e
bsub -cwd /home/workDir -outdir /home/$J program inputfile
where it will look for inputfile in the specified working directory. The -outdir will create a new directory based on the JobId.
What I'm wondering is how you pipe the results created from the run in the working directory to the newly created output dir.
You can't add a command like
mv * /home/%J
as the underlying OS has no understanding of the %J identifier. Is there an option in LSF for piping the data inside the job, where it knows the jobId?
You can use the environment variable $LSB_JOBID.
mv * /data/${LSB_JOBID}/
If you copy the data inside your job script then it will hold the compute resource during the data copy. If you're copying a small amount of data then its not a problem. But if its a large amount of data you can use bsub -f so that other jobs can start while the data copy is ongoing.
bsub -outdir "/data/%J" -f "/data/%J/final < bigfile" sh script.sh
bigfile is the file that your job creates on the compute host. It will be copied to /data/%J/final after the job finishes. It even works on a non-shared filesystem.
I have a program that performs some operations on a specified log file, flushing to the disk multiple times for each execution. I'm calling this program from a perl script, which lets you specify a directory, and I run this program on all files within the directory. This can take a long time because of all the flushes.
I'd like to execute the program and run it in the background, but I don't want the pipeline to be thousands of executions long. This is a snippet:
my $command = "program $data >> $log";
myExecute("$command");
myExecute basically runs the command using system(), along with some other logging/printing functions. What I want to do is:
my $command = "program $data & >> $log";
This will obviously create a large pipeline. Is there any way to limit how many background executions are present at a time (preferably using &)? (I'd like to try 2-4).
#!/bin/bash
#
# lets call this script "multi_script.sh"
#
#wait until there are less then 4 instances running
#polling with interval 5 seconds
while [ $( pgrep -c program ) -gt 4 ]; do sleep 5; done
/path/to/program "$1" &
Now call it like this:
my $command = "multi_script.sh $data" >> $log;
Your perl script will wait if the bash script waits.
positives:
If a process crashes it will be replaced (the data goes, of course, unprocessed)
Drawbacks:
It is important for your perl script to wait a moment between starting instances
(maybe a sleep period of a second)
because of the latency between invoking the script and passing the while loop test. If you spawn them too quickly (system spamming) you will end up with much more processes than you bargained for.
If you are able to change
my $command = "program $data & >> $log";
into
my $command = "cat $data >>/path/to/datafile";
(or even better: append $data to /path/to/datafile directly from perl )
And when your script is finished that the last line will be:
System("/path/to/quadslotscript.sh");
then I have the script quadslotscript.sh here:
4 execution slots are started and stay until the end
all slots get input from the same datafile
when a slot is ready with processing it will read a new entry to process
until the datafile/queue is empty
no processtable lookup during execution, only when all work is done.
the code:
#!/bin/bash
#use the datafile as a queue where all processes get their input
exec 3< "/path/to/datafile"
#4 seperate processes
while read -u 3 DATA; do "/path/to/program $DATA" >>$log; done &
while read -u 3 DATA; do "/path/to/program $DATA" >>$log; done &
while read -u 3 DATA; do "/path/to/program $DATA" >>$log; done &
while read -u 3 DATA; do "/path/to/program $DATA" >>$log; done &
#only exit when 100% sure that all processes ended
while pgrep "program" &>"/dev/null" ; do wait ; done
I used to use a server with LSF but now I just transitioned to one with SLURM.
What is the equivalent command of bpeek (for LSF) in SLURM?
bpeek
bpeek Displays the stdout and stderr output of an unfinished job
I couldn't find the documentation anywhere. If you have some good references for SLURM, please let me know as well. Thanks!
You might also want to have a look at the sattach command.
I just learned that in SLURM there is no need to do bpeek to check the current standard output and standard error since they are printed in running time to the files specified for the stdout and stderr.
Here's a workaround that I use. It mimics the bpeek functionality from LSF
Create a file bpeek.sh:
#!/bin/bash
# take as input an argument - slurm job id - and save it into a variable
jobid=$1
# run scontrol show job $jobid and save the output into a variable
#find the string that starts with StdOut= and save it into a variable without the StdOut= part
stdout=$(scontrol show job $jobid | grep StdOut= | sed 's/StdOut=//')
#show last 10 rows of the file if no argument 2 is given
nrows=${2:-10}
tail -f -n $nrows $stdout
Then you can use it:
sh bpeek.sh JOBID NROWS(optional)
Or add an alias to ~/.bashrc file:
alias bpeek="sh ~/bpeek.sh $1 $2"
and then use it:
bpeek JOBID NROWS(optional)
Here is a part of my .conf file.
env SERVICE_ROOT="/data/service_root"
env LOG_DIR="$SERVICE_ROOT/logs"
and I checked all variables with following..
echo "\n`env`" >> /tmp/listener.log 2>&1
I expect that $LOG_DIR is "/data/service_root/logs" but what I got is..
SERVICE_ROOT=/data/service_root
LOG_DIR=$SERVICE_ROOT/logs
Did I miss something?
Defined environment variable is not accessible to the Job Configuration File itself.
Upstart allows you to set environment variables which will be accessible to the jobs whose job configuration files they are defined in.
As explained in 8.2 Environment Variables:
Note that a Job Configuration File does not have access to a user's environment variables, not even the superuser. This is not possible since all job processes created are children of init which does not have a user's environment.
Defined variable $SERVICE_ROOT is accessible to defined job.
# /etc/init/test.conf
env SERVICE_ROOT="/data/service_root"
script
export LOG_DIR="$SERVICE_ROOT/logs"
# prints "LOG_DIR='/data/service_root/logs'" to system log
logger -t $0 "LOG_DIR='$LOG_DIR'"
exec /home/vagrant/test.sh >> /tmp/test.log
end script
Variable $LOG_DIR exported in script block is available for processes called within the same block.
#!/bin/bash -e
# /home/vagrant/test.sh
echo "running test.sh"
echo "\n`env`" | grep 'LOG_DIR\|SERVICE_ROOT'
After running sudo start test content of /tmp/test.log will be:
running test.sh
SERVICE_ROOT=/data/service_root
LOG_DIR=/data/service_root/logs
In syslog you will find:
Jul 16 01:39:39 vagrant-ubuntu-raring-64 /proc/self/fd/9: LOG_DIR='/data/service_root/logs'