GNU parallel --jobs option using multiple nodes on cluster with multiple cpus per node - hpc

I am using gnu parallel to launch code on a high performance (HPC) computing cluster that has 2 CPUs per node. The cluster uses TORQUE portable batch system (PBS). My question is to clarify how the --jobs option for GNU parallel works in this scenario.
When I run a PBS script calling GNU parallel without the --jobs option, like this:
#PBS -lnodes=2:ppn=2
...
parallel --env $PBS_O_WORKDIR --sshloginfile $PBS_NODEFILE \
matlab -nodiplay -r "\"cd $PBS_O_WORKDIR,primes1({})\"" ::: 10 20 30 40
it looks like it only uses one CPU per core, and also provides the following error stream:
bash: parallel: command not found
parallel: Warning: Could not figure out number of cpus on galles087 (). Using 1.
bash: parallel: command not found
parallel: Warning: Could not figure out number of cpus on galles108 (). Using 1.
This looks like one error for each node. I don't understand the first part (bash: parallel: command not found), but the second part tells me it's using one node.
When I add the option -j2 to the parallel call, the errors go away, and I think that it's using two CPUs per node. I am still a newbie to HPC, so my way of checking this is to output date-time stamps from my code (the dummy matlab code takes 10's of seconds to complete). My questions are:
Am I using the --jobs option correctly? Is it correct to specify -j2 because I have 2 CPUs per node? Or should I be using -jN where N is the total number of CPUs (number of nodes multiplied by number of CPUs per node)?
It appears that GNU parallel attempts to determine the number of CPUs per node on it's own. Is there a way that I can make this work properly?
Is there any meaning to the bash: parallel: command not found message?

Yes: -j is the number of jobs per node.
Yes: Install 'parallel' in your $PATH on the remote hosts.
Yes: It is a consequence from parallel missing from the $PATH.
GNU Parallel logs into the remote machine; tries to determine the number of cores (using parallel --number-of-cores) which fails and then defaults to 1 CPU core per host. By giving -j2 GNU Parallel will not try to determine the number of cores.
Did you know that you can also give the number of cores in the --sshlogin as: 4/myserver ? This is useful if you have a mix of machines with different number of cores.

This is not an answer to the 3 primary questions, but I'd like to point out some other problems with the parallel statement in the first code block.
parallel --env $PBS_O_WORKDIR --sshloginfile $PBS_NODEFILE \
matlab -nodiplay -r "\"cd $PBS_O_WORKDIR,primes1({})\"" ::: 10 20 30 40
The shell expands the $PBS_O_WORKDIR prior to executing parallel. This means two things happen (1) the --env sees a filename rather than an environment variable name and essentially does nothing and (2) expands as part command string eliminating the need to pass $PBS_O_WORKDIR which is why there wasn't an error.
The latest version of parallel 20151022 has a workdir option (although the tutorial lists it as alpha testing) which is probably the easiest solution. The parallel command line would look something like:
parallel --workdir $PBS_O_WORKDIR --sshloginfile $PBS_NODEFILE \
matlab -nodisplay -r "primes1({})" :::: 10 20 30 40
Final note, PBS_NODEFILE may contain hosts listed multiple times if more than one processor is requested by qsub. This many have implications for number of jobs run, etc.

Related

Can Linux provide an execve user-space callback?

In the past I've employed inotify for logging and as well as system functions. Now I'm in a position where I need to know each time an executable has been called, and a complete set of command line arguments passed in.
Short of setting up an auditd rule, is there any method to trigger on a particular executable being called, and return its command line arguments from user-space? I know the audit daemon can do this, so perhaps that's where I should look.
Monitoring process creation and termination events is a useful skill to have in you toolbox. This article consists of two parts. The first introduces exiting tools for diffrent platforms. The second explains how these tools work internally. 1
1 describes many tools, one tool named forkstat which uses the netlink2 and source code
Here are commands I used:
git clone https://github.com/ColinIanKing/forkstat.git
cd forkstat
make
sudo ./forkstat
In a separate ssh session I ran an ls command and observed this output:
Time Event PID Info Duration Process
09:43:49 fork 10362 parent -bash
09:43:49 fork 10433 child -bash
09:43:49 exec 10433 ls --color=auto
09:43:49 exit 10433 0 0.004s ls --color=auto

How to make handbrake use the cpu with less intensity?

I've recently began using HandBrake to process some videos I downloaded to make them lighter. I built a small python GUI program to automate the processing, making use of the CLI version. What I am doing is generating the command according to the video and executing it with os.system. Something like this:
import os
def process(args):
#some algorithm to generate cmd using args
cmd = "handbrakecli -i raw_video.mp4 -o video.mp4 -O -e x264" #example command
os.system(cmd)
os.remove("raw_video.mp4")
The code works perfectly, but the problem is the overuse of my CPU. Usually, this takes 100% of CPU usage during considerable amount of time. I use the program CoreTemp to keep track of my processor temperature and, usually, it hits 78 °C.
I tried using BES (Battle Encoder Shirase) by saving the cmd command into a batch file called exec.bat and doing os.system("BES_1.7.7\BES.exe -J -m exec.exe 20"), but this simply does nothing.
Speed isn't important at all. Even if it takes longer, I just want to use less of my CPU, something around 50% would be great. Any idea on how I could do so?
In Handbrake you can pass advanced parameters so you only use a certain amount of CPU threads.
You can use threads, view the Handbrake CLI Documentation
When using threads you can specify any number of CPU processors to use. The default is auto.
The -x parameter stands for Advanced settings in the GUI of Handbrake, that is where threads will go.
The below tells Handbrake to only use one CPU thread for the Advanced setting:
-x threads=1
You can also use the veryslow for the --encoder-preset setting to help the CPU load.
--encoder-preset=veryslow
I actually prefer using the --encoder-preset=veryslow preset since I see an overall better quality in the encode.
And both together:
--encoder-preset=veryslow -x threads=1
So formatted with your cmd variable:
cmd = "handbrakecli -i raw_video.mp4 -o video.mp4 -O -e x264 --encoder-preset=veryslow -x threads=1" #example command
See if that helps.
One easy way in Linux is to use taskset. You can use the terminal or make a custom shortcut/command.
For example, my CPU has 8 threads but I only want to use 6 for Handbrake.
Just start the program with taskset -c 2,3,4,5,6,7 handbrake, this way the threads 0 and 1 will be free to another task/process and the program will run on threads 2,3,4,5,6,7.
In Windows you can change the Target of the shortcut or use on cmd:
C:\Windows\System32\cmd.exe /C start "" /affinity FC "C:\Program Files\HandBrake\HandBrake.exe""
As far as I understand It reads the value backwards for each four bits, it means the first four bits in Hexadecimal are for threads 7-4 (1111) and the second four bits in Hexadecimal are for threads 3-0 (1100). In my case I have a 8 threads CPU and leaving free theads 1 and 0 (see image below).

Automating Month End Process

Whether month end process can be automated in progress bases applications like nessie? I already searched for it and I think maybe it can done by scheduling it through background jobs.
Scheduling jobs is a function of the OS or of 3rd party applications that specialize in such things (generally used in large enterprises with IT groups that obsess over that kind of stuff).
If you are using UNIX then you want to look into "cron".
If you are using Windows then "scheduled tasks".
In any event you will need to create a "wrapper" script that properly sets the background job environment and launches a Progress session. If you are using Windows you should be aware that a batch process is "headless" and that unless your batch process is doing something very strange it will not be using GUI components -- so you should probably run _progres.exe rather than prowin32.exe.
A generic (UNIX) example:
#!/bin/sh
#
DLC=/usr/dlc
PATH=$DLC/bin:$PATH
export DLC PATH
_progres -b -db /path/dbname -p batchjob.p > logfile 2>&1 &
(That is "_progres" with just 1 "s" -- this is from the days when file names were restricted to 8 characters on some operating systems.)
Windows is very similar:
# echo off
set DLC=c:\progress
set PATH=%DLC%\bin;%PATH%
_progres.exe -b -db \path\dbname -p batchjob.p > logfile 2>&1
But there are a lot of "gotchyas" with Windows. If, for instance, you run a job using a login-id that might actually login then you will have the problem that on logout all the scheduled tasks will be "helpfully" killed by the OS. Aside from stopping your job when you probably don't want it to this may have other negative side effects like crashing the db. To get around that problem on Windows you either create a "service account" that never logs in or use 3rd party scheduler that runs jobs "as a service".

how do i find a complete list of available torque pbs queues?

Q: How do I find the available PBS queues on the "typical" Torque MPI system?
(asking our admin takes 24+ hours, and the system changes with constant migration)
(for example, "Std8" is one possible queue)
#PBS -q Std8
The admin finally got back. To get a list of queues on our hpc system, the command is:
$ qstat -q
qstat -f -Q
shows available queues and details about the limits (cputime, walltime, associated nodes etc.)
How about simply "pbsnodes" - that should probably tell you more than you care to know. Or I suppose "qstat -Q".
Run
qhost -q
to see the node-queue mapping.
Another option:
qmgr -c 'p q'
The p and q are for print queues.

Stress testing a command-line application

I have a command line perl script that I want to stress test. Basically what I want to do is to run multiple instances of the same script in parallel so that I can figure out at what point our machine becomes unresponsive.
Currently I am doing something like this:
$ prog > output1.txt 2>err1.txt & \
prog > output2.txt 2>err2.txt &
.
.
.
.
and then I am checking ps to see which instances finished and which didn't. Is there any open-source application available that can automated this process? Preferably with a web-interface?
You can use xargs to run commands in parallel:
seq 1 100 | xargs -n 1 -P 0 -I{} sh -c 'prog > output{}.txt 2>err{}.txt'
This will run 100 instances in parallel.
For a better testing framework (including parallel testing via 'spawn') take a look at Expect.
Why not use the crontab or Scheduled Tasks to automatically run the script?
You could write something to automatically parse the output easily.
With GNU Parallel this will run one prog per CPU core:
seq 1 1000 | parallel prog \> output{}.txt 2\>err{}.txt
If you wan to run 10 progs per CPU core do:
seq 1 1000 | parallel -j1000% prog \> output{}.txt 2\>err{}.txt
Watch the intro video to learn more: http://www.youtube.com/watch?v=OpaiGYxkSuQ