How to make handbrake use the cpu with less intensity? - handbrake

I've recently began using HandBrake to process some videos I downloaded to make them lighter. I built a small python GUI program to automate the processing, making use of the CLI version. What I am doing is generating the command according to the video and executing it with os.system. Something like this:
import os
def process(args):
#some algorithm to generate cmd using args
cmd = "handbrakecli -i raw_video.mp4 -o video.mp4 -O -e x264" #example command
os.system(cmd)
os.remove("raw_video.mp4")
The code works perfectly, but the problem is the overuse of my CPU. Usually, this takes 100% of CPU usage during considerable amount of time. I use the program CoreTemp to keep track of my processor temperature and, usually, it hits 78 °C.
I tried using BES (Battle Encoder Shirase) by saving the cmd command into a batch file called exec.bat and doing os.system("BES_1.7.7\BES.exe -J -m exec.exe 20"), but this simply does nothing.
Speed isn't important at all. Even if it takes longer, I just want to use less of my CPU, something around 50% would be great. Any idea on how I could do so?

In Handbrake you can pass advanced parameters so you only use a certain amount of CPU threads.
You can use threads, view the Handbrake CLI Documentation
When using threads you can specify any number of CPU processors to use. The default is auto.
The -x parameter stands for Advanced settings in the GUI of Handbrake, that is where threads will go.
The below tells Handbrake to only use one CPU thread for the Advanced setting:
-x threads=1
You can also use the veryslow for the --encoder-preset setting to help the CPU load.
--encoder-preset=veryslow
I actually prefer using the --encoder-preset=veryslow preset since I see an overall better quality in the encode.
And both together:
--encoder-preset=veryslow -x threads=1
So formatted with your cmd variable:
cmd = "handbrakecli -i raw_video.mp4 -o video.mp4 -O -e x264 --encoder-preset=veryslow -x threads=1" #example command
See if that helps.

One easy way in Linux is to use taskset. You can use the terminal or make a custom shortcut/command.
For example, my CPU has 8 threads but I only want to use 6 for Handbrake.
Just start the program with taskset -c 2,3,4,5,6,7 handbrake, this way the threads 0 and 1 will be free to another task/process and the program will run on threads 2,3,4,5,6,7.
In Windows you can change the Target of the shortcut or use on cmd:
C:\Windows\System32\cmd.exe /C start "" /affinity FC "C:\Program Files\HandBrake\HandBrake.exe""
As far as I understand It reads the value backwards for each four bits, it means the first four bits in Hexadecimal are for threads 7-4 (1111) and the second four bits in Hexadecimal are for threads 3-0 (1100). In my case I have a 8 threads CPU and leaving free theads 1 and 0 (see image below).

Related

Python 3 Popen calling rrdtool hangs indefinitely

I am trying to use Python's Popen() to retrieve graph data from a multiple rrd files. Due to complexity of app where the following piece of code is utilised, I rely on rrdtool graph parameter -Z for handling missing files for me:
#!/bin/python3
import subprocess
cmd = '/opt/rrdtool/bin/rrdtool graph - -a JSONTIME -Z --width 924 --start 1486428000 --end 1486471200 DEF:foo1=ch1.rrd:flows:MAX DEF:foo2=ch2.rrd:flows:MAX AREA:foo1#000:"ch1" AREA:foo2#606060:"ch2":STACK'
path = '/data/live/pokus/rrd/channels/'
p = subprocess.Popen(cmd, stdout=subprocess.PIPE, cwd=path, shell=True)
p.wait()
if p.returncode is not 0:
print("Error")
else:
print(p.stdout.read().decode(encoding="utf-8"))
Following code works as expected when both files ch1.rrd and ch2.rrd are present. When one of them is missing, whole thing hangs indefinitely until I kill the rrdtool process manually from htop. Then python detects nonzero return code and reports error.
Using shell=False and shlex.split() on cmd does not help.
When I execute the same command from bash, even with the missing files rrdtool does the job as expected.
Unfortunately I can't use rrdtool bindings for python and also I am stuck on python 3.4.5. rrdtool version is 1.6.0.
I am glab for any idea how to overcome this. I would prefer solution that does not include testing whether files exist from python and keeps the -Z parameter in rrdtool command. Also using timeout on p.wait() isn't a viable solution.
Thanks in advance
Ok, I found the solution.
The reason why Python (namely p.wait) hanged was because rrdtool did not know the minimum step size (parameter -S) resulting in step size of two seconds. This way, output of the rrdtool was able to fill the OS buffers and that deadlocked p.wait. According to Python docs, Popen.communicate should be a way to go.

Automating Month End Process

Whether month end process can be automated in progress bases applications like nessie? I already searched for it and I think maybe it can done by scheduling it through background jobs.
Scheduling jobs is a function of the OS or of 3rd party applications that specialize in such things (generally used in large enterprises with IT groups that obsess over that kind of stuff).
If you are using UNIX then you want to look into "cron".
If you are using Windows then "scheduled tasks".
In any event you will need to create a "wrapper" script that properly sets the background job environment and launches a Progress session. If you are using Windows you should be aware that a batch process is "headless" and that unless your batch process is doing something very strange it will not be using GUI components -- so you should probably run _progres.exe rather than prowin32.exe.
A generic (UNIX) example:
#!/bin/sh
#
DLC=/usr/dlc
PATH=$DLC/bin:$PATH
export DLC PATH
_progres -b -db /path/dbname -p batchjob.p > logfile 2>&1 &
(That is "_progres" with just 1 "s" -- this is from the days when file names were restricted to 8 characters on some operating systems.)
Windows is very similar:
# echo off
set DLC=c:\progress
set PATH=%DLC%\bin;%PATH%
_progres.exe -b -db \path\dbname -p batchjob.p > logfile 2>&1
But there are a lot of "gotchyas" with Windows. If, for instance, you run a job using a login-id that might actually login then you will have the problem that on logout all the scheduled tasks will be "helpfully" killed by the OS. Aside from stopping your job when you probably don't want it to this may have other negative side effects like crashing the db. To get around that problem on Windows you either create a "service account" that never logs in or use 3rd party scheduler that runs jobs "as a service".

GNU parallel --jobs option using multiple nodes on cluster with multiple cpus per node

I am using gnu parallel to launch code on a high performance (HPC) computing cluster that has 2 CPUs per node. The cluster uses TORQUE portable batch system (PBS). My question is to clarify how the --jobs option for GNU parallel works in this scenario.
When I run a PBS script calling GNU parallel without the --jobs option, like this:
#PBS -lnodes=2:ppn=2
...
parallel --env $PBS_O_WORKDIR --sshloginfile $PBS_NODEFILE \
matlab -nodiplay -r "\"cd $PBS_O_WORKDIR,primes1({})\"" ::: 10 20 30 40
it looks like it only uses one CPU per core, and also provides the following error stream:
bash: parallel: command not found
parallel: Warning: Could not figure out number of cpus on galles087 (). Using 1.
bash: parallel: command not found
parallel: Warning: Could not figure out number of cpus on galles108 (). Using 1.
This looks like one error for each node. I don't understand the first part (bash: parallel: command not found), but the second part tells me it's using one node.
When I add the option -j2 to the parallel call, the errors go away, and I think that it's using two CPUs per node. I am still a newbie to HPC, so my way of checking this is to output date-time stamps from my code (the dummy matlab code takes 10's of seconds to complete). My questions are:
Am I using the --jobs option correctly? Is it correct to specify -j2 because I have 2 CPUs per node? Or should I be using -jN where N is the total number of CPUs (number of nodes multiplied by number of CPUs per node)?
It appears that GNU parallel attempts to determine the number of CPUs per node on it's own. Is there a way that I can make this work properly?
Is there any meaning to the bash: parallel: command not found message?
Yes: -j is the number of jobs per node.
Yes: Install 'parallel' in your $PATH on the remote hosts.
Yes: It is a consequence from parallel missing from the $PATH.
GNU Parallel logs into the remote machine; tries to determine the number of cores (using parallel --number-of-cores) which fails and then defaults to 1 CPU core per host. By giving -j2 GNU Parallel will not try to determine the number of cores.
Did you know that you can also give the number of cores in the --sshlogin as: 4/myserver ? This is useful if you have a mix of machines with different number of cores.
This is not an answer to the 3 primary questions, but I'd like to point out some other problems with the parallel statement in the first code block.
parallel --env $PBS_O_WORKDIR --sshloginfile $PBS_NODEFILE \
matlab -nodiplay -r "\"cd $PBS_O_WORKDIR,primes1({})\"" ::: 10 20 30 40
The shell expands the $PBS_O_WORKDIR prior to executing parallel. This means two things happen (1) the --env sees a filename rather than an environment variable name and essentially does nothing and (2) expands as part command string eliminating the need to pass $PBS_O_WORKDIR which is why there wasn't an error.
The latest version of parallel 20151022 has a workdir option (although the tutorial lists it as alpha testing) which is probably the easiest solution. The parallel command line would look something like:
parallel --workdir $PBS_O_WORKDIR --sshloginfile $PBS_NODEFILE \
matlab -nodisplay -r "primes1({})" :::: 10 20 30 40
Final note, PBS_NODEFILE may contain hosts listed multiple times if more than one processor is requested by qsub. This many have implications for number of jobs run, etc.

how to display the CPU percentage usage by each process in cmd

i want to a cmd windows command to display the all the processes and the cpu percentage for each process.
is there a command which give me this result?
can you help me please?
thank you
Try pslist from the SysInternals-powered pstools.
You will need to download them from that link and put the tools in your cmd directory (or chdir to wherever they are).
Use -s to see the CPU usage of each process.
Perfmon can use a wildcard to get the CPU usage for each running process. It also has the text interface typeperf which spits the results out to the console.
This command will produce a one-line CSV output of the current running process CPU usage:
typeperf "\process(*)\% processor time" -sc 1
The PID is missing from this report. If you need, you can add the PID for each of the processes as a separate counter to log, then match up the names:
typeperf "\process(*)\% processor time" "\process(*)\id process" -sc 1

Perl scripts suddenly compiling very slowly

Suddenly, compiling my perl scripts started taking too much time. (About a minute each)
It doesn't really matter what I have in the scripts, what does matter, however, is how many require and use I use.
I think it is in compiling, but I am not sure. The thing is - if I run only the checking part - meaning, perl -c script.h, it takes about the same time.
My question is - how to debug it, how to find out, what exactly is perl doing, to find out what takes so much time?
You can check how long each use an require command takes to load with something like the following (time is a unix/linux command, so on Windows you'll need to keep an eye on your watch):
$ time perl -c -e 'use strict;'
-e syntax OK
real 0m0.122s
user 0m0.000s
sys 0m0.008s
Just change the use/require line for each entry you have to find which one results in the longest time.
If you are on Windows, you could use Process Monitor utility to see disk I/O activity. If you have suspicion on Moose, running isolated script could show what is loaded and when.