Subproccess exits when parent python script stopped by systemd - subprocess

I am new to python and wanted to get familiar, so I created a python script that uses Popen from subprocess to execute a bash script. The bash script sets up an environment for, and executes a c++ program to run in the background.
The intended use is to run the python script as a service that watches for the C++ process, and if the C++ process exits, run the bash startup script again.
Everything runs as intended if I start the python script from the command line (./proc_watchdog.py), and then ctr+c, the C++ process will continue running.
If I then execute the the python script using systemd, systemd start pythonscript.service and then stop it systemd stop pythonscript.service, the C++ program exits.
The .service file:
[Unit]
Description=RustDedicated watchdog service
StartLimitIntervalSec=0
[Service]
Type=simple
Restart=always
RestartSec=1
User=scriptuser
ExecStart=/path/to/C++_Prog_Dir/proc_watchdog.py
[Install]
WantedBy=multi-user.target
The python script:
#!/usr/bin/python3
import subprocess as sp
import os
import psutil
import time
backgroundProc = "Procname"
def processWatchdog():
waitCount = 0;
while True:
procList = []
for proc in psutil.process_iter():
procList.append(proc.as_dict(attrs=['name']))
found = 0
for pname in procList:
if backgroundProc == pname['name']:
print("Process running")
found = 1
if found == 0:
print("Process not found...")
waitCount += 1
if waitCount == 3:
p = sp.Popen(["/path/to/C++_Prog_Dir/start.sh"])
print("Restarting")
p.wait()
waitCount = 0
print("Restarted")
time.sleep(2)
if __name__ == '__main__':
processWatchdog()
Bash script example:
#!/bin/bash
./c++_process &>> /dev/null &
exit 0
Can anyone help me understand why the python script would behave differently for each of the ways it gets executed?

Do supervision properly have the parent wait on the child. When the child exits, restart it. This is what DJB's daemontools does.
Change your script from:
./c++_process &>> /dev/null &
to:
exec ./c++_process &>> /dev/null
Then your watch dog will be alerted immediately when p.wait() returns that it has to restart the program as it had exited. With that your watch dog can be reduced to:
import subprocess as sp
import time
def processWatchdog():
while True:
p = sp.Popen(["/path/to/C++_Prog_Dir/start.sh"])
p.wait()
time.sleep(2)
if __name__ == '__main__':
processWatchdog()
Which simply restarts the program with a 2 second delay.

Related

Python Windows how to get STDOUT data in real time?

I have a windows executable that I want to run over and over. The problem is that sometimes there's an error about 1 second in, but the program doesn't exit. So what I would like to do is to be able to grab the contents of stdout, recognize there is an error, and then kill the subprocess and start it over.
When I run this executable, stuff prints to the screen just fine. But when I wrap it in a subprocess from python then the stdout stuff doesn't show up until the program terminates.
I've tried basically everything posted here with no luck:
Constantly print Subprocess output while process is running
Here's my current code, I replaced the executable with a second python program just to remove any other weird variables:
parent_program.py:
import subprocess, os, sys
program = "python "+os.path.dirname(os.path.abspath(__file__)) + "/child_program.py"
with subprocess.Popen(program, shell=True, stdout=subprocess.PIPE, bufsize=1, universal_newlines=True) as p:
for line in p.stdout:
print(line, end='')
child_program.py:
from time import sleep
for i in range(0,10):
print(i)
sleep(1)
What I would expect is that I would see 1,2,3,4... printed one second at a time, as if I had just run python child_program.py, but instead I get nothing for 10 seconds and then get all the output at once.
I also thought about trying to run the program from the CMD prompt and piping the stdout to a file python child_program.py 2>&1 > output.txt and then having python read that file, but it's the same problem, the file doesn't get written until the program terminates.
Is there any way to fix this on windows?

solaris: O_NDELAY set on stdin: when process exits the shell exits

TLDR: In Solaris, if O_NDELAY is set on stdin by a child process, bash exits. Why?
The following code causes interactive bash (v4.3.33) or tcsh (6.19.00) shells to exit after the process finishes running:
#include <fcntl.h>
int main() {
fcntl( 0, F_SETFL, O_NDELAY );
//int x = fcntl( 0, F_GETFL );
//fcntl( 0, F_SETFL, ~(x ^ (~O_NDELAY)) );
return 0;
}
The versions of ksh, csh and zsh we have aren't affected by this problem.
To investigate I ran bash & csh under truss (similar to strace on Linux) like this:
$ truss -eaf -o bash.txt -u'*' -{v,r,w}all bash --noprofile --norc
$ truss -eaf -o csh.txt -u'*' -{v,r,w}all csh -f
After csh finishes running the process it does the following:
fcntl( 0, F_GETFL ) = FWRITE|FNDELAY
fcntl( 0, F_SETFL, FWRITE) = 0
... which gave me an idea. I changed the program to the commented out code above so it would toggle the state of O_NDELAY. If I run it twice in a row bash doesn't exit.
This answer got me started on the right path. The man page for read (in Solaris) says:
When attempting to read a file associated with a terminal that has no data currently available:
* If O_NDELAY is set, read() returns 0
* If O_NONBLOCK is set, read() returns -1 and sets errno to EAGAIN
... so when bash tries to read stdin it returns 0 causing it to assume EOF was hit.
This page indicates O_NDELAY shouldn't be used anymore, instead recommending O_NONBLOCK. I've found similar statements regarding O_NDELAY / FIONBIO for various flavors of UNIX.
As an aside, in Linux O_NDELAY == FNDELAY == O_NONBLOCK, so it's not terribly surprising I was unable to reproduce this problem in that environment.
Unfortunately, the tool that's doing this isn't one I have the source code for, though from my experimenting I've found ways to work around the problem.
If nothing else I can make a simple program that removes O_NDELAY as above then wrap execution of this tool in a shell script that always runs the "fixer" program after the other one.

Systemd + Sys V init.d script: start works, but stop does not

I'm pretty new to writing systemd compatible init scripts. I've tried the following example:
#!/bin/sh
#
### BEGIN INIT INFO
# Provides: test
# Required-Start: $remote_fs $syslog
# Required-Stop: $remote_fs $syslog
# Default-Start: 2 3 4 5
# Default-Stop: 0 1 6
# Short-Description: Test.
# Description: Test.
### END INIT INFO
#
# Define LSB log_* functions.
# Depend on lsb-base (>= 3.0-6) to ensure that this file is present.
. /lib/lsb/init-functions
case "$1" in
stop)
log_failure_msg "Stop!"
exit 1
;;
start)
log_failure_msg "Start!"
exit 1
;;
restart)
log_failure_msg "Restart!"
exit 1
;;
status)
log_failure_msg "Status!"
exit 1
;;
*)
echo "Usage: $SELF start|stop|restart|status"
exit 1
;;
esac
# Some success paths end up returning non-zero, so exit 0 explicitly. See bug #739846.
exit 0
So far, so good. Now, I install the init script like so:
update-rc.d test defaults
update-rc.d test enable
Again, so far, so good. I've setup this init script specifically to fail (I had to do so during my testing to confirm the issue I was having.)
If I run /etc/init.d/test start, it fails as expected, and I can see the error message in my log.
If, however, I run /etc/init.d/test stop, I get nothing in my log, and the script returns successfully.
It seems like systemd is doing some kind of black magic behind the scenes and is hijacking stop somehow, but I can't figure out how, and I've been Googling around forever without success. Can anyone help me to understand why passing stop to my init script doesn't execute the corresponding code in my case block?
As a side note, the status option also doesn't work (systemd just outputs its own status info.)
I'm attempting to run this on Ubuntu 16.04. Thank you!
You'll notice the top of your script loads /lib/lsb/init-functions. You can read the code in there and the related code in /lib/lsb/init-functions.d to understand the related code you are pulling in.
A summary is that your script likely being converted to a systemd .service in the background and is subject to a whole host of documented incompatibilities with systemd
There is inherit extra complexity and potential problems when you ask systemd to emulate and support the file formats used by the legacy Upstart and sysVinit init systems.
Since you are writing new init script from scratch, consider writing a systemd .service file directly, removing all the additional complexity of involving additional init systems.
A minimal .service file to go in /etc/systemd/system/ could look like:
[Unit]
Description=Foo
[Service]
ExecStart=/usr/sbin/foo-daemon
[Install]
WantedBy=multi-user.target
More details in man systemd.service. Some additional learning now will save you some debugging later!

Running Python Script in Background Infinitely

I am trying to write a python script which runs on another server such that even if I close my server connection on my PC's terminal it keeps on running on that server.When the script is kept alive, it runs infinitely listening to any events on a Website (UI), on event occurrence it then starts up certain dockers appropriately and keeps on listening to PosgreSQL Events.
When I tried to use nohup (to run the script in background) it did run in the background but was unable to listen to any of the events. Has any one worked on something similar before? Please share your thoughts.
I am sharing a part of my script.
self.pool = await asyncpg.create_pool(user='alg_user',password='algy',database='alg',host='brain',port=6543)
async with self.pool.acquire() as conn:
def enqueue_listener(*args):
self.queue.put_nowait(args)
await conn.add_listener('task_created', enqueue_listener)
print("Added the listener")
while True:
print("---- Listening for new job ----")
conn2, pid, channel, payload = await self.queue.get()
x = re.sub("[^\w]", " ", payload).split()
print(x)
if x[5] == '1':
tsk = 'TASK_ID=%s' % str(x[1])
if x[3] == '1':
command = "docker run --rm -it -e ALGORITHM_ID=1 -e TASK_ID=%s --network project_default project/docked_prol:1.0" % (str(x[1]))
subprocess.Popen(command, shell=True, stdout=subprocess.PIPE)
if x[3] == '8':
command = "docker run --rm -it -e ALGORITHM_ID=8 -e TASK_ID=%s --network project_default project/docked_pro:1.0" % (str(x[1]))
subprocess.Popen(command, shell=True, stdout=subprocess.PIPE)
The script is running absolutely fine when kept on running manually, but just need some advice with implementation methodology.
First of all, I am here after 3 years later.
To run a script infinitely as a background task, you need process manager tools. PM2 is my favorite process manager tool made in nodejs but can run any terminal task because it is a CLI.
Basically, you can install NodeJs and npm to reach pm2. (You can visit NodeJs.org to download the installer.)
You need to install the pm2 as a global module using npm install -g pm2 on your terminal
You can check if it is installed simply by pm2 -v
Then you can start your python script on your terminal using pm2 start file_name.py
It will create a thread in background to run your script and it will be restart forever.
If you were doing something that takes so much time and you dont want to see the task running on the terminal you can just disable restarting by adding the parameter --no-autorestart into the command. (# pm2 start file_name.py --no-autorestart)
If you want to see the logs or the state of the task, you can just use pm2 status, pm2 logs and pm2 monit.
If you want to stop the task, you can use pm2 stop task_name
You can use pm2 reload all or pm2 update to start all the tasks back
You can kill the task using pm2 kill
For more information you can visit PM2 Python Documentation
Running something in background via nohup will only work if the process/script runs automatically without providing external inputs, because there is no way to provide manual inputs to a background process.
First, try checking if the process is still running in background (ps -fe|grep processname).
If its running, then check the 'nohup.out' file to see where the process is getting stuck. This gets generated in the same directory where you started the process. This will give you some idea what is going on inside the process.

Wait for all processes started by a command or script executed with the function system perl

With the system functionlaunch a bash script. The function system waits for the bash script finished execution and I will return the exit status of the script.
The bash script in question, in its execution flow has a loop that executes n times the same script with different parameters.Obviously when the loop condition is no longer valid, the loop is terminated and the exit is invoked. In this way there are child processes of the executed script from perl function system that are zombies.
The system function does not wait for processes zombies but only the first script launched.
The my scenario is:
perl system function ---launch---> my bash script ---launch---> bash script
---launch---> bash script
---launch---> bash script
.............................
.............................
.............................
---launch---> bash script
To wait until all processing is done, I have to change the bash script or function I can resolve directly with the function system perl?
Change the bash script you call from Perl so it does:
bash subprocess &
bash subprocess &
bash subprocess &
...
wait
then it will wait for all its own children to complete before it exits itself. For example
sleep 5 &
sleep 5 &
sleep 5 &
wait
will take 5 seconds to run.