solaris: O_NDELAY set on stdin: when process exits the shell exits - solaris

TLDR: In Solaris, if O_NDELAY is set on stdin by a child process, bash exits. Why?
The following code causes interactive bash (v4.3.33) or tcsh (6.19.00) shells to exit after the process finishes running:
#include <fcntl.h>
int main() {
fcntl( 0, F_SETFL, O_NDELAY );
//int x = fcntl( 0, F_GETFL );
//fcntl( 0, F_SETFL, ~(x ^ (~O_NDELAY)) );
return 0;
}
The versions of ksh, csh and zsh we have aren't affected by this problem.
To investigate I ran bash & csh under truss (similar to strace on Linux) like this:
$ truss -eaf -o bash.txt -u'*' -{v,r,w}all bash --noprofile --norc
$ truss -eaf -o csh.txt -u'*' -{v,r,w}all csh -f
After csh finishes running the process it does the following:
fcntl( 0, F_GETFL ) = FWRITE|FNDELAY
fcntl( 0, F_SETFL, FWRITE) = 0
... which gave me an idea. I changed the program to the commented out code above so it would toggle the state of O_NDELAY. If I run it twice in a row bash doesn't exit.

This answer got me started on the right path. The man page for read (in Solaris) says:
When attempting to read a file associated with a terminal that has no data currently available:
* If O_NDELAY is set, read() returns 0
* If O_NONBLOCK is set, read() returns -1 and sets errno to EAGAIN
... so when bash tries to read stdin it returns 0 causing it to assume EOF was hit.
This page indicates O_NDELAY shouldn't be used anymore, instead recommending O_NONBLOCK. I've found similar statements regarding O_NDELAY / FIONBIO for various flavors of UNIX.
As an aside, in Linux O_NDELAY == FNDELAY == O_NONBLOCK, so it's not terribly surprising I was unable to reproduce this problem in that environment.
Unfortunately, the tool that's doing this isn't one I have the source code for, though from my experimenting I've found ways to work around the problem.
If nothing else I can make a simple program that removes O_NDELAY as above then wrap execution of this tool in a shell script that always runs the "fixer" program after the other one.

Related

System call to run pbmtextps: ghostscript

I'm coding a Perl script to generate images with text in them. I'm on a Linux machine. I'm using pbmtextps. When I try to run pbmtextps in Perl with a system call like this
system("pbmtextps -fontsize 24 SampleText > out.pbm");
I get this error message
pbmtextps: failed to run Ghostscript process: rc=-1
However, if I run the exact same pbmtextps command from the command-line outside of Perl, it runs with no errors.
Why does it cause the ghostscript error when I run it from inside a Perl script?
ADDITIONAL INFO: I tried to hack around this by creating a C code called mypbmtextps.c which does the exact same thing with a C system call. That works from the command line. No errors. But then when I call that C program from the Perl script, I get the same ghostscript error.
ANSWER: I solved it. The problem was this line in the PERL script:
$SIG{CHLD} = 'IGNORE';
When I got rid of that (which I need for other things, but not in this script) it worked okay. If anyone knows why that would cause a problem, please add that explanation.
Ah-ha. Well, the SIGCHLD is required for wait(), and so required for perl to be able to retrieve the exit status of the child process created by system(). In particular, system() always returns -1 when SIGCHLD is ignored. $? will also be unavailable with SIGCHLD blocked.
What printed the error message? pbmtextps, or your perl script?
As far as I know, the signal handler for your perl process shouldn't affect the signal handler for the child processes, but this could depend on your version of perl and your OS version. On my Linux Mint 13 with Perl 5.14.2 the inner perl script prints 0, with the outer script printing -1:
perl -e '$SIG{CHLD}= "IGNORE"; print system(q{perl -e "print system(q(/bin/sleep 1))"})'
Is your perl script modifying the environment?
You can test with
system("env > /tmp/env.perl");
and then compare it to the environment form your shell:
env > /tmp/env.shell
diff /tmp/env.shell /tmp/env.perl
Is the perl script also being run from the shell, or is it being run from some other process like cron or apache? (in particular, you should check $PATH)

How to tell bash not to issue warnings "cannot set terminal process group" and "no job control in this shell" when it can't assert job control?

To create a new interactive bash shell I call bash -i. Due to issues with my environment, bash cannot assert job control (I'm using cygwin bash in GNU emacs) and issues warnings ("cannot set terminal process group" and "no job control in this shell"). - I have to live with the disabled job control in my environment, but I would like to get rid of the warning:
How can I tell bash not to assert job control and not to issue these warnings? I obviously still want the shell as an interactive one.
Note: I have tried set -m in .bashrc, but bash still writes out the warnings on start up - the ~/.bashrc file might be executed after the shell tries to assert job control. Is there a command line option which would work?
man bash says about set options that The options can also be specified as arguments to an invocation of the shell. Note you will need +m not -m. Admittedly the manual isn't quite clear on that.
However looking at bash source code (version 4.2), apparently it ignores the state of this flag. I would say this is a bug.
Applying the following small patch makes bash honor the m flag on startup. Unfortunately this means you will have to recompile bash.
--- jobs.c.orig 2011-01-07 16:59:29.000000000 +0100
+++ jobs.c 2012-11-09 03:34:49.682918771 +0100
## -3611,7 +3611,7 ##
}
/* We can only have job control if we are interactive. */
- if (interactive == 0)
+ if (interactive == 0 || !job_control)
{
job_control = 0;
original_pgrp = NO_PID;
Tested on my linux machine where job control is available by default, so the error messages you see on mingw are not printed here. You can still see that bash honors the +m now, though.
$ ./bash --noprofile --norc
$ echo $-
himBH
$ fg
bash: fg: current: no such job
$ exit
$ ./bash --noprofile --norc +m
$ echo $-
hiBH
$ fg
bash: fg: no job control

Odd behavior with Perl system() command

Note that I'm aware that this is probably not the best or most optimal way to do this but I've run into this somewhere before and I'm curious as to the answer.
I have a perl script that is called from an init that runs and occasionally dies. To quickly debug this, I put together a quick wrapper perl script that basically consists of
#$path set from library call.
while(1){
system("$path/command.pl " . join(" ",#ARGV) . " >>/var/log/outlog 2>&1");
sleep 30; #Added this one later. See below...
}
Fire this up from the command line and it runs fine and as expected. command.pl is called and the script basically halts there until the child process dies then goes around again.
However, when called from a start script (actually via start-stop-daemon), the system command returns immediately, leaving command.pl running. Then it goes around for another go. And again and again. (This was not fun without the sleep command.). ps reveals the parent of (the many) command.pl to be 1 rather than the id of the wrapper script (which it is when I run from the command line).
Anyone know what's occurring?
Maybe the command.pl is not being run successfully. Maybe the file doesn't have execute permission (do you need to say perl command.pl?). Maybe you are running the command from a different directory than you thought, and the command.pl file isn't found.
There are at least three things you can check:
standard error output of your command. For now you are swallowing it by saying 2>&1. Remove that part and observe what errors the system command produces.
the return value of system. The command may run and system may still return an exit code, but if system returns 0, you know the command was successful.
Perl's error variable $!. If there was a problem, Perl will set $!, which may or may not be helpful.
To summarize, try:
my $ec = system("command.pl >> /var/log/outlog");
if ($ec != 0) {
warn "exit code was $ec, \$! is $!";
}
Update: if multiple instance of the command keep showing up in your ps output, then it sounds like the program is forking and running itself in the background. If that is indeed what the command is supposed to do, then what you do NOT want to do is run this command in an endless loop.
Perhaps when run from a deamon the "system" command is using a different shell than the one used when you are running as yourself. Maybe the shell used by the daemon does not recognize the >& construct.
Instead of system("..."), try exec("...") function if that works for you.

Returning an exit code from a shell script that was called from inside a perl script

I have a perl script (verifyCopy.pl) that uses system() to call a shell script (intercp.sh).
From inside the shell script, I have set up several exit's with specific exit codes and I'd like to be able to do different things based on which exit code is returned.
I've tried using $?, I have tried assigning the value of system("./intercp.sh") to a variable then checking the value of that, but the error message is always 0.
Is this because even though something inside the shell script fails, the actual script succeeds in running?
I tried adding a trap in the shell script (ie trap testexit EXIT and testexit() { exit 222; } but that didn't work either.
$? should catch the exit code from your shell script.
$ cat /tmp/test.sh
#!/bin/sh
exit 2
$ perl -E 'system("/tmp/test.sh"); say $?'
512
Remember that $? is encoded in the traditional manner, so $? >> 8 gives the exit code, $? & 0x7F gives the signal, and $? & 0x80 is true if core was dumped. See perlvar for details.
Your problem may be one of several things: maybe your shell script isn't actually exiting with the exit code (maybe you want set -e); maybe you have a signal handle for SIGCHLD eating the exit code; etc. Try testing with the extremely simple shell script above to see if its a problem in your perl script or your shell script.

Emacs shell-command restart on crash

Is it possible to detect when a long running process started with shell-command crashes, so it can automatically be restarted? Without manually checking its buffer and restarting by hand.
I wouldn't handle this from Emacs at all. Instead, I'd write a wrapper script around my original long-running process that restarts the process if it dies in a partiular way. For example, if your program dies by getting the SIGABRT signal, the wrapper script might look like this:
#!/bin/bash
while true
do
your-original-command --switch some args
if [ $? -ne 134 ]; then break; fi
echo "Program crashed; restarting"
done
I got the value 134 for the SIGABRT signal by doing this:
perl -e 'kill ABRT => $$'; echo $?
This is all assuming some kind of Unix-y system.