I have to start a pdf viewer from a Perl script. The viewer should
become detached from the parent process and the terminal that the parent process was run from. If I close the parent or the terminal the
viewer should still be kept running. I considered three approaches (using evince as the pdf viewer command):
Using system and sh:
system 'evince test.pdf &';
Using fork():
$SIG{CHLD} = "IGNORE"; #reap children as they complete
my $pid = fork();
if ( $pid == 0 ) {
exec 'evince', 'test.pdf';
}
Using Proc::Daemon:
use Proc::Daemon;
my $daemon = Proc::Daemon->new(
work_dir => '/tmp/evince',
child_STDOUT => '>>stdout.txt',
child_STDERR => '>>stderr.txt',
);
my $pid = $daemon->Init();
if ( $pid == 0 ) {
exec 'evince', 'test.pdf';
}
What would be the difference between these approaches? Which approach would you recommend?
system 'evince test.pdf &';
In my experience, this is likely to really be:
system 'evince $pdf_file &';
If $pdf_file is user input, then we get shell-injection bugs, such as passing in a pdf name of $(rm -rf /) or even just ;rm -rf /. And what if the name has a space in it? Well, you can avoid all that if you quote it, right?
system 'evince "$pdf_file" &';
Well, no, now all I have to do is give you a filename of ";rm -rf "/. And what if my pdf has a double quote in its name? You could use single quotes, but the same problem comes up if the filename has single quotes in it, and the shell injection isn't really any harder. You could come up with an elaborate shellify function that properly quotes a string all so that the shell can unquote it and get back to the original entry ... but that seems like so much more work than your other options, neither of which suffers from these problems.
$SIG{CHLD} = "IGNORE"; #reap children as they complete
my $pid = fork();
if ( $pid == 0 ) {
exec 'evince', 'test.pdf';
}
Setting a global $SIG{CHLD} is nice and easy ... unless you need to handle other children as they die. So only you can tell whether that's acceptable or not. And, again in my experience, not even always then. I've been bitten by this one - though rarely. I had this mixed in with an application that, elsewhere, used AnyEvent, and managed to break AE's subprocess handling. (The same would likely hold true if you mixed this with any event system, I just happened to be using AE.)
Also, this is missing the stdout and stderr redirects - and stdin redirect. That's easy enough to add - inside your if, before the exec, just close and reopen the filehandles as you need, e.g.:
close STDOUT; open STDOUT, '>', '/dev/null';
close STDERR; open STDERR, '>', '/dev/null';
close STDIN; open STDIN, '<', '/dev/null';
No big deal. However, Proc::Daemon does set up a few more things for you to ensure signals don't reach from one to the other process, in either direction. This depends on how severe you need to get.
For most of my purposes, I've found #2 to be sufficient. I've only reached for Proc::Daemon on a few projects, but that's where a) I have full control over the module installation, and b) it really matters. Starting a pdf viewer wouldn't normally be such a case.
I avoid #1 at all costs - I have had some fairly significant bites with shell injection, and now try to avoid the shell at all times.
Related
Was debugging a perl script for the first time in my life and came over this:
$my_temp_file = File::Temp->tmpnam();
system("cmd $blah | cmd2 > $my_temp_file");
open(FIL, "$my_temp_file");
...
unlink $my_temp_file;
This works pretty much like I want, except the obvious race conditions in lines 1-3. Even if using proper tempfile() there is no way (I can think of) to ensure that the file streamed to at line 2 is the same opened at line 3. One solution might be pipes, but the errors during cmd might occur late because of limited pipe buffering, and that would complicate my error handling (I think).
How do I:
Write all output from cmd $blah | cmd2 into a tempfile opened file handle?
Read the output without re-opening the file (risking race condition)?
You can open a pipe to a command and read its contents directly with no intermediate file:
open my $fh, '-|', 'cmd', $blah;
while( <$fh> ) {
...
}
With short output, backticks might do the job, although in this case you have to be more careful to scrub the inputs so they aren't misinterpreted by the shell:
my $output = `cmd $blah`;
There are various modules on CPAN that handle this sort of thing, too.
Some comments on temporary files
The comments mentioned race conditions, so I thought I'd write a few things for those wondering what people are talking about.
In the original code, Andreas uses File::Temp, a module from the Perl Standard Library. However, they use the tmpnam POSIX-like call, which has this caveat in the docs:
Implementations of mktemp(), tmpnam(), and tempnam() are provided, but should be used with caution since they return only a filename that was valid when function was called, so cannot guarantee that the file will not exist by the time the caller opens the filename.
This is discouraged and was removed for Perl v5.22's POSIX.
That is, you get back the name of a file that does not exist yet. After you get the name, you don't know if that filename was made by another program. And, that unlink later can cause problems for one of the programs.
The "race condition" comes in when two programs that probably don't know about each other try to do the same thing as roughly the same time. Your program tries to make a temporary file named "foo", and so does some other program. They both might see at the same time that a file named "foo" does not exist, then try to create it. They both might succeed, and as they both write to it, they might interleave or overwrite the other's output. Then, one of those programs think it is done and calls unlink. Now the other program wonders what happened.
In the malicious exploit case, some bad actor knows a temporary file will show up, so it recognizes a new file and gets in there to read or write data.
But this can also happen within the same program. Two or more versions of the same program run at the same time and try to do the same thing. With randomized filenames, it is probably exceedingly rare that two running programs will choose the same name at the same time. However, we don't care how rare something is; we care how devastating the consequences are should it happen. And, rare is much more frequent than never.
File::Temp
Knowing all that, File::Temp handles the details of ensuring that you get a filehandle:
my( $fh, $name ) = File::Temp->tempfile;
This uses a default template to create the name. When the filehandle goes out of scope, File::Temp also cleans up the mess.
{
my( $fh, $name ) = File::Temp->tempfile;
print $fh ...;
...;
} # file cleaned up
Some systems might automatically clean up temp files, although I haven't care about that in years. Typically is was a batch thing (say once a week).
I often go one step further by giving my temporary filenames a template, where the Xs are literal characters the module recognizes and fills in with randomized characters:
my( $name, $fh ) = File::Temp->tempfile(
sprintf "$0-%d-XXXXXX", time );
I'm often doing this while I'm developing things so I can watch the program make the files (and in which order) and see what's in them. In production I probably want to obscure the source program name ($0) and the time; I don't want to make it easier to guess who's making which file.
A scratchpad
I can also open a temporary file with open by not giving it a filename. This is useful when you want to collect outside the program. Opening it read-write means you can output some stuff then move around that file (we show a fixed-length record example in Learning Perl):
open(my $tmp, "+>", undef) or die ...
print $tmp "Some stuff\n";
seek $tmp, 0, 0;
my $line = <$tmp>;
File::Temp opens the temp file in O_RDWR mode so all you have to do is use that one file handle for both reading and writing, even from external programs. The returned file handle is overloaded so that it stringifies to the temp file name so you can pass that to the external program. If that is dangerous for your purpose you can get the fileno() and redirect to /dev/fd/<fileno> instead.
All you have to do is mind your seeks and tells. :-) Just remember to always set autoflush!
use File::Temp;
use Data::Dump;
$fh = File::Temp->new;
$fh->autoflush;
system "ls /tmp/*.txt >> $fh" and die $!;
#lines = <$fh>;
printf "%s\n\n", Data::Dump::pp(\#lines);
print $fh "How now brown cow\n";
seek $fh, 0, 0 or die $!;
#lines2 = <$fh>;
printf "%s\n", Data::Dump::pp(\#lines2);
Which prints
[
"/tmp/cpan_htmlconvert_DPzx.txt\n",
"/tmp/cpan_htmlconvert_DunL.txt\n",
"/tmp/cpan_install_HfUe.txt\n",
"/tmp/cpan_install_XbD6.txt\n",
"/tmp/cpan_install_yzs9.txt\n",
]
[
"/tmp/cpan_htmlconvert_DPzx.txt\n",
"/tmp/cpan_htmlconvert_DunL.txt\n",
"/tmp/cpan_install_HfUe.txt\n",
"/tmp/cpan_install_XbD6.txt\n",
"/tmp/cpan_install_yzs9.txt\n",
"How now brown cow\n",
]
HTH
I have the following Perl code and would like it to display exactly as invoking /bin/ls in the terminal would display. For example on a terminal sized to 100 columns, it would print up to 100 characters worth of output before inserting a newline. Instead this code prints 1 file per line of output. I feel like it involves assigning some terminal settings to the IO::Pty instance, but I've tried variations of that without luck.
UPDATE: I replaced the <$READER> with a call to sysread hoping the original code might just have a buffering issue, but the output received from sysread is still one file per line.
UPDATE: I added code showing my attempt at changing the IO::Pty's size via the clone_winsize_from method. This didn't result in the output being any different.
UPDATE: As best I can tell (from reading IPC::open3 code for version 1.12) it seems you cannot pass a variable of type IO::Handle without open3 creating a pipe rather than dup'ing the filehandle. This means isatty doesn't return a true value when ls invokes it and ls then forces itself into "one file per line" mode.
I think I just need to do a fork/exec and handle the I/O redirection myself.
#!/usr/bin/env perl
use IPC::Open3;
use IO::Pty;
use strict;
my $READER = IO::Pty->new();
$READER->slave->clone_winsize_from(\*STDIN);
my $pid = open3(undef, $READER, undef, "/bin/ls");
while(my $line = <$READER>)
{
print $line;
}
waitpid($pid, 0) or die "Error waiting for pid: $!\n";
$READER->close();
I think $READER is getting overwritten with a pipe created by open3, which can be avoided by changing
my $READER = ...;
my $pid = open3(undef, $READER, undef, "/bin/ls");
to
local *READER = ...;
my $pid = open3(undef, '>&READER', undef, "/bin/ls");
See the docs.
You can pass the -C option to ls to force it to use columnar output (without getting IO::Pty involved).
The IO::Pty docs describe a clone_winsize_from(\*FH) method. You might try cloning your actual pty's dimensions.
I see that you're setting up the pty only as stdout of the child process. You might need to set it up also as its stdin — when the child process sends the "query terminal size" escape sequence to its stdout, it would need to receive the response on its stdin.
I have a process in Perl that creates another one with the system command, I leave it on memory and I pass some variables like this:
my $var1 = "Hello";
my $var1 = "World";
system "./another_process.pl $var1 $var2 &";
But the system command only returns the result, I need to get the PID. I want to make something like fork. What should I do? How can I make something like fork but in diferent scripts?
Thanks in advance!
Perl has a fork function.
See perldoc perlfaq8 - How do I start a process in the background?
(contributed by brian d foy)
There's not a single way to run code
in the background so you don't have to
wait for it to finish before your
program moves on to other tasks.
Process management depends on your
particular operating system, and many
of the techniques are in perlipc.
Several CPAN modules may be able to
help, including
IPC::Open2
or
IPC::Open3
,
IPC::Run
,
Parallel::Jobs
,
Parallel::ForkManager
,
POE
,
Proc::Background
, and
Win32::Process
.
There are many other modules you might
use, so check those namespaces for
other options too. If you are on a
Unix-like system, you might be able to
get away with a system call where you
put an & on the end of the command:
system("cmd &")
You can also try using
fork,
as described in
perlfunc
(although this is the same thing that
many of the modules will do for you).
STDIN, STDOUT, and STDERR are shared
Both the main process and the
backgrounded one (the "child" process)
share the same STDIN, STDOUT and
STDERR filehandles. If both try to
access them at once, strange things
can happen. You may want to close or
reopen these for the child. You can
get around this with opening a pipe
(see open) but on some systems this
means that the child process cannot
outlive the parent.
Signals
You'll have to catch the SIGCHLD
signal, and possibly SIGPIPE too.
SIGCHLD is sent when the backgrounded
process finishes. SIGPIPE is sent when
you write to a filehandle whose child
process has closed (an untrapped
SIGPIPE can cause your program to
silently die). This is not an issue
with system("cmd&").
Zombies
You have to be prepared to "reap" the
child process when it finishes.
$SIG{CHLD} = sub { wait };
$SIG{CHLD} = 'IGNORE'; You can also
use a double fork. You immediately
wait() for your first child, and the
init daemon will wait() for your
grandchild once it exits.
unless ($pid = fork) {
unless (fork) {
exec "what you really wanna do";
die "exec failed!";
}
exit 0;
}
waitpid($pid, 0);
See Signals in
perlipc
for other examples of code to do this.
Zombies are not an issue with
system("prog &").system("prog &").
It's true that you can use fork/exec, but I think it will be much easier to simply use the pipe form of open. Not only is the return value the pid you are looking for, you can be connected to either the stdin or stdout of the process, depending on how you open. For instance:
open my $handle, "foo|";
will return the pid of foo and connect you to the stdout so that if you you get a line of output from foo. Using "|foo" instead will allow you to write to foo's stdin.
You can also use open2 and open3 to do both simultaneously, though that has some major caveats applied as you can run in to unexpected issues due to io buffering.
Use fork and exec.
If you need to get the PID of a perl script you can use the $$ variable. You can put it in your another_process.pl then have it output the pid to a file. Can you be more clear on like fork? You can always use the fork exec combination.
I have a module that uses IPC::Open3 (or IPC::Open2, both exhibit this problem) to call an external binary (bogofilter in this case) and feed it some input via the child-input filehandle, then reads the result from the child-output handle. The code works fine when run in most environments. However, the main use of this module is in a web service that runs under Apache 2.2.6. And under that environment, I get the error:
Cannot fdopen STDOUT: Invalid argument
This only happens when the code runs under Apache. Previously, the code constructed a horribly complex command, which included a here-document for the input, and ran it with back-ticks. THAT worked, but was very slow and prone to breaking in unique and perplexing ways. I would hate to have to revert to the old version, but I cannot crack this.
Could it be because mod_perl 2 closes STDOUT? I just discovered this and posted about it:
http://marc.info/?l=apache-modperl&m=126296015910250&w=2
I think it's a nasty bug, but no one seems to care about it thus far. Post a follow up on the mod_perl list if your problem is related and you want it to get attention.
Jon
Bogofilter returns different exit codes for spam/nonspam.
You can "fix" this by redirecting stdout to /dev/null
system("bogofilter < $input > /dev/null") >> 8;
Will return 0 for spam, 1 for nonspam, 2 for unknown (the >> 8 is because perl helpfully corrects the exit code, this fixes the damage).
Note: the lack of an environment may also prevent bogofilter from finding its wordlist, so pass that in explicitly as well:
system("bogofilter -d /path/to/.bogofilter/ < $input > /dev/null") >> 8;
(where /path/to/.bogofilter contains the wordlist.db)
You can't retrieve the actual rating that bogofilter gave that way, but it does get you something.
If your code is only going to be run on Linux/Unix systems it is easy to write an open3 replacement that does not fail because STDOUT is not a real file handle:
sub my_open3 {
# untested!
pipe my($inr), my($inw) or die;
pipe my($outr), my($outw) or die;
pipe my($errr), my($errw) or die;
my $pid = fork;
unless ($pid) {
defined $pid or die;
POSIX::dup2($inr, 0);
POSIX::dup2($outw, 1);
POSIX::dup2($errw, 2);
exec #_;
POSIX::_exit(1);
}
return ($inw, $outr, $errr);
}
my ($in, $out, $err) = my_open3('ls /etc/');
Caveat Emptor: I am not a perl wizard.
As #JonathanSwartz suggested, I believe the issue is that apache2 mod_perl closes STDIN and STDOUT. That shouldn't be relevant to what IPC::Open3 is doing, but it has a bug in it, described here.
In summary (this is the part I'm not super clear on), open3 tries to match the child processes STDIN/OUT/ERR to your process, or duplicate it if that was what is requested. Due to some undocumented ways that open('>&=X') works, it generally works fine, except in the case where STDIN/OUT/ERR are closed.
Another link that gets deep into the details.
One solution is to fix IPC::Open3, as described in both of those links. The other, which worked for me, is to temporarily open STDIN/OUT in your mod_perl code and then close it afterwards:
my ($save_stdin,$save_stdout);
open $save_stdin, '>&STDIN';
open $save_stdout, '>&STDOUT';
open STDIN, '>&=0';
open STDOUT, '>&=1';
#make your normal IPC::Open3::open3 call here
close(STDIN);
close(STDOUT);
open STDIN, '>&', $save_stdin;
open STDOUT, '>&', $save_stdout;
Also, I noticed a bunch of complaints around the net about IPC::Run3 suffering from the same problems, so if anyone runs into the same issue, I suspect the same solution would work.
The following snippet of code is used to find the PID of a user's terminal, by using ptree and grabbing the third PID from the results it returns. All terminal PID's are stored in a hash with the user's login as the key.
## If process is a TEMINAL.
## The command ptree is used to get the terminal's process ID.
## The user can then use this ID to peek the user's terminal.
if ($PID =~ /(\w+)\s+(\d+) .+basic/) {
$user = $1;
if (open(PTREE, "ptree $2 |")) {
while ($PTREE = <PTREE>) {
if ($PTREE =~ /(\d+)\s+-pksh-ksh/) {
$terminals{$user} = $terminals{$user} . " $1";
last;
}
next;
}
close(PTREE);
}
next;
}
Below is a sample ptree execution:
ares./home_atenas/lmcgra> ptree 29064
485 /usr/lib/inet/inetd start
23054 /usr/sbin/in.telnetd
23131 -pksh-ksh
26107 -ksh
29058 -ksh
29064 /usr/ob/bin/basic s=61440 pgm=/usr/local/etc/logon -q -nr trans
412 sybsrvr
I'd like to know if there is a better way to code this. This is the part of the script that takes longest to run.
Note: this code, along with other snippets, are inside a loop and are executed a couple of times.
I think the main problem is that this code is in a loop. You don't need to run ptree and parse the results more than once! You need to figure out a way to run ptree once and put it into a data structure that you can use later. Probably be some kind of simple hash will suffice. You may even be able to just keep around your %terminals hash and keep reusing it.
Some nitpicks...
Both of your "next" statements seem
unnecessary to me... you should be
able to just remove them.
Replace
$terminals{$user} = $terminals{$user} . " $1";
with:
$terminals{$user} .= " $1";
Replace the bareword PTREE which you
are using as a filehandle with
$ptreeF or some such... using
barewords became unnecessary for
filehandles about 10 years ago :)
I don't know why your $PID variable
is all caps... it could be confusing
to readers of your code because it
looks like there is something
special about that variable, and
there isn't.
I think you'll get the best performance improvement by avoiding the overhead of repeatedly executing an external command (ptree, in this case). I'd look for a CPAN module that provides a direct interface to the data structures that ptree is reading. Check the Linux:: namespace, maybe? (I'm not sure if ptree is setuid; that may complicate things.)
The above advice aside, some additional style and robustness notes based on the posted snippet only (forgive me if the larger code invalidates them):
I'd start by using strict, at the very least. Lexical filehandles would also be a good idea.
You appear to be silently ignoring the case when you cannot open() the ptree command. That could happen for many reasons, some of which I can't imagine you wanting to ignore, such as…
You're not using the full path to the ptree command, but rather assuming it's in your path—and that the one in your path is the right one.
How many users are on the system? Can you invert this? List all -pksh-ksh processes in the system along with their EUIDs, and build the map from that - that might be only one execution of ps/ptree.
I was thinking of using ps to get the parents pid, but I would need to loop this to get the great-grandparent's pid. That's the one I need. Thanks. – lamcro
Sorry, there are many users and each can have up to three terminals open. The whole script is used to find those terminals that are using a file. I use fuser to find the processes that use a file. Then use ptree to find the terminal's pid. – lamcro
If you have (or can get) a list of PIDs using a file, and just need all of the grand-parents of that PID, there's an easier way, for sure.
#!perl
use warnings;
use strict;
#***** these PIDs are gotten with fuser or some other method *****
my($fpids) = [27538, 31812, 27541];
#***** get all processes, assuming linux PS *****
my($cmd) = "ps -ef";
open(PS, "$cmd |") || die qq([ERROR] Cannot open pipe from "$cmd" - $!\n);
my($processlist) = {};
while (<PS>) {
chomp;
my($user, $pid, $ppid, $rest) = split(/ +/, $_, 4);
$processlist->{$pid} = $ppid;
}
close PS;
#***** lookup grandparent *****
foreach my $fpid (#$fpids) {
my($parent) = $processlist->{$fpid} || 0;
my($grandparent) = $processlist->{$parent} || 0;
if ($grandparent) {
#----- do something here with grandparent's pid -----
print "PID:GRANDPID - $fpid:$grandparent\n";
}
else {
#----- some error condition -----
print "ERROR - Cannot determine GrandPID: $fpid ($parent)\n";
}
}
Which for me produces:
ERROR - Cannot determine GrandPID: 27538 (1)
PID:GRANDPID - 31812:2804
PID:GRANDPID - 27541:27538
Have you considered using 'who -u' to tell you which process is the login shell for a given tty instead of using ptree? This would simplify your search - irrespective of the other changes you should also make.
I just did some trivial timings here based on your script (calling "cat ptree.txt" instead of ptree itself) and confirmed my thoughts that all of your time is spent creating new sub-processes and running ptree itself. Unless you can factor away the need to call ptree (maybe there's a way to open up the connection once and reuse it, like with nslookup), you won't see any real gains.