IPC::Open3 Fails Running Under Apache - perl

I have a module that uses IPC::Open3 (or IPC::Open2, both exhibit this problem) to call an external binary (bogofilter in this case) and feed it some input via the child-input filehandle, then reads the result from the child-output handle. The code works fine when run in most environments. However, the main use of this module is in a web service that runs under Apache 2.2.6. And under that environment, I get the error:
Cannot fdopen STDOUT: Invalid argument
This only happens when the code runs under Apache. Previously, the code constructed a horribly complex command, which included a here-document for the input, and ran it with back-ticks. THAT worked, but was very slow and prone to breaking in unique and perplexing ways. I would hate to have to revert to the old version, but I cannot crack this.

Could it be because mod_perl 2 closes STDOUT? I just discovered this and posted about it:
http://marc.info/?l=apache-modperl&m=126296015910250&w=2
I think it's a nasty bug, but no one seems to care about it thus far. Post a follow up on the mod_perl list if your problem is related and you want it to get attention.
Jon

Bogofilter returns different exit codes for spam/nonspam.
You can "fix" this by redirecting stdout to /dev/null
system("bogofilter < $input > /dev/null") >> 8;
Will return 0 for spam, 1 for nonspam, 2 for unknown (the >> 8 is because perl helpfully corrects the exit code, this fixes the damage).
Note: the lack of an environment may also prevent bogofilter from finding its wordlist, so pass that in explicitly as well:
system("bogofilter -d /path/to/.bogofilter/ < $input > /dev/null") >> 8;
(where /path/to/.bogofilter contains the wordlist.db)
You can't retrieve the actual rating that bogofilter gave that way, but it does get you something.

If your code is only going to be run on Linux/Unix systems it is easy to write an open3 replacement that does not fail because STDOUT is not a real file handle:
sub my_open3 {
# untested!
pipe my($inr), my($inw) or die;
pipe my($outr), my($outw) or die;
pipe my($errr), my($errw) or die;
my $pid = fork;
unless ($pid) {
defined $pid or die;
POSIX::dup2($inr, 0);
POSIX::dup2($outw, 1);
POSIX::dup2($errw, 2);
exec #_;
POSIX::_exit(1);
}
return ($inw, $outr, $errr);
}
my ($in, $out, $err) = my_open3('ls /etc/');

Caveat Emptor: I am not a perl wizard.
As #JonathanSwartz suggested, I believe the issue is that apache2 mod_perl closes STDIN and STDOUT. That shouldn't be relevant to what IPC::Open3 is doing, but it has a bug in it, described here.
In summary (this is the part I'm not super clear on), open3 tries to match the child processes STDIN/OUT/ERR to your process, or duplicate it if that was what is requested. Due to some undocumented ways that open('>&=X') works, it generally works fine, except in the case where STDIN/OUT/ERR are closed.
Another link that gets deep into the details.
One solution is to fix IPC::Open3, as described in both of those links. The other, which worked for me, is to temporarily open STDIN/OUT in your mod_perl code and then close it afterwards:
my ($save_stdin,$save_stdout);
open $save_stdin, '>&STDIN';
open $save_stdout, '>&STDOUT';
open STDIN, '>&=0';
open STDOUT, '>&=1';
#make your normal IPC::Open3::open3 call here
close(STDIN);
close(STDOUT);
open STDIN, '>&', $save_stdin;
open STDOUT, '>&', $save_stdout;
Also, I noticed a bunch of complaints around the net about IPC::Run3 suffering from the same problems, so if anyone runs into the same issue, I suspect the same solution would work.

Related

How to pipe to and read from the same tempfile handle without race conditions?

Was debugging a perl script for the first time in my life and came over this:
$my_temp_file = File::Temp->tmpnam();
system("cmd $blah | cmd2 > $my_temp_file");
open(FIL, "$my_temp_file");
...
unlink $my_temp_file;
This works pretty much like I want, except the obvious race conditions in lines 1-3. Even if using proper tempfile() there is no way (I can think of) to ensure that the file streamed to at line 2 is the same opened at line 3. One solution might be pipes, but the errors during cmd might occur late because of limited pipe buffering, and that would complicate my error handling (I think).
How do I:
Write all output from cmd $blah | cmd2 into a tempfile opened file handle?
Read the output without re-opening the file (risking race condition)?
You can open a pipe to a command and read its contents directly with no intermediate file:
open my $fh, '-|', 'cmd', $blah;
while( <$fh> ) {
...
}
With short output, backticks might do the job, although in this case you have to be more careful to scrub the inputs so they aren't misinterpreted by the shell:
my $output = `cmd $blah`;
There are various modules on CPAN that handle this sort of thing, too.
Some comments on temporary files
The comments mentioned race conditions, so I thought I'd write a few things for those wondering what people are talking about.
In the original code, Andreas uses File::Temp, a module from the Perl Standard Library. However, they use the tmpnam POSIX-like call, which has this caveat in the docs:
Implementations of mktemp(), tmpnam(), and tempnam() are provided, but should be used with caution since they return only a filename that was valid when function was called, so cannot guarantee that the file will not exist by the time the caller opens the filename.
This is discouraged and was removed for Perl v5.22's POSIX.
That is, you get back the name of a file that does not exist yet. After you get the name, you don't know if that filename was made by another program. And, that unlink later can cause problems for one of the programs.
The "race condition" comes in when two programs that probably don't know about each other try to do the same thing as roughly the same time. Your program tries to make a temporary file named "foo", and so does some other program. They both might see at the same time that a file named "foo" does not exist, then try to create it. They both might succeed, and as they both write to it, they might interleave or overwrite the other's output. Then, one of those programs think it is done and calls unlink. Now the other program wonders what happened.
In the malicious exploit case, some bad actor knows a temporary file will show up, so it recognizes a new file and gets in there to read or write data.
But this can also happen within the same program. Two or more versions of the same program run at the same time and try to do the same thing. With randomized filenames, it is probably exceedingly rare that two running programs will choose the same name at the same time. However, we don't care how rare something is; we care how devastating the consequences are should it happen. And, rare is much more frequent than never.
File::Temp
Knowing all that, File::Temp handles the details of ensuring that you get a filehandle:
my( $fh, $name ) = File::Temp->tempfile;
This uses a default template to create the name. When the filehandle goes out of scope, File::Temp also cleans up the mess.
{
my( $fh, $name ) = File::Temp->tempfile;
print $fh ...;
...;
} # file cleaned up
Some systems might automatically clean up temp files, although I haven't care about that in years. Typically is was a batch thing (say once a week).
I often go one step further by giving my temporary filenames a template, where the Xs are literal characters the module recognizes and fills in with randomized characters:
my( $name, $fh ) = File::Temp->tempfile(
sprintf "$0-%d-XXXXXX", time );
I'm often doing this while I'm developing things so I can watch the program make the files (and in which order) and see what's in them. In production I probably want to obscure the source program name ($0) and the time; I don't want to make it easier to guess who's making which file.
A scratchpad
I can also open a temporary file with open by not giving it a filename. This is useful when you want to collect outside the program. Opening it read-write means you can output some stuff then move around that file (we show a fixed-length record example in Learning Perl):
open(my $tmp, "+>", undef) or die ...
print $tmp "Some stuff\n";
seek $tmp, 0, 0;
my $line = <$tmp>;
File::Temp opens the temp file in O_RDWR mode so all you have to do is use that one file handle for both reading and writing, even from external programs. The returned file handle is overloaded so that it stringifies to the temp file name so you can pass that to the external program. If that is dangerous for your purpose you can get the fileno() and redirect to /dev/fd/<fileno> instead.
All you have to do is mind your seeks and tells. :-) Just remember to always set autoflush!
use File::Temp;
use Data::Dump;
$fh = File::Temp->new;
$fh->autoflush;
system "ls /tmp/*.txt >> $fh" and die $!;
#lines = <$fh>;
printf "%s\n\n", Data::Dump::pp(\#lines);
print $fh "How now brown cow\n";
seek $fh, 0, 0 or die $!;
#lines2 = <$fh>;
printf "%s\n", Data::Dump::pp(\#lines2);
Which prints
[
"/tmp/cpan_htmlconvert_DPzx.txt\n",
"/tmp/cpan_htmlconvert_DunL.txt\n",
"/tmp/cpan_install_HfUe.txt\n",
"/tmp/cpan_install_XbD6.txt\n",
"/tmp/cpan_install_yzs9.txt\n",
]
[
"/tmp/cpan_htmlconvert_DPzx.txt\n",
"/tmp/cpan_htmlconvert_DunL.txt\n",
"/tmp/cpan_install_HfUe.txt\n",
"/tmp/cpan_install_XbD6.txt\n",
"/tmp/cpan_install_yzs9.txt\n",
"How now brown cow\n",
]
HTH

Perl::Critic in Brutal Mode

So I've recently started using Perl::Critic to check the quality of the code I've written. I'm running it in brutal mode and have one suggestion it is making which I don't understand as being an issue. The output is:
Return value of flagged function ignored - print at line 197, column 13. See pages 208,278 of PBP. (Severity: 1)
This is basically a call to the print function with a short message which outputs to the console. Why then should I capture the return value which will almost certainly always be 1 as I can't think of any use case where this wouldn't be a 1.
Is brutal mode being 'too brutal'? Or am I missing something? I should add that I did read pages 208 and 278 of the PBP and the answer is not clear to me.
I agree that most of the time print will not fail. But, you can disable this function by creating a .perlcriticrc file and adding these lines to it (like I do):
# Check all builtins except "print"
[InputOutput::RequireCheckedSyscalls]
functions = :builtins
exclude_functions = print
This is described in Perl::Critic::Policy::InputOutput::RequireCheckedSyscalls
Also, if you disagree with all the policies of the Brutal setting, you can just use one of the other 4 less brutal settings. The tool is highly configurable.
Here is a trivial case where print can fail (printing to a closed filehandle):
open my $fh, '>', 'out';
print $fh "555\n";
close $fh;
print $fh "888\n" or die "print failed: $!";
# we shouldn't get here
print "777\n";
In such short code, it is obvious that you just closed the filehandle, and you would never then try to print to it. But, if you had a lot of (poorly designed) code, maybe it would occur.
There are other reasons print could fail, such as if another process came along and deleted a directory or disabled write permissions on your open file.
I created a script for myself to run perlcritic which makes it easy to access the POD for a given policy: Sort and summarize perlcritic output
One use case where print "something"; fails is when STDOUT has been opened to a file and the file system is full. But in my projects I also do not check the return value of print.
To implement the fix for
print "$updated_service_name\n";
Use
my $printed = (print "$updated_service_name\n");
if (!$printed) {
die "Unable to write to stdout\n";
}

Start a pdf viewer from a Perl script

I have to start a pdf viewer from a Perl script. The viewer should
become detached from the parent process and the terminal that the parent process was run from. If I close the parent or the terminal the
viewer should still be kept running. I considered three approaches (using evince as the pdf viewer command):
Using system and sh:
system 'evince test.pdf &';
Using fork():
$SIG{CHLD} = "IGNORE"; #reap children as they complete
my $pid = fork();
if ( $pid == 0 ) {
exec 'evince', 'test.pdf';
}
Using Proc::Daemon:
use Proc::Daemon;
my $daemon = Proc::Daemon->new(
work_dir => '/tmp/evince',
child_STDOUT => '>>stdout.txt',
child_STDERR => '>>stderr.txt',
);
my $pid = $daemon->Init();
if ( $pid == 0 ) {
exec 'evince', 'test.pdf';
}
What would be the difference between these approaches? Which approach would you recommend?
system 'evince test.pdf &';
In my experience, this is likely to really be:
system 'evince $pdf_file &';
If $pdf_file is user input, then we get shell-injection bugs, such as passing in a pdf name of $(rm -rf /) or even just ;rm -rf /. And what if the name has a space in it? Well, you can avoid all that if you quote it, right?
system 'evince "$pdf_file" &';
Well, no, now all I have to do is give you a filename of ";rm -rf "/. And what if my pdf has a double quote in its name? You could use single quotes, but the same problem comes up if the filename has single quotes in it, and the shell injection isn't really any harder. You could come up with an elaborate shellify function that properly quotes a string all so that the shell can unquote it and get back to the original entry ... but that seems like so much more work than your other options, neither of which suffers from these problems.
$SIG{CHLD} = "IGNORE"; #reap children as they complete
my $pid = fork();
if ( $pid == 0 ) {
exec 'evince', 'test.pdf';
}
Setting a global $SIG{CHLD} is nice and easy ... unless you need to handle other children as they die. So only you can tell whether that's acceptable or not. And, again in my experience, not even always then. I've been bitten by this one - though rarely. I had this mixed in with an application that, elsewhere, used AnyEvent, and managed to break AE's subprocess handling. (The same would likely hold true if you mixed this with any event system, I just happened to be using AE.)
Also, this is missing the stdout and stderr redirects - and stdin redirect. That's easy enough to add - inside your if, before the exec, just close and reopen the filehandles as you need, e.g.:
close STDOUT; open STDOUT, '>', '/dev/null';
close STDERR; open STDERR, '>', '/dev/null';
close STDIN; open STDIN, '<', '/dev/null';
No big deal. However, Proc::Daemon does set up a few more things for you to ensure signals don't reach from one to the other process, in either direction. This depends on how severe you need to get.
For most of my purposes, I've found #2 to be sufficient. I've only reached for Proc::Daemon on a few projects, but that's where a) I have full control over the module installation, and b) it really matters. Starting a pdf viewer wouldn't normally be such a case.
I avoid #1 at all costs - I have had some fairly significant bites with shell injection, and now try to avoid the shell at all times.

How do I influence the width of Perl IPC::Open3 output?

I have the following Perl code and would like it to display exactly as invoking /bin/ls in the terminal would display. For example on a terminal sized to 100 columns, it would print up to 100 characters worth of output before inserting a newline. Instead this code prints 1 file per line of output. I feel like it involves assigning some terminal settings to the IO::Pty instance, but I've tried variations of that without luck.
UPDATE: I replaced the <$READER> with a call to sysread hoping the original code might just have a buffering issue, but the output received from sysread is still one file per line.
UPDATE: I added code showing my attempt at changing the IO::Pty's size via the clone_winsize_from method. This didn't result in the output being any different.
UPDATE: As best I can tell (from reading IPC::open3 code for version 1.12) it seems you cannot pass a variable of type IO::Handle without open3 creating a pipe rather than dup'ing the filehandle. This means isatty doesn't return a true value when ls invokes it and ls then forces itself into "one file per line" mode.
I think I just need to do a fork/exec and handle the I/O redirection myself.
#!/usr/bin/env perl
use IPC::Open3;
use IO::Pty;
use strict;
my $READER = IO::Pty->new();
$READER->slave->clone_winsize_from(\*STDIN);
my $pid = open3(undef, $READER, undef, "/bin/ls");
while(my $line = <$READER>)
{
print $line;
}
waitpid($pid, 0) or die "Error waiting for pid: $!\n";
$READER->close();
I think $READER is getting overwritten with a pipe created by open3, which can be avoided by changing
my $READER = ...;
my $pid = open3(undef, $READER, undef, "/bin/ls");
to
local *READER = ...;
my $pid = open3(undef, '>&READER', undef, "/bin/ls");
See the docs.
You can pass the -C option to ls to force it to use columnar output (without getting IO::Pty involved).
The IO::Pty docs describe a clone_winsize_from(\*FH) method. You might try cloning your actual pty's dimensions.
I see that you're setting up the pty only as stdout of the child process. You might need to set it up also as its stdin — when the child process sends the "query terminal size" escape sequence to its stdout, it would need to receive the response on its stdin.

exit code of system() call with a single scalar argument in Perl

There is a system() call in a Perl script with multiple pipes, using a single scalar argument. The call looks more or less like this:
system("zcat /foo.gz | grep '^.{6}X|Y|Z' | awk '{print $2,$3,$4,$6}' | bzip2 > /foo.processed.bz2");
The file in question (foo.gz) is quite large, about 2GB compressed in size. I guess that's why it was originally done via a system call.
Questions:
The problem now is, that this system call always seem to return 0, whether one of the system commands fail or not. I assume this is because it gets invoked via sh -c '...'. Is that correct?
Is there a way to check if a system() call was successful if only a single scalar argument is passed?
Is there a better way to process a large file like this, in a way thats equally or more efficient (in terms of speed mainly)?
Thanks for any hints as I am not really familiar with Perl.
Two things:
When you do a system call, the value returned is the last value in the pipeline. Thus, you're getting the status code of the bzip2 command.
The reason the program is doing this is because the people who wrote the program probably didn't know any better. I've seen Perl programs use system calls for finding the basename of the file, doing a find, and even doing a copy/rename/move. These are all things that can be done faster and easier inside the Perl program. And, you don't have the whole Windows/Unix compatibility issues.
You're always better off using Perl modules for things like this. In this case, I bet the Perl modules will be even faster than the shell pipeline, and you'll have more control over the entire operation.
There's a set called IO::Compress that can handle both Zip and BZip2.
I use Archive::Zip which is a great module, but you want to use the Bzip2 compression algorithm, and Archive::Zip can't handle that.
system() returns what the /bin/sh shell returns. When multiple commands are pipelined, the shell forks a new process for each of them and the status code of the last command in the chain is returned, in this case bzip2.
Based on your comments and answers, I'd do it like that now:
$infile =~ s/(.*\.gz)\s*$/gzip -dc < $1|/;
open(OUTFH, "| /bin/bzip > $outfile") or die "Can't open $outfile: $!";
open(INFH, $infile) or die "Can't open $infile: $!";
while (my $line = <INFH>) {
if ($line =~ /^.{6}X|Y|Z) {
# TODO: the awk part...
print OUTFH $line;
}
}
close(INFH);
close(OUTFH);
Please feel free to comment and vote up/down.
You'd be better doing the text processing from within perl itself - that's what perl's for :)
system() only ever returns 0 or 1. To capture actual output, try calling it via backticks: `command` rather than system('command')