How to check if command executed with IPC::open3 is hung? - perl

I'm using the following script to capture STDIN, STDOUT and STDERR from the command passed as an argument.
#!/usr/bin/perl
use strict;
use warnings;
use IPC::Open3;
local(*CMD_IN, *CMD_OUT, *CMD_ERR);
my $pid = open3(*CMD_IN, *CMD_OUT, *CMD_ERR, $ARGV[0]);
close(CMD_IN);
my #stdout_output = <CMD_OUT>;
my #stderr_output = <CMD_ERR>;
close(CMD_OUT);
close(CMD_ERR);
waitpid ($pid, 0); # reap the exit code
print "OUT:\n", #stdout_output;
print "ERR:\n", #stderr_output;
It all works good with the exception that I'm not sure how to monitor if the command passed is hung. Could you please suggest a way?
I've borrowed this snippet originally from 'Programming Perl'.

You can use select or IO::Select and provide a timeout. If you want to read both from stdout and stderr, you should do that anyway (see the documentation of IPC::Open3).
Here's an example program using IO::Select:
#!/usr/bin/perl
use strict;
use warnings;
use IO::Select;
use IPC::Open3;
use Symbol 'gensym';
my ($cmd_in, $cmd_out, $cmd_err);
$cmd_err = gensym;
my $pid = open3($cmd_in, $cmd_out, $cmd_err, $ARGV[0]);
close($cmd_in);
my $select = IO::Select->new($cmd_out, $cmd_err);
my $stdout_output = '';
my $stderr_output = '';
while (my #ready = $select->can_read(5)) {
foreach my $handle (#ready) {
if (sysread($handle, my $buf, 4096)) {
if ($handle == $cmd_out) {
$stdout_output .= $buf;
}
else {
$stderr_output .= $buf;
}
}
else {
# EOF or error
$select->remove($handle);
}
}
}
if ($select->count) {
print "Timed out\n";
kill('TERM', $pid);
}
close($cmd_out);
close($cmd_err);
waitpid($pid, 0); # reap the exit code
print "OUT:\n", $stdout_output;
print "ERR:\n", $stderr_output;
Notes:
I use lexical vars for file handles. This requires the use of gensym for the stderr handle.
The argument to can_read is the timeout in seconds.
I use sysread for non-buffered IO.
I terminate the child if there's a read timeout.

I came up with the following solution heavily based on this answer.
However using select and avoiding signals as in nwellnhof's example looks much cleaner which is why I accepted it. I'm posting it here if somebody is interested:
my $pid = open3(*CMD_IN, *CMD_OUT, *CMD_ERR, $cmd);
if ($pid > 0){
eval{
local $SIG{ALRM} = sub {kill 9, $pid;};
alarm 6;
waitpid($pid, 0);
alarm 0;
};
}

Related

perl redirect stdout to lexical filehandle

I'm trying to write a helper function that runs a perl function in another process and returns a closure that produces a line of output at a time when called.
I figured out a way of doing this using pipe that mixes old and new-style filehandles. I used an old-style one for the sink in order to use the open(STDOUT, ">&thing") syntax and a new-style one for the source since it needs to be captured by a closure and I didn't want to burden the caller with providing a filehandle.
Is there a way of using a new-style filehandle in a construction with the same meaning as open(STDOUT, ">&thing")?
#!/usr/bin/env perl
# pipe.pl
# use pipe() to create a pair of fd's.
# write to one and read from the other.
#
# The source needs to be captured by the closure and can't be
# destructed at the end of get_reader(), so it has to be lexical.
#
# We need to be able to redirect stdout to sink in such a way that
# we actually dup the file descriptor (so shelling out works as intended).
# open(STDOUT, ">&FILEHANDLE") achieves this but appears to require an
# old-style filehandle.
use strict;
use warnings;
sub get_reader {
local *SINK;
my $source;
pipe($source, SINK) or die "can't open pipe!";
my $cpid = fork();
if ($cpid == -1) {
die 'failed to fork';
}
elsif ($cpid == 0) {
open STDOUT, ">&SINK" or die "can't open sink";
system("echo -n hi");
exit;
}
else {
return sub {
my $line = readline($source);
printf "from child (%s)\n", $line;
exit;
}
}
}
sub main {
my $reader = get_reader();
$reader->();
}
main();
When run, this produces
from child (hi)
as expected.
sub get_reader {
my ($cmd) = #_;
open(my $pipe, '-|', #$cmd);
return sub {
return undef if !$pipe;
my $line = <$pipe>;
if (!defined($line)) {
close($pipe);
$pipe = undef;
return undef;
}
chomp($line);
return $line;
};
}
If that's not good enough (e.g. because you also need to redirect the child's STDIN or STDERR), you can use IPC::Run instead.
use IPC::Run qw( start );
sub get_reader {
my ($cmd) = #_;
my $buf = '';
my $h = start($cmd, '>', \$buf);
return sub {
return undef if !$h;
while (1) {
if ($buf =~ s/^([^\n]*)\n//) {
return $1;
}
if (!$h->pump())) {
$h->finish();
$h = undef;
return substr($buf, 0, length($buf), '') if length($buf);
return undef;
}
}
};
}
Either way, you can now do
my $i = get_reader(['prog', 'arg', 'arg']);
while (defined( my $line = $i->() )) {
print "$line\n";
}
Either way, error handling left to you.

Reading STDOUT and STDERR of external command with no wait

I would like to execute external command rtmpdump and read it's STDOUT and STDERR separately, but not to wait till such command ends, but read its partial outputs in bulks, when available...
What is a safe way to do it in Perl?
This is a code I have that works "per-line" basis:
#!/usr/bin/perl
use warnings;
use strict;
use Symbol;
use IPC::Open3;
use IO::Select;
sub execute {
my($cmd) = #_;
print "[COMMAND]: $cmd\n";
my $pid = open3(my $in, my $out, my $err = gensym(), $cmd);
print "[PID]: $pid\n";
my $sel = new IO::Select;
$sel->add($out, $err);
while(my #fhs = $sel->can_read) {
foreach my $fh (#fhs) {
my $line = <$fh>;
unless(defined $line) {
$sel->remove($fh);
next;
}
if($fh == $out) {
print "[OUTPUT]: $line";
} elsif($fh == $err) {
print "[ERROR] : $line";
} else {
die "[ERROR]: This should never execute!";
}
}
}
waitpid($pid, 0);
}
But the above code works in text mode only, I believe. To use rtmpdump as a command, I need to collect partial outputs in binary mode, so do not read STDOUT line-by-line as it is in the above code.
Binary output of STDOUT should be stored in variable, not printed.
Using blocking functions (e.g. readline aka <>, read, etc) inside a select loop defies the use of select.
$sel->add($out, $err);
my %bufs;
while ($sel->count) {
for my $fh ($sel->can_read) {
my $rv = sysread($fh, $bufs{$fh}, 128*1024, length($bufs{$fh}));
if (!defined($rv)) {
# Error
die $! ;
}
if (!$rv) {
# Eof
$sel->remove($fh);
next;
}
if ($fh == $err) {
while ($bufs{$err} =~ s/^(.*\n)//) {
print "[ERROR] $1";
}
}
}
}
print "[ERROR] $bufs{$err}\n" if length($bufs{$err});
waitpid($pid, 0);
... do something with $bufs{$out} ...
But it would be much simpler to use IPC::Run.
use IPC::Run qw( run );
my ($out_buf, $err_buf);
run [ 'sh', '-c', $cmd ],
'>', \$out_buf,
'2>', sub {
$err_buf .= $_[0];
while ($err_buf =~ s/^(.*\n)//) {
print "[ERROR] $1";
}
};
print "[ERROR] $err_buf\n" if length($err_buf);
... do something with $out_buf ...
If you're on a POSIX system, try using Expect.pm. This is exactly the sort of problem it is designed to solve, and it also simplifies the task of sending keystrokes to the spawned process.

How to multithread seeing if a webpage exists in Perl?

I'm writing a Perl script that takes in a list of URLs and checks to see if they exist. (Note that I only care if they exist; I don’t care what their contents are. Here’s the important part of the program.
use LWP::Simple qw($ua head);
if (head($url))
{
$numberAlive ++;
}
else
{
$numberDead ++;
}
Right now the program works fine; however, I want it to run faster. Thus I'm considering making it multithreaded. I assume that the slow part of my program is contacting the server for each URL; therefore, I'm looking for a way in which I can send out requests to the URLs of other webpages on my list while I'm waiting for the first response. How can I do this? As far as I can tell, the head routine doesn't have a callback that can get called once the server has responded.
Begin with familiar-looking front matter.
#! /usr/bin/env perl
use strict;
use warnings;
use 5.10.0; # for // (defined-or)
use IO::Handle;
use IO::Select;
use LWP::Simple;
use POSIX qw/ :sys_wait_h /;
use Socket;
Global constants control program execution.
my $DEBUG = 0;
my $EXIT_COMMAND = "<EXIT>";
my $NJOBS = 10;
URLs to check arrive one per line on a worker’s end of the socket. For each URL, the worker calls LWP::Simple::head to determine whether the resource is fetchable. The worker then writes back to the socket a line of the form url : *status* where *status* is either "YES" or "NO" and represents the space character.
If the URL is $EXIT_COMMAND, then the worker exits immediately.
sub check_sites {
my($s) = #_;
warn "$0: [$$]: waiting for URL" if $DEBUG;
while (<$s>) {
chomp;
warn "$0: [$$]: got '$_'" if $DEBUG;
exit 0 if $_ eq $EXIT_COMMAND;
print $s "$_: ", (head($_) ? "YES" : "NO"), "\n";
}
die "NOTREACHED";
}
To create a worker, we start by creating a socketpair. The parent process will use one end and each worker (child) will use the other. We disable buffering at both ends and add the parent end to our IO::Select instance. We also note each child’s process ID so we can wait for all workers to finish.
sub create_worker {
my($sel,$kidpid) = #_;
socketpair my $parent, my $kid, AF_UNIX, SOCK_STREAM, PF_UNSPEC
or die "$0: socketpair: $!";
$_->autoflush(1) for $parent, $kid;
my $pid = fork // die "$0: fork: $!";
if ($pid) {
++$kidpid->{$pid};
close $kid or die "$0: close: $!";
$sel->add($parent);
}
else {
close $parent or die "$0: close: $!";
check_sites $kid;
die "NOTREACHED";
}
}
To dispatch URLs, the parent grabs as many readers as are available and hands out the same number of URLs from the job queue. Any workers that remain after the job queue is empty receive the exit command.
Note that print will fail if the underlying worker has already exited. The parent must ignore SIGPIPE to prevent immediate termination.
sub dispatch_jobs {
my($sel,$jobs) = #_;
foreach my $s ($sel->can_write) {
my $url = #$jobs ? shift #$jobs : $EXIT_COMMAND;
warn "$0 [$$]: sending '$url' to fd ", fileno $s if $DEBUG;
print $s $url, "\n" or $sel->remove($s);
}
}
By the time control reaches read_results, the workers have been created and received work. Now the parent uses can_read to wait for results to arrive from one or more workers. A defined result is an answer from the current worker, and an undefined result means the child has exited and closed the other end of the socket.
sub read_results {
my($sel,$results) = #_;
warn "$0 [$$]: waiting for readers" if $DEBUG;
foreach my $s ($sel->can_read) {
warn "$0: [$$]: reading from fd ", fileno $s if $DEBUG;
if (defined(my $result = <$s>)) {
chomp $result;
push #$results, $result;
warn "$0 [$$]: got '$result' from fd ", fileno $s if $DEBUG;
}
else {
warn "$0 [$$]: eof from fd ", fileno $s if $DEBUG;
$sel->remove($s);
}
}
}
The parent must keep track of live workers in order to collect all results.
sub reap_workers {
my($kidpid) = #_;
while ((my $pid = waitpid -1, WNOHANG) > 0) {
warn "$0: [$$]: reaped $pid" if $DEBUG;
delete $kidpid->{$pid};
}
}
Running the pool executes the subs above to dispatch all URLs and return all results.
sub run_pool {
my($n,#jobs) = #_;
my $sel = IO::Select->new;
my %kidpid;
my #results;
create_worker $sel, \%kidpid for 1 .. $n;
local $SIG{PIPE} = "IGNORE"; # writes to dead workers will fail
while (#jobs || keys %kidpid || $sel->handles) {
dispatch_jobs $sel, \#jobs;
read_results $sel, \#results;
reap_workers \%kidpid;
}
warn "$0 [$$]: returning #results" if $DEBUG;
#results;
}
Using an example main program
my #jobs = qw(
bogus
http://stackoverflow.com/
http://www.google.com/
http://www.yahoo.com/
);
my #results = run_pool $NJOBS, #jobs;
print $_, "\n" for #results;
the output is
bogus: NO
http://www.google.com/: YES
http://stackoverflow.com/: YES
http://www.yahoo.com/: YES
Another option is HTTP::Async.
#!/usr/bin/perl
use strict;
use warnings;
use HTTP::Request;
use HTTP::Async;
my $numberAlive = 0;
my $numberDead = 0;
my #urls = ('http://www.perl.com','http://www.example.xyzzy/foo.html');
my $async = HTTP::Async->new;
# you might want to wrap this in a loop to deal with #urls in batches
foreach my $url (#urls){
$async->add( HTTP::Request->new( HEAD => $url ) );
}
while ( my $response = $async->wait_for_next_response ) {
if ($response->code == 200){$numberAlive ++;}
else{$numberDead ++;}
}
print "$numberAlive Alive, $numberDead Dead\n";
Worker-based parallelisation (using your choice of threads or processes):
use strict;
use warnings;
use feature qw( say );
use threads; # or: use forks;
use LWP::Simple qw( head );
use Thread::Queue::Any qw( );
use constant NUM_WORKERS => 10; # Or whatever.
my $req_q = Thread::Queue::Any->new();
my $resp_q = Thread::Queue::Any->new();
my #workers;
for (1..NUM_WORKERS) {
push #workers, async {
while (my $url = $req_q->dequeue()) {
my $is_alive = head($url) ? 1 : 0;
$resp_q->enqueue($is_alive);
}
};
}
$req_q->enqueue($_) for #urls;
my ($alive, $dead);
for (1..#urls) {
my $is_alive = $resp_q->dequeue();
++( $is_alive ? $alive : $dead );
}
$req_q->enqueue(undef) for #workers;
$_->join for #workers;
say $alive;
say $dead;

Using the Inkscape shell from perl

Inkscape has a shell mode invoked like this
inkscape --shell
where you can execute commands like this:
some_svg_file.svg -e some_png_output.png -y 1.0 -b #ffffff -D -d 150
which will generate a PNG file, or like this:
/home/simone/some_text.svg -S
which gives you the bounding box of all elements in the file in the return message like this
svg2,0.72,-12.834,122.67281,12.942
layer1,0.72,-12.834,122.67281,12.942
text2985,0.72,-12.834,122.67281,12.942
tspan2987,0.72,-12.834,122.67281,12.942
The benefit of this is that you can perform operations on SVG files without having to restart Inkscape every time.
I would like to do something like this:
sub do_inkscape {
my ($file, $commands) = #_;
# capture output
return $output
}
Things work OK if I use open2 and forking like this:
use IPC::Open2;
$pid = open2(\*CHLD_OUT, \*CHLD_IN, 'inkscape --shell');
$\ = "\n"; $/ = ">";
my $out; open my $fh, '>', \$out;
if (!defined($kidpid = fork())) {
die "cannot fork: $!";
} elsif ($kidpid == 0) {
while (<>) { print CHLD_IN $_; }
} else {
while (<CHLD_OUT>) { chop; s/\s*$//gmi; print "\"$_\""; }
waitpid($kidpid, 0);
}
but I can't find out how to input only one line, and capture only that output without having to restart Inkscape every time.
Thanks
Simone
You don't need to fork, open2 handles that by itself. What you need to do is find a way of detecting when inkscape is waiting for input.
Here's a very basic example of how you could achieve that:
#! /usr/bin/perl
use strict;
use warnings;
use IPC::Open2;
sub read_until_prompt($) {
my ($fh) = (#_);
my $done = 0;
while (!$done) {
my $in;
read($fh, $in, 1);
if ($in eq '>') {
$done = 1;
} else {
print $in;
}
}
}
my ($is_in, $is_out);
my $pid = open2($is_out, $is_in, 'inkscape --shell');
read_until_prompt($is_out);
print "ready\n";
print $is_in "test.svg -S\n";
read_until_prompt($is_out);
print $is_in "quit\n";
waitpid $pid, 0;
print "done!\n";
The read_until_prompt reads from inkscapes output until it finds a > character, and assumes that when it sees one, inkscape is ready.
Note: This is too simple, you will probably need more logic in there to make it work more reliably if a > can appear outside the prompt in the output you're expecting. There is also no error checking at all in the above script, which is bad.

How do you tell if a pipe opened process has terminated?

Assuming a handle created with the following code:
use IO::File;
my $fh = IO::File->new;
my $pid = $fh->open('some_long_running_proc |') or die $!;
$fh->autoflush(1);
$fh->blocking(0);
and then read with a loop like this:
while (some_condition_here) {
my #lines = $fh->getlines;
...
sleep 1;
}
What do I put as some_condition_here that will return false if the process on the other end of the pipe has terminated?
Testing for $fh->eof will not work since the process could still be running without printing any new lines. Testing for $fh->opened doesn't seem to do anything useful.
Currently I am using $pid =! waitpid($pid, WNOHANG) which seems to work in POSIX compliant environments. Is this the best way? What about on Windows?
On using select,
use strict;
use warnings;
use IO::Select qw( );
sub process_msg {
my ($client, $msg) = #_;
chomp $msg;
print "$client->{id} said '$msg'\n";
return 1; # Return false to close the handle.
}
my $select = IO::Select->new();
my %clients;
for (...) {
my $fh = ...;
$clients{fileno($fh)} = {
id => '...'
buf => '',
# ...
};
$select->add($fh);
}
while (my #ready = $select->can_read) {
for my $fh (#ready) {
my $client = $clients{ fileno($fh) };
our $buf; local *buf = \( $client->{buf} );
my $rv = sysread($fh, $buf, 64*1024, length($buf));
if (!$rv) {
if (defined($rv)) {
print "[$client->{id} ended]\n";
} else {
print "[Error reading from $client->{id}: $!]\n";
}
print "[Incomplete message received from $client->{id}]\n"
if length($buf);
delete $clients{ fileno($fh) };
$select->remove($fh);
next;
}
while ($buf =~ s/^(.*\n)//) {
if (!process_msg($client, "$1")) {
print "[Dropping $client->{id}]\n";
delete $clients{ fileno($fh) };
$select->remove($fh);
last;
}
}
}
}
What's wrong with waiting for an actual EOF?
while (<$fh>) {
...
sleep 1;
}
You've set the handle for non-blocking reads, so it should just do the right thing. Indeed, given your example, you don't even need to set non-blocking and can get rid of the sleep.
Are there other things that you want to do while waiting on some_long_running_proc? If so, select is probably in your future.
There a number of options.
readline aka <$fh> will return false on eof (or error).
eof will return true on eof.
read (with block size > 0) will return defined and zero on eof.
sysread (with block size > 0) will return defined and zero on eof.
You can use select or make the handle non-blocking before any of the above to check without blocking.
You use select() to ascertain whether there is any data, or an exceptional condition such as a close.
Personally I prefer to use IO::Multiplex, especially where you're multiplexing input from several different descriptors, but that may not apply in this case.