xargs equivalent in perl - perl

I liked to make something, what xargs in shellscripting does. Thus:
Running external commands by the exec() system calls
as child processes
parallel
waiting their execution (ideally with a timeout)
How can it be done in perl?

You could use the Proc::Background module.
Particularly interesting is the sub timeout_system(..).
Here's an example coming from the Proc::Background module page:
use Proc::Background;
timeout_system($seconds, $command, $arg1);
timeout_system($seconds, "$command $arg1");
my $proc1 = Proc::Background->new($command, $arg1, $arg2);
my $proc2 = Proc::Background->new("$command $arg1 1>&2");
$proc1->alive;
$proc1->die;
$proc1->wait;
my $time1 = $proc1->start_time;
my $time2 = $proc1->end_time;
# Add an option to kill the process with die when the variable is
# DETROYed.
my $opts = {'die_upon_destroy' => 1};
my $proc3 = Proc::Background->new($opts, $command, $arg1, $arg2);
$proc3 = undef;

my #join;
push #join, fasync {
local $SIG{ALRM} = sub { die "alarm\n" };
alarm 10;
# exec(..);
sleep 20;
print "job1\n";
};
push #join, fasync {
print "job2\n";
};
# wait for jobs
$_->() for #join;
sub fasync(&) {
my ($worker) = #_;
my $pid = fork() // die "can't fork!";
if ($pid == 0) {
$worker->();
exit(0);
}
return sub {
my ($flags) = #_;
return waitpid($pid, $flags // 0);
}
}

Related

Run a subroutine in the background until a condition

How could I rewrite this so that info runs in the background until a $aw is equal result?
#!/usr/bin/env perl
use 5.12.0;
use warnings;
use Term::ReadLine;
my $term = Term::ReadLine->new( 'something' );
$term->ornaments( 0 );
sub info {
# in the real script this runs some computations instead of the sleep
# and returns some information.
my ( $time ) = #_;
sleep $time;
return $time * 2;
}
my $value_returned_by_info = info( 10 ); # run this in the background
my $aw;
$aw = $term->readline( 'User input: ' );
if ( $aw eq 'result' ) {
# if info() is still running in the background:
# wait until info() returns because "$value_returned_by_info" is needed.
say $value_returned_by_info;
}
else {
# if info() is still running in the background:
# let info() in the background because "$value_returned_by_info" is not needed here.
say $aw;
}
$aw = $term->readline( 'User input: ' );
if ( $aw eq 'result' ) {
# if info() is still running in the background:
# wait until info() returns because "$value_returned_by_info" is needed.
say $value_returned_by_info;
}
else {
# if info() is still running in the background:
# let info() in the background because "$value_returned_by_info" is not needed here.
say $aw;
}
$aw = $term->readline( 'User input: ' );
if ( $aw eq 'result' ) {
# if info() is still running in the background:
# wait until info() returns because "$value_returned_by_info" is needed.
say $value_returned_by_info;
}
else {
# if info() is still running in the background:
# let info() in the background because "$value_returned_by_info" is not needed here.
say $aw;
}
say "End";
I agree with user5402. The mention of running in the background and the sleeping info function raise a lot of questions.
I wonder if maybe you're looking for a more tidy way of re-prompting for input when the given input is not correct. If that's the case, then maybe the IO::Prompter module works for you.
#!/usr/bin/env perl
use 5.10.0;
use strict;
use warnings;
use IO::Prompter;
sub info {
my ($time) = #_;
sleep $time;
return $time * 2;
}
my $expect = info(10);
my $aw;
PROMPT:
{
$aw = IO::Prompter::prompt( 'Enter number', -i );
if ( $aw eq $expect ) {
say "$aw :)";
}
else {
say "$aw :(";
redo PROMPT;
}
}
say "End";
If info can run in a separate process then you can just use fork. Otherwise you will have to use a threaded version of perl.
Example of using fork:
sub start_info {
my #params = #_;
my $pipe;
my $pid = open($pipe, "-|");
if (!$pid) {
# this code will run in a sub-process
# compute $result from #params
# and print result to STDOUT
sleep(10);
my $result = "p = $params[0] - pid $$";
print $result;
exit(0);
};
my $r;
return sub {
return $r if defined($r);
$r = <$pipe>; # read a single line
waitpid $pid, 0;
$r;
};
}
sub prompt {
print "Hit return: ";
<STDIN>;
}
my $info1 = start_info(4);
prompt();
print "result = ", $info1->(), "\n";
my $info2 = start_info(30);
prompt();
print "result = ", $info2->(), "\n";

How to multithread seeing if a webpage exists in Perl?

I'm writing a Perl script that takes in a list of URLs and checks to see if they exist. (Note that I only care if they exist; I don’t care what their contents are. Here’s the important part of the program.
use LWP::Simple qw($ua head);
if (head($url))
{
$numberAlive ++;
}
else
{
$numberDead ++;
}
Right now the program works fine; however, I want it to run faster. Thus I'm considering making it multithreaded. I assume that the slow part of my program is contacting the server for each URL; therefore, I'm looking for a way in which I can send out requests to the URLs of other webpages on my list while I'm waiting for the first response. How can I do this? As far as I can tell, the head routine doesn't have a callback that can get called once the server has responded.
Begin with familiar-looking front matter.
#! /usr/bin/env perl
use strict;
use warnings;
use 5.10.0; # for // (defined-or)
use IO::Handle;
use IO::Select;
use LWP::Simple;
use POSIX qw/ :sys_wait_h /;
use Socket;
Global constants control program execution.
my $DEBUG = 0;
my $EXIT_COMMAND = "<EXIT>";
my $NJOBS = 10;
URLs to check arrive one per line on a worker’s end of the socket. For each URL, the worker calls LWP::Simple::head to determine whether the resource is fetchable. The worker then writes back to the socket a line of the form url : *status* where *status* is either "YES" or "NO" and represents the space character.
If the URL is $EXIT_COMMAND, then the worker exits immediately.
sub check_sites {
my($s) = #_;
warn "$0: [$$]: waiting for URL" if $DEBUG;
while (<$s>) {
chomp;
warn "$0: [$$]: got '$_'" if $DEBUG;
exit 0 if $_ eq $EXIT_COMMAND;
print $s "$_: ", (head($_) ? "YES" : "NO"), "\n";
}
die "NOTREACHED";
}
To create a worker, we start by creating a socketpair. The parent process will use one end and each worker (child) will use the other. We disable buffering at both ends and add the parent end to our IO::Select instance. We also note each child’s process ID so we can wait for all workers to finish.
sub create_worker {
my($sel,$kidpid) = #_;
socketpair my $parent, my $kid, AF_UNIX, SOCK_STREAM, PF_UNSPEC
or die "$0: socketpair: $!";
$_->autoflush(1) for $parent, $kid;
my $pid = fork // die "$0: fork: $!";
if ($pid) {
++$kidpid->{$pid};
close $kid or die "$0: close: $!";
$sel->add($parent);
}
else {
close $parent or die "$0: close: $!";
check_sites $kid;
die "NOTREACHED";
}
}
To dispatch URLs, the parent grabs as many readers as are available and hands out the same number of URLs from the job queue. Any workers that remain after the job queue is empty receive the exit command.
Note that print will fail if the underlying worker has already exited. The parent must ignore SIGPIPE to prevent immediate termination.
sub dispatch_jobs {
my($sel,$jobs) = #_;
foreach my $s ($sel->can_write) {
my $url = #$jobs ? shift #$jobs : $EXIT_COMMAND;
warn "$0 [$$]: sending '$url' to fd ", fileno $s if $DEBUG;
print $s $url, "\n" or $sel->remove($s);
}
}
By the time control reaches read_results, the workers have been created and received work. Now the parent uses can_read to wait for results to arrive from one or more workers. A defined result is an answer from the current worker, and an undefined result means the child has exited and closed the other end of the socket.
sub read_results {
my($sel,$results) = #_;
warn "$0 [$$]: waiting for readers" if $DEBUG;
foreach my $s ($sel->can_read) {
warn "$0: [$$]: reading from fd ", fileno $s if $DEBUG;
if (defined(my $result = <$s>)) {
chomp $result;
push #$results, $result;
warn "$0 [$$]: got '$result' from fd ", fileno $s if $DEBUG;
}
else {
warn "$0 [$$]: eof from fd ", fileno $s if $DEBUG;
$sel->remove($s);
}
}
}
The parent must keep track of live workers in order to collect all results.
sub reap_workers {
my($kidpid) = #_;
while ((my $pid = waitpid -1, WNOHANG) > 0) {
warn "$0: [$$]: reaped $pid" if $DEBUG;
delete $kidpid->{$pid};
}
}
Running the pool executes the subs above to dispatch all URLs and return all results.
sub run_pool {
my($n,#jobs) = #_;
my $sel = IO::Select->new;
my %kidpid;
my #results;
create_worker $sel, \%kidpid for 1 .. $n;
local $SIG{PIPE} = "IGNORE"; # writes to dead workers will fail
while (#jobs || keys %kidpid || $sel->handles) {
dispatch_jobs $sel, \#jobs;
read_results $sel, \#results;
reap_workers \%kidpid;
}
warn "$0 [$$]: returning #results" if $DEBUG;
#results;
}
Using an example main program
my #jobs = qw(
bogus
http://stackoverflow.com/
http://www.google.com/
http://www.yahoo.com/
);
my #results = run_pool $NJOBS, #jobs;
print $_, "\n" for #results;
the output is
bogus: NO
http://www.google.com/: YES
http://stackoverflow.com/: YES
http://www.yahoo.com/: YES
Another option is HTTP::Async.
#!/usr/bin/perl
use strict;
use warnings;
use HTTP::Request;
use HTTP::Async;
my $numberAlive = 0;
my $numberDead = 0;
my #urls = ('http://www.perl.com','http://www.example.xyzzy/foo.html');
my $async = HTTP::Async->new;
# you might want to wrap this in a loop to deal with #urls in batches
foreach my $url (#urls){
$async->add( HTTP::Request->new( HEAD => $url ) );
}
while ( my $response = $async->wait_for_next_response ) {
if ($response->code == 200){$numberAlive ++;}
else{$numberDead ++;}
}
print "$numberAlive Alive, $numberDead Dead\n";
Worker-based parallelisation (using your choice of threads or processes):
use strict;
use warnings;
use feature qw( say );
use threads; # or: use forks;
use LWP::Simple qw( head );
use Thread::Queue::Any qw( );
use constant NUM_WORKERS => 10; # Or whatever.
my $req_q = Thread::Queue::Any->new();
my $resp_q = Thread::Queue::Any->new();
my #workers;
for (1..NUM_WORKERS) {
push #workers, async {
while (my $url = $req_q->dequeue()) {
my $is_alive = head($url) ? 1 : 0;
$resp_q->enqueue($is_alive);
}
};
}
$req_q->enqueue($_) for #urls;
my ($alive, $dead);
for (1..#urls) {
my $is_alive = $resp_q->dequeue();
++( $is_alive ? $alive : $dead );
}
$req_q->enqueue(undef) for #workers;
$_->join for #workers;
say $alive;
say $dead;

Help passing reference to class subroutine in Perl

I am trying to pass a routine to another subroutine within a Perl module. But when I pass the sub reference the passed in ref no longer has the object data. Maybe its not possible to do it this way. The line I have a question about is the "unless" lines below:
sub get_flag_end {
my $self = shift;
return ( -e "$self->{file}" );
}
sub wait_for_end {
my $self = shift;
my $timeout = shift;
my $poll_interval = shift;
# Is it even possible to pass the oject subroutine and retain the objects data?
#unless ( $self->timeout( $timeout, $poll_interval, $self->get_flag_end ) ) { # does not work
unless ( $self->timeout( $timeout, $poll_interval, \&get_flag_end ) ) { # call happens but members are empty
die "!!!ERROR!!! Timed out while waiting for wait_for_end: timeout=$timeout, poll_interval=$poll_interval \n";
}
}
sub timeout {
my $self = shift;
my $timeout = shift;
my $poll_interval = shift;
my $test_condition = shift;
until ($test_condition->() || $timeout <= 0) {
$timeout -= $poll_interval;
sleep $poll_interval;
}
return $timeout > 0; # condition was met before timeout
}
I know that I could change the "get_flag_end" routine to take the value as an argument to the subroutine but what if there was a bunch of stuff done in "get_flag_end" and I needed more members from the object. I simplified the code a bit to make it a little easier to follow.
Just make a closure and pass that in:
sub wait_for_end {
my $self = shift;
my $timeout = shift;
my $poll_interval = shift;
my $callback = sub { $self->get_flag_end() };
unless ( $self->timeout( $timeout, $poll_interval, $callback ) ) {
die "!!!ERROR!!! Timed out while waiting for wait_for_end: timeout=$timeout, poll_interval=$poll_interval \n";
}
}
Update:
The other option is, since timeout is a method of the same class, pass in a method name.
sub wait_for_end {
my $self = shift;
my $timeout = shift;
my $poll_interval = shift;
my $callback = sub { $self->get_flag_end() };
unless ( $self->timeout( $timeout, $poll_interval, 'get_flag_end' ) ) {
die "!!!ERROR!!! Timed out while waiting for wait_for_end: timeout=$timeout, poll_interval=$poll_interval \n";
}
}
sub timeout {
my $self = shift;
my $timeout = shift;
my $poll_interval = shift;
my $method = shift;
# Do whatever
# Now call your method.
$self->$method();
}
In your $test_condition->() line, you are calling the subroutine but not passing it any arguments. Chances are what you meant was $test_condition->($self) or perhaps as $self->$test_condition
Here is a refactor of your code, correcting a few other issues:
sub get_flag_end {
my $self = shift;
return -e $self->{file}; # no need to quote the variable
}
sub wait_for_end {
my ($self, $timeout, $poll_interval) = #_; # unpack many args at once
unless ( $self->timeout( $timeout, $poll_interval, $self->can('get_flag_end') ) ) {
die "!!!ERROR!!! Timed out while waiting for wait_for_end: timeout=$timeout, poll_interval=$poll_interval \n";
}
}
sub timeout {
my ($self, $timeout, $poll_interval, $test_condition) = #_;
until ($self->$test_condition || $timeout <= 0) {
$timeout -= $poll_interval;
sleep $poll_interval;
}
return $timeout > 0; # condition was met before timeout
}
Depending on the rest of your implementation, creating a subroutine that knows its invocant might be better. You can do this in Perl with a closure:
unless ( $self->timeout( $timeout, $poll_interval, sub {$self->get_flag_end} )){
Here a new subroutine is created, which remembers the value of $self. You would call it without arguments $test_condition->()

How can I implement a timeout for a qx(command)?

How could I implement in this piece of code a timeout: if the "hwinfo --usb"-command didn't return anything after a certain amount of time, ( stop the command and ) do a return or die from the sub _usb_device.
#!/usr/bin/env perl
use warnings;
use strict;
sub _usb_device {
my #array;
{
local $/ = "";
#array = qx( hwinfo --usb );
}
...
...
}
Timeouts are usually done with alarms.
sub _usb_device
{
# Scope array
my #array;
# Try shell command
eval
{
local $SIG{ALRM} = sub { die "timeout\n" };
local $/ = "";
alarm 10;
#array = qx( hwinfo --usb );
alarm 0;
};
# Catch and rethrow non timout errors
die $# if $# && $# ne "timeout\n";
# Done
return #array;
}

How can I timeout a forked process that might hang?

I am writing a Perl script that will write some inputs and send those inputs to an external program. There is a small but non-zero chance that this program will hang, and I want to time it out:
my $pid = fork;
if ($pid > 0){
eval{
local $SIG{ALRM} = sub { die "TIMEOUT!"};
alarm $num_secs_to_timeout;
waitpid($pid, 0);
alarm 0;
};
}
elsif ($pid == 0){
exec('echo blahblah | program_of_interest');
exit(0);
}
As it stands now, after $num_secs_to_timeout, program_of_interest still persists. I tried to kill it in the anonymous subroutine for $SIG{ALRM} as follows:
local $SIG{ALRM} = sub{kill 9, $pid; die "TIMEOUT!"}
but this doesn't do anything. program_of_interest is still persisting. How do I go about killing this process?
I was able to successfully kill my exec()ed process by killing the process group, as shown as the answer to question In perl, killing child and its children when child was created using open. I modified my code as follows:
my $pid = fork;
if ($pid > 0){
eval{
local $SIG{ALRM} = sub {kill 9, -$PID; die "TIMEOUT!"};
alarm $num_secs_to_timeout;
waitpid($pid, 0);
alarm 0;
};
}
elsif ($pid == 0){
setpgrp(0,0);
exec('echo blahblah | program_of_interest');
exit(0);
}
After timeout, program_of_interest is successfully killed.
The above code (by strictlyrude27) didn't work out of the box, because -$PID is spelt in capitals.
(BTW: there's also: http://www.gnu.org/software/coreutils/manual/html_node/timeout-invocation.html)
Here's an example with test:
#!/usr/bin/perl
use strict;
use warnings;
use File::Basename;
my $prg = basename $0;
my $num_secs_sleep = 2;
my $num_secs_to_timeout = 1;
my $orig_program = "sleep $num_secs_sleep; echo \"Look ma, survived!\"";
my $program = $orig_program;
my $expect = "";
if (#ARGV){
if($ARGV[0] eq "test"){
test();
exit 0;
} elsif (#ARGV == 1) {
$num_secs_to_timeout = $ARGV[0];
} elsif (#ARGV == 2) {
$program = $ARGV[0];
$num_secs_to_timeout = $ARGV[1];
} else {
die "Usage: $prg [ \"test\" | [program] seconds ] "
}
}
if($orig_program eq $program) {
if(#ARGV < 2) {
$expect = $num_secs_to_timeout > $num_secs_sleep ?
"(we expected to survive.)" : "(we expected to TIME OUT!)";
}
print STDERR "sleeping: $num_secs_sleep seconds$/";
}
print STDERR <<END;
timeout after: $num_secs_to_timeout seconds,
running program: '$program'
END
if($orig_program eq $program) {
print STDERR "$expect$/";
}
exit Timed::timed($program, $num_secs_to_timeout);
sub test {
eval "use Test::More qw(no_plan);";
my $stdout;
close STDOUT;
open STDOUT, '>', \$stdout or die "Can't open STDOUT: $!";
Timed::timed("sleep 1", 3);
is($stdout, undef);
Timed::timed("sleep 2", 1);
is($stdout, "TIME OUT!$/");
}
################################################################################
package Timed;
use strict;
use warnings;
sub timed {
my $retval;
my ($program, $num_secs_to_timeout) = #_;
my $pid = fork;
if ($pid > 0){ # parent process
eval{
local $SIG{ALRM} =
sub {kill 9, -$pid; print STDOUT "TIME OUT!$/"; $retval = 124;};
alarm $num_secs_to_timeout;
waitpid($pid, 0);
alarm 0;
};
return defined($retval) ? $retval : $?>>8;
}
elsif ($pid == 0){ # child process
setpgrp(0,0);
exec($program);
} else { # forking not successful
}
}
Hmmm your code works for me, after some minor modifications - which I assume are changes made by yourself to make the code into a generic example.
So that leaves me with two ideas:
You removed the problem when you created the sample code - try creating a small sample that actually runs (I had to change 'program_of_interest' and $num_secs_to_timeout to real values to test it). Make sure the sample has the same problem.
It's something to do with the program_of_interest you're running - as far as I know, you can't mask a kill 9, but maybe there's something going on. Have you tried testing your code with a really simple script. I created one for my testing that goes while (1) { print "hi\n"; sleep 1; }
Something else.
Good luck...
The only way SIGKILL can be ignored is if the process is stuck in a system call which is uninterruptible. Check the state of the hung process (with ps aux) if the state is D, then the process can't be killed.
You might also want to check that the function is being called by outputting something from it.