Why does Perl crash when using LibXML after a fork? - perl

I'm having a hard time with this problem, but I've narrowed it down to using XML::LibXML after I've done a fork in Perl. I'm running Strawberry Perl 5.12.0 on Windows XP, and XML::LibXML 1.70.
I'm have a script where I need to run several processes concurrently, take the XML output and process the results. If I run this on a Solaris machine, it runs just fine and I have no issues. However, if I run this on Windows, I get a crash window stating that 'perl.exe has encountered a problem and needs to close.'
Here is a sample program that will generate the error:
use strict;
use warnings;
use XML::LibXML;
use Try::Tiny;
my $cmds = ['cmd1', 'cmd2'];
my #pids = ();
foreach my $cmd (#{$cmds}) {
my $pid = fork();
if ($pid) {
# parent
push (#pids, $pid);
} elsif ($pid == 0) {
XML::LibXML->load_xml(string=>'<root />'); # <-- this will crash it
exit 0;
}
}
foreach my $ch_pid (#pids) {
try {
waitpid($ch_pid, 0);
} catch {
carp("Error on waitpid: $!");
};
}
exit 0;
If I only have one process, or if I don't fork, then it will work successfully. If I remove the load_xml call (and have nothing in the child), then it will work successfully.
Does anyone know what may be causing this and how to fix it?

Special considerations need to be made when using XML::LibXML with threads. Some of these (particularly about initially loading the module) will also pertain to forks.
You can try removing the compile time load (the use XML::LibXML; line), and instead load in the module once you have forked:
} elsif ($pid == 0) {
require XML::LibXML;
XML::LibXML->load_xml(string=>'<root />');
exit 0;
}

Related

Understanding how Perl fork works

What would be the right way to fork processes that each one of them runs a different subroutine sub1,sub2,...,subN. After reading a lot of previous thread and material, I feel that I understand the logic but a bit confused on how to write in the cleanest way possible (readability is important to me).
Consider 4 subs. Each one of them gets different arguments. It feels like that the most efficient way would be to create 7 forks that each one of them will run a different sub. The code will look something like this:
my $forks = 0;
foreach my $i (1..4) {
if ($i == 1) {
my $pid = fork();
if ($pid == 0) {
$forks++;
run1();
exit;
}
} elsif ($i == 2) {
my $pid = fork();
if ($pid == 0) {
$forks++;
run1();
exit;
}
} elsif ($i == 3) {
my $pid = fork();
if ($pid == 0) {
$forks++;
run1();
exit;
}
} elsif ($i == 4) {
my $pid = fork();
if ($pid == 0) {
$forks++;
run1();
exit;
}
}
}
for (1 .. $forks) {
my $pid = wait();
print "Parent saw $pid exiting\n";
}
print "done\n";
Some points:
This will work only if all of the forks were successful. But I would like to run the subs even though the fork failed (even though it will not be parallel. In that case, I guess we need to take the subs out of the if and exit only if the $pid wasn't 0. something like:
my $pid = fork();
run1();
$forks++ if ($pid == 0);
exit if ($pid == 0);
But it still feels not right.
Using exit is the right way to kill the child process? if the processes were killed with exit should I still use wait? Will it prevent zombies?
Maybe the most interesting question: What will I do if we have 15 function calls? I would like to somehow create 15 forks but I can't create 15 if-else statements - the code will not be readable that way. At first, I thought that it is possible to insert those function calls into an array (somehow) and loop over that array. But after some research, I didn't find a way that it is possible.
If possible, I prefer not to use any additional modules like Parallel::ForkManager.
Is there a clean and simple way to solve it?
There are a few questions to clear up here.
A basic example
use warnings;
use strict;
use feature 'say';
my #coderefs;
for my $i (1..4) {
push #coderefs, sub {
my #args = #_;
say "Sub #$i with args: #args";
};
}
my #procs;
for my $i (0 .. $#coderefs) {
my $pid = fork // do {
warn "Can't fork: $!";
# retry, or record which subs failed so to run later
next;
};
if ($pid == 0) {
$coderefs[$i]->("In $$: $i");
exit;
}
push #procs, $pid;
#sleep 1;
}
say "Started: #procs";
for my $pid (#procs) {
my $goner = waitpid $pid, 0;
say "$goner exited with $?";
}
We generate anonymous subroutines and store those code references in an array, then go through that array and start that many processes, running a sub in each. After that the parent waitpids on these in the order in which they were started, but normally you'll want to reap as they exit; see docs listed below.
A child process always exits, or you'd have multiple processes executing all of the rest of the code in the program. Once a child process exits the kernel will notify the parent, and the parent can "pick up" that notification ("reap" the exit status of the child process) via wait/waitpid, or use a signal handler to handle/ignore it.
If the parent never does this after the child exited, once it exits itself later the OS stays stuck with that information about the (exited) child process in the process table; that's a zombie. So you do need to wait, so that OS gets done with the child process (and you check up on how it went). Or, you can indicate in a signal handler that you don't care about the child's exit.† Modern systems reap would-be zombies but not always and you cannot rely on that; clean up after yourself.
Note, you'll need to be reading perlipc, fork, wait and waitpid, perlvar ... and yet other resources that'll come up while working on all this. It will take a little playing and some trial and error. Once you get it all down you may want to start using modules, at least for some types of tasks.
† To ignore the SIGCHLD (default)
$SIG{CHLD} = 'IGNORE';
Or, can run code there (but well advised to be minimal)
$SIG{CHLD} = sub { ... };
These signal "dispositions" are inherited in fork-ed processes (but not via execve).
See the docs listed above, and the basics of %SIG variable in perlvar. Also see man(7) signal. All this is generally *nix business.
This is a global variable, affecting all code in the interpreter. In order to limit the change to the nearest scope use local
local $SIG{CHLD} = ...

Signalling problems when combining Net::OpenSSH and threads

I wrote a quite large program that executes commands on many remote hosts, but I ran into a serious problem, and I don't know how to fix it.
After a lot of trying around, I was able to extract the minimum code to reproduce the problem reliably on my machine:
use warnings;
use strict;
use threads;
use threads::shared;
use Data::Dumper;
use POSIX ":sys_wait_h";
use Net::OpenSSH;
use Time::HiRes qw( usleep );
my #LIST=qw(host038b host039a host039b host040a host040b host041a host041b host043a
host043b host044a host044b host045a host045b host046a host046b host047a host047b host049a
host049b host050a host050b host054a host054b host055a host055b host056a host056b host057a
host057b host058a host059a host059b host060a host060b host062a host062b host063a host068a
host068b host069a host069b host071a host071b host072a host073a host073b host075a host075b
host078a host078b host082a host082b host087a host087b host089a host089b host090a host090b
host091a host091b host092a host092b host096a host096b host097a host097b host098a host099a
host099b host100a);
my ($SSH, $CPID, %PIDS, #DONE);
sub _testthread {
# Read stdout pipe
my $SCROUT=shift;
while (<$SCROUT>) {
print $_; # I normally write that to a logfile
}
return (0);
}
foreach (#LIST) {
$SSH->{$_}=Net::OpenSSH->new($_, async => 1,
master_opts => [ -o => "PasswordAuthentication=no"]);
}
$SIG{CHLD} = sub { my $WPID;
push (#DONE, { 'PID' => $WPID, 'RC' => $?, 'ERR' => $!}) while (($WPID = waitpid(-1, WNOHANG)) > 0) };
foreach (#LIST) {
my ($SCRFH, $SCROUT, undef, $CPID) = $SSH->{$_}->open_ex({stdin_pipe => 1,
stdout_pipe => 1}, '/bin/bash -s');
$PIDS{$CPID}='ACTIVE';
threads->new('_testthread', $SCROUT);
print $SCRFH "sleep 2\n";
print $SCRFH "echo test `hostname`\n";
print $SCRFH "exit 0\n";
close $SCRFH;
usleep 10000;
}
while (grep(/^ACTIVE/, values(%PIDS)) > 0) {
print Dumper \%PIDS;
while (#DONE) {
my $DONE = shift (#DONE);
$PIDS{$DONE->{PID}}='DONE';
}
sleep 1;
}
$_->join foreach (threads->list);
With the preinstalled perl 5.10, this segfaults most of the time, even when taking out some more complicated constructs of redericting the open_ex output to a file descriptor.
With a newly compiled perl 5.18.2, this script hangs indefinitely most of the time, because it does not seem to receive every SIG{CHLD}, even though I am using safe signalling (as far as I understand).
To reproduce the problem, the following things seem to be necessary:
A sufficient number of hosts in #LIST
letting open_ex (or derived methods of Net::OpenSSH) fork
giving the STDOUT file handle of that fork to a thread
using a signal handler for SIG{CHLD}
As my bigger program which uses this structure is mostly unusable, I would be very happy if somebody could help me find a solution, maybe an alternative.
Thanks, and greetings,
Mazze
You are trying to mix signals and threads, and that's always going to be a bad idea. The solution is simply to stop using threads; whatever you're doing can be done nicer a different way.
Perhaps consider some kind of asynchronous/event-driven IO system, to perform these concurrent IO tasks with.

Perl (tk): how to run asynchronously a system command, being able to react to it's output?

I'm writing a wrapper to an external command ("sox", if this can help) with Perl "Tk".
I need to run it asynchronously, of course, to avoid blocking tk's MainLoop().
But, I need to read it's output to notify user about command's progress.
I am testing a solution like this one, using IPC::Open3:
{
$| = 1;
$pid = open3(gensym, ">&STDERR", \*FH, $cmd) or error("Errore running command \"$cmd\"");
}
while (defined($ch = FH->getc)) {
notifyUser($ch) if ($ch =~ /$re/);
}
waitpid $pid, 0;
$retval = $? >> 8;
POSIX::close($_) for 3 .. 1024; # close all open handles (arbitrary upper bound)
But of course the while loop blocks MainLoop until $cmd does terminate.
Is there some way to read output handle asynchronously?
Or should I go with standard fork stuff?
The solution should work under win32, too.
For non-blocking read of a filehandle, take a look at Tk::fileevent.
Here's an example script how one can use a pipe, a forked process, and fileevent together:
use strict;
use IO::Pipe;
use Tk;
my $pipe = IO::Pipe->new;
if (!fork) { # Child XXX check for failed forks missing
$pipe->writer;
$pipe->autoflush(1);
for (1..10) {
print $pipe "something $_\n";
select undef, undef, undef, 0.2;
}
exit;
}
$pipe->reader;
my $mw = tkinit;
my $text;
$mw->Label(-textvariable => \$text)->pack;
$mw->Button(-text => "Button", -command => sub { warn "Still working!" })->pack;
$mw->fileevent($pipe, 'readable', sub {
if ($pipe->eof) {
warn "EOF reached, closing pipe...";
$mw->fileevent($pipe, 'readable', '');
return;
}
warn "pipe is readable...\n";
chomp(my $line = <$pipe>);
$text = $line;
});
MainLoop;
Forking may or may not work under Windows. Also one needs to be cautious when forking within Tk; you must make sure that only one of the two processes is doing X11/GUI stuff, otherwise bad things will happen (X11 errors, crashes...). A good approach is to fork before creating the Tk MainWindow.

Open3 outputting to std error only and not asynchronously

I am relatively new to perl programming and I am trying to figure out how open3 works. Here is the code.
#!/usr/bin/perl
use strict;
use warnings;
use IPC::Open3;
my $dir = "/home/vman/Documents/Dev/perl_scripts/Provenance/temp";
my $fileHandle;
my $bashPid;
print "Starting main program\n";
my $pid = fork();
if($pid)#Parent process2
{
print("Start transfer.\n");
$bashPid = $pid;
#Attaching an strace to the executed command which happens in the child process
open3(\*WRITE, \*READ,\*ERROR,"strace", "-f", "-F", "-e", "trace=open,execve","-p", $bashPid, "-s", "2097152","-q");
while(<READ>)
{
print("Here1\n");
print("$_");
}
while(<ERROR>)
{
print("$_");
}
print("Finish transfer.\n");
}
elsif($pid == 0)
{
if (scalar(#ARGV == 0))
{
exit
}
my $args = join(' ', #ARGV);
exec($args);
}
else
{
die("Could not fork.");
}
close(READ);
close(WRITE);
close(ERROR);
waitpid($bashPid, 0);
print "End of main program\n";
I want to run an strace on a bash process, then capture all the output while it is being outputted. Then I will take that output and parse it to see what files are being changed by which process and I will save those changes in a mysql database. For now all I am trying to do is attach an strace onto an existing bash process and get the output of that strace printed within the bash terminal that is running just to make sure that it is asynchronously reading the output.
One of the problems is that I am getting the output through the ERROR filehandle. I am a little confused on to why this is happening. Am I using the correct order for open3 and if there is an error why is the correct output even making it to stderr?
The second problem I have is that I am getting the output only when exec ends which is no good since it needs to be done while exec is running. I thought open3 runs asynchronously.
As per suggested this is what I did and it works perfectly.
#!/usr/bin/perl
use strict;
use warnings;
use IPC::Run3;
my $bashPid;
print "Starting main program\n";
my $pid = fork();
if($pid)#Parent process
{
print("Start transfer.\n");
$bashPid = $pid;
#Attaching an strace to the executed command which happens in the child process
my $command = "strace -fFe trace=open,execve -p $bashPid -s 2097152 -q";
run3($command, \*STDIN, \*STDOUT, \*STDERR);
if ($?)
{
die "something went horribly wrong";
}
while(<STDERR>)
{
print($_);
}
print("Finish transfer.\n");
}
elsif($pid == 0)#cild process
{
if (scalar(#ARGV == 0))
{
exit
}
my $args = join(' ', #ARGV);
exec($args);
}
else
{
die("Could not fork.");
}
close(STDIN);
close(STDOUT);
close(STDERR);
waitpid($bashPid, 0);
print "End of main program\n";
One of the problems is that I am getting the output through the ERROR filehandle
Correct. strace writes to STDERR.
The second problem I have is that I am getting the output only when exec ends which is no good since it needs to be done while exec is running. I thought open3 runs asynchronously.
That's because you only start reading from the child's STDERR after the child closes its STDOUT when it ends.
In fact, you're lucky you haven't deadlocked yet. By reading one at a time as you are currently, doing, you'll deadlock when strace has output enough to fill the pipe.
You need to read from both the child's STDOUT and STDERR as it comes in. You could do this using with the help of select, polling non-blocking handle or threads. None of those options are as simple as ditching open3 and using a higher-level module that handles this for you. The simpler IPC::Run3 and the fully featured IPC::Run are good choices.

How do I fork correctly in a perl module for znc?

I'm currently writing an IRC bot. The scripts are loaded as perl modules in ZNC but the bot gets disconnected with an Input/Output error if I create a forked process. This is a working example script without fork, but this causes the bot to freeze until the script finishes doing its task.
package imdb;
use warnings;
use strict;
sub new
{
my ($class) = #_;
my $self = {};
bless( $self, $class );
return( $self );
}
sub OnChanMsg
{
my ($self, $nick, $channel,$text) = #_;
#unless (my $pid = fork()) {
my $result = a_slow_process($text);
ZNC::PutIRC( "PRIVMSG $channel :$result" );
# exit;
#}
return( ZNC::CONTINUE );
}
sub OnShutdown
{
my ( $me ) = #_;
}
sub a_slow_process {
my $input = shift;
sleep 10;
return "You said $input.";
}
1;
The fork code that is causing the error is commented out. How do I fix this?
Edited to add: I was told that ZNC::PutIRC should not be put in the child process.
A fork() call has effects on open file and socket handles, including:
File descriptors (and sometimes locks on
those descriptors) are shared, while everything else is copied.
...
Beginning with v5.6.0, Perl will attempt to flush all files
opened for output before forking the child process, but this
may not be supported on some platforms (see perlport). To be
safe, you may need to set $| ($AUTOFLUSH in English) or call
the "autoflush()" method of "IO::Handle" on any open handles in
order to avoid duplicate output.
and in general it is not a good idea to set up a socket connection in one process and try to read/write on that connection in a child process.
A workaround might be to make a new ZNC connection in the child process (after a_slow_process() is done), write your private message, and then close the new connection.
If you're not adverse to rewriting your module in c++, znc has a CExecSock which wraps popen2() and should do what you need. You can look in the shell.cpp module for example usage.