process hangs when writing large data to pipe - perl

I have a problem with hung processes with my Perl program, and I think I have isolated it to whenever I write significant amounts of data to a pipe.
Below is all of the code that I think is relevant from my program. When the program hangs, it hangs on the line in ResponseConstructor.pm: print { $self->{writer} } $data;.
I've tested with different data sizes, and it doesn't appear to hang at an exact size. It may become more likely with size, though: sizes around 32KB sometimes work, sometimes don't. Every time I've tried a 110KB string it has failed.
I believe I've also ruled out the contents of the data as a cause, because the same data sometimes causes a hang, and othertimes doesn't.
This is probably the first time I have used pipes in a program before, so I'm not sure what to try next. Any ideas?
use POSIX ":sys_wait_h";
STDOUT->autoflush(1);
pipe(my $pipe_reader, my $pipe_writer);
$pipe_writer->autoflush(1);
my $pid = fork;
if ($pid) {
#I am the parent
close $pipe_writer;
while (waitpid(-1, WNOHANG) <= 0){
#do some stuff while waiting for child to send data on pipe
}
#process the data it got
open(my $fh, '>', "myoutfile.txt");
while ( my $line = <$pipe_reader>){
print $fh $line;
}
close $pipe_reader;
close $fh;
else {
#I am the child
die "cannot fork: $!" unless defined $pid;
close $pipe_reader;
my $response = ResponseConstructor->new($pipe_writer);
if ([a condition where we want to return small data]){
$response->respond('small data');
exit;
}
elsif ([a condition where we want to return big data]){
$response->respond('imagine this is a really big string');
}
}
ResponseConstructor.pm:
package ResponseConstructor;
use strict;
use warnings;
sub new {
my $class = shift;
my $writer = shift;
my $self = {
writer => $writer
};
bless($self, $class);
return($self);
}
#Writes the response then closes the writer (pipe)
sub respond {
my $self = shift;
my $data = shift;
print { $self->{writer} } $data;
close $self->{writer};
}
1;

You probably shouldn't be ignoring your pipe while it is returning data: you can use a select on the pipe (instead of waitpid) to see if there's any data to read during your waiting loop, but if you really want a larger pipe buffer so you can read it all at once, you can use a socketpair instead of a pipe and then you can use setsockopt to make the buffer as large as you need it.

Related

How to get a user-configurable buffer for printing?

I'd like to have a print function supporting a user-configurable buffer, so to print what I have in the buffer only when the buffer is > a threshold).
I need to write multiple files, so I have multiple filehandles to write to, and for this an object oriented module might be handier.
I imagine something like this:
my $printer1 = Print::Buffer->new({ size => 1000, filehandle => \$OUT1 });
for (my $i=1; $i<1000; $i++) {
$printer1->print("This string will be eventually printed ($i/1000)");
}
# and at the end print the remaining buffer
$printer1->flush();
Any recommendation? I probably don't use the right keywords as with print/buffer I didn't find clear matches in CPAN.
UPDATE:
Thanks everyone for the very useful comments. As some of you pointed out, the problem is more complex than I initially thought, and probably a bad idea. (This question arose as I was printing very large files [>100Gb] in with a print statement at each loop iteration, and noted that if I was printing every hunderth iteration I had a speedup, but it could be dependent on how the loop was changed...)
UPDATE 2:
I need/want to accept an answer. To me both have been instructive and they are both useful. I tested both and they both need further work before being able to benchmark the improvement (if any, see update above). The tie handle is a less known feature that I loved, that's why I accepted that. They were both equally close to the desired answer in my opinion. Thank you all very much for the discussion and the insights.
I'd like to have a print function supporting a user-configurable buffer, [...]
I imagine something like this: [...]
It's not hard to write something like it. Here's a basic sketch
File PrintBuffer.pm
package PrintBuffer;
use warnings;
use strict;
sub new {
my ($class, %args) = #_;
my $self = {
_size => $args{size} // 64*1024, #//
_fh => $args{filehandle} // *STDOUT,
_buf => ''
};
$self->{_fh}->autoflush; # want it out once it's printed
bless $self, $class;
}
sub print {
my ($self, $string) = #_;
$self->{_buf} .= $string;
if ( length($self->{_buf}) > $self->{_size} ) {
print { $self->{_fh} } $self->{_buf};
$self->{_buf} = '';
}
return $self;
}
sub DESTROY {
my $self = shift;
print { $self->{_fh} } $self->{_buf} if $self->{_buf} ne '';
$self->{_buf} = '';
}
1;
There's a bit more to do here, and a whole lot that can be added, and since it relies only on basic tools one can add/change as desired.† For one, I can imagine a size method to manipulate the buffer size of an existing object (print if there's already more data than the new size), and flush.
Note that DESTROY method provides for the buffer to be printed as the object drops out of any scope, and is getting destroyed, what seems reasonable to do.
A driver
use warnings;
use strict;
use feature 'say';
use PrintBuffer;
my $fout = shift // die "Usage: $0 out-file\n";
open my $fh, '>', $fout or die "Can't open $fout: $!";
my $obj_file = PrintBuffer->new(size => 100, filehandle => $fh);
my $obj_stdout = PrintBuffer->new(size => 100);
$obj_file->print('a little bit');
$obj_stdout->print('a little bit');
say "printed 'a little bit' ..."; sleep 10;
$obj_file->print('out'x30); # push it over a 100 chars
$obj_stdout->print('out'x30);
say "printed 'out'x30 ... "; sleep 10;
$obj_file->print('again...'); # check DESTROY
$obj_stdout->print('again');
say "printed 'again' (and we're done)";
Check the size of output file in another terminal after each informational print.
I tried PerlIO::buffersize brought up by Grinnz in a comment and it seems to work "as advertised" as they say. It doesn't allow you to do all you may wish but it may be a ready solution for basic needs. Note that this doesn't work with :encoding layer in use.
Thanks to ikegami for comments and tests (linked in comments).
† The print works with an autoflush-ed handle. Still, the first change could be to use syswrite instead, which is unbuffered and attempts to directly write all that's asked of it, via one write(2) call. But since there's no guarantee that all got written we also need to check
use Carp; # for croak
WRITE: {
my $bytes_written = 0;
while ( $bytes_written < length $self->{_buf} ) {
my $rv = syswrite(
$self->{_fh},
$self->{_buf},
length($self->{_buf}) - $bytes_written,
$bytes_written
);
croak "Error writing: $!" if not defined $rv;
$bytes_written += $rv;
}
$self->{_buf} = '';
};
I've put this in a block only to limit the scope of $bytes_written and any other variables that one may wish to introduce so to reduce the number of dereferences of $self (but note that $self->{_buf} may be quite large and copying it "to optimize" dereferencing may end up slower).
Naively we'd only need syswrite(FH, SCALAR) but if it happens that not all of SCALAR gets written then we need to continue writing from past what was written, thus the need to use the form with length-to-write and offset as well.
Since this is unbuffered it mustn't be mixed with buffered IO (or that need be done very carefully); see the docs. Also, :encoding layers can't be used with it. Consider these restrictions against other capabilities that may be wanted in this class.
I don't see a general solution on CPAN, either. But this is straightforward enough with tied filehandles. Something like
use Symbol;
sub Print::Buffer::new {
my ($class,$mode,$file,#opts) = #_;
my $x = Symbol::gensym;
open ($x, $mode, $file) or die "failed to open '$file': $!";
tie *$x, "Print::Buffer", fh => $fh, #opts;
$x;
}
sub Print::Buffer::TIEHANDLE {
my $pkg = shift;
my $self = { #_ };
$self->{bufsize} //= 16 * 1024 * 1024;
$self->{_buffer} = "";
bless $self, $pkg;
}
sub Print::Buffer::PRINT {
my ($self,#msg) = #_;
$self->{buffer} .= join($,,#msg);
$self->_FLUSH if length($self->{buffer}) > $self->{bufsize};
}
sub Print::Buffer::_FLUSH {
my $self = shift;
print {$self->{fh}} $self->{buffer};
$self->{buffer} = "";
}
sub Print::Buffer::CLOSE {
my $self = shift;
$self->_FLUSH;
close( $self->{fh} );
}
sub Print::Buffer::DESTROY {
my $self = shift;
$self->_FLUSH;
}
# ----------------------------------------
my $fh1 = Print::Buffer->new(">", "/tmp/file1",
bufsize => 16*1024*1024);
for (my $i=1; $i<1000; $i++) {
print $fh1 "This string will be eventually printed ($i/1000)\n";
}

Reading from Perl pipe child process hanging on last element of handle array

When running the following code, the last server is not printed - the script 'hangs' after the second-last array element.
my %readers;
my $command = "pgrep -f weblogic.Name";
foreach my $server(#servers) {
pipe($readers{$server},WRITER);
unless(my $pid = fork()) {
my $response = qx(ssh -q oracle\#$server "$command");
print WRITER $response;
exit();
}
}
foreach my $server (#servers) {
my $fh = $readers{$server};
my #procs = <$fh>;
chomp(#procs);
for my $proc (#procs) {
printf "%s\t%s\n", substr($server,8), $proc;
}
}
print "end\n";
The output is as follows:
$ ./get_stuck.pl
92 18196
93 27420
94 17635
95 10258
96 10831
There should be a server '97' output after '96', yet there is not, and the script just hangs/stops at that point.
If I change the reader section to use a string instead of an array as follows:
foreach my $server (#servers) {
my $fh = $readers{$server};
my $procs = <$fh>;
printf "%s\n", $server;
}
...then the script prints all servers including '97', however, if there are multiple results from the command, this will only print the first result (seems to break on newline). In other words, if the command returns 3 process ids for a given server, only the first process id is printed.
Any suggestions on why using an array causes the script to hang on the last element? Or perhaps (less desirable) how I might use a string, but retrieve all results?
I haven't actually tried it, but:
This code looks like you're deadlocking yourself.
< > in list context slurps the whole file, i.e. it reads until it reaches end-of-file (EOF).
The file handle in question refers to a pipe.
The read end of a pipe reaches EOF when all open handles to the write end are closed.
You never close WRITER in the parent (the child implicitly closes it on exit).
Thus the parent is stuck reading from the pipe while holding the write end open.
The reason this only happens for the last array element is because you're using a bareword filehandle (WRITER), which is effectively a global variable. Re-opening the same handle implicitly closes it first; i.e. the (n+1)th iteration of the loop closes the nth pipe. Only the last WRITER is left open.
If I'm right, then the fix is:
foreach my $server(#servers) {
pipe($readers{$server},WRITER);
unless(my $pid = fork()) {
my $response = qx(ssh -q oracle\#$server "$command");
print WRITER $response;
exit();
}
close WRITER; # always close WRITER in the parent
}
But I would also recommend changing the code to this:
foreach my $server (#servers) {
pipe($readers{$server}, my $WRITER);
defined(my $pid = fork()) or die "$0: fork: $!\n";
unless($pid) {
my $response = qx(ssh -q oracle\#$server "$command");
print $WRITER $response;
exit();
}
close $WRITER;
}
I.e. check fork for errors and use a lexical variable instead of a bareword filehandle. In this case the close is actually optional because $WRITER is implicitly closed at the end of its scope (the current loop iteration) because there are no other references to it.
You could simplify it a bit more by using pipe open:
foreach my $server (#servers) {
open $readers{$server}, '-|', 'ssh', '-q', "oracle\#$server", $command
or die "$0: ssh: $!\n";
}
Finally,
my $fh = $readers{$server};
my #procs = <$fh>;
could be reduced to
my #proces = readline $readers{$server};
(I don't like the < > operator. In my opinion always writing either readline or glob explicitly makes it more readable.)

Perl (tk): how to run asynchronously a system command, being able to react to it's output?

I'm writing a wrapper to an external command ("sox", if this can help) with Perl "Tk".
I need to run it asynchronously, of course, to avoid blocking tk's MainLoop().
But, I need to read it's output to notify user about command's progress.
I am testing a solution like this one, using IPC::Open3:
{
$| = 1;
$pid = open3(gensym, ">&STDERR", \*FH, $cmd) or error("Errore running command \"$cmd\"");
}
while (defined($ch = FH->getc)) {
notifyUser($ch) if ($ch =~ /$re/);
}
waitpid $pid, 0;
$retval = $? >> 8;
POSIX::close($_) for 3 .. 1024; # close all open handles (arbitrary upper bound)
But of course the while loop blocks MainLoop until $cmd does terminate.
Is there some way to read output handle asynchronously?
Or should I go with standard fork stuff?
The solution should work under win32, too.
For non-blocking read of a filehandle, take a look at Tk::fileevent.
Here's an example script how one can use a pipe, a forked process, and fileevent together:
use strict;
use IO::Pipe;
use Tk;
my $pipe = IO::Pipe->new;
if (!fork) { # Child XXX check for failed forks missing
$pipe->writer;
$pipe->autoflush(1);
for (1..10) {
print $pipe "something $_\n";
select undef, undef, undef, 0.2;
}
exit;
}
$pipe->reader;
my $mw = tkinit;
my $text;
$mw->Label(-textvariable => \$text)->pack;
$mw->Button(-text => "Button", -command => sub { warn "Still working!" })->pack;
$mw->fileevent($pipe, 'readable', sub {
if ($pipe->eof) {
warn "EOF reached, closing pipe...";
$mw->fileevent($pipe, 'readable', '');
return;
}
warn "pipe is readable...\n";
chomp(my $line = <$pipe>);
$text = $line;
});
MainLoop;
Forking may or may not work under Windows. Also one needs to be cautious when forking within Tk; you must make sure that only one of the two processes is doing X11/GUI stuff, otherwise bad things will happen (X11 errors, crashes...). A good approach is to fork before creating the Tk MainWindow.

How do you tell if a pipe opened process has terminated?

Assuming a handle created with the following code:
use IO::File;
my $fh = IO::File->new;
my $pid = $fh->open('some_long_running_proc |') or die $!;
$fh->autoflush(1);
$fh->blocking(0);
and then read with a loop like this:
while (some_condition_here) {
my #lines = $fh->getlines;
...
sleep 1;
}
What do I put as some_condition_here that will return false if the process on the other end of the pipe has terminated?
Testing for $fh->eof will not work since the process could still be running without printing any new lines. Testing for $fh->opened doesn't seem to do anything useful.
Currently I am using $pid =! waitpid($pid, WNOHANG) which seems to work in POSIX compliant environments. Is this the best way? What about on Windows?
On using select,
use strict;
use warnings;
use IO::Select qw( );
sub process_msg {
my ($client, $msg) = #_;
chomp $msg;
print "$client->{id} said '$msg'\n";
return 1; # Return false to close the handle.
}
my $select = IO::Select->new();
my %clients;
for (...) {
my $fh = ...;
$clients{fileno($fh)} = {
id => '...'
buf => '',
# ...
};
$select->add($fh);
}
while (my #ready = $select->can_read) {
for my $fh (#ready) {
my $client = $clients{ fileno($fh) };
our $buf; local *buf = \( $client->{buf} );
my $rv = sysread($fh, $buf, 64*1024, length($buf));
if (!$rv) {
if (defined($rv)) {
print "[$client->{id} ended]\n";
} else {
print "[Error reading from $client->{id}: $!]\n";
}
print "[Incomplete message received from $client->{id}]\n"
if length($buf);
delete $clients{ fileno($fh) };
$select->remove($fh);
next;
}
while ($buf =~ s/^(.*\n)//) {
if (!process_msg($client, "$1")) {
print "[Dropping $client->{id}]\n";
delete $clients{ fileno($fh) };
$select->remove($fh);
last;
}
}
}
}
What's wrong with waiting for an actual EOF?
while (<$fh>) {
...
sleep 1;
}
You've set the handle for non-blocking reads, so it should just do the right thing. Indeed, given your example, you don't even need to set non-blocking and can get rid of the sleep.
Are there other things that you want to do while waiting on some_long_running_proc? If so, select is probably in your future.
There a number of options.
readline aka <$fh> will return false on eof (or error).
eof will return true on eof.
read (with block size > 0) will return defined and zero on eof.
sysread (with block size > 0) will return defined and zero on eof.
You can use select or make the handle non-blocking before any of the above to check without blocking.
You use select() to ascertain whether there is any data, or an exceptional condition such as a close.
Personally I prefer to use IO::Multiplex, especially where you're multiplexing input from several different descriptors, but that may not apply in this case.

How can I prevent the parent from blocking when writing to a child?

Recently I had a problem using (pipe |-) when I wanted to communicate between two processes.
Basically, the child process couldn't process STDIN as fast as it was filled up by parent. This caused parent to wait until STDIN was free and made it run slow.
How big can STDIN be and is it possible to modify it. If yes, what is the best practice size?
Here is some code sample to show what I mean:
if ($child_pid = open($child, "|-"))
{
$child->autoflush(1);
# PARENT process
while (1)
{
# Read packet from socket save in $packet
process_packet($packet);
# forward packet to child
print $child $packet;
}
}
else
{
die "Cannot fork: $!" unless defined $child_pid;
# CHILD process
my $line;
while($line = <STDIN>)
{
chomp $line;
another_process_packet($line);
}
}
In this sample another_process_packet slower than process_packet. The reason I write the code like this is, I want to use same data comes from socket and actually get it once.
Thanks in advance.
You can of course buffer in the parent process, and only write to the child when the child's fd is writable (i.e., writing won't block). You can do this yourself with the right args to syswrite, or use an event loop:
use AnyEvent;
use AnyEvent::Handle;
# make child, assume you write to it via $fh
my $done = AnyEvent->condvar;
my $h = AnyEvent::Handle->new( fh => $fh );
while( you do stuff ){
my $data = ...;
$h->push_write($data); # this will never block
}
$h->on_drain(sub { $done->send });
$done->wait; # now you block, waiting for all writes to actually complete
Edit: This used to be untested, but I tested it, and it works. (I used perl -ne "sleep 1; print $_" as the slow child.) Writes proceed during the while loop, if possible, but never block the loop. At the end, you actually block until all the writes have completed.
My test scripts are on gist.github: http://gist.github.com/126488
You can see how the child blocks the blocking loop, but how it doesn't block the non-blocking loop. Obvious when you put it that way ;)
(Finally, as a general rule of thumb; if you are interacting with the network or with other processes, you should probably be using an event loop.)
The size is set in the kernel. You can either recompile the kernel with a higher limit or use an intermediary buffer process.
Process handle contains a member function named 'blocking'. Just set the blocking to 0, and the parent process will not be blocked.
if ($child_pid = open($child, "|-"))
{
$child->blocking(0); # Key to the solution.
$child->autoflush(1);
# PARENT process
while (1)
{
# Read packet from socket save in $packet
process_packet($packet);
# forward packet to child
print $child $packet;
}
}
else
{
die "Cannot fork: $!" unless defined $child_pid;
# CHILD process
my $line;
while($line = <STDIN>)
{
chomp $line;
another_process_packet($line);
}
}