following setup:
linux debian based 4.4.14-v7+ armv7l GNU/Linux
perl version is v5.20.2 built for arm-linux-gnueabihf-thread-multi-64int
A Perl script which should read a stream of data (hex chars, different length per line)
example stream output:
00 AA BB 11 22 33 44 ...
00 AB BB 11 22 33 44 ...
Depending on specific values the script should do specific actions.
Works fine however when the stream stops sending data, i.e. stream is finished sending data, the while loop does not get stopped. and the script waits for more lines.
The stream itself sends f.e. 5 sec of data, and then the analyse script should do_stuff;
once the analyse script is finished with the calculations, it will start the stream again.
However i am unable to figure out the reason why either the "next command" is not executed, or the while loop does not terminate correctly on no further lines.
If i manually start the stream process again the analyse script continues just fine until there are no more lines again.
simplified Code
#!/usr/bin/perl
#script to analyse data
use warnings;
use strict;
use POSIX ();
sub stop_stream($){
my $pid = shift;
my $cmd = "/bin/kill -9 $pid";
system($cmd);
}
while (1){
my $steampid = open( STREAM, "/usr/local/bin/stream |" );
STREAM: while ( my $row = <STREAM> ) {
chomp $row;
my #o = split /\s+/, $row;
#do_stuff;
#..
#this is the row on which the script hangs until it get's new lines in the filehandle.
next STREAM if $o[1] ne "AA";
#...
#do_other_stuff;
#...
}
stop_stream( $steampid );
close STREAM;
}
Resources i tried to figure out the issue:
http://perldoc.perl.org/perlsyn.html#Compound-Statements
http://www.perlmonks.org/?node_id=1065701
and numerous others.
I tried stackoverflow with some combination of "perl while loop close filehandle" to no success.
OK, the root of the problem here is that a while ( <FILEHANDLE> ) { loop will block if the file handle is open but there is no data to be read.
So it's likely that /usr/local/bin/stream continues piping data until killed - and so your reads block.
The easy solution is to use something like IO::Select which has can_read as an option:
use IO::Select;
open ( my $stream, '-|', '/usr/local/bin/stream' ) or die $!;
#register this filehandle with a new select object
my $select = IO::Select -> new ( $stream );
#2s stream timeout.
#works because can_read returns a list of filehandles ready for reading
#with only one filehandle registered, it can be used as a simple logic test.
while ( $select -> can_read(2) ) {
my $row = <$stream>;
#etc.
}
#etc.
Related
Let's say I have such a C program:
...
puts("What is your name?");
scanf("%s", name);
puts("Thank You!");
...
This program asks you to type your name, accepts input, displays "Thank You!" message and exits.
I want to write a program which automatically writes a name to my C program, receives output and prints it. Following program in perl works fine:
use strict;
use IPC::Open3;
my $bin = './myniceprogram';
my $pid = open3(\*CHLD_IN, \*CHLD_OUT, \*CHLD_ERR, "$bin")
or die "open3() failed $!";
my $r;
print CHLD_IN "A"x10 . "\n";
$r = <CHLD_OUT>;
print "$r";
$r = <CHLD_OUT>;
print "$r";
waitpid $pid, 0;
It produces following output:
What is your name?
Thank You!
However, I would want to read the first line from my C program ("What is your name?") BEFORE writing to it. But if I change the order of read/writes in my perl program, it just hangs:
use strict;
use IPC::Open3;
my $bin = './myniceprogram';
my $pid = open3(\*CHLD_IN, \*CHLD_OUT, \*CHLD_ERR, "$bin")
or die "open3() failed $!";
my $r;
$r = <CHLD_OUT>;
print "$r";
print CHLD_IN "A"x10 . "\n";
$r = <CHLD_OUT>;
print "$r";
waitpid $pid, 0;
With strace I can see that it's stuck on read call:
[myuser#myhost mydir]$ strace perl test2.pl
...
close(6) = 0
close(8) = 0
read(5,
But why?
You are suffering from buffering. The child program is probably using the normal convention that STDOUT is line-buffered when connected to a terminal, but block-buffered otherwise.
If this is the case, you can fool it into flushing its buffer when it outputs a line feed by using a pseudo-tty (ptty). A simple way of doing this is by executing unbuffer ./myniceprogram. IPC::Run also provides a simple mechanism for using pseudo-ttys. (Simply use <pty< and >pty> instead of < and >.)
But that's not going to work for "binary data" (data not organized into LF-terminated lines). A program that uses line-buffering for binary data should be considered buggy. If it's not meant to be interactive, it should use block buffering at all times. But this one is meant to be interactive, so it should flush its buffer at the appropriate time or avoid buffering entirely. Fixing this problem will require modifying myniceprogram.
Similarly, you probably want to disable buffering on CHLD_OUT. There are two ways of doing this:
Turn on autoflush for the handle:
CHLD_OUT->autoflush( 1 );
Flush for the handle after every write (print):
CHLD_OUT->flush();
Can you hook the opening of the DATA handle for a module while Perl is still compiling? And by that I mean is there a way that I can insert code that will run after Perl has opened the DATA glob for reading but before the compilation phase has ceased.
Failing that, can you at least see the raw text after __DATA__ before the compiler opens it up?
In response to Ikegami, on recent scripts that I have been working on, I have been using __DATA__ section + YAML syntax to configure the script. I've also been building up a vocabulary of YAML configuration handlers where the behavior is requested by use-ing the modules. And in some scripts that are quick-n-dirty, but not quite enough to forgo strict, I wanted to see if I could expose variables from the YAML specification.
It's been slightly annoying though just saving data in the import subs and then waiting for an INIT block to process the YAML. But it's been doable.
The file handle in DATA is none other than the handle the parser uses to read the code found before __DATA__. If that code is still being compiled, then __DATA__ hasn't been reached, then the handle hasn't been stored in DATA.
You could do something like the following instead:
open(my $data_fh, '<', \<<'__EOI__');
.
. Hunk of text readable via $data_fh
.
__EOI__
I don’t know where you want the hook. Probably in UNITCHECK.
use warnings;
sub i'm {
print "in #_\n";
print scalar <DATA>;
}
BEGIN { i'm "BEGIN" }
UNITCHECK { i'm "UNITCHECK" }
CHECK { i'm "CHECK" }
INIT { i'm "INIT" }
END { i'm "END" }
i'm "main";
exit;
__END__
Data line one.
Data line two.
Data line three.
Data line four.
Data line five.
Data line six.
Produces this when run:
in BEGIN
readline() on unopened filehandle DATA at /tmp/d line 5.
in UNITCHECK
Data line one.
in CHECK
Data line two.
in INIT
Data line three.
in main
Data line four.
in END
Data line five.
You can use any of the before runtime but after compilation blocks to change the *DATA handle. Here is a short example using INIT to change *DATA to uc.
while (<DATA>) {
print;
}
INIT { # after compile time, so DATA is opened, but before runtime.
local $/;
my $file = uc <DATA>;
open *DATA, '<', \$file;
}
__DATA__
hello,
world!
prints:
HELLO,
WORLD!
Which one of the blocks to use depends on other factors in your program. More detail about the various timed blocks can be found on the perlmod manpage.
I'm afraid not, if I got your question right. It's written in The Doc:
Note that you cannot read from the DATA filehandle in a BEGIN block:
the BEGIN block is executed as soon as it is seen (during
compilation), at which point the corresponding DATA (or END)
token has not yet been seen.
There's another way, though: read the file with DATA section as a normal text file, parse this section, then require the script file itself (which will be done at run-time). Don't know whether it'll be relevant in your case. )
perlmod says:
CHECK code blocks are run just after the initial Perl compile phase ends and before the run time begins, in LIFO order.
May be you are looking for something like this?
CHECK {
say "Reading from <DATA> ...";
while (<DATA>) {
print;
$main::count++;
};
}
say "Read $main::count lines from <DATA>";
__DATA__
1
2
3
4
5
This produces the following output:
Reading from <DATA> ...
1
2
3
4
5
Read 5 lines from <DATA>
I found out that ::STDIN actually gives me access to the stream '-'. And that I can save the current location, through tell( $inh ) and then seek() it when I'm done.
By using that method, I could read the __DATA__ section in the import sub!
sub import {
my ( $caller, $file ) = ( caller 0 )[0,1];
my $yaml;
if ( $file eq '-' ) {
my $place = tell( ::STDIN );
local $RS;
$yaml = <::STDIN>;
seek( ::STDIN, $place, 0 );
}
else {
open( my $inh, '<', $file );
local $_ = '';
while ( defined() and !m/^__DATA__$/ ) { $_ = <$inh>; }
local $RS;
$yaml = <$inh>;
close $inh;
}
if ( $yaml ) {
my ( $config ) = YAML::XS::Load( $yaml );;
no strict 'refs';
while ( my ( $n, $v ) = each %$config ) {
*{"$caller\::$n"} = ref $v ? $v : \$v;
}
}
return;
}
This worked on Strawberry Perl 5.16.2, so I don't know how portable this is. But right now, to me, this is working.
Just a background. I used to do a bit of programming with Windows Script Files. One thing I liked about the wsf format was that you could specify globally useful objects outside of the code. <object id="xl" progid="Application.Excel" />. I have always liked the look of programming by specification and letting some modular handler sort the data out. Now I can get a similar behavior through a YAML handler: excel: !ActiveX: Excel.Application.
This works for me.
The test is here, in case you're interested:
use strict;
use warnings;
use English qw<$RS>;
use Test::More;
use data_mayhem; # <-- that's my module.
is( $k, 'Excel.Application' );
is( $l[1], 'two' );
{ local $RS;
my $data = <DATA>;
isnt( $data, '' );
say $data
}
done_testing;
__DATA__
---
k : !ActiveX Excel.Application
l :
- one
- two
- three
I am using IO::Select to keep track of a variable number of file handles for reading. Documentation I've come across strongly suggests not to combine the select statement with <> (readline) for reading from the file handles.
My situation:
I will only ever use each file handle once, i.e. when the select offers me the file handle, it will be completely used and then removed from the select. I will be receiving a hash and a variable number of files. I do not mind if this blocks for a time.
For more context, I am a client sending information to be processed by my servers. Each file handle is a different server I'm talking to. Once the server is finished, a hash result will be sent back to me from each one. Inside that hash is a number indicating the number of files to follow.
I wish to use readline in order to integrate with existing project code for transferring Perl objects and files.
Sample code:
my $read_set = IO::Select()->new;
my $count = #agents_to_run; #array comes as an argument
for $agent ( #agents_to_run ) {
( $sock, my $peerhost, my $peerport )
= server($config_settings{ $agent }->
{ 'Host' },$config_settings{ $agent }->{ 'Port' };
$read_set->add( $sock );
}
while ( $count > 0) {
my #rh_set = IO::Select->can_read();
for my $rh ( #{ $rh_set } ) {
my %results = <$rh>;
my $num_files = $results{'numFiles'};
my #files = ();
for (my i; i < $num_files; i++) {
$files[i]=<$rh>;
}
#process results, close fh, decrement count, etc
}
}
Using readline (aka <>) is quite wrong for two reasons: It's buffered, and it's blocking.
Buffering is bad
More precisely, buffering using buffers that cannot be inspected is bad.
The system can do all the buffering it wants, since you can peek into its buffers using select.
Perl's IO system cannot be allowed to do any buffering because you cannot peek into its buffers.
Let's look at an example of what can happen using readline in a select loop.
"abc\ndef\n" arrives on the handle.
select notifies you that there is data to read.
readline will try to read a chunk from the handle.
"abc\ndef\n" will be placed in Perl's buffer for the handle.
readline will return "abc\n".
At this point, you call select again, and you want it to let you know that there is more to read ("def\n"). However, select will report there is nothing to read since select is a system call, and the data has already been read from the system. That means you will have to wait for more to come in before being able to read "def\n".
The following program illustrates this:
use IO::Select qw( );
use IO::Handle qw( );
sub producer {
my ($fh) = #_;
for (;;) {
print($fh time(), "\n") or die;
print($fh time(), "\n") or die;
sleep(3);
}
}
sub consumer {
my ($fh) = #_;
my $sel = IO::Select->new($fh);
while ($sel->can_read()) {
my $got = <$fh>;
last if !defined($got);
chomp $got;
print("It took ", (time()-$got), " seconds to get the msg\n");
}
}
pipe(my $rfh, my $wfh) or die;
$wfh->autoflush(1);
fork() ? producer($wfh) : consumer($rfh);
Output:
It took 0 seconds to get the msg
It took 3 seconds to get the msg
It took 0 seconds to get the msg
It took 3 seconds to get the msg
It took 0 seconds to get the msg
...
This can be fixed using non-buffered IO:
sub consumer {
my ($fh) = #_;
my $sel = IO::Select->new($fh);
my $buf = '';
while ($sel->can_read()) {
sysread($fh, $buf, 64*1024, length($buf)) or last;
while ( my ($got) = $buf =~ s/^(.*)\n// ) {
print("It took ", (time()-$got), " seconds to get the msg\n");
}
}
}
Output:
It took 0 seconds to get the msg
It took 0 seconds to get the msg
It took 0 seconds to get the msg
It took 0 seconds to get the msg
It took 0 seconds to get the msg
It took 0 seconds to get the msg
...
Blocking is bad
Let's look at an example of what can happen using readline in a select loop.
"abcdef" arrives on the handle.
select notifies you that there is data to read.
readline will try to read a chunk from the socket.
"abcdef" will be placed in Perl's buffer for the handle.
readline hasn't received a newline, so it tries to read another chunk from the socket.
There is no more data currently available, so it blocks.
This defies the purpose of using select.
[ Demo code forthcoming ]
Solution
You have to implement a version of readline that doesn't block, and only uses buffers you can inspect. The second part is easy because you can inspect the buffers you create.
Create a buffer for each handle.
When data arrives from a handle, read it but no more. When data is waiting (as we know from select), sysread will return what's available without waiting for more to arrive. That makes sysread perfect for this task.
Append the data read to the appropriate buffer.
For each complete message in the buffer, extract it and process it.
Adding a handle:
$select->add($fh);
$clients{fileno($fh)} = {
buf => '',
...
};
select loop:
use experimental qw( refaliasing declared_refs );
while (my #ready = $select->can_read) {
for my $fh (#ready) {
my $client = $clients{fileno($fh)};
my \$buf = \($client->{buf}); # Make $buf an alias for $client->{buf}
my $rv = sysread($fh, $buf, 64*1024, length($buf));
if (!$rv) {
delete $clients{fileno($fh)};
$sel->remove($fh);
if (!defined($rv)) {
... # Handle error
}
elsif (length($buf)) {
... # Handle eof with partial message
}
else {
... # Handle eof
}
next;
}
while ( my ($msg) = $buf =~ s/^(.*)\n// )
... # Process message.
}
}
}
By the way, this is much easier to do using threads, and this doesn't even handle writers!
Note that IPC::Run can do all the hard work for you if you're communicating with a child process, and that asynchronous IO can be used as an alternative to select.
After much discussion with #ikegami, we determined that in my extremely specific case the readline is actually not an issue. I'm still leaving ikegami's as the accepted right answer because it is far and away the best way to handle the general situation, and a wonderful writeup.
Readline (aka <>) is acceptable in my situation due to the following facts:
The handle is only returned once from the select statement, and then it is closed/removed
I only send one message through the file handle
I do not care if read handles block
I am accounting for timeouts and closed handle returns from select (error checking not included in the sample code above)
I'm writing a Perl script which uses an external script. The external script must run from a specific directory so I found the following useful:
use IPC::System::Simple qw(capture);
my #args = ('external script path...', 'arg1', ...);
my $out = capture( [0], "cd $dir ; #args" );
Sometimes the external script writes stuff to STDERR but still returns 0. I wish to capture these times and confess (or die). Since I don't control the return value of the external script, I thought maybe I could capture its STDERR so I'll have something like this:
my ($out, $err) = cool_capture( [0], "cd $dir ; #args" );
say "Output was: $out";
if ($err) {
die "Error: this was written to STDERR: $err";
}
What can I do?
This is covered in the Perl FAQ.
Presuming test_app is a program that outputs one line to stdout and one line to stderr:
use IPC::Open3;
use Symbol 'gensym';
my($wtr, $rdr, $err);
$err = gensym;
my $pid = open3($wtr, $rdr, $err, 'test_app');
waitpid($pid, 0);
my $status = $? >> 8;
my $stdout = <$rdr>;
my $stderr = <$err>;
print "out output: $stdout\n";
print "err output: $stderr\n";
print "Exit code: $status\n";
EDIT: Per the request updated to include capturing the exit code. You could also have asked perldoc IPC::Open3 which says
waitpid( $pid, 0 );
my $child_exit_status = $? >> 8;
And which you should read anyway for its cautions and caveats.
If significant output is being written to stdout and/or stderr or you're both reading and writing to the process. You need to be a lot more careful with your I/O handling to avoid various blocking problems.
my ($wtr, $rdr, $err) ;
my $pid = IPC::Open3::open3($wtr, $rdr, $err, #_);
close($wtr);
my $stdout = '';
my $stderr = '';
my $s = IO::Select->new;
$s->add($rdr) if $rdr;
$s->add($err) if $err;
while (my #ready = $s->can_read) {
foreach my $ioh (#ready) {
my $bytes_read = sysread($ioh, my $chunk = '', 1024);
die "read error: $!" unless $bytes_read >= 0;
if ($bytes_read) {
($ioh eq $rdr? $stdout: $stderr) .= $chunk;
}
else {
$s->remove($ioh);
}
}
}
my $pid1;
for (;;) {
last if kill(0, $pid);
$pid1 = wait();
#
# Wait until we see the process or -1 (no active processes);
#
last if ($pid1 == $pid || $pid1 <= 0);
}
Finish reading before you shutdown the process. If you're writing to the process's stdin, you'd also need to add $wtr and syswrite to the above select loop.
EDIT
Rationale:
The above is probably overkill for simple cases. This advanced handling of input and output comes into play when you're likely to move more than a few K of data.
You wouldn't need it if you were executing a 'df' command for example.
However, it's when system buffers for any of stdin, stdout or stderr fill up that blocking becomes likely and things can get more involved.
If the child process fills up the stderr and/or stdout buffers, it'll likely block and wait for you to clear them. But if you're waiting for the process finish before you read from stdout or stderr; thats a deadlock. You'll likely to see that the system call never finishes and the child process never completes.
There's a similar possibility of deadlock if stdin is being written to, but the child process is unable to consume the input. This is particularly likely in a 'pipe' situation where the child process is consuming input and writing to stdout.
The select loop is about progressively clearing the buffers to avoid blocking. Both stdout and stderr are monitored concurrently.
If you're writing to stdin and reading from stdout (a pipe), you'll want to keep stdout and stderr clear and only write to stdin when its ready to receive input.
Simply waiting for the process to finish, then reading stdout/stderr probably works 90% of the time. This reply is just to give you somewhere to go if things get more complicated and processes start to block or go into deadlock.
EDIT2
As for which to use, I'd say start simple, test hard.
Go with Sorpigal's approach, but try to stress test with higher data volumes and under more difficult loads and conditionals that you'd ever expect in a live system.
I need to run a string through a Java program and then retrieve the output. The Java program accepts the string through standard input. The following works:
my $output = `echo $string | java -jar java_program.jar`;
There is one problem: $string could be just about anything. Any thoughts on a good solution to this problem?
I suggest you to look at IPC::Run3 module. It uses very simple interface and allow to get STDERR and STDOUT. Here is small example:
use IPC::Run3;
## store command output here
my ($cmd_out, $cmd_err);
my $cmd_input = "put your input string here";
run3([ 'java', '-jar', 'java_program.jar'], \$cmd_input, \$cmd_out, \$cmd_err);
print "command output [$cmd_out] error [$cmd_err]\n";
See IPC::Run3 comparation with other modules.
If you can use CPAN modules (and I'm assuming most people can), look at Ivan's answer on using IPC::Run3. It should handle everything you need.
If you can't use modules, here's how to do things the plain vanilla way.
You can use a pipe to do your input, and it will avoid all those command line quoting issues:
open PIPE, "| java -jar java_program.jar";
print PIPE "$string";
close(PIPE);
It looks like you actually need the output of the command, though. You could open two pipes with something like IPC::Open2 (to and from the java process) but you risk putting yourself in deadlock trying to deal with both pipes at the same time.
You can avoid that by having java output to a file, then reading from that file:
open PIPE, "| java -jar java_program.jar > output.txt";
print PIPE "$string";
close(PIPE);
open OUTPUT, "output.txt";
while (my $line = <OUTPUT>) {
# do something with $line
}
close(OUTPUT);
The other option is to do things the other way around. Put $string in a temporary file, then use it as input to java:
open INPUT, "input.txt";
print INPUT "$string";
close(INPUT);
open OUTPUT, "java -jar java_program.jar < input.txt |";
while (my $line = <OUTPUT>) {
# do something with the output
}
close(OUTPUT);
Note that this isn't the greatest way to do temporary files; I've just used output.txt and input.txt for simplicity. Look at the File::Temp docs for various cleaner ways to create temporary files more cleanly.
Have you looked into IPC::Run?
Syntax similar to this might be what you are looking for:
use IPC::Run qw( run );
my $input = $string;
my ($out, $err);
run ["java -jar java_program.jar"], \$input, \$out, \$err;
Create a pipeline just like your shell would.
Here's our scary string:
my $str = "foo * ~ bar \0 baz *";
We'll build our pipeline backwards, so first we gather the output from the Java program:
my $pid1 = open my $fh1, "-|";
die "$0: fork: $!" unless defined $pid1;
if ($pid1) {
# grab output from Java program
while (<$fh1>) {
chomp;
my #c = unpack "C*" => $_;
print "$_\n => #c\n";
}
}
Note the special "-|" argument to Perl's open operator.
If you open a pipe on the command '-' , i.e., either '|-' or '-|' with 2-arguments (or 1-argument) form of open(), then there is an implicit fork done, and the return value of open is the pid of the child within the parent process, and 0 within the child process … The filehandle behaves normally for the parent, but i/o to that filehandle is piped from/to the STDOUT/STDIN of the child process.
The unpack is there to peek into the contents of the data read from the pipe.
In your program, you'll want to run the Java program, but the code below uses a reasonable facsimile:
else {
my $pid2 = open my $fh2, "-|";
die "$0: fork: $!" unless defined $pid2;
if ($pid2) {
$| = 1;
open STDIN, "<&=" . fileno($fh2)
or die "$0: dup: $!";
# exec "java", "-jar", "java_program.jar";
# simulate Java program
exec "perl", "-pe", q(
BEGIN { $" = "][" }
my #a = split " ", scalar reverse $_;
$_ = "[#a]\n";
);
die "$0: exec failed";
}
Finally, the humble grandchild simply prints the scary string (which arrives on the standard input of the Java program) and exits. Setting $| to a true value flushes the currently selected filehandle and puts it in unbuffered mode.
else {
print $str;
$| = 1;
exit 0;
}
}
Its output:
$ ./try
[*][zab][][rab][~][*][oof]
=> 91 42 93 91 122 97 98 93 91 0 93 91 114 97 98 93 91 126 93 91 42 93 91 111 111 102 93
Note that the NUL survives the trip.
The builtin IPC::Open2 module provides a function to handle bidirectional-piping without an external file.